Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

How best to control which states are expanded in the planner ?

Discussion in 'AI & Navigation Previews' started by Gnarf, Nov 23, 2020.

  1. Gnarf

    Gnarf

    Joined:
    Jul 29, 2015
    Posts:
    6
    Hello !

    I am having trouble making the planner search breadth-first. In my example, only 2 of the possible 5 states at each level have completed their subplan. This seems to be motivated by the highest estimated reward, however it pains me to know that if he would just expand the first level resulting states, he would find an action with a big immediate reward.

    This also pushes the planner to have a depth of 1000+, which is useless in my case.

    So far I tried :
    • Balancing out the immediate rewards to prevent the cumulated ones from hulking up.
    • Changing the DC plan settings : upping state expansion budget, capping plan size, setting the selection job to parallel instead of sequential.
    • Changing the execution settings.
    Any (hopefully deterministic) advice ?

    Cheers !

    plan.jpg
     
    Noxalus likes this.
  2. JvanOpstal

    JvanOpstal

    Joined:
    Oct 12, 2016
    Posts:
    8
    What worked for me was to create a reward estimator with higher bounded values returned through the Estimate function. If I understand correctly, any branch with a reward outside of the min/max bounds of the reward estimator is discarded, so putting those bounds higher makes the planner evaluate more branches. But this was on an earlier version, so your milage may vary.
     
    eugolana likes this.
  3. Gnarf

    Gnarf

    Joined:
    Jul 29, 2015
    Posts:
    6
    Thank you @JvanOpstal; by using the Default Cumulative Reward Estimator, with [-100, 0, 100] bounds, and keeping all my actions' rewards in the [-3, 3] range (instead of scaling them with relevant trait values), i was able to get breadth expansions.

    I am, however, still hazy on the whole Bounded Value concept. From the VacuumRobot example project I had the impression that the Reward Estimator was giving a ballpark guess of what a state could produce, not actual bounds above which the planner stopped searching.

    Any articles or wiki pages (besides the API) you know of that can clear it up ?

    Thank you ! :)
     
    Last edited: Nov 24, 2020
  4. JvanOpstal

    JvanOpstal

    Joined:
    Oct 12, 2016
    Posts:
    8
    Noxalus and Gnarf like this.
  5. Gnarf

    Gnarf

    Joined:
    Jul 29, 2015
    Posts:
    6
    So, each branch's lower/upper bounds are pessimistic/optimistic predictions based on future states. And the reason my states (3, 4, 5) were not expanding was because their upper bound was lower than the states' (1, 2) lower bound, making them strictly worse.

    And by adding estimated bounds much larger than the plan's cumulated reward, they were still worse but no longer strictly, so they were evaluated.

    Cool, thanks ! :)