Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

Question Sustainably chopping wood

Discussion in 'ML-Agents' started by VladDerGauner, Jan 8, 2021.

  1. VladDerGauner

    VladDerGauner

    Joined:
    Dec 3, 2015
    Posts:
    21
    Hey everyone,

    I am trying to create an Agent that accesses the state of a forest and then decides wether to chop a tree or not. The general setup is that each tree in the forest spawns new trees with a certain probability, so over time the forest is fully stacked.

    The Agent gets as Input the number of trees in the forest and how much wood it has generated as of yet.

    It gets rewarded if it chops a tree and there are still trees remaining in the forest and it gets punished if it chops the last tree. (I also tried punishing it a little if it does not chop a tree in every step, but this yielded the same result.)

    The behaviour I expect would be: That the agent chops wood if there is more than a single tree and that it does not if there is just one tree left.

    However so far it always ends up in a policy, where it just does not want to chop at all. (Quite the druid...)

    Any suggestions on how to proceed? Or is this generally a problem that can not be solved that well by reinforcement leraning?

    Also here is my config file:
    Code (Boo):
    1. behaviors:
    2.   Chopper:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 10
    6.       buffer_size: 100
    7.       learning_rate: 3.0e-4
    8.       beta: 5.0e-4
    9.       epsilon: 0.2
    10.       lambd: 0.99
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 128
    16.       num_layers: 3
    17.     reward_signals:
    18.       extrinsic:
    19.         gamma: 0.99
    20.         strength: 1.0
    21.       curiosity:
    22.         strength: 0.02
    23.         gamma: 0.99
    24.         encoding_size: 256
    25.         learning_rate: 3.0e-4
    26.  
    27.     max_steps: 100000
    28.     time_horizon: 1000
    29.     summary_freq: 1000
    30.     threaded: true

    Any help is greatly appreciated!

    Cheers
     
  2. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    Hi. In my humble opinion, this is definitely a simple enough problem for the agent to be able to solve. However the situation is still a little vague. Could you please elaborate a little? What kind of observations are you providing the agent with? Are you sure that the agent is exposed to the objects it needs to interact with? Also what are the measurements for the reward system?
     
  3. VladDerGauner

    VladDerGauner

    Joined:
    Dec 3, 2015
    Posts:
    21
    Hi!
    Sorry for my lage respone.

    Sensordata:
    - number of trees in the forest
    - average age of the trees (0 - 100)
    - amount of wood in storage

    Actions:
    - Chop a tree (The oldest tree is picked automatically)
    - Don't chop

    Rewards:
    - If a tree is chopped get age/100 as reward (This means if an old tree is chopped it gets a reward of almost 1)
    - If the last tree is chopped get a reward of -1
     
  4. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    Based on your Sensor data, I am still unclear as to whether or not the agent has any idea that the forest even exists at a certain location. if possible, can you share your project so I can take a look?
     
  5. VladDerGauner

    VladDerGauner

    Joined:
    Dec 3, 2015
    Posts:
    21
    No the agent is no physical entity. Its an abstract "decision maker". At least that is what I want it to be. Yep will share! Just need to get home.