Search Unity

Feedback Observation + Reward Suggestions

Discussion in 'ML-Agents' started by IrishCyborg, Apr 2, 2021.

  1. IrishCyborg

    IrishCyborg

    Joined:
    May 29, 2018
    Posts:
    6
    I'm trying to make an ML-Agent of Tetris, having completed a couple of other projects, and wanting to ramp things up. I've programmed Tetris in Unity, and it's represented by a 10x20 2d array of Integers. I'm converting each integer into a float and using those as the observations for the ML-Agent.

    I'm wondering what you think of using these rewards and observations:
    Rewards:
    • -1 reward for losing the game (i.e. touching the top)
    • 0.05 reward every time a block lands, for each filled square on the highest level at the time of landing
    • 1 reward every time a row is filled (and then cleared by the game)
    Observations:
    • All 200 squares in the game.
    • The 10 squares in the currently highest row. (Is this redundant?)
    Actions: (in case you're interested, there are 5 discrete actions, swapping isn't implemented in this version)
    • Move left
    • Move right
    • Move down
    • Rotate left
    • Rotate right
    Any other suggestions or feedback would be super appreciated!
     

    Attached Files:

  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    I'd say overall it sounds reasonable to start with.

    > 0.05 reward every time a block lands, for each filled square on the highest level at the time of landing
    I'd say each filled square below the highest level if you're trying to reward it not to go higher when possible.

    > The 10 squares in the currently highest row.
    This doesn't sound necessary as they don't provide extra information.

    Only one thing I could suggest is to use sensors that captures the 2D board observations like Camera/Grid Sensor/Match3 Visual Sensor, rather than using vector sensor which put everything in a 1D vector. The relative positions of each cell in the grid (like which cell is next to which cell) is an important information for making decisions in this case and the agent might benefit from that.
     
  3. IrishCyborg

    IrishCyborg

    Joined:
    May 29, 2018
    Posts:
    6
    Thanks!

    Here's what I'm going to try for the next training run:
    Reward for completing a row
    Positive reward for placing a block lower (lower means more reward)
    Negative reward for each empty block below a newly placed block.

    I'll try using some of those sensors you mentioned, Camera looks like the easiest to start with so I'll use that first. :)

    Thanks for your help!