I'm trying to train an agent to shoot slow moving bullets at a moving target. The action is the angle that the bullet shoots at. The problem is that the agent might fire several bullets before the first bullet even reaches the target. So, if the first bullet hits the target and creates a reward, won't that reward be associated with the later actions (which might be in the opposite direction of the target). Essentially, my question is this: is there any way to associate rewards with specific actions that occurred in the past? This way, I could associate the bullet hit reward with the exact action that represents the firing of that specific bullet (thus creating an accurate action/reward tuple). If not, what would be the best way of training for this specific situation (slow moving bullets towards a moving target)?