So I'm trying to play around with the stored tensorflow PPO model, but I can't seem to get the discrete actions when inputting my vector observations. I see in the TFPolicy class it uses the action output named policy_1/concat:0 however, this is not present in my frozen graph, only policy_1/concat_2/axis:0 and policy_1/concat_2:0 are present. I have attached my code below. What am I doing wrong? I have 2 discrete action branches with 3 possible values each. from mlagents.tf_utils import tf def load_pb(path_to_pb): with tf.gfile.GFile(path_to_pb, "rb") as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) with tf.Graph().as_default() as graph: tf.import_graph_def(graph_def, name='') return graph GRAPH_PB_PATH = './my_path/frozen_graph_def.pb' graph = load_pb(GRAPH_PB_PATH) with tf.compat.v1.Session(graph=graph) as sess: output_node = graph.get_tensor_by_name('policy_1/concat_2:0') input_node = graph.get_tensor_by_name('vector_observation:0') action_masks = graph.get_tensor_by_name('action_masks:0') model_result = sess.run( output_node, feed_dict ={ input_node : np.array( [[ 0. , 0. , -0.29925328, -0.18826045, -0.40039593, 0. , 0. , -0.92760783, 0.13117751, -0.56681085]]), action_masks: np.array([[1,1,1,1,1,1]])}) model_result.round(1) /// Results in: array([[-15.4, -15.4, -0. , -15.4, -0. , -15.4]], dtype=float32)
Hi, I think the tensors that you want are "action:0" (and possibly "action_output_shape:0"). For discrete actions, the values that are output are actually the log probabilities for each branch. You can see how we do it here: https://github.com/Unity-Technologi...l-agents/Runtime/Inference/ApplierImpl.cs#L44