How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi,
I have a custom environment and model which work fine for PPO and IMAPALA. I am trying to use the same setup for ApexDQN, but I am having some trouble.
When I instantiate the ApexTrainer class, I get the following error. It seems to be coming from within model.get_q_value_distributions(), which takes the output of model().
File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 613, in __init__ self._build_policy_map( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1803, in _build_policy_map self.policy_map.create_policy( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/policy_map.py”, line 123, in create_policy self[policy_id] = create_policy_for_framework( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/utils/policy.py”, line 80, in create_policy_for_framework return policy_class(observation_space, action_space, merged_config) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/policy_template.py”, line 330, in __init__ self._initialize_loss_from_dummy_batch( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/policy.py”, line 1053, in _initialize_loss_from_dummy_batch actions, state_outs, extra_outs = self.compute_actions_from_input_dict( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py”, line 320, in compute_actions_from_input_dict return self._compute_action_helper( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/utils/threading.py”, line 24, in wrapper return func(self, *a, **k) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py”, line 953, in _compute_action_helper dist_inputs, dist_class, state_out = self.action_distribution_fn( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/algorithms/dqn/dqn_torch_policy.py”, line 234, in get_distribution_inputs_and_class q_vals = compute_q_values( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/algorithms/dqn/dqn_torch_policy.py”, line 424, in compute_q_values (action_scores, logits, probs_or_logits) = model.get_q_value_distributions( File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/algorithms/dqn/dqn_torch_model.py”, line 146, in get_q_value_distributions action_scores = self.advantage_module(model_out) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl return forward_call(*input, **kwargs) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/container.py”, line 139, in forward input = module(input) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl return forward_call(*input, **kwargs) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/models/torch/misc.py”, line 169, in forward return self._model(x) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl return forward_call(*input, **kwargs) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/container.py”, line 139, in forward input = module(input) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl return forward_call(*input, **kwargs) File “/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/linear.py”, line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (32×17 and 16×256)
The problem is, I am not sure what model.get_q_value_distributions() method is being called inside compute_q_values, because it does not seem to be the custom model I have built for PPO and IMPALA (when I try to print from within my custom model’s get_q_value_distributions() nothing is printed, so I assume it is never called) and which I am now passing into the ApexTrainer config.
Does anyone know what might be causing this 32×17 and 16×256 mismatch? model() is outputting 17 dimensions because the action_space.n of my environment is 17, but for some reason model.get_q_value_distributions() expects 16 dimensions. I am not sure how to change what model.get_q_value_distributions() expects or what model is being used when this method is called.
Thanks in advance for your help!