Hello, there was an error when I used the Sophia optimizer to train GPT3 with Megatron. The error point is that grad
cannot be substituted into the optimizer with require_grad = True
state to calculate the second derivative. Do you know how to solve this problem?
File "/root/miniconda3/envs/torch18/lib/python3.7/site-packages/torch/autograd/__init__.py", line 277, in grad allow_unused, accumulate_grad=False) # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too