In decoupled_optimizer.py
, one finds the code fragment:
# Iterate through the named modules of the model.
for module_name, module in model.named_modules():
# Check if the current module is an instance of any of the desired
# types (LayerNorm or torch.nn.Embedding).
for ndim in [LayerNorm, torch.nn.Embedding]:
if isinstance(module, ndim):
# If torch.nn.Embedding, append its name with a ".weight"
# suffix to the no_decay list.
if module_name == exclude_module:
no_decay.append(f"{module_name}.weight")
else:
# If the module is an instance of LayerNorm
no_decay.append(f"{module_name}.gamma")
# Exit the inner loop since the desired module has been found.
break
If the module_name != exclude_module
, this code appends a parameter named gamma
to the no_decay
list. In this case, the layer is a LayerNorm, defined in torch.nn.LayerNorm, which only has parameters weight
and bias
. Thus, .gamma
should be replaced by weight
.
Of course, I do not really know why bias
is not included. But that is for another day.
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too