In the module: MambaTransformer/mamba_transformer
, you execute the following in class MambaTransformerblock
:
# Layernorm
self.norm = nn.LayerNorm(dim)
def forward(self, x: Tensor) -> Tensor:
for mamba, attn, ffn in zip(
self.mamba_blocks,
self.transformer_blocks,
self.ffn_blocks,
):
x = self.norm(x)
x = mamba(x) + x
x = self.norm(x)
x = attn(x) + x
x = self.norm(x)
x = ffn(x) + x
return x
Since the layerNorm
has trainable parameter, you appear to be calling three layer norms in the forward
function with tied parameters. Is that what you really want?
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too