I noticed that there are 2 version of class MoE in the repo. One is in model.py, named SwitchMoE, which is used in MambaMoE. While another MoE is in block.py, named SwitchMixtureOfExperts, which is not used in the model MambaMoE. Whats the purpose of that and whats the difference?
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too