Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

Creatorkyegomez
Stars42
LicenseMIT License
RepositoryGitHub
Websitediscord.gg