32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
#1 opened 1 year ago in kyegomez/Blockwise-Parallel-Transformer
#2 opened 1 year ago in kyegomez/Blockwise-Parallel-Transformer