Kye Gomez/Vit-RGTS

Get Confused about the L2 Norm

2 months ago

👍

I could not find a rigorous definition of the feature norms in the paper. Which layer or block do the tokens originate from?
Regarding the attention maps, I assume that the norms are based on the linearly transformed tokens used to calculate the attention matrices. According to LayerNorm, all tokens should have a norm of $d^{0.5}$. However, Fig. 3 shows that some tokens have norms ranging from 200 to 600, which seems too large for $d^{0.5}$. This confuses me.
Am I misunderstanding something?

kyegomez / Vit-RGTS

Open source implementation of "Vision Transformers Need Registers"

143

Contributors get 40% of received funds after fees

How does funding with Polar work?

Pay now to fund the work behind this issue.

Get updates on progress being made.

Maintainer is rewarded once the issue is completed.

FAQ

Backer

You're funding impactful open source efforts

Contributor

You want to contribute to this effort

Maintainer

You want to get funding like this too

Kye Gomez/Vit-RGTS

Get Confused about the L2 Norm

How does funding with Polar work?

Backer

Why does "Fund on completion" require GitHub login?

When is the invoice due for "Fund on completion"?

What happens if the issue is never completed?

Do I get any extra benefits by funding?

Do I get progress updates?

Contributor

Do I get a reward?

Is rewards guaranteed?

Maintainer

How can I get funding like this for my open source initiatives?