Fund: [BUG] Is activation actually quantized? (I think it's still in f16 or f32)

In activation_quant(Tensor) in file bitlinear.py the quantization seems quite suspicious to me in line 16:

 y = (x * scale).round().clamp_(-128, 127) / scale

Two issues I see here:

You rescale the float values at x * scale, then round them with round(). This does not change the representation of the data, they are still in their original floating point precision. In other words, you do introduce the errors of quantization (due to rounding) but you do not save space by reducing to int8 representation.
At the very end you divide by the scale. Why is that? You rescale your floating point rounded numbers between [-128.0, 127.0] back to near-original values (not exactly original values due to rounding errors). I'm not sure what's the point here?

Maybe I'm missing a point, but I thought that the function activation_quant is for taking a torch.Tensor(dtype=torch.float16) and reducing it to torch.Tensor(dtype=torch.int8). This is not what's happening

Kye Gomez/BitNet

[BUG] Is activation actually quantized? (I think it's still in f16 or f32)

How does funding with Polar work?

Backer

Contributor

Maintainer

Kye Gomez/BitNet

[BUG] Is activation actually quantized? (I think it's still in f16 or f32)

How does funding with Polar work?

Backer

Why does "Fund on completion" require GitHub login?

When is the invoice due for "Fund on completion"?

What happens if the issue is never completed?

Do I get any extra benefits by funding?

Do I get progress updates?

Contributor

Do I get a reward?

Is rewards guaranteed?

Maintainer

How can I get funding like this for my open source initiatives?