In activation_quant(Tensor)
in file bitlinear.py
the quantization seems quite suspicious to me in line 16:
y = (x * scale).round().clamp_(-128, 127) / scale
Two issues I see here:
x * scale
, then round them with round()
. This does not change the representation of the data, they are still in their original floating point precision. In other words, you do introduce the errors of quantization (due to rounding) but you do not save space by reducing to int8
representation.scale
. Why is that? You rescale your floating point rounded numbers between [-128.0, 127.0] back to near-original values (not exactly original values due to rounding errors). I'm not sure what's the point here?Maybe I'm missing a point, but I thought that the function activation_quant
is for taking a torch.Tensor(dtype=torch.float16)
and reducing it to torch.Tensor(dtype=torch.int8)
. This is not what's happening
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too