Log in

I wonder what hardware conditions (GPU) the code uses, and why the loss value has been above 5.2 after running the train.py file, and the validation generated unreadable incomprehensible content.

0
👍
0
kyegomez
kyegomez / BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Minimum is $1

By funding this issue, you agree to our Terms of Service and understand our Privacy Policy.

How does funding with Polar work?

1

Pay now to fund the work behind this issue.

2

Get updates on progress being made.

3

Maintainer is rewarded once the issue is completed.

FAQ

Backer

You're funding impactful open source efforts

Contributor

You want to contribute to this effort

Maintainer

You want to get funding like this too