Sitemap

Why does AI (LLM) model’s quality degrades when converting from bfloat16 to float16

mlapi
2 min readJul 29, 2024

As you must have notices most of the AI (LLM) models are trained in bfloat16 instead of float16.

Photo by Bernd 📷 Dittrich on Unsplash

Why bfloat16?

In bfloat16, 16 bits are occupied as following

1 bit - sign bit
8 bits - exponent bits
7 bits - fraction bits

and in float16, 16 bits are occupied like this

1 bit - sign bit
5 bits - exponent bits
10 bits - fraction bits

As we can see bfloat16 has 8 exponent bits as compared to 5 bits in float16, and if we compare this to float32 whose bits are occupied like this,

1 bit- sign bit
8 bits- exponent bits
23 bits- fraction bits

float32 has 8 exponent bits, which is same as bfloat16.

So bfloat16 will store exponant values like float32, but it will still occupy 16 bits only.

This is why most AI models are now using bfloat16 insted of float16 for training.

Ok, now coming back to main point, as mentioned above float32 has only 5 exponent bits so when we try to convert bfloat16 to float16 we will loose 3 bits, which in terms will loose lot of valuable information.

How to solve it

We can overcome this issue by converting bfloat16 to float32, as we just need to add zeros to fraction bit, this conversion will make sure all the information is reserved. This will double the model size, but that’s the drawback of having good accuracy.

--

--

mlapi
mlapi

No responses yet