利用 0x5f3759df 实现快速开方求倒

0x0001 · 2023年3月10日

偶然得知卡马克与雷神之锤3的故事，略有意思，两步操作开方，被秀了一脸

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;
 
	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;                       // evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
 
	return y;
}

0x5f3759df的数学原理_ACdreamers的博客-CSDN博客

hsxfjames · 2023年3月10日

An author I'm following on Zhihu recently happened to share an another solution for Fast Inverse Square Root, using a larger magic number 0x5fe6ec85e7de30da. It is also worth studying.
有什么理论复杂但是实现简单的算法？ - 酱紫君的回答 - 知乎

And I also found a very detailed explanation for the original magic number 0x5f3759df and how it works. But I doubt that it is at least inspired by Wikipedia or similar articles.

Bintou · 2023年3月11日

0x0001 看上去，这已经不是科学技术，而是一门艺术（Art在英文中也是手艺的意思）。非一般人可以理解。虽然看上去这是一种追求效率（或者利益最大化）的艺术，但是，我猜测，仅仅为了利益（功利）是创造不出这种艺术的。不知道这种猜测对不对。

0x0001 · 2023年3月11日

Bintou 我也觉得，猜测可能背后写这个的人有什么斟酌，刚好凑出这个 magic number，又恰恰好很实用

背后有点点追求极致的动力，功利的视角驱动的话，差不多达成目的就不管了。

qazxcdswe123 · 2023年3月11日

这个算法是我在大一时候看到的，顿时惊为天人，现在大二了，来补一篇，多少有点纪念意义：D
也可以去我的博客阅读，能跳转双链（如果你想看 IEEE 754 相关的话）

Why care

The cryptic function we see was used to calculate the reverse of square root, namely $\frac{1}{\sqrt{c^{2}}}$ , but why care?
It turns out that if we want to implement physics or lighting in the game engine, it helps when the vector you're calculating with are normalized to have length 1. The length of the vector can be calculated using Pythagorean theorem $\sqrt{x^2+y^2+z^2}$ (in 3D of course), and thus each component will be $x \times \frac{1}{\sqrt{x^{2} + y^{2} + z^2}}$ , you might see where this is going .
However, even though calculating exponent on computer is easy, doing division is slow and expensive. In a performance-critical scenario such as gaming, this "evil" function can dramatically improve performance (in that particular era).

Aside: Normal vectors are unit vectors aligned perpendicularly to a surface, defining its direction. They are commonly used for lighting, collisions, and other operations involving surfaces. Most of the time we only care about the direction of the light or physics and bringing in magnitude may even bring in some weird bugs, so a normalized vector is all we need. It also helps to separate the direction from the magnitude of your vector. For example, you can keep the speed of a character constant, even when they travel in weird diagonal directions.

But How

It mostly involves [[Binary]] calculation, Newton's method, a few math tricks and [[Floating Point Representation]], an additional blog concerning IEEE 754 can be found [[Introduction to Floating Point Number|here]].
Detailed visual proof can be found Here.

What Now

The first time I saw this algorithm, I was thrilled and immediately compare it with y = 1 / sqrt(x). But it disappointed me, by a lot. This magic function is now 10x slower!!!
Modern compiler, "No one knows optimization better than me".

It is still a good starting point to learn IEEE 754 though.

Aside: In a recent Lex Fridman podcast with Carmack, he actually says that he didn't invent this algorithm, this piece of code was found in the codebase but somehow everyone credits this function to him. He is now an AI researcher in Meta.

References

Vector math — Godot Engine (stable) documentation in English
Fast Inverse Square Root — A Quake III Algorithm - YouTube
Why would you normalize a vector ? : r/gamedev

0x0001 · 2023年3月12日

qazxcdswe123 抛砖引玉了！

hsxfjames · 2023年3月13日

qazxcdswe123 The first time I saw this algorithm, I was thrilled and immediately compare it with y = 1 / sqrt(x). But it disappointed me, by a lot. This magic function is now 10x slower!!!
Modern compiler, "No one knows optimization better than me".

Now we have some handy and fast CPU instructions to achieve it, like rsqrt in Intel's SSE instructions(or this). It takes only several CPU cycles. The modern compiler can recognise the codes like 1 / sqrt(x) and then optimise and transform it into something like rsqrt instruction. So the magic number version for Fast Inverse Square Root is just a pure calculation to approximate the result, and it may be compiled into about ten instructions or even more, that's why it's slower.

Example:
On common x86_64 platform w/ gcc, we'll get rsqrtss for float and a combination of sqrtsd and divsd for double. Notice that I've passed the -ffast-math option to the compiler, see gnu docs.

On MIPS64 platform we'll get rsqrt for either float and double(And similar results on RISC-V 64):

Results above are generated by Compiler Explorer.