from Hacker News

Did any processor implement an integer square root instruction?

by rwallace on 4/7/24, 10:55 AM with 72 comments

by corsix on 4/7/24, 2:30 PM
AArch64 NEON has the URSQRTE instruction, which gets closer to the OP's question than you might think; view a 32-bit value as a fixed-precision integer with 32 fractional bits (so the representable range is evenly spaced 0 through 1-ε, where ε=2^-32), then URSQRTE computes the approximate inverse square root, halves it, then clamps it to the range 0 through 1-ε. Fixed-precision integers aren't quite integers, and approximate inverse square root isn't quite square root, but it might get you somewhere close.
The related FRSQRTE instruction is much more conventional, operating on 32-bit floats, again giving approximate inverse square root.
by fjfaase on 4/7/24, 12:17 PM
Is it possible in a single clock-cycle. Yes, with a very large lookup table. It is probably possible to reduce the size depending on how many serial logical gates can be executed within the clock-cycle. Think about that the binary root of 10000 is rather similar to that of 100 only with respect to different number of zero's.
by msarnoff on 4/7/24, 7:45 PM
If you wanted to expand the definition of “processor” to electromechanical contraptions, the Friden SRQ could perform square roots using just additions and shifts, with not a single electronic component other than a motor. And since you had to position the decimal points manually, it would _technically_ be an integer operation…
Video: https://youtu.be/o44a1ao5h8w
by treprinum on 4/7/24, 12:20 PM
Can't you use the sequence 1 + 3 + 5 + ... + 2k + 1 to get the integer square root of any integer number? It's basically the k of the nearest lower number to your number in this sequence.
by kelnos on 4/8/24, 7:29 PM
This bit in an answer further down made me chuckle:
> My implementation of square root using binary search, that doesn't depend on a multiplier. Only basic ALU instructions are used. It is vigorously undocumented. I have no idea what I wrote but it seems to work.
A fine reminder that if we write clever code, we're probably not going to remember how it works.
by moomin on 4/7/24, 2:06 PM
You need to read down a bit, but the answer “ENIAC” is hilarious.
by Dwedit on 4/7/24, 8:04 PM
2 ^ (1/2 * Log2(X)) = sqrt(X)
You can get a really really rough approximation if you replace Log2(x) with 'count leading zeroes'. With a better approximation of Log(2), you can get closer to the answer.
by dahart on 4/7/24, 2:34 PM
For an approximate (very rough) answer, as opposed to one accurate to the nearest integer, a right shift by half the number of bits of the leading 1’s position will do, and of course nearly every processor has a shift instruction. I’m not sure how often processors haven’t had something like FLO (Find Leading One) or FFS (Find First Set) instruction, those seem ubiquitous as well.
The super rough approximation for some uses can be approximately as good as an accurate answer. When you just need a decent starting place for some further Newton-Raphson iteration, for example. (Of course the right-shift trick is a nice way to seed a more accurate square root calculation. :P)
by bryanlarsen on 4/7/24, 3:08 PM
IIRC, most (all?) fixed point DSP's have a square root instruction and/or helper instructions.
by jlarcombe on 4/8/24, 5:13 AM
Semi-related and of interest to 6502 fans, exhaustive analysis of square root algorithms: https://github.com/TobyLobster/sqrt_test
by pajko on 4/7/24, 8:40 PM
ARM VFP has VSQRT
https://developer.arm.com/documentation/dui0473/m/vfp-instru...
by Pet_Ant on 4/8/24, 1:46 PM
I'm sure the VAX must have? (if we are including microcode)