A question came up on StackOverflow today regarding testing whether integers were perfect squares. The person asking the question said his first idea was to take the (floating point) square root, round the result to the nearest integer, square that, and see whether he got the original number back. But he thought maybe there was a faster way that avoided floating point arithmetic.
My suggestion was to first look at the number in hexadecimal (base 16) and see whether the number could be ruled out as a square before proceeding. Squares in base 16 end in 0, 1, 4, or 9. (You can see this by brute force: square every number from 0 to 15 and take the remainder dividing by 16.) You can find the end of the number in hex by a fast bit operation, then rule out 12 out of every 16 numbers. Then do the square root and square procedure on the 4 out of 16 numbers that slip through the hexadecimal filter. Here’s C++ code for the algorithm.
int PerfectSquare(int n) { int h = n & 0xF; // last hexadecimal "digit" if (h > 9) return 0; // return immediately in 6 cases out of 16. // Take advantage of Boolean short-circuit evaluation if ( h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8 ) { // take square root if you must int t = (int) floor( sqrt((double) n) + 0.5 ); return t*t == n; } return 0; }
This algorithm ran about twice as fast as always taking the square root. The analogous C# code also ran about twice as fast as the more direct method.
What about looking at other bases? We want to use bases that are powers of 2 so we can get the last “digit” quickly. Half the numbers in base 4 are potential squares. Three out of eight numbers in base 8 are potential squares. Four out of 16 are potential squares in base 16. So taking smaller powers of 2 is probably not faster. Seven out of 32 numbers are potential squares base 32, a slightly lower ratio than base 16, but at the cost of more comparisons. I haven’t benchmarked using bases other than 16, but I doubt they are faster. If any are faster, I imagine the difference is small.
Here are a couple number theory problem that comes out of the problem above. First, how many numbers have square roots mod 2n, i.e. for how many values y does x2 ≡ y (mod 2n) have a solution? Call that number g(n). For example, g(3) = 3, g(4) = 4, g(5) = 7, g(6) = 12. Is there a simple formula for g(n)? Second, what is the minimum value of g(n)/2n?
Update: See Michael Lugo’s solution to the number theory questions in the comments below or more discussion here.
Update: According to the StackOverflow discussion, the base 64 technique was a little faster than the base 16 technique, at least when implemented in C#. The relative speeds of variations on the algorithm depend on what language you’re using.
Related: Applied number theory
The encyclopedia of integer sequences gives the simple formula g(n) = [(2^n + 10)/6], where [x] is the greatest integer less than or equal to x. (g(6) is in fact 12; the possible residues are 0, 1, 4, 9, 16, 17, 25, 33, 36, 41, 49, 57.) So {g(n)/2^n} is a sequence which decreases towards a limit of 1/6, although it never gets there.
Useless but… oh!… so beautiful. Thank you.
I think you’d speed up the code considerably if you used the table-in-a-constant trick:
if ((1<<0 | 1<<1 | 1<<4 | 1<> (n & 0xF) & 1) do_the_sqrt;
or, if you have 64-bit integers,
if (0x2001002010213ULL >> (n & 0x3F) & 1) do_the_sqrt;
Know this is going back in time, but why is there a + 0.5 in the algorithm?
The floor() function takes the integer part of a floating point number, so floor(x + 0.5) rounds x to the nearest integer.
Recently I have been working on a problem around testing if a particular big integer is a perfect square.
I don’t have BigInteger.SquareRoot() in my language (C#), so I implemented my own.
First, using Newton’s iteration new_estimate = old_estimate + S/old_estimate. Work within the rational (since they are big integer, I don’t have floating point arithmetic with them anyway), then iterates.
The problem with that is, first, you don’t know whether it converged or not, and second, the numerator and denominator grows too quickly making the computation go real slow.
The solution to the first is to recognize Newton’s iteration result always give upper bound to the actual root, so S/new_estimate gives a lower bound of it. If the upper bound and the lower bound’s difference is less than 1, then we are done, because either the interval enclose an integer that is the root, or the interval does not enclose an integer which prove the number is not a perfect square.
The solution to the second problem is to truncate away the decimals in each iteration, the optimization make the program run blinking fast for moderate size.
BTW, in your solution, if you know that the number mod 4 equals zero, you may want to divide by 4 (until you can’t) before you take root. That should help considerably too because divide by 4 is just bit shifting.
You could use a non deterministic approch.
If x in Z is that x = y^2, so it is modulo every prime p. Being a square modulo a prime is known as been Quadratic Residue modulo p. Testing whenever an integer u in Zp is QR modulo p is easy by computing Legendre symbol (u/p) which is polynomial in p. Also by Euler criterion if u is QR mod p the u^((p-1)/2) is congruent to 1 mod p. So if x = y^2, p is a prime then x^((p-1)/2) = (y^2)^((p-1)/2) = y^(p-1) =p= 1 by Fermat little theorem (=p= means congruent mod p). So the statement above is proved.
We can use this to implement this probabilistic test: given a square u to be tested, pick up r primes p1,…,pr and test whetever (u/p1),…,(u/pr) are all 1. Is not, u is not a square, else it MAY be. Since there are about (p-1)/2 QR mod p, the probability of u to false pass the test with r primes is 2^-r. So with just 31 prime you got less 1/1000000000 probability. You better not use only the very first small primes however.
How to compute legendre symbol can be googled.
This method is implemented, for example, in the GNFS to test squareness in the ring Z[m] for polynomial with very big coefficients.
If we write:
int t = (int) floor(sqrt((double) n));
program will work correctly, isn’t it?
No, it won’t. sqrt((double) n) might be something like x.99999, so floor will be x instead of x+1.
Hi. Really nice blog you have here :)
Just a (perhaps) very minor improvement, but if n is a positive number, you can simply skip the floor function,
i.e., int t = (int) ( sqrt((double) n) + 0.5 );
int t = (int) floor( sqrt((double) n) + 0.5 );
Additionally, you could replace the 6 comparisons with a negation of 4:
if ( !(h == 0 || h==1 || h ==4 || h ==9 ) )
Intermediate stored results such as h add time as do multiple compares. To increase the speed perform the modulo 16 check in one operation using a mask as follows:
If you’re doing this often enough for it to matter, you can store an array of true/false values instead of making explicit comparisons, and include the array as initialized data in the program. Depending on how patient you are for such things, 16, 64, and 256 are obvious sizes, or 4096 or 64K if you really want.
Sorry for my layman contribution here. I wont pretend to understand my contribution either, but I will offer it anyway. If instead of checking n & 0xF against 0, 1, 4, or 9 as you have in your algorithm, you changed your algorithm so that you check n & 7 against 1, n & 31 against a 4, n & 127 against the value 16, and lastly n & 191 against 0, you are basically performing the same test, logically and computationally speaking. Except you reject far more non-squares by doing it like that. In my simple experiments, I cut the number of false reports of squareness nearly in half, and resorted to square root computation far less frequently.
Here’s a discussion which is geared toward big numbers (BigInts) http://mersenneforum.org/showpost.php?p=110896. SymPy has an internal sympy.ntheory.primtest.is_square() which is based on this.
I tried out the 3 variants in c#:
Even though I use 64bit in dotNet 4.5.1, for me the first variant was slower than the second one.
Here the fastest algorithm I have chosen for 64 bit c#:
Correction: The 64 bit version is the fastest for me.
I forgot the “L” after “1” to make use of the 64 bit shift left operator.
static bool IsSquareFast64(long n)
{
if ((0x202021202030213 & (1L << (int)(n & 63))) > 0)
{
long t = (long)Math.Round(Math.Sqrt((double)n));
bool result = t * t == n;
return result;
}
return false;
}
Additions (valid for c#)
– Removing Math.Round improves performance more.
– Odd but there is a speed gain by removing the variable “result”, and directly returning t * t == n
– Manually Inline the code for more speed gain. (Attribute MethodImplOptions.AggressiveInlining is not helping much.)
– I tried to replace the t*t multiplication: Changed type of t to double, and then check for (t % 1 == 0) which checks if t is an integer. This does help slightly but if you have a good test method, you will find out it produces wrong results if (n > 4400000000000000L). And if you then add an if for size check then it gets slower again. So for me t * t == n is not avoidable for fastest correct results in full long range.
So your inlined code could look something like this:
if ((0x202021202030213L & (1L << (int)(n & 63))) > 0)
{
long t = (long)Math.Sqrt((double)n);
if (t * t == n)
{
// It is a square, do stuff in this case
}
}