A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2
. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)
For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says
and in general
Here the sum is over all positive integers d that divide n.
Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that
where μ is the Möbius function.
We could compute f(n) with Python as follows:
from sympy.ntheory import mobius, divisors def num_primitive(n): return sum( mobius(n/d)*2**d for d in divisors(n) )
The latest version of SymPy, version 0.7.6, comes with a function mobius
for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius
function:
from sympy.ntheory import factorint def mobius(n): exponents = factorint(n).values() lenexp = len(exponents) m = 0 if lenexp == 0 else max(exponents) return 0 if m > 1 else (-1)**lenexp
The version of mobius
that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.
Related: Applied number theory
You could also just rearrange the first formula around to get a recursive formula for f(n).
2^n = sum{f(d), d|n}
2^n = f(n) + sum{f(d), d|n; d<n}
f(n) = 2^n - sum{f(d), d|n; d<n}
The base cases are prime n where f(n) is just 2^n.
Here’s the SymPy code:
It looks like it’s not more efficient. As far as I know, SymPy’s factorint() doesn’t support streaming factors. That would be a useful feature, although it really only matters if you are dealing with numbers with very large prime factors (e.g., the algorithm in factorint() does trial division on primes up to 2^15 before it even gets into the Pollard rho algorithm).
An advantage of the SymPy function is that it works symbolically, i.e., you can put symbolic values in like mobius(n) and do manipulation on them and substitute them later
I was going to open an issue but it looks like there already is one https://github.com/sympy/sympy/issues/6919.
I think you have a bug in your mobius() function.
mobius(12) should be 0, but your code returns 1
I think it shouldn’t look for min(exponents), but just for any(exponent > 1).
Thanks. My min should have been max.
@Joe Taber,
f(p) where p is prime is actually 2^p – 2, because 00…0 and 11…1 are primitive.
There was a question on StackOverflow a few months ago about determining whether a string is primitive or not, but I can’t find it. It basically duplicated the string and checked all cyclic patterns of it to see if any is repeated at all. Pretty neat to find the number of them though!
http://oeis.org/A027375 “Number of aperiodic binary strings of length n; also number of binary sequences with primitive period n”
0, 2, 2, 6, 12, 30, 54, 126, 240, 504, 990, 2046, 4020, 8190, 16254, 32730, 65280, 131070, 261576, 524286, 1047540, 2097018, 4192254, 8388606, 16772880, 33554400, 67100670, 134217216, 268419060, 536870910, 1073708010, 2147483646, 4294901760