Stand on a large clock, say on the 1. Now flip a coin and move ahead one hour if the coin turns up heads, and back one hour otherwise. Keep repeating the process until you’ve stood on all 12 numbers. How long, on average, will this random walk take? If you generalize to clocks with p positions, how does the expected time vary with p?
Here’s a little Python code to experiment with the problem.
from random import random p = 12 def time_to_cover_circle(): circle = [0 for i in range(p)] count, pos = 0, 0 while True: count += 1 pos += 1 if random() > 0.5 else -1 pos %= p circle[pos] = 1 if min(circle) == 1: return count
I’m curious, is this problem different from ‘How long would it take, on average, to reach the number 7 from the number 1?’ (The answer is 36 flips, if I’m not terribly mistaken)
My immediate impulse is 144. Trying the code now …
Well, I was wrong, obviously. Cool answer, though. Now I’ve got the “why?” to ponder in this silly meeting I must attend.
[edit/remove in moderation: t = 1 + \sum_n^{p-1} n]
The python is inconsistent with how I would interpret the problem. It is off by one. For example, with p=2, you are standing on 0, and with one step you stand on 1, thus covering the graph. The python returns 2 steps, not 1 step.
This is a small difference, but counting the initial position does lead to a slightly nicer formula (if my formula is correct).
Using Jeff Garrett’s interpretation (which was also how I interpreted it), I get that the formula for the mean number of steps for p positions is
m = p*(p-1)/2.
Simple indeed.
With such a simple result, I feel that there should be an easy way to see why it is correct. (I inferred the formula from simulation.)
@the cyclist: How did you infer an analytic formula through a numeric procedure like simulation?
I have a proof, that I feel is at least conceptual if not easy.
First, I need a basic fact about random walks and hitting times.
Let S_0 = 0 and let S_k = X_1 + … + X_k for k > 0 where X_i are independent, identically distributed variables with X_i = 1 with probability 1/2, and X_i = -1 with probability 1/2.
Let t = min { k : S_k = -A or S_k = B } for some fixed integers A, B > 0.
Basic Fact: E(t) = A*B.
From here, we can get to the clock problem. Let V_k = the visited set at time k, the collection of all vertices already visited up to this time. Let t(n) = min { k : #V_k = n } for n <= p be the first time that we have visited n distinct vertices. In my interpretation, t(1) = 0, and t(2) = 1. We want to calculate E(t(p)).
The key realization is that we can get from t(n) to t(n+1) without too much trouble. Consider the situation at time t(n) for n < p. We have visited n vertices and we are at one extreme of V_t(n). Adding labels, without loss of generality, we can say we are at 0, and we have visited 1, 2, …, n-1. To calculate t(n+1) – t(n) we want to know the first time we hit either -1 or n (in the case of n = p, these are the same vertex, but that is irrelevant). Therefore from the basic fact stated above:
E(t(n+1) – t(n)) = n for n < p
Which yields:
E(t(n)) = 1 + … + (n-1) = n(n-1)/2 for n <= p
In particular:
E(t(p)) = p(p-1)/2
@vonjd:
For a given number of hours H, I ran 50k iterations of the random walk to see the number of steps. Then I calculated the mean (call it “M”) of the those 50k iterations. The results were:
H=1 –> M=0 [no need to simulate]
H=2 –> M=1 [no need to simulate]
H=3 –> M=3
H=4 –> M=6
H=5 –> M=10
H=6 –> M=15
H=7 –> M=21
H=8 –> M=28
H=9 –> M=36
H=10 –> M=45
H=11 –> M=55
H=12 –> M=66
Each of these calculations of M had a some error, typically ~0.01. The pattern is clear, and the formula is easy to see.
Of course this is not a proof; the pattern is just a guide. It might go off the rails at the very next value of H! But, that seems unlikely, given the simplicity of the random walk rules here.
To me, this is very reminiscent of the “fly flying between two trains” problem (see here, e.g.: http://mathforum.org/dr.math/faq/faq.fly.trains.html). There is the infinite sum approach, but I feel that there may be a much simpler one as well.
Interesting.
A little technical point: `[0 for i in range(p)]` is more simply `[0]*p`.
Another (Python) technical point: checking whether all the positions have been reached would be conceptually simpler and asymptotically faster with `not_yet_reached = set(range(p))` and `not_yet_reached.discard(pos)` and `if not not_yet_reached:`.
This is neat!
@EOL’s suggestion is excellent, you can have the while loop triggered by not_yet_reached (and return afterwards).
J.Garrett’s proof above is illuminating and exact (simpler still would be to consider A==B, since the transition probabilities are equal).
WP shows that the expected exit time from an interval [-r, r] centered around the starting location is r^2 ( https://en.wikipedia.org/wiki/Hitting_time#Examples ) .
In fact, we’re considering the saturation of a Brownian diffusion, and for this random process the std.dev grows as the square root of time. If we set the mean of the process to 0 and invert, we retrieve the scaling law for times. (Fast and loose, apologies to all mathematicians :D)
I’m getting an answer of about 64. But that was just with some brute force unpretty R code. Don’t scroll down unless you want to see some bleary eyed stubborn code!
p <- 12L
avgs <- integer(100)
for(s in 1:length(avgs)) {
store <- integer(1000)
for(k in 1:length(store)) {
x <- i <- 1L
j <- 1
while(length(x) 0.5) {
if(i < p){
i <- i + 1
if(!(i %in% x)) {
x <- c(x, i)
}
} else {
i 1) {
i <- i – 1
if(!(i %in% x)) {
x <- c(x, i)
}
} else {
i <- p
}
}
j <- j + 1
}
store[k] <- j
}
avgs[s] <- median(store)
}
hist(avgs)
summary(avgs)
table(avgs)
i used matlab code to solve it and repeated the code 1000 times and got average of 22.4 toss with deviation of 20.7006 , but i want meaning of those numbers , please
% code
z=12;
A=[1 2 3 4 5 6 7 8 9 10 11 0];
B=inf(10000000,1);
r=1;
count=0;
while true
x=round(rand(1,1));
if x==0
z=mod(z-1,12);
B(r)=z;
lia= ismember(A,B);
for n=1:size(lia)
if lia(n)==true
count=count+1;
end
end
if count==12
break;
else r=r+1;
end
else z=mod(z+1,12);
B(r)=z;
lia= ismember(A,B);
for n=1:size(lia)
if lia(n)==true
count=count+1;
end
end
if count==12
break;
else r=r+1;
end
end
end
disp(r)
Here is my MATLAB code, which got 66 (within error).
HOURS = 12;
DONE = 0:(HOURS-1);
NTRIALS = 100000;
steps = zeros(NTRIALS,1);
for nt=1:NTRIALS
if mod(nt,10000)==0
disp([num2str(nt),’ out of ‘,num2str(NTRIALS),’ trials completed.’])
end
current = 0;
visited = zeros(1,HOURS);
while not(isequal(visited,DONE))
steps(nt) = steps(nt) + 1;
current = mod(current+2*(rand<0.5)-1,HOURS);
visited(current+1)=current;
end
end
figure
histogram(steps,min(steps)-0.5:max(steps)+0.5)
xlim([HOURS-0.5 HOURS^2+HOURS+0.5])
xlabel('Steps')
ylabel('Frequency')
disp(['Mean number of steps is ',num2str(mean(steps))])
Via simulation I got ~66 steps on average, bit the curious thing is that I found with fitter that the number of steps follows a gamma distribution. Has anybody else gotten the same result?
@Julio Luna, I do not think I agree. If I’m not mistaken, I get a pgf of:
E(z**t) = z**11/((z**2 – 2)*(z**4 – 16*z**2 + 16)*(z**5 – 6*z**4 – 12*z**3 + 32*z**2 + 16*z – 32))