Question: You have a stream of bytes from which you can read one byte at a time. You only have enough space to store one byte. After processing those bytes, you have to return a random byte. Note: The probability of picking any one of those bytes should be equal.
Answer: Since we can only store one byte at a time, we have to be able to work with the bytes as they come in. We can’t store everything, count the number of bytes and pick one. That would be too easy wouldn’t it?
Let’s think it through. We don’t know how many bytes there are, so at any point in time we should have a random byte picked from all the bytes we have seen so far. We shouldn’t worry about the total number of bytes at any time.
When we get the first byte, it is simple. There is only one so we store it. When we get the second one, it has 1/2 probability of being picked and the one we have stored has a 1/2 probability of being picked (we can use any programming language’s built-in random function). After picking one, we replace the current stored byte with the one we picked. Now it gets interesting. When we get the third one, it has 1/3 probability of being picked and the one we have stored has a 2/3 probability of being picked. We pick one of the two bytes based on this probability. We can keep doing this forever.
Just generalizing, when we get the nth byte, it has a 1/n probability of being picked and the byte we have stored has (n-1)/n probability of being picked.
Sometimes, this question is a little confusing. The question said you have only space to store one byte but there is an assumption that you have space to store variables to track the number of bytes and so on. Another assumption is that the stream of bytes is not infinite. If not, we won’t be able to calculate (n-1)/n.
If you have suggestions or comments, they are always welcome.
Like this post? Please Digg it or Stumble it.
Related posts:


Yes this should state that there is space for other calculations, as you need to store n !
———–
When we get the third one, it has 1/3 probability of being picked and the one we have stored has a 2/3 probability of being picked.
———–
Why do you think the probability of being picked for the one already stored is 2/3 ? All the 3 bytes should have equal prob. of 1/3 each being picked rt ?
- Pavan
Pavan,
Since the one we stored represents all the bytes which have passed (2 bytes so far), we have to give it more probability. Let’s label the bytes, A, B and C (where A and B are the first two bytes).
P(A) = (Probability of A being picked in the first round and then again in the next round) = 1/2 (A out of A,B) * 2/3 ( A out of A/B and C ) = 1/3
P(B) = (Probability of B being picked in the first round and then again in the next round) = 1/2 (B out of A,B) * 2/3 ( B out of A/B and C ) = 1/3
P(C) = (Probability of C being picked in the second round) = 1/3 ( C out of A/B and C ) = 1/3
Does that make it clear?
Just to make it clear. (n-1)/n is not undefined when n tends to infinity. it is
1 – 1/n = 1
In words it means that the previous byte always gets picked as n tends to infinity. So no assumptions are necessary.
Hi
I still have doubt:
when we have A B and C.
P(A)c = (when A is stored and C is the new byte, A has been stored then there are only 2 bytes that A and C) = 1/2
P(A)abc = P(A)b * P(A)c
= 1/2 * 1/2
Similarly P(B)abc = 1/2 * 1/2
And P(C) = 1/2
P(A) = 1/2(probability of A being chosen of A and B)* 1/2(now)
Ok Let’s walk through the probability calculations for picking A.
P(A out of AB) = 1/2 (probability of picking A is 1/2 and B is 1/2)
Once we move on to the next round. P(A out of AC) is 2/3 and P(C out of AC) = 1/3 since A carries the weight of B as well.
P(A) = P(A out of AB)*P(A out of AC) = 1/2 * 2/3 = 1/3
Similarly,
P(B) = P(B out of AB)*P(B out of BC) = 1/2 * 2/3 = 1/3
P(C) = P(A out of AB)*P(C out of AC) + P(B out of AB)*P(C out of BC) = 1/2 * 1/3 + 1/2 * 1/3 = 1/3
Still having doubts?
In the theory of computation, the solution involves a log space random Turing machine.
The logarithmic space is for storing the counter that counts the number of byte seen.