Pick a Random Byte

Question: You have a stream of bytes from which you can read one byte at a time. You only have enough space to store one byte. After processing those bytes, you have to return a random byte. Note: The probability of picking any one of those bytes should be equal.

Answer: Since we can only store one byte at a time, we have to be able to work with the bytes as they come in. We can’t store everything, count the number of bytes and pick one. That would be too easy wouldn’t it?

Let’s think it through. We don’t know how many bytes there are, so at any point in time we should have a random byte picked from all the bytes we have seen so far. We shouldn’t worry about the total number of bytes at any time.

When we get the first byte, it is simple. There is only one so we store it. When we get the second one, it has 1/2 probability of being picked and the one we have stored has a 1/2 probability of being picked (we can use any programming language’s built-in random function). After picking one, we replace the current stored byte with the one we picked. Now it gets interesting. When we get the third one, it has 1/3 probability of being picked and the one we have stored has a 2/3 probability of being picked. We pick one of the two bytes based on this probability. We can keep doing this forever.

Just generalizing, when we get the nth byte, it has a 1/n probability of being picked and the byte we have stored has (n-1)/n probability of being picked.

Sometimes, this question is a little confusing. The question said you have only space to store one byte but there is an assumption that you have space to store variables to track the number of bytes and so on. Another assumption is that the stream of bytes is not infinite. If not, we won’t be able to calculate (n-1)/n.

If you have suggestions or comments, they are always welcome.

Like this post? Please Digg it or Stumble it.

Get a free subscription to Oracle magazine published by Oracle Corp.
 Powered by Max Banner Ads 

Related posts:

  1. 10 Google Interview Puzzles
  2. Red and Blue Marbles
  3. Probability of a Car Passing By
  4. Boys and Girls
  5. The Ant Problem

7 Responses

  1. thetoolman says:

    Yes this should state that there is space for other calculations, as you need to store n !

  2. Pavan says:

    ———–
    When we get the third one, it has 1/3 probability of being picked and the one we have stored has a 2/3 probability of being picked.
    ———–

    Why do you think the probability of being picked for the one already stored is 2/3 ? All the 3 bytes should have equal prob. of 1/3 each being picked rt ?

    - Pavan

  3. Aswin says:

    Pavan,
    Since the one we stored represents all the bytes which have passed (2 bytes so far), we have to give it more probability. Let’s label the bytes, A, B and C (where A and B are the first two bytes).

    P(A) = (Probability of A being picked in the first round and then again in the next round) = 1/2 (A out of A,B) * 2/3 ( A out of A/B and C ) = 1/3
    P(B) = (Probability of B being picked in the first round and then again in the next round) = 1/2 (B out of A,B) * 2/3 ( B out of A/B and C ) = 1/3
    P(C) = (Probability of C being picked in the second round) = 1/3 ( C out of A/B and C ) = 1/3

    Does that make it clear?

  4. Sancho says:

    Just to make it clear. (n-1)/n is not undefined when n tends to infinity. it is
    1 – 1/n = 1

    In words it means that the previous byte always gets picked as n tends to infinity. So no assumptions are necessary.

  5. Hi
    I still have doubt:
    when we have A B and C.

    P(A)c = (when A is stored and C is the new byte, A has been stored then there are only 2 bytes that A and C) = 1/2
    P(A)abc = P(A)b * P(A)c
    = 1/2 * 1/2
    Similarly P(B)abc = 1/2 * 1/2

    And P(C) = 1/2
    P(A) = 1/2(probability of A being chosen of A and B)* 1/2(now)

  6. Doggie says:

    Ok Let’s walk through the probability calculations for picking A.

    P(A out of AB) = 1/2 (probability of picking A is 1/2 and B is 1/2)
    Once we move on to the next round. P(A out of AC) is 2/3 and P(C out of AC) = 1/3 since A carries the weight of B as well.

    P(A) = P(A out of AB)*P(A out of AC) = 1/2 * 2/3 = 1/3

    Similarly,
    P(B) = P(B out of AB)*P(B out of BC) = 1/2 * 2/3 = 1/3
    P(C) = P(A out of AB)*P(C out of AC) + P(B out of AB)*P(C out of BC) = 1/2 * 1/3 + 1/2 * 1/3 = 1/3

    Still having doubts?

  7. Miyuki says:

    In the theory of computation, the solution involves a log space random Turing machine.

    The logarithmic space is for storing the counter that counts the number of byte seen.

Leave a Reply

Using Gravatars in the comments - get your own and be recognized!

XHTML: These are some of the tags you can use: <a href=""> <b> <blockquote> <code> <em> <i> <strike> <strong>