You observe a sample of measurements coming from a fixed length ruler. If the object is shorter than the ruler you observe the actual measurement. Otherwise you observe the length of the ruler. What would be a good estimator of the ruler length?

Question

Anonymous · Accepted Answer

get rid of measurements that are equal to the ruler length. then take the average of the rest of the measurements that are within the range (0, ruler_length), ruler_length is 2 times this average value

Anonymous · Answer

My initial answer was to use the MAX of the sample. That however is a biased estimator. How can you account for the bias and come up with an unbiased estimator? I think this is where you need to start making assumptions on the distribution. A uniform distribution would allow to estimate the bias.

Anonymous · Answer

Assuming that the measurements are on a continuous scale, you would have a lot of mass on the point exactly corresponding to the ruler's length, so you could use something akin to a mode I'd imagine.

Anonymous · Answer

I came up a solution: if we know the distribution of actual measurement and the value of actual measurement, then the expected probability of getting wrong measurement should equal to the probability of actual measurement greater than length of ruler. 
Not sure this is correct and will interview on friday. good luck to me.

Jacob Curtis · Answer

The mode should work, right? The length of the ruler is likely to be the only specific value that shows up more than once in the data.

Anonymous · Answer

L^{hat} = N/(N+1) * max(X1,X2,X3,...XN) is an unbiased estimator

Anonymous · Answer

L^{hat} = 2*sum(X)/N is another unbiased estimator

Anonymous · Answer

The length of the ruler would be the censored value in the data. If you draw the histogram of observed values, there should be a mass on the largest value, which is the length of the ruler. The more observation you have, the better the estimation.

Anonymous · Answer

I think it should be (N+1)/N * max(X1,..XN). Is there anyone agreeing with me?

Anonymous · Answer

If we now the distribution... I'd analyse the tail of cumulative density function.

Anonymous · Answer

round(central tendency) * 2

Anonymous · Answer

Please ignore my previous two answers above. I misread the question and thought it was a regression problem, when it wasn't.

Anonymous · Answer

If we now the distribution... I'd analyse

Ludovico Grossi · Answer

what kind of distribution is it?
what do we do with the data?
what precision should we get? 
what happens if we "lose" the oversize data?

Anonymous · Answer

Should first ask whether we have some prior knowledge about the object length distribution

Google

Google interview question

Interview Answers

Followed companies

Job searches

Bowls

Want the inside scoop on your own company?