Google interview question

You observe a sample of measurements coming from a fixed length ruler. If the object is shorter than the ruler you observe the actual measurement. Otherwise you observe the length of the ruler. What would be a good estimator of the ruler length?

Interview Answers

Anonymous

4 Dec 2016

get rid of measurements that are equal to the ruler length. then take the average of the rest of the measurements that are within the range (0, ruler_length), ruler_length is 2 times this average value

5

Anonymous

6 Mar 2017

Assuming that the measurements are on a continuous scale, you would have a lot of mass on the point exactly corresponding to the ruler's length, so you could use something akin to a mode I'd imagine.

1

Anonymous

25 Oct 2017

The mode should work, right? The length of the ruler is likely to be the only specific value that shows up more than once in the data.

Anonymous

6 Oct 2015

I came up a solution: if we know the distribution of actual measurement and the value of actual measurement, then the expected probability of getting wrong measurement should equal to the probability of actual measurement greater than length of ruler. Not sure this is correct and will interview on friday. good luck to me.

1

Anonymous

17 July 2018

L^{hat} = N/(N+1) * max(X1,X2,X3,...XN) is an unbiased estimator

Anonymous

17 July 2018

L^{hat} = 2*sum(X)/N is another unbiased estimator

Anonymous

22 July 2018

The length of the ruler would be the censored value in the data. If you draw the histogram of observed values, there should be a mass on the largest value, which is the length of the ruler. The more observation you have, the better the estimation.

Anonymous

15 Mar 2019

I think it should be (N+1)/N * max(X1,..XN). Is there anyone agreeing with me?

Anonymous

2 Nov 2015

If we now the distribution... I'd analyse the tail of cumulative density function.

Anonymous

28 Aug 2016

round(central tendency) * 2

Anonymous

28 Nov 2016

Please ignore my previous two answers above. I misread the question and thought it was a regression problem, when it wasn't.

Anonymous

2 Nov 2015

If we now the distribution... I'd analyse

Anonymous

15 June 2015

My initial answer was to use the MAX of the sample. That however is a biased estimator. How can you account for the bias and come up with an unbiased estimator? I think this is where you need to start making assumptions on the distribution. A uniform distribution would allow to estimate the bias.

2

Anonymous

21 June 2015

what kind of distribution is it? what do we do with the data? what precision should we get? what happens if we "lose" the oversize data?

Anonymous

16 Dec 2017

Should first ask whether we have some prior knowledge about the object length distribution