Spectrum interview question

How do you benchmark LLM performance?