Oracle interview question

Describe 3 different optimisations applied to LLM inference.

Interview Answer

Anonymous

7 July 2025

KV caching, speculative decoding, operator fusion