Imagine our data platform's query performance is degrading because of too many small, partitioned files. We have a limited daily budget for 'compaction'—merging small files into larger ones.
How would you design an algorithm to decide which partitions to compact each day to get the biggest performance improvement within that budget?