Performance work has a glamour problem. The fun part is the trick: the allocation removed, the cache miss avoided, the branch made predictable, the bit-level thing everyone remembers from Quake.
The useful part is usually less photogenic: finding out what is actually slow.
A microbenchmark can lie in very respectable ways. The compiler removes the work. The CPU predicts the neat branch. The inputs are too friendly. The function is fast in isolation and irrelevant once it is back inside the real program.
Services lie differently. The code path is not hot. The network dominates. The database is the problem. The average improved and p99 got worse, which is the kind of victory that ruins a morning.
The optimizations I trust have receipts:
- What workload was measured?
- What changed?
- What got better?
- What got worse?
- Is this still true next week?
The best answers are often boring. Stop parsing the same thing twice. Batch writes. Move work out of the request path. Avoid dragging a large object through code that needs three fields. Boring is fine. Boring is often the clue that the system got simpler, not just faster.
Back to writing