2025-05-27

profiling c++ code

Performance optimization starts with measurement. Working on sfsim taught me that assumptions about performance bottlenecks are often wrong. Profiling reveals where time actually goes.

Sometimes you just need to know how long something takes. A basic timer handles 90% of profiling needs

For deeper analysis without modifying code, perf sampling profiler shows exactly where CPU time goes:

perf record -g ./sfsim

perf report

The report shows percentage of time in each function. Functions at the top are your bottlenecks. The -g flag includes call stacks, showing not just what's slow but what calls the slow code.

Flame graphs visualize this data beautifully:

perf record -g ./sfsim
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > profile.svg

Wide bars are expensive functions. The hierarchical view shows the complete call chain. One glance reveals optimization opportunities.

For real-time applications, tracking frame times reveals performance patterns:

class FrameTimer {
    std::chrono::high_resolution_clock::time_point last_frame;
    
public:
    void tick() {
        auto now = std::chrono::high_resolution_clock::now();
        auto frame_time = std::chrono::duration<double, std::milli>(now - last_frame).count();
        
        if (frame_time > 16.67) {  // Slower than 60 FPS
            std::cout << "Slow frame: " << frame_time << "ms\n";
        }
        
        last_frame = now;
    }
};

Logging slow frames helps correlate performance drops with game events. Maybe particle explosions cause spikes. Maybe certain camera angles stress the renderer.

Hot functions dominate the profile. In sfsim, collision detection often takes 30-40% of frame time. That's where optimization efforts pay off. Unexpected costs hide everywhere - string formatting in logging statements, vector resizing in loops, temporary object creation. Profiling exposes these hidden expenses.

Cache misses matter more than algorithm complexity for modern CPUs. Accessing memory randomly is 100x slower than sequential access. Entity-Component-System architectures help by processing components in cache-friendly patterns.

Build with optimization flags but keep debug symbols:

g++ -O2 -g main.cpp -o sfsim

Profile realistic scenarios. A thousand entities behave differently than ten. Load typical game states and measure steady performance, not startup. Focus on the biggest bottlenecks. Optimizing a function taking 1% of runtime improves performance by at most 1%. Find functions taking 20-30% and fix those first.

Simple profiling tools revealed massive optimization opportunities. No complex analysis needed - just measurement and targeted fixes where data showed real bottlenecks.

0.0 ms