2025-05-25

curiously recurring template pattern (c++ idiom)

CRTP (Curiously Recurring Template Pattern): Compile-Time Polymorphism in C++

The Curiously Recurring Template Pattern (CRTP) enables compile-time polymorphism through template metaprogramming. I discovered this pattern while working on sfsim and realized it could eliminate virtual function overhead in performance-critical simulation code. The pattern's defining characteristic is a base class template that accepts its derived class as a parameter:

template <typename Derived>
class Base {
    // Base uses Derived as a compile-time type
};

class Concrete : public Base<Concrete> {
    // Concrete passes itself to Base
};

This self-referential structure allows the base class to call derived class methods without virtual dispatch (the runtime process of looking up which function to call). The compiler knows the exact type at compile time and can inline functions (copying function code directly into the calling location rather than jumping to it).

In traditional object-oriented programming, virtual functions enable polymorphism - one interface with multiple implementations. When you call a virtual function, the program must look up the correct implementation at runtime. This happens through a virtual function table (vtable), essentially a list of function pointers that each object maintains.

Consider calling shape->draw() where shape could be a Circle or Square. With virtual functions, the program must look up the object's vtable pointer (8 bytes of memory per object), find the correct draw() function in the table, then jump to that function's location in memory. This indirect function call prevents the compiler from optimizing across function boundaries. In a simulation like sfsim where thousands of objects update 60 times per second, these microseconds of overhead accumulate into noticeable performance degradation.

CRTP resolves all function calls at compile time. Instead of runtime lookups, the compiler generates direct function calls:

template <typename Derived>
class Shape {
public:
    void draw() {
        static_cast<Derived*>(this)->drawImpl();
    }
    
    double area() {
        return static_cast<Derived*>(this)->areaImpl();
    }
};

class Circle : public Shape<Circle> {
public:
    void drawImpl() { /* circle drawing logic */ }
    double areaImpl() { return 3.14 * radius * radius; }
private:
    double radius;
};

class Square : public Shape<Square> {
public:
    void drawImpl() { /* square drawing logic */ }
    double areaImpl() { return side * side; }
private:
    double side;
};

The static_cast<Derived*>(this) tells the compiler exactly which type we're working with. No runtime decisions needed. The compiler can even inline these functions, replacing the function call with the actual code, eliminating call overhead entirely.

CRTP also enables fluent interfaces where methods can be chained together while preserving type information:

template <typename Derived>
class Builder {
protected:
    Derived& self() { return static_cast<Derived&>(*this); }
    
public:
    Derived& setName(const std::string& name) {
        // name setting logic
        return self();
    }
    
    Derived& setSize(int size) {
        // size setting logic
        return self();
    }
};

class WidgetBuilder : public Builder<WidgetBuilder> {
public:
    WidgetBuilder& setColor(const std::string& color) {
        // color setting logic
        return *this;
    }
};

// Usage: builder.setName("widget").setSize(10).setColor("red");

Without CRTP, calling a base class method would return a base class reference, losing access to derived class methods; CRTP maintains the concrete type throughout the chain.

Virtual functions provide flexibility - you can store different object types in the same container and decide behavior at runtime. CRTP trades this flexibility for speed. All types must be known when the code compiles. You cannot store different CRTP-derived types in a single container. The performance difference becomes significant in tight loops. Where a virtual function call might take 10-20 nanoseconds (including the vtable lookup and indirect jump), a CRTP call can be completely inlined, taking effectively zero time.

Template code in C++ gets instantiated for each type used. A Shape<Circle> and Shape<Square> create completely separate code in the final executable. This means longer compile times as the compiler generates all variations, larger executable files due to code duplication, and more complex error messages that reference template parameters.

Debugging becomes trickier. Instead of seeing "Shape::draw()" in stack traces, you see "Shape<Circle>::draw()" with full template parameters. Error messages can span hundreds of lines for simple mistakes. Template code typically lives in header files (.h files) rather than separate implementation files (.cpp), exposing all implementation details. This means changing the base template requires recompiling every file that uses it.

sfsim uses an Entity-Component-System (ECS) architecture where game objects are composed of data components processed by systems. Traditional ECS implementations use virtual functions for component interfaces, creating overhead when iterating through thousands of entities. CRTP could optimize these hot paths. Instead of virtual calls for each component update, the compiler would generate direct calls or inline the code entirely. Since sfsim knows all component types at compile time, we don't need runtime polymorphism. The pattern aligns with modern CPU architecture. Processors predict and prefetch instructions. Virtual function calls break this prediction, causing pipeline stalls. Direct calls from CRTP allow the CPU to maintain full speed.

The self-referential nature of CRTP seems bizarre initially but there a major performance payoffs when we avoid the use of virtual functions and allow the compiler to apply optimizations.

0.0 ms