AI × Quant Trader Series — Day 22¶
Memory Layout in C++¶
Reading time: ~20 minutes
Prerequisites: Basic C++, CPU Cache Optimization, Shared Memory IPC
Focus: understanding how memory layout affects performance in modern trading systems
Part 1: Introduction¶
Two programs may execute exactly the same algorithm.
Both may have the same computational complexity.
Yet one consistently runs twice as fast.
Why?
In many cases, the answer has nothing to do with the algorithm itself.
It has everything to do with how data is arranged in memory.
For modern processors, memory layout is often one of the largest determinants of performance.
This is especially true in High Frequency Trading, where millions of market events are processed every second.
Good software is not only about writing efficient code.
It is also about organizing data efficiently.
Part 2: What Is Memory Layout?¶
Memory layout describes how objects are organized in memory.
For example, consider this simple structure:
Although it appears straightforward, the compiler decides:
- Where each field is stored
- How much padding is inserted
- How the object is aligned in memory
These decisions directly affect cache efficiency and memory bandwidth.
Part 3: Object Alignment¶
Modern CPUs access aligned memory more efficiently than unaligned memory.
Suppose a processor expects an 8-byte value to begin at an 8-byte boundary.
Accessing aligned data typically requires fewer CPU operations.
To guarantee alignment, C++ provides:
Alignment is particularly important for cache-sensitive applications.
Part 4: Padding¶
Compilers often insert unused bytes between fields.
Example:
Memory may actually look like:
The extra bytes improve alignment but increase memory usage.
Understanding padding helps reduce unnecessary memory traffic.
Part 5: Field Ordering¶
The order of fields matters.
Example A:
Example B:
Both structures represent the same information.
However,
the second layout often contains less padding.
Better field ordering improves cache utilization.
Part 6: Array of Structures (AoS)¶
Many beginners naturally organize data as:
Memory looks like:
This approach is intuitive.
However,
algorithms that only need prices still load quantities into cache.
Unnecessary data consumes memory bandwidth.
Part 7: Structure of Arrays (SoA)¶
An alternative organization is:
Memory becomes:
Now a pricing algorithm loads only prices.
Cache efficiency improves significantly.
Many numerical libraries and HFT systems prefer this layout for data-intensive workloads.
Part 8: Contiguous Memory¶
Processors perform best when data is stored continuously.
Example:
Sequential access allows hardware prefetchers to load future cache lines automatically.
In contrast,
pointer-based structures such as linked lists require frequent memory jumps, increasing cache misses.
Whenever possible, contiguous memory should be preferred.
Part 9: Hot Data and Cold Data¶
Not every field is accessed equally often.
Example:
A trading strategy may only use:
- Price
- Quantity
The comment field is rarely accessed.
Professional systems often separate:
Hot Data
Frequently accessed.
Must remain cache-friendly.
Cold Data
Rarely accessed.
Can be stored elsewhere.
Separating hot and cold data reduces cache pollution.
Part 10: False Sharing¶
Memory layout also affects multi-threaded performance.
Suppose two threads modify different variables stored inside the same cache line.
Although the variables are unrelated,
the CPU repeatedly synchronizes the entire cache line.
Performance degrades dramatically.
This phenomenon is known as False Sharing.
Proper alignment and padding can eliminate this issue.
Part 11: Memory Layout in High Frequency Trading¶
Trading systems continuously process:
- Market data
- Order books
- Positions
- Risk limits
- Execution reports
These structures are accessed millions of times every second.
Poor memory layout increases:
- Cache misses
- Memory bandwidth
- CPU stalls
Professional trading platforms therefore invest significant effort in organizing data efficiently before optimizing algorithms.
Part 12: Where godzilla.dev Fits¶
The design of godzilla.dev places strong emphasis on data-oriented programming and cache-aware memory layouts.
Core trading structures are organized to reduce unnecessary memory accesses while supporting high-throughput event processing.
Rather than treating memory organization as an implementation detail, the framework considers it a fundamental part of system architecture.
This philosophy complements other low-latency techniques such as shared memory communication, lock-free programming, and event-driven processing.
Part 13: Key Takeaways¶
Memory layout determines how data is organized inside memory.
Good layouts improve:
- Cache locality
- Memory bandwidth
- CPU utilization
- Overall throughput
Key optimization techniques include:
- Proper alignment
- Reducing padding
- Field reordering
- Contiguous storage
- Structure of Arrays (SoA)
- Separating hot and cold data
For High Frequency Trading systems, memory layout is often as important as algorithm design itself.
Performance Engineering Notes¶
When optimizing low-latency software, engineers often focus on reducing computation.
In practice, memory movement frequently dominates execution time.
Organizing data to match modern CPU architectures can produce larger performance improvements than replacing one algorithm with another.
Design your data first.
Then optimize your code.
What's Next?¶
The next article explores one of the most common memory-related performance issues in multi-threaded software:
- What is False Sharing?