4 Months of Developing a Memory Allocator: Updating "Hakozuna" to v3.0 (hz3/hz4)
Introduction
I am excited to announce the release of Hakozuna, a high-performance memory allocator.
- Repository: [https://github.com/hakorune/hakozuna]
- Paper & Artifacts (Zenodo v3.0): [https://zenodo.org/records/18674502]
Over the past four months, I’ve been through countless cycles of implementation and benchmarking, optimizing the performance against industry standards like mimalloc and tcmalloc.
The biggest takeaway from this journey? Instead of trying to create a "one-size-fits-all" configuration to win every race, the real solution was to branch out into specialized profiles based on use cases.
-
hz3: Optimized for local-heavy / Redis-like workloads. -
hz4: Optimized for remote-heavy / high-thread workloads.
What is Hakozuna?
Hakozuna is built on Box Theory—a design philosophy centered on aggregating boundaries to isolate responsibilities. During development, I prioritized:
- Zero overhead in the hot path: Eliminating unnecessary operations where it matters most.
- Reversibility: Ensuring every optimization can be toggled via compile flags.
- Observability: Using A/B benchmarking and one-shot counters to make performance data transparent.
Benchmark Summary (Ubuntu Native, Representative Values at RUNS=10)
MT lane x remote% (Ops/s)
| Configuration | hz3 |
hz4 |
mimalloc |
tcmalloc |
|---|---|---|---|---|
| main_r0 | 375.4M | 137.4M | 224.2M | 232.7M |
| main_r50 | 66.5M | 78.1M | 17.9M | 84.3M |
| main_r90 | 62.6M | 67.6M | 13.0M | 54.9M |
| cross128_r90 | 1.80M | 50.65M | 10.94M | 7.50M |
Redis-like (Median ops/s)
-
hz3: 571,199 -
mimalloc: 568,740 -
tcmalloc: 568,052 -
hz4: 560,576
Choosing the Right Version
For most scenarios, I recommend starting with hz3 as the default. Switch to hz4 only if your workload is strictly remote-heavy or involves extremely high thread counts.
# hz3 (Default)
cd hakozuna/hz3
make clean all_ldpreload_scale
LD_PRELOAD=./libhakozuna_hz3_scale.so ./your_app
# hz4 (Remote-heavy / High-thread)
cd ../hz4
make clean all
LD_PRELOAD=./libhakozuna_hz4.so ./your_app
Key Lessons Learned
Through this development process, I gained two major insights:
- Identify "NO-GOs" Early Documenting and archiving optimizations that didn't work was just as important as the successes. Moving on quickly from failed paths ultimately accelerated the final progress.
-
There is no single "winning" path
Stability in performance numbers only came after separating the logic:
hz3for local-heavy andhz4for remote-heavy/high-thread counts. Specialization is the key to outperforming general-purpose allocators in specific niches.
Links
- GitHub: [hakorune/hakozuna]
- Zenodo v3.0: [View Records]
- DOI: [10.5281/zenodo.18674502]
Top comments (0)