DEV Community

CharmPic
CharmPic

Posted on

4 Months of Developing a Memory Allocator: Updating "Hakozuna" to v3.0 (hz3/hz4)


4 Months of Developing a Memory Allocator: Updating "Hakozuna" to v3.0 (hz3/hz4)

Introduction

I am excited to announce the release of Hakozuna, a high-performance memory allocator.

Over the past four months, I’ve been through countless cycles of implementation and benchmarking, optimizing the performance against industry standards like mimalloc and tcmalloc.

The biggest takeaway from this journey? Instead of trying to create a "one-size-fits-all" configuration to win every race, the real solution was to branch out into specialized profiles based on use cases.

  • hz3: Optimized for local-heavy / Redis-like workloads.
  • hz4: Optimized for remote-heavy / high-thread workloads.

What is Hakozuna?

Hakozuna is built on Box Theory—a design philosophy centered on aggregating boundaries to isolate responsibilities. During development, I prioritized:

  • Zero overhead in the hot path: Eliminating unnecessary operations where it matters most.
  • Reversibility: Ensuring every optimization can be toggled via compile flags.
  • Observability: Using A/B benchmarking and one-shot counters to make performance data transparent.

Benchmark Summary (Ubuntu Native, Representative Values at RUNS=10)

MT lane x remote% (Ops/s)

Configuration hz3 hz4 mimalloc tcmalloc
main_r0 375.4M 137.4M 224.2M 232.7M
main_r50 66.5M 78.1M 17.9M 84.3M
main_r90 62.6M 67.6M 13.0M 54.9M
cross128_r90 1.80M 50.65M 10.94M 7.50M

Redis-like (Median ops/s)

  1. hz3: 571,199
  2. mimalloc: 568,740
  3. tcmalloc: 568,052
  4. hz4: 560,576

Choosing the Right Version

For most scenarios, I recommend starting with hz3 as the default. Switch to hz4 only if your workload is strictly remote-heavy or involves extremely high thread counts.

# hz3 (Default)
cd hakozuna/hz3
make clean all_ldpreload_scale
LD_PRELOAD=./libhakozuna_hz3_scale.so ./your_app

# hz4 (Remote-heavy / High-thread)
cd ../hz4
make clean all
LD_PRELOAD=./libhakozuna_hz4.so ./your_app

Enter fullscreen mode Exit fullscreen mode

Key Lessons Learned

Through this development process, I gained two major insights:

  1. Identify "NO-GOs" Early Documenting and archiving optimizations that didn't work was just as important as the successes. Moving on quickly from failed paths ultimately accelerated the final progress.
  2. There is no single "winning" path Stability in performance numbers only came after separating the logic: hz3 for local-heavy and hz4 for remote-heavy/high-thread counts. Specialization is the key to outperforming general-purpose allocators in specific niches.

Links

  • GitHub: [hakorune/hakozuna]
  • Zenodo v3.0: [View Records]
  • DOI: [10.5281/zenodo.18674502]

Top comments (0)