Alex Rosito

Posted on Jun 12

Because in a Life-Threatening Situation, Every Millisecond Counts

#cpp #c #embeddedsystems #machinelearning

Removing expf() from a fire detector: one header, 1.95x faster, zero accuracy loss

A smoke detector is not a demo project.

When it fires, someone either evacuates in time or doesn't. The firmware running on that microcontroller has one job, and it needs to do it without hesitation, without bloat, and without dependencies that can fail in unexpected ways.

Last May 28th I published a bare-metal fire detection system built with Hasaki 刃先 — a neural network trainer that exports standalone C headers with no runtime, no Python, no TensorFlow. The model is a 12-8-4-1 MLP trained on 28,596 sensor readings. It fits in 3.8 kB of Flash and achieves 99.93% accuracy on held-out data, with a single missed fire event out of 3,599.

But there was something in that header that bothered me.

static inline float sigmoid(float x) {
    return 1.0f / (1.0f + expf(-x));
}

expf(). Right there in a life-safety application. On a microcontroller that may not have a hardware FPU.

The problem with expf() on bare metal

On processors with a hardware FPU — like the ESP32-C3 — expf() is fast. But the moment you deploy to an ATmega328P, an ATtiny85, or any Cortex-M0 target, that call becomes software floating-point. The CPU has to simulate the operation in firmware, cycle by cycle.

It works. But it carries hidden cost: unpredictable latency, dependency on math.h, and a transcendental function sitting in the critical path of every single inference.

For a smoke detector running at 1 Hz this might seem irrelevant. But inference latency compounds with sensor reads, normalization, and communication overhead. And more importantly — if you're deploying to a truly constrained target, expf() might be the difference between fitting in Flash or not.

The fix: one header from kigu-quant

kigu-quant(comming soon) is a new tool in the Rosito Bench ecosystem. It generates ready-to-include C headers for evaluating mathematical functions on microcontrollers — no FPU, no libm, no dependencies.

One command:

kigu-quant --method lut --func sigmoid --size 256 --fmt q15 -o lut_sigmoid.h

One change in the model header:

// Before
#include <math.h>
static inline float sigmoid(float x) {
    return 1.0f / (1.0f + expf(-x));
}

// After
#include "lut_sigmoid.h"
// sigmoid is now lut_sigmoid_lookup() — called directly in predict()

The generated header contains a 256-entry Q1.15 lookup table covering the range [-6, 6], an inline interpolated lookup function, and nothing else. No math.h. No expf(). No heap allocation. 512 bytes of Flash for the table.

Benchmark — ATmega328P, 16MHz, no FPU

Measured with micros() on an Arduino Nano, 1000 evaluations, anti-optimization accumulator:

Method	1000 evaluations	Speedup
`expf()`	227,292 µs	baseline
`lut_sigmoid_lookup()`	116,512 µs	1.95x faster

Max error vs expf(): 0.000021

These are honest numbers from real hardware, not simulations.

Model accuracy: unchanged

The sigmoid sits at the output layer — one evaluation per inference, converting the final logit to a probability. The LUT covers [-6, 6] with 256 points and linear interpolation. For this model, the pre-activation values at the output layer fall well within that range during normal operation.

Validation on 7,150 held-out samples, never seen during training:

[3547    4]   ← TN  FP
[   1 3598]   ← FN  TP

Accuracy:    99.93%
FN:          1 / 3,599 fire events

Identical to the float32 baseline. The LUT approximation introduces no measurable degradation at the model level.

The two-tool pipeline

This project is the first demonstration of Hasaki and kigu-quant working together:

Hasaki 刃先 trains the model and exports a standalone C header — weights, biases, and activation functions in pure C++
kigu-quant generates the fixed-point math headers that replace the expensive activations

The integration is intentionally minimal. kigu-quant doesn't touch the model. It doesn't rewrite the header. You drop in one #include and replace one function call. Everything else stays the same.

Train → hasaki → smoke-detector-model-float.h
                        ↓
Generate → kigu-quant → lut_sigmoid.h
                        ↓
              #include both → flash → done

The repository

The full project — modified model header, generated LUT, and Arduino sketch — is available here:

hasaki-smoke-detector-v2

├── smoke-detector-model-float.h   Hasaki model — sigmoid replaced
├── lut_sigmoid.h                  kigu-quant Q1.15 LUT
└── hasaki_kigu_smoke_detector.ino Arduino sketch

One last thing

The 1.95x speedup on ATmega328P is real and measured. On targets where this matters even more — AVR running at 8MHz, Cortex-M0 with no FPU, low-power MCUs in battery-operated systems — the gap widens further.

A fire detector doesn't need to be fast to be useful. But it should never be slower than it has to be.

Every millisecond you give back to the scheduler is a millisecond available for sensor reads, communication, or simply a faster response to the next sample.

Every millisecond matters in a life threatening scenario.

expf() was a dependency this model never needed.

Built with Hasaki 刃先 and *kigu-quant (a member of the Kigu 器具 family, comming soon) — Rosito Bench*

DEV Community