Alex Rosito

Posted on May 28 • Edited on Jun 12

The 3.8 kB Alarm: A Zero-Bloat Edge AI Smoke Detector in Pure C

#cpp #c #embeddedsystems #iot

The 3.8 kB Alarm: A Zero-Bloat Edge AI Smoke Detector in Pure C

You are building a critical IoT safety device — a smart smoke and fire detector.
It monitors 12 environmental variables in parallel: temperature, humidity, TVOC, eCO2, raw hydrogen, ethanol, barometric pressure, and five particulate matter metrics.

You want a neural network to catch the non-linear chemical signatures of an imminent fire before the flames start.

Then you look at the industry standard for Edge AI.

TensorFlow Lite for Microcontrollers demands megabytes of Flash, custom memory allocators, a dynamic runtime, and a dependency chain long enough to make you reconsider the whole project. On a cheap, ultra-low-power microcontroller — an ESP32, an ATtiny — those frameworks eat the silicon and leave nothing for the WiFi stack or peripheral control.

The hardware isn't the problem.

1. The Data

No synthetic data here. We used the Smoke Detection Dataset — originally collected in the field at 1 Hz by German researcher Stefan Blattmann for his project Real-time Smoke Detection with AI-based Sensor Fusion, and later curated and published on Kaggle by Dataset Grandmaster Deep Contractor.

Blattmann's setup captured real environmental readings under real conditions: normal rooms, controlled wood-burning tests, outdoor air, active indoor smoke formation. His goal was the same as ours: prove that sensor fusion is more reliable for saving lives than a single optical or ionization detector that trips on kitchen smoke.

The dataset provides 12 environmental features:

Air: Temperature [°C], Humidity [%], Barometric Pressure [hPa]
Gas: Raw H₂, Raw Ethanol, eCO₂ [ppm], TVOC [ppb]
Particulates: PM1.0, PM2.5, NC0.5, NC1.0, NC2.5

A note on hardware: 12 features does not mean 12 pins. In a real deployment, Temperature + Humidity come from a single DHT22 or SHT31 (1 pin). PM1.0 through NC2.5 come from one particle sensor over UART. eCO₂ + TVOC from a single SGP30 over I2C. The full sensor stack runs on 4–6 physical pins on an ESP32, sharing I2C and UART buses.

Raw data is a problem for a micro-network. The original CSV also contains a UTC timestamp, a row index, and a CNT column — a sequential counter. We dropped all three. Timestamps and sequence counters have zero correlation with real chemistry; they cause networks to memorize ordering rather than learn physics.

2. Preprocessing: A Custom C++ Parser

No Jupyter notebooks. No Pandas. A fast, standalone parser in pure C++.

Three things it does:

The Cleanup. UTC, row index, and CNT are dropped. They are position artifacts that cause instant overfitting.

Min-Max Scaling. All 12 sensor readings mapped to a strict [0.0, 1.0] boundary. This prevents high-magnitude features like eCO₂ from drowning out small signals like temperature fluctuations.

Target Lock. A critical bug surfaced during early testing: the binary classification column (0 = clean air, 1 = alarm) was accidentally swept into the normalization pool and corrupted into floating-point fractions. We locked the label column to ensure it stayed as hard integers.

Class Balance. The dataset is skewed — 44,757 fire events vs. 17,873 clean air samples. Without correction, the network takes the easy path: predict "Alarm" constantly and collect 71% accuracy for free. We undersampled the majority class to 17,873, producing a balanced 35,746-row dataset.

The output: a clean, normalized CSV with exactly 12 inputs and 1 binary label.

3. Architecture and the Training Trap

We fed 12 features into a funnel architecture: 12 → 8 → 4 → 1

[Input Layer]     12 Neurons (Normalized Sensor Features)
                        |
[Hidden Layer 1]   8 Neurons (ReLU)
                        |
[Hidden Layer 2]   4 Neurons (ReLU)
                        |
[Output Layer]     1 Neuron  (Sigmoid → Probability 0.0 to 1.0)

The network has 96 weights across all layers.

The Lazy Network

First training run looked great on paper. Wrong. During validation, the model output the exact same value for every input. Because the raw dataset was skewed 3:1 toward fire events, the network found the easy path: freeze the biases, say "Alarm" constantly, collect the accuracy applause from the loss function.

The fix: aggressive data shuffle, class balancing in the preprocessor, and a lower learning rate to keep weights out of the dead zones of the Sigmoid activation.

The CNT Leakage Bug

A subtler problem took longer to catch. The CNT column — a sequential row counter — was included as a feature in early runs. The network learned to use position in the dataset as a proxy for class membership. The results looked excellent. They were meaningless. CNT was dropped and the entire preprocessing pipeline was rebuilt from scratch.

4. Training

Training was done with Hasaki 刃先 v3.2.0 — a CLI C++ tool for desktop neural network training that exports standalone C headers for firmware deployment. No Python, no runtime, no dependencies.

Configuration:

Optimizer: Adam
Learning rate: 0.001
Max epochs: 500,000
Early stopping: 1,000 epochs patience, min_delta 0.0001
Split: 80% train / 20% internal validation

The model converged at epoch 2,013. Final validation loss: 0.000008.

5. Validation Against Unseen Data

After training, we evaluated the model against a held-out test set of 7,150 samples — data the model never touched during training or early stopping.

Confusion Matrix:
[3547    4]
[   1 3598]

Accuracy: 99.9301%

Breaking this down:

True Negatives: 3,547 — clean air correctly identified
True Positives: 3,598 — fire events correctly flagged
False Positives: 4 — alarm triggers without fire. 0.11% of clean-air cases.
False Negatives: 1

That last number. One missed fire event out of 3,599. 99.97% sensitivity.

In a smoke detector, a false positive wakes someone up at 2 AM. A false negative lets a fire go undetected. This network achieved 99.97% sensitivity on data it had never seen, running in 3.8 kB of Flash.

6. The 3.8 kB Header File

The end product is a single, self-contained C header: smoke-detector-model-float.h. The entire file — weights, biases, activation functions, and inference code — is 3.8 kB on disk.

Generated by Hasaki v3.2.0:

#ifndef SMOKE_DETECTOR_MODEL_H
#define SMOKE_DETECTOR_MODEL_H

#include <math.h>

// Generated by hasaki v3.2.0
// Architecture: 12-8-4-1 | float32

static inline float relu(float x)    { return x > 0.0f ? x : 0.0f; }
static inline float sigmoid(float x) { return 1.0f / (1.0f + expf(-x)); }

// Layer weights — float32
const float w1[8][12] = { ... };
const float b1[8]     = { ... };

const float w2[4][8]  = { ... };
const float b2[4]     = { ... };

const float w3[1][4]  = { ... };
const float b3[1]     = { ... };

static inline void predict(const float* input, float* output) {
    float a[8], b[4];

    // Layer 1: ReLU
    for (int i = 0; i < 8; i++) {
        float dot = 0.0f;
        for (int j = 0; j < 12; j++) dot += w1[i][j] * input[j];
        a[i] = relu(b1[i] + dot);
    }

    // Layer 2: ReLU
    for (int i = 0; i < 4; i++) {
        float dot = 0.0f;
        for (int j = 0; j < 8; j++) dot += w2[i][j] * a[j];
        b[i] = relu(b2[i] + dot);
    }

    // Layer 3: Sigmoid
    float dot = 0.0f;
    for (int j = 0; j < 4; j++) dot += w3[0][j] * b[j];
    output[0] = sigmoid(b3[0] + dot);
}

#endif

No heap allocations. No malloc. No dynamic object trees. Flat array indexing that runs in microseconds.

7. Firmware Integration

The header drops directly into any C/C++ firmware. The predict() function writes its result into an output array — one float per output neuron.

#include "smoke-detector-model-float.h"

#define ALARM_PIN         3
#define TRIGGER_THRESHOLD 3

// Normalization ranges from training data
const float INPUT_MIN[12] = {-22.01f, 10.74f,    0.0f,    400.0f,
                              10668.0f, 15317.0f, 930.852f, 0.0f,
                              0.0f,     0.0f,     0.0f,     0.0f};
const float INPUT_MAX[12] = { 59.93f,  75.2f,  60000.0f, 60000.0f,
                              13803.0f, 21410.0f, 939.861f, 14333.69f,
                              45432.26f, 61482.03f, 51914.68f, 30026.438f};

void loop() {
    // 1. Read sensors
    float input[12] = {
        read_temp(), read_hum(),      read_tvoc(),     read_eco2(),
        read_rawh2(), read_ethanol(), read_pressure(),
        read_pm1(),  read_pm25(),    read_nc05(), read_nc1(), read_nc25()
    };

    // 2. Normalize to [0.0, 1.0]
    for (int i = 0; i < 12; i++)
        input[i] = (input[i] - INPUT_MIN[i]) / (INPUT_MAX[i] - INPUT_MIN[i]);

    // 3. Inference
    float output[1];
    predict(input, output);

    // 4. State accumulator — 3 consecutive positives required to trigger
    static int accumulator = 0;
    if (output[0] >= 0.5f) {
        if (accumulator < TRIGGER_THRESHOLD) accumulator++;
    } else {
        if (accumulator > 0) accumulator--;
    }

    // 5. Drive alarm pin
    digitalWrite(ALARM_PIN, accumulator >= TRIGGER_THRESHOLD ? HIGH : LOW);

    delay(1000); // 1 Hz — matches dataset sample rate
}

The state accumulator requires 3 consecutive predictions above 0.5 before triggering the alarm, and decrements on clean readings. A real fire doesn't disappear in 3 seconds. An isolated sensor spike does. With 99.97% sensitivity, 3 consecutive false negatives is essentially impossible.

Conclusion: Respect the Silicon

When something runs slow or takes too much space, the reflex is to upgrade the chip, buy more cloud credits, or add RAM.

We wrote our own parser, caught two data bugs before they made it into print, and ended up with a 3.8 kB inference engine running on hardware that costs less than a cup of coffee. The sensor fusion approach Stefan Blattmann set out to validate — that multi-variable chemical signatures outperform single-sensor smoke detectors — holds up under scrutiny.

The silicon was never the constraint.

Errata

Update (2026-05-29): After publishing, I ran Hasaki's quantize_test command against the exported INT8 header. It failed — mean error 0.140836 against a threshold of 0.000518. The INT8 quantization was introducing unacceptable precision loss for this specific model.

The header in the repo has been replaced with the float32 export. The accuracy numbers are unchanged — 99.9301%, 1 false negative on 7,150 unseen samples. The file size is practically identical: 3.8 kB vs 3.7 kB. For a network with 96 weights, the difference between INT8 and float32 on disk is noise — the fixed overhead of comments and inference code dominates.

The lesson: always run quantize_test before publishing a header. I didn't. Now I do.

Hasaki 刃先 companion repository: github.com/AlexRosito67/hasaki-smoke-detector
Hasaki 刃先: github.com/AlexRosito67/hasaki
Dataset: Smoke Detection Dataset by Deep Contractor, original data by Stefan Blattmann

Top comments (3)

Gilder Miller • May 29

Cool idea, love the tight 3.7KB constraint, that’s real embedded discipline.
But I’d be careful with those near-perfect metrics, real sensor drift usually breaks that fast.
CNT leakage is a classic, I’ve hit the same issue with time-based features before.
Have you tried it on long-term noisy data or just the test split?

Alex Rosito • May 29 • Edited

Good point on sensor drift — this was validated on the Kaggle test split,
not long-term field deployment. Real-world drift from aging sensors,
humidity changes, and calibration shift would almost certainly degrade
those metrics over time.

The CNT leakage was exactly that kind of subtle bug — looked great until
it didn't. Caught it before it made it to print, but it's a good reminder
that position artifacts are everywhere.

For production deployment you'd want periodic recalibration or retraining
on field data. Hasaki makes retraining straightforward — same pipeline,
new CSV. But the drift problem is a hardware + data problem first.

Gilder Miller • May 29

Yeah that makes sense. Kaggle numbers always look cleaner than real deployment, drift hits way harder in practice. CNT leakage is one of those silent bugs that looks great until it quietly breaks everything. Periodic recalibration feels like the only realistic path once you leave controlled data.