iterflow 1.0: What changed after someone pointed out JS already has lazy iterators

#javascript #iterators #typescript #webdev

iterflow 1.0 shipped last week. It's a different library than what I wrote about in January.

Back then, at RC2, someone rightly pointed out that JavaScript already has native Iterator Helpers - map, filter, take, drop, flatMap. Most of what iterflow offered was redundant with the language. Fair. The only things it had beyond native iterators were window(), chunk(), and a few terminal statistics with naive implementations.

Between RC2 and 1.0, the library found a different reason to exist.

Welford's algorithm changed how I thought about the library

RC2 swapped variance from two-pass sum-of-squares to Welford's online algorithm. Single pass, three scalars of state (count, running mean, running sum of squared deviations), no array materialization. You feed it values one at a time and it maintains the correct variance at every step.

I noticed this is the same contract as a JavaScript generator - values arrive one by one, state is local, output goes downstream. So I tried something: instead of variance consuming the whole iterator and returning one number, what if it yielded a running result at each step?

import { iter } from '@mathscapes/iterflow';

iter([2, 4, 4, 4, 5, 5, 7, 9]).streamingVariance().toArray();
// [0, 1, 0.889, 0.75, 0.96, 1, 1.959, 4]
// ^ population variance after each observation

That's streamingVariance() - a generator wrapping Welford's recurrence. It chains with filter, take, window like any other transform. You could write this generator yourself (Iterator Helpers are extensible), but you'd be implementing Welford's recurrence from scratch, and again for mean, and again for covariance, and again for correlation.

Once I had this working, every release after was adding another online algorithm as a pipeline stage.

What 1.0 looks like

By 1.0, iterflow has: streaming mean, streaming variance, EWMA, streaming covariance, streaming Pearson correlation (Chan et al.'s bivariate Welford extension, six scalars of state), z-score anomaly detection, and monotonic deque windowed min/max (Lemire's algorithm). All composable, all lazy.

Here's a composed pipeline vs. writing it by hand. Running Pearson correlation between price and volume, 10 observations:

import { iter } from '@mathscapes/iterflow';

const prices  = [100, 102, 104, 103, 107, 110, 108, 112, 115, 113];
const volumes = [500, 480, 520, 510, 550, 600, 570, 620, 650, 610];

iter(prices).zip(volumes).streamingCorrelation().toArray();
// [NaN, -1, 0.5, 0.543, 0.851, 0.941, 0.950, 0.968, 0.980, 0.978]

Without iterflow:

const iterP = prices[Symbol.iterator]();
const iterV = volumes[Symbol.iterator]();
let n = 0, mx = 0, my = 0, m2x = 0, m2y = 0, cxy = 0;
const results = [];
for (;;) {
  const p = iterP.next(), v = iterV.next();
  if (p.done || v.done) break;
  n++;
  const dx = p.value - mx; mx += dx / n;
  const dx2 = p.value - mx; m2x += dx * dx2;
  const dy = v.value - my; my += dy / n;
  const dy2 = v.value - my; m2y += dy * dy2;
  cxy += dx * dy2;
  const d = m2x * m2y;
  results.push(d > 0 ? cxy / Math.sqrt(d) : NaN);
}

Same output. But now try adding a filter upstream, or a threshold downstream, or swapping correlation for z-score. Each combination means rewriting the loop.

With iterflow, those combinations are just different chains. Here's anomaly detection on sensor data:

import { iter } from '@mathscapes/iterflow';

const sensorReadings = [
  { valid: true,  value: 10 },
  { valid: false, value: 999 },
  { valid: true,  value: 12 },
  { valid: true,  value: 11 },
  { valid: true,  value: 50 },  // anomaly
  { valid: true,  value: 13 },
  { valid: true,  value: 12 },
];

iter(sensorReadings)
  .filter(r => r.valid)
  .map(r => r.value)
  .streamingZScore()
  .filter(z => Math.abs(z) > 2)
  .toArray();
// [47.77]

The z-score of 47.77 looks extreme, but it's correct - the prior valid readings are [10, 12, 11] with mean 11 and stddev 0.82, so a jump to 50 is a ~48-sigma event. The z-score stage uses Welford state from prior observations only. The current value doesn't influence its own score. First two elements yield NaN (not enough data for standard deviation) and get filtered out by the threshold.

Why a standalone library and not a contribution to @stdlib?

@stdlib/stats/incr has solid streaming accumulators. But they're imperative objects - you call .update(x) in a loop and read .value. iterflow's contribution isn't the algorithms (Welford is Welford) but the composition model: online algorithms as generator stages inside JavaScript's iterator protocol. "filter, window, variance, threshold, take 100" as a single chained expression, not a loop with manual accumulator wiring.

If you only need one streaming statistic with no pipeline composition, @stdlib will be faster. Generators have per-element overhead from yield/next() dispatch. The paper benchmarks this - iterflow wins on multi-stage composition with early termination, loses on standalone single-statistic computation. I tried to be honest about both sides.

The paper

I wrote a paper formalizing the design: "iterflow: Composable Streaming Statistics for JavaScript".

It covers the algorithms (Welford, Chan et al., Lemire, Hoare's Quickselect), benchmarks against native array methods, @stdlib/stats-incr-mvariance, and hand-written loops, and documents the limitations I know about: Array.shift() on the monotonic deque is O(k) per front removal, not O(1) - fine for small windows, degrades for large k. All benchmarks on a single platform (V8/Node.js, ARM64 Linux).

Implementation choices: population variance (n denominator, not n-1) since the typical use case is complete streams. Pre-observation z-score convention. Circular buffer for window() instead of shift-based arrays.

Things I'm not planning: async iterables, streaming quantile estimation, operator fusion.