Java Meets Whisper: Speech-to-Text Without Python (and Without JNI Pain)

#java #quarkus #ai #speech

I’ve done the “Java talks to native code” dance many times.

You know the steps:

A bit of JNI.
A wrapper generator that breaks on Tuesdays.
A Python sidecar process “for convenience.”
A quiet moment where you ask yourself: Is this really my life now?

Good news: Java 21+ changed the choreography.

With the Foreign Function & Memory API (FFM), Java can call native libraries directly, with a real type system, and with memory handling you can actually reason about. No black magic, no JNI framework bingo.

So I built something fun: a local, offline speech-to-text app.

You press a button, speak into your mic, and Java returns text.

No cloud.
No Python.
No shell calls.

Just Quarkus + whisper.cpp + FFM.

This post is the teaser. The full tutorial (with all the sharp edges, fixes, and “why is this failing in Dev Mode?” moments) is linked at the end.

The “wait, Java can do that?” demo

Here’s the goal:

Browser UI records audio from your microphone
Client resamples to 16 kHz (what Whisper expects)
Quarkus receives raw PCM floats
Java calls whisper.cpp through FFM
Native inference runs on your machine

You speak. Java transcribes. That’s the whole trick.

And yes, it runs offline.

The actual magic trick isn’t Whisper

Whisper is cool, but it’s not the plot twist.

The plot twist is this:

We can build a clean native bridge in Java without writing JNI.

Instead of hand-writing glue code, we generate bindings from the C headers using jextract.

That means you end up with a typed Java API for Whisper functions.

If you’ve ever had to debug a JNI crash dump at 3 a.m., this feels like therapy.

The part nobody tells you about: Dev Mode vs native loading

Here’s the honest version: the “happy path” is short, but the “real path” has a boss fight.

Quarkus Dev Mode uses ClassLoaders in a way that breaks the default native symbol lookup that jextract generates.

On macOS, you can also hit another fun surprise: symbol names can be prefixed with an underscore.

So the tutorial includes the “I tried the obvious thing and it didn’t work” fix:

Load the library from an absolute path
Use a robust SymbolLookup
Try both whisper_init... and _whisper_init...

This is the difference between a demo and something you can actually run while developing.

(And yes, it took me longer than I want to admit. You’re welcome.)

A tiny taste of the Java side

The core service is basically:

load the native library
init the Whisper context
allocate native memory with an Arena
call whisper_full(...)
read the segments back as strings

That’s it.

No JNI framework. No “native helper process.” No ritual.

And when it works, the logs look glorious because you can literally see Whisper picking the right backend (CPU/GPU) on your machine.

Why you should care (even if you don’t need speech-to-text)

Because speech-to-text is just the demo.

The bigger takeaway is:

Java can now integrate with modern native ML stacks cleanly
FFM is practical, not “academic”
Quarkus is a great host when you want a tight runtime and clear lifecycle

If you’re building local AI tools, edge workloads, or anything “Java + native,” this is a new door opening.

The full tutorial (with all commands + code)

If you want the full end-to-end walkthrough (build whisper.cpp as a shared library, download models, run jextract, patch the generated lookup for Quarkus Dev Mode, wire the Quarkus REST endpoint, and add the simple UI), it’s here:

👉 https://www.the-main-thread.com/p/java-speech-to-text-quarkus-whisper-ffm

Have fun. And if you still have a Python sidecar somewhere “just for Whisper”… you can probably delete it now.

Subscribe now