DEV Community

Cover image for I Replaced 200 Threads With 10,000. Java Finished 13.5x Faster.
S M Tahosin
S M Tahosin Subscriber

Posted on

I Replaced 200 Threads With 10,000. Java Finished 13.5x Faster.

Low overhead for blocking tasks

I expected the fans to spin.

I had just asked Java to start 10,000 tasks, give each task its own virtual
thread, and make every one wait for 100 milliseconds.

Instead, the program finished before I could move my hand away from Enter.

So I ran it again. Then three more times.

On my 12-logical-processor laptop, the median result looked like this:

Executor 10,000 waiting tasks
Fixed pool of 200 platform threads 5,116 ms
One virtual thread per task 378 ms

That is 13.5x faster completion after changing the executor, not the task.

Benchmark results comparing platform and virtual threads

This is not proof that virtual threads make Java code 13.5x faster.

It is proof that I had been thinking about threads incorrectly.

Let us rebuild that mental model from the inside.

First, Make a Prediction

Each task does this:

Thread.sleep(Duration.ofMillis(100));
Enter fullscreen mode Exit fullscreen mode

There are 10,000 tasks.

How long should the whole program take?

  • A: About 1,000 seconds, because 10,000 x 100 ms = 1,000 seconds
  • B: About 5 seconds, because 200 platform threads process the work in waves
  • C: Well under 1 second, because waiting virtual threads can step aside

All three answers can be correct. The executor decides which world you live
in.

The Old Mental Model

For most of Java's life, a Java thread was a thin wrapper around an operating
system thread.

That made threads useful, but expensive enough to treat as a limited resource.

If your server had a pool of 200 platform threads and all 200 were waiting for
a slow database, request 201 had to stand in line.

request -> platform thread -> OS thread -> wait
request -> platform thread -> OS thread -> wait
request ->       queue       ->          -> wait for a free thread
Enter fullscreen mode Exit fullscreen mode

The code was blocked, but the operating system thread assigned to it was still
occupied.

Virtual threads break that one-to-one relationship.

Platform threads compared with virtual threads

A virtual thread is still a real java.lang.Thread.

The difference is that it does not permanently own an OS thread. The JVM
schedules many virtual threads onto a smaller number of platform threads,
called carrier threads.

You can see the distinction directly:

Thread platform = Thread.ofPlatform().start(
        () -> System.out.println(Thread.currentThread().isVirtual())
);

Thread virtual = Thread.ofVirtual().start(
        () -> System.out.println(Thread.currentThread().isVirtual())
);

platform.join();
virtual.join();
Enter fullscreen mode Exit fullscreen mode

Output:

false
true
Enter fullscreen mode Exit fullscreen mode

Same Thread API. Different scheduling model.

What Happens When a Virtual Thread Waits?

Imagine a virtual thread running on carrier thread 3.

It calls a supported blocking operation, such as Thread.sleep() or blocking
network I/O.

The JVM can:

  1. Pause the virtual thread.
  2. Unmount it from carrier thread 3.
  3. Use carrier thread 3 to run other virtual threads.
  4. Remount the original virtual thread when its wait is over.

Timeline showing a virtual thread stepping aside while waiting

The virtual thread did not make the database, network, or timer faster.

It stopped wasting a scarce carrier thread while waiting.

That sentence is the whole feature:

Virtual threads make waiting cheap. They do not make work cheap.

The Experiment

Here is the important part of the benchmark.

private static final int TASKS = 10_000;
private static final Duration WAIT = Duration.ofMillis(100);

private static void run(ExecutorService executor) throws Exception {
    try (executor) {
        List<Future<Integer>> futures = new ArrayList<>(TASKS);

        for (int task = 0; task < TASKS; task++) {
            int taskId = task;

            futures.add(executor.submit(() -> {
                Thread.sleep(WAIT);
                return taskId;
            }));
        }

        for (Future<Integer> future : futures) {
            future.get();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

I ran the same method with two executors:

run(Executors.newFixedThreadPool(200));

run(Executors.newVirtualThreadPerTaskExecutor());
Enter fullscreen mode Exit fullscreen mode

The first executor lets at most 200 tasks wait at once.

The virtual-thread executor starts one virtual thread for every task. When the
tasks sleep, the JVM can unmount them and keep its carrier threads available.

That is why the fixed pool behaves roughly like this:

10,000 tasks / 200 threads = 50 waves
50 waves x 100 ms          = about 5 seconds
Enter fullscreen mode Exit fullscreen mode

The virtual-thread version does not need 50 waves. Almost every task can begin,
sleep, and get out of the carriers' way.

The measured medians from three runs were:

WAITING WORK
200 platform threads        5,116 ms
virtual thread per task       378 ms

CPU WORK
platform threads            2,387 ms
virtual threads             2,300 ms
Enter fullscreen mode Exit fullscreen mode

The waiting result changed dramatically.

The CPU result did not.

The Benchmark Trap

Virtual threads are not tiny turbo buttons.

To test that, I also submitted 48 CPU-heavy tasks that counted primes up to
1,000,000.

Both executors finished in roughly the same time because my laptop still had
only 12 logical processors.

You can create one million virtual threads.

You cannot create one million CPU cores.

Decision tree for choosing virtual threads

Good virtual-thread workloads spend meaningful time waiting:

  • HTTP requests
  • database queries
  • many file operations, after profiling
  • message queues
  • remote API calls
  • many independent sleep() or timer waits

Poor candidates spend most of their time calculating:

  • image processing
  • video encoding
  • compression
  • machine-learning inference
  • large in-memory transformations
  • number crunching

For CPU-bound work, use bounded parallelism near the amount of CPU your machine
can actually execute.

The Simplest Useful Rule

When tasks mostly wait:

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    Future<String> user = executor.submit(() -> loadUser());
    Future<List<Order>> orders = executor.submit(() -> loadOrders());

    renderProfile(user.get(), orders.get());
}
Enter fullscreen mode Exit fullscreen mode

This code is ordinary, blocking, and readable.

That is intentional.

For years, developers often had to choose between simple thread-per-request
code that did not scale and asynchronous code that scaled but split the
workflow across callbacks, futures, or reactive operators.

Virtual threads make the simple shape practical for many high-throughput
blocking applications.

They do not remove every concurrency problem. They remove one expensive
assumption: that every concurrent task needs its own OS thread.

Do Not Pool Virtual Threads

This feels wrong at first.

We learned to pool threads because platform threads were expensive. A pool
limited how many of those scarce threads existed.

Virtual threads are designed to be created per task.

So this is the normal pattern:

Executors.newVirtualThreadPerTaskExecutor();
Enter fullscreen mode Exit fullscreen mode

Not this:

a tiny pool of reusable virtual threads
Enter fullscreen mode Exit fullscreen mode

If you must limit access to something scarce, limit that thing.

Suppose a partner API permits only 20 concurrent requests:

Semaphore partnerApiSlots = new Semaphore(20);

String callPartnerApi() throws InterruptedException {
    partnerApiSlots.acquire();

    try {
        return makeBlockingHttpRequest();
    } finally {
        partnerApiSlots.release();
    }
}
Enter fullscreen mode Exit fullscreen mode

Many virtual threads passing through a semaphore before a partner API

The executor can still create a virtual thread per task.

The semaphore protects the actual bottleneck.

This separation is useful far beyond virtual threads:

Concurrency is how much work can be in progress. Capacity is how much work a
dependency can safely accept.

The Quiet ThreadLocal Trap

Virtual threads support ThreadLocal, so request context such as a user ID or
trace ID can continue to work.

The dangerous pattern is using ThreadLocal as a tiny object pool:

private static final ThreadLocal<ExpensiveClient> CLIENT =
        ThreadLocal.withInitial(ExpensiveClient::new);
Enter fullscreen mode Exit fullscreen mode

That may look efficient when 200 pooled platform threads reuse 200 clients.

With one virtual thread per task, it can quietly become thousands of expensive
clients that are barely reused.

Keep context in thread-local variables only when it truly belongs to the task.
Do not use them to cache heavy reusable objects per virtual thread.

You Can Observe Them

Virtual threads are invisible to the operating system because the OS sees
carrier threads, not every virtual thread.

The JDK understands them, though.

You can create a virtual-thread-aware dump with:

jcmd <pid> Thread.dump_to_file -format=json threads.json
Enter fullscreen mode Exit fullscreen mode

That distinction matters during debugging. An OS dashboard may show a modest
thread count while the JVM is managing thousands of virtual threads.

The right question is not only "how many threads exist?"

It is "what are those threads waiting for?"

One Outdated Warning

You may have read this advice:

Never block inside synchronized code when using virtual threads, because it
pins the carrier thread.

That warning mattered when virtual threads became final in Java 21.

Java 24 changed the implementation through
JEP 491. Virtual threads can now release their
carrier when blocking inside synchronized code in the normal case.

Pinning has not vanished completely. Native and foreign-function calls can
still pin a virtual thread.

But the blanket "virtual threads and synchronized do not mix" rule is
outdated on modern JDKs.

This is one reason I ran the experiment on Java 25 LTS instead of repeating an
old Java 21 checklist.

A Five-Minute Migration Checklist

Do not rewrite an application because virtual threads sound exciting.

Take one blocking workflow and inspect it.

  1. Confirm the workload waits. Look for database calls, HTTP calls, file access, queues, and sleeps.
  2. Replace the task executor. Try Executors.newVirtualThreadPerTaskExecutor().
  3. Keep downstream limits. Connection pools, API quotas, and rate limits still exist.
  4. Load test the real path. A sleep benchmark teaches the model, not your production capacity.
  5. Measure CPU and memory too. Cheap threads can still run expensive code or retain large objects.
  6. Check native integrations. Native calls are one of the remaining pinning cases.

The goal is not "use virtual threads everywhere."

The goal is "stop paying for idle OS threads where you do not need them."

The Mental Model I Am Keeping

Before this experiment, I thought:

More concurrent Java work requires a larger thread pool.

Now I think:

Waiting work wants cheap virtual threads. CPU work wants bounded
parallelism. Scarce dependencies want explicit limits.

That model is simple enough for a beginner and accurate enough to prevent a
surprising number of production mistakes.

The full runnable lab behind the numbers uses only the JDK. No framework, build
tool, or dependency is required.

Compile and run it with Java 25:

javac VirtualThreadsLab.java
java VirtualThreadsLab
Enter fullscreen mode Exit fullscreen mode

Open the complete runnable VirtualThreadsLab.java

Virtual threads became final in Java 21. Java 25 is not required for the basic
API, but it gives us the current LTS behavior, including the post-Java-24
improvements discussed above.

Sources

What should I put through this lab next: a database connection pool, 10,000
real HTTP calls, or a ThreadLocal-heavy application?

Top comments (16)

Collapse
 
tamimrao profile image
Tamim Rao

Since virtual threads are getting a lot of attention lately, experiments like this are a good reminder of why. The surprising part isn't that 10,000 tasks ran, it's how little overhead there was compared to the mental model many of us still have from platform threads.

What I find interesting is that it also highlights a common misconception. Seeing 10,000 threads complete smoothly doesn't mean we should start creating threads everywhere. It means the cost model has changed, so we can focus more on expressing concurrency in a straightforward way and less on building complex pooling strategies for I/O-heavy workloads.

I'd be curious to see the same experiment with blocking network calls, database operations, and some CPU-bound work mixed in. That's usually where the real trade-offs start to show up.

Collapse
 
tahosin profile image
S M Tahosin

That's exactly the takeaway I was hoping readers would get from the experiment. The interesting part isn't the number itself, it's that virtual threads let us go back to a much simpler concurrency model without paying the same cost we used to associate with threads.

I also agree that "10,000 threads worked" can easily turn into the wrong conclusion if people stop there. Virtual threads make waiting cheap, but they don't magically make CPU work cheaper.

The mixed workload scenario you mentioned would be a great follow-up. Network I/O, database calls, and CPU-bound tasks in the same benchmark would probably show a much more nuanced picture of where virtual threads shine and where the underlying hardware limits still dominate. That's actually the direction I'm thinking of exploring next.

Collapse
 
motedb profile image
mote

The "cheap waiting, not cheap work" framing is the part that took me longest to internalize. I kept trying to use virtual threads as a drop-in for thread pools on CPU-bound tasks, then wondering why memory usage spiked without speed improvement. The Semaphore pattern for rate-limiting actual bottlenecks is underrated — most devs reach for it too late, after they've already blown up a downstream API's rate limit.

One thing I'd push back on slightly: the article mentions ThreadLocal as a gotcha, but the deeper issue is that virtual threads fundamentally change the cost model. In Rust's async model, you'd handle this differently — instead of Semaphore + blocking calls, you'd reach for async channels or futures that yield without thread blocking. Same problem, different primitives. Neither is wrong, just requires rethinking what "waiting" means in your specific runtime.

What's your take on structured concurrency here? Virtual threads make it easier to accidentally spawn fire-and-forget tasks that outlive their parent scope.

Collapse
 
tahosin profile image
S M Tahosin

That's a great point. I think the ThreadLocal example is really just one symptom of the broader shift in the cost model. Virtual threads let us write code in a more direct style, but they also force us to revisit assumptions that were built around expensive threads.

I also agree with the Rust comparison. The primitives are different, but the underlying challenge is the same: expressing concurrency without confusing waiting with useful work.

As for structured concurrency, I'm a big fan of it for exactly the reason you mentioned. Once spawning work becomes cheap, lifecycle management becomes more important, not less. It's very easy to create tasks that technically work but are no longer tied to the scope that created them. Structured concurrency feels like the missing guardrail that keeps that power manageable.

Collapse
 
ankitasarker profile image
Ankita Sarkar

Really enjoyed this experiment. A lot of developers still think "threads are expensive" without considering what those threads are actually doing. Your results are a good reminder that modern JVMs and operating systems handle idle threads much better than many of us expect.

What stood out to me is how easy it is to carry old assumptions forward without testing them. It would be interesting to see the same experiment with CPU-heavy work instead of sleeping threads to compare where the real limits start showing up.

Thanks for sharing actual measurements instead of just repeating common wisdom.

Collapse
 
tahosin profile image
S M Tahosin

Exactly. That was one of the main motivations behind the experiment. It's surprisingly easy to inherit assumptions from older threading models and never revisit them.

A CPU-heavy version would be a great comparison because that's where I'd expect the hardware limits to become much more visible. Thanks for the thoughtful observation.

Collapse
 
ismailhasan profile image
Ismail Hasan

This experiment is fascinating because it really challenges our intuition about what modern hardware can handle. Most people assume starting thousands of threads would instantly bring a laptop to its knees, but seeing it barely notice is eye-opening. It also makes me think about how much the JVM and modern operating systems optimize thread management behind the scenes. I wonder how this would scale on different workloads, especially when threads are doing more than just sleeping. It’s a great reminder that sometimes our assumptions about performance bottlenecks are outdated, and testing can reveal surprising truths about the tools we use every day.

Collapse
 
tahosin profile image
S M Tahosin

I completely agree. One of the biggest lessons for me was realizing how many performance assumptions I was carrying around without ever testing them.

You're also right that the workload matters. Sleeping threads are one thing, but CPU-heavy work or blocking I/O can tell a very different story. That's why benchmarks are so valuable. They often reveal that the bottleneck isn't where we expected it to be.

Thanks for sharing your thoughts.

Collapse
 
mansadatta profile image
Mansa Datta

What I liked about this experiment is that it challenges a common assumption many developers have: seeing a huge thread count and immediately expecting the system to fall apart. The interesting takeaway isn't that 10,000 threads worked, but understanding why they worked. Most of them were likely waiting rather than actively competing for CPU time.

It's also a good reminder that concurrency discussions are often more nuanced than "more threads = bad." Thread state, memory usage, and workload type matter just as much as the raw number of threads.

A follow-up comparison with CPU-bound tasks or Java virtual threads would be really interesting. That would show where traditional threads start to hit their limits and how different concurrency approaches compare in practice.

Great experiment and a nice reality check for many of the assumptions we carry about threads.

Collapse
 
tahosin profile image
S M Tahosin

That's a great way to put it. I think many of us still carry the mental model that a large thread count automatically means trouble, because that's often true with platform threads. What surprised me most wasn't the number itself, but how little actual contention there was once you look at what those threads were doing.

I also like your point that concurrency discussions often get reduced to a single metric. The raw thread count is easy to focus on, but thread state and workload characteristics usually tell a much more useful story.

A CPU-bound comparison is definitely on my list. My expectation is that the gap becomes much smaller there, which would reinforce the idea that virtual threads make waiting cheap, not computation cheap. That's where the distinction between concurrency and parallelism becomes really interesting in practice.

Collapse
 
adrianng profile image
Adrian Ng

What stood out to me is how this highlights the gap between theory and reality. We often hear "threads are expensive" and stop there, but seeing 10,000 threads barely make a modern laptop sweat puts that advice into context. The most interesting part wasn't the number itself, it was the reminder to test assumptions instead of repeating them. Nice experiment and a fun read.

Collapse
 
tahosin profile image
S M Tahosin

I couldn't agree more. The phrase "threads are expensive" isn't wrong, but it's often repeated without enough context.

What surprised me most was how different the actual result was from the picture I had in my head before running the test. That's exactly why I love small experiments like this. They have a way of exposing assumptions we didn't even realize we were carrying around.

Collapse
 
danielmarcus profile image
Daniel Markus

This was a fun reminder that many of us still think about Java threads using rules from a different era. The most interesting part wasn't that 10,000 threads could be created, but how little impact it had when those threads weren't actively doing work.

It also highlights an important distinction between thread count and actual concurrency pressure. Numbers alone can be misleading. Thanks for sharing a simple experiment that challenges assumptions instead of repeating them.

Collapse
 
tahosin profile image
S M Tahosin

Exactly. I think that's the key distinction people often miss. A large thread count sounds scary until you look at what those threads are actually doing.

The experiment was less about proving that 10,000 is a magic number and more about questioning an assumption I've seen repeated for years. Sometimes the mental model becomes outdated long before we realize it.

Collapse
 
hani1808 profile image
Hani Lieu

Really cool experiment. It's amazing how something that used to feel impossible is now running comfortably on a regular laptop thanks to virtual threads.

Collapse
 
tahosin profile image
S M Tahosin

That's what surprised me too. A few years ago, "10,000 threads on a laptop" would have sounded like a terrible idea. Virtual threads really change what feels practical for highly concurrent workloads.