DEV Community

AranaDeDoros
AranaDeDoros

Posted on

What Happens When One Parallel Call Fails? Structured Concurrency in Scala

When building backend systems, we often fan out to multiple services in parallel:

  • Price providers
  • Recommendation engines
  • Search indexes
  • Payment gateways

The real question isn't "How do I run these in parallel?"

It's:

  • What happens if one fails?
  • What happens if one times out?
  • Do retries leak work?
  • Can I keep partial results?

I explored this while building a small price aggregation simulator in Scala 3 using Cats Effect.


The scenario

The setup was simple:

  • Multiple providers
  • Each returns a price
  • We call them in parallel
  • We aggregate the results

But the interesting part wasn't parallelism.

It was failure semantics.

The Core Abstraction

trait PriceProvider[F[_]]:
  def name: ProviderName
  def fetchPrice(productId: ProductId): F[Price]
Enter fullscreen mode Exit fullscreen mode

Providers may fail.

One specific failure is modeled explicitly:

sealed trait ProviderError extends Throwable
final case class TimeOutError(provider: String,time: LocalDateTime) 
extends ProviderError(s"$provider timed out", time)
Enter fullscreen mode Exit fullscreen mode

Timeouts are not transport hacks.

They are part of the domain.

Retry as a Decorator

Instead of baking retry into the provider, I wrapped providers with a retry policy:

def retryOnTimeout[A](
  fa: IO[A],
  maxRetries: Int,
  delay: FiniteDuration
): IO[A] =

  def loop(attempt: Int): IO[A] =
    fa.handleErrorWith {
      case _: TimeOutError if attempt < maxRetries =>
        IO.sleep(delay) *> loop(attempt + 1)
      case e =>
        IO.raiseError(e)
    }

  loop(0)
Enter fullscreen mode Exit fullscreen mode

Then composed it:

final class RetryingProvider(
  underlying: PriceProvider[IO],
  maxRetries: Int,
  delay: FiniteDuration
) extends PriceProvider[IO]:

  override def fetchPrice(productId: ProductId): IO[Price] =
    retryOnTimeout(
      underlying.fetchPrice(productId),
      maxRetries,
      delay
    )
Enter fullscreen mode Exit fullscreen mode

Important detail:

  • Only timeouts retry
  • Other failures propagate immediately
  • The underlying provider remains untouched

Retry is a policy layer, not embedded behavior.

Aggregation Strategy #1 — Fail Fast

def fetchAll(
  providers: List[PriceProvider[IO]],
  productId: ProductId
): IO[List[Price]] =
  providers.parTraverse(_.fetchPrice(productId))
Enter fullscreen mode Exit fullscreen mode

Semantics:

  • All providers run in parallel
  • If one fails, the whole operation fails
  • Remaining providers are cancelled

This is strict and correct when all results are required.

Aggregation Strategy #2 — Keep Partial Results

Sometimes partial success is acceptable.

So instead of failing the entire computation, we capture results explicitly:

def fetchSafe(
  provider: PriceProvider[IO],
  productId: ProductId
): IO[ProviderResult] =
  provider.fetchPrice(productId)
    .map(price =>
      ProviderResult.Success(provider.name, price)
    )
    .handleError {
      case e: ProviderError =>
        ProviderResult.Failure(provider.name, e)
    }
Enter fullscreen mode Exit fullscreen mode

Then aggregate:

def fetchAllPartial(
  providers: List[PriceProvider[IO]],
  productId: ProductId
): IO[List[ProviderResult]] =
  providers.parTraverse(p => fetchSafe(p, productId))
Enter fullscreen mode Exit fullscreen mode

Now:

  • All providers run in parallel
  • Failures don't cancel siblings
  • Partial results are preserved
  • The operation always completes

Different policy. Same building blocks.

Optional: Enforcing a Quorum

You can even require a minimum number of successes:

def requireAtLeast(
  n: Int,
  results: List[ProviderResult]
): IO[List[(ProviderName, Price)]] =
  val successes = results.collect {
    case ProviderResult.Success(p, price) => (p, price)
  }

  if successes.size >= n then IO.pure(successes)
  else IO.raiseError(new RuntimeException("Not enough providers"))
Enter fullscreen mode Exit fullscreen mode

Now the system supports:

  • Fail-fast
  • Partial aggregation
  • Quorum-based acceptance

All composed explicitly.

What This Experiment Revealed

The interesting part wasn't syntax. It was separation of concerns.

The system has three independent policy layers:

1. Failure classification (timeouts vs other errors)

2. Retry behavior (decorator)

3. Aggregation strategy (fail-fast vs partial vs quorum)

None of them are tangled together.

That's the real value.

  • When to retry
  • When to cancel
  • When partial results are acceptable
  • When to fail the whole operation

The abstraction you choose determines how clearly you can express those decisions.

And that's a backend engineering concern, not a language feature.

Top comments (0)