Petr Macek

Posted on Feb 19 • Originally published at saasforge.cz

Propagating User Identity in Axon 5 Query Handlers

#java #cqrs #security #axonframework

During internal testing of our latest release, we hit a puzzling bug: owners couldn't see their own entities on a management page. The error message was simply "Unable to load" — despite the record clearly existing in the database.

Our application uses Axon Framework 5 with Spring WebFlux and Netflix DGS (GraphQL). The query handler was doing something seemingly reasonable — checking whether the authenticated user was the owner before returning INACTIVE records:

@QueryHandler
public Mono<Venue> handle(FindVenueByIdQuery query) {
    return venueRepository.findByIdWithAllRelationships(query.venueId().toString())
            .filterWhen(venue -> canViewVenueReactive(venue))
            .map(VenueNode::toDomain);
}

private Mono<Boolean> canViewVenueReactive(VenueNode venue) {
    return ReactiveSecurityContextHolder.getContext()
            .map(ctx -> ctx.getAuthentication().getName())
            .map(userId -> isOwner(venue, userId) || venue.isActive())
            .defaultIfEmpty(venue.isActive());
}

Can you spot the problem?

Why ReactiveSecurityContextHolder Is Always Empty Inside Axon Handlers

The reactive security context in Spring WebFlux is propagated through the Reactor Context — a subscriber-scoped mechanism attached to the reactive chain. It works within a single reactive pipeline. But Axon queries introduce two critical disruptions.

Disruption 1: The Scheduler Hop

Our original ReactiveQueryGateway wrapped Axon's QueryGateway (which returns CompletableFuture) in a Mono with a dedicated blocking scheduler:

return Mono.fromFuture(() -> queryGateway.query(query, responseType))
        .subscribeOn(blockingScheduler);

The subscribeOn(blockingScheduler) shifts execution to a different thread pool. While Reactor's Hooks.enableAutomaticContextPropagation() handles MDC propagation across scheduler hops, the security context requires explicit ThreadLocal restoration — which Axon doesn't do.

(As it turns out, this scheduler hop is unnecessary in Axon 5 — but the toFuture() boundary below breaks context propagation regardless.)

Disruption 2: The toFuture() Boundary

Even if the security context survived the scheduler hop, Axon internally calls .toFuture() on the Mono returned by query handlers. This creates an independent subscription — a completely separate reactive chain that has no knowledge of the original subscriber's context.

The result: ReactiveSecurityContextHolder.getContext() inside any Axon query handler returns Mono.empty(). Always.

Three Approaches We Considered

Approach A: Add callerId to Query Objects

The quick fix: just put the user ID in the query record.

public record FindVenueByIdQuery(VenueId venueId, String callerId) {}

We rejected this. Query objects are domain concepts — they express what you want to find, not who is asking. Polluting every query with authentication concerns violates CQRS principles and creates a leaky abstraction.

Approach B: SecurityContext ThreadLocalAccessor

ContextRegistry.getInstance().registerThreadLocalAccessor("security",
    () -> SecurityContextHolder.getContext(),
    SecurityContextHolder::setContext,
    SecurityContextHolder::clearContext);

We rejected this too. Axon 5 deliberately moved away from thread-local-based patterns. The toFuture() boundary still breaks this approach — ThreadLocalAccessors only help with subscribeOn hops within a single Reactor chain, not across independent subscriptions.

Approach C: Axon MetaData (What We Chose)

Axon has a first-class mechanism for propagating cross-cutting concerns with messages: MetaData. Every Axon message (commands, queries, events) can carry a Map<String, String> of metadata alongside the payload.

This is the CQRS-correct approach:

Query objects remain pure domain objects — no authentication concerns
Identity travels WITH the message — not as ambient thread-local state
Works regardless of threading model — scheduler hops, toFuture(), serialization boundaries... none of it matters

The Solution: Auth-Injecting Query Gateway

The Key Axon 5 API Insight

Axon 5's QueryGateway doesn't expose a metadata parameter directly (unlike CommandGateway.send(command, metadata)).

But there's a workaround. Looking at DefaultQueryGateway.asQueryMessage():

private <Q, R> QueryMessage<Q, R> asQueryMessage(Q query, Class<R> responseType) {
    if (query instanceof QueryMessage<?, ?> queryMessage) {
        return (QueryMessage<Q, R>) queryMessage;  // Used directly!
    }
    // ... wraps in GenericQueryMessage otherwise
}

If the query object is already a QueryMessage, Axon uses it directly — metadata and all. So we construct a GenericQueryMessage with metadata attached and pass it to the gateway.

Implementation

We enhanced our existing ReactiveQueryGateway — the single choke point that all query callers go through:

@Component
public class DefaultReactiveQueryGateway implements ReactiveQueryGateway {

    private final QueryGateway queryGateway;
    private final ReactiveAuthenticationSupplier authenticationSupplier;

    @Override
    public <R, Q> Mono<R> query(Q query, Class<R> responseType) {
        return buildAuthMetadata()
                .flatMap(metadata -> {
                    Object queryWithMetadata = wrapWithMetadata(query, metadata);
                    return Mono.fromFuture(
                            () -> queryGateway.query(queryWithMetadata, responseType));
                });
    }

    private Mono<Metadata> buildAuthMetadata() {
        return authenticationSupplier.getAuthentication()
                .map(auth -> {
                    Metadata metadata = Metadata.with("userId", auth.getName());
                    String roles = auth.getAuthorities().stream()
                            .map(GrantedAuthority::getAuthority)
                            .collect(Collectors.joining(","));
                    if (!roles.isEmpty()) {
                        metadata = metadata.and("roles", roles);
                    }
                    return metadata;
                })
                .defaultIfEmpty(Metadata.emptyInstance());
    }

    private <Q> Object wrapWithMetadata(Q query, Metadata metadata) {
        if (metadata.isEmpty()) {
            return query;
        }
        MessageType messageType = new MessageType(query.getClass());
        return new GenericQueryMessage(
                new GenericMessage(messageType, query, Map.copyOf(metadata)),
                null
        );
    }
}

The critical ordering: buildAuthMetadata() runs in the DGS reactive chain where ReactiveSecurityContextHolder works. The result is captured in the flatMap closure before Mono.fromFuture() crosses the async boundary.

Query Handler Side

Query handlers consume the metadata via @MetadataValue parameter injection:

@QueryHandler
public Mono<Venue> handle(FindVenueByIdQuery query,
        @MetadataValue(value = "userId", required = false) String callerId,
        @MetadataValue(value = "roles", required = false) String roles) {

    return venueRepository.findByIdWithAllRelationships(query.venueId().toString())
            .filter(venue -> canViewVenue(venue, callerId, roles))
            .map(VenueNode::toDomain);
}

private boolean canViewVenue(VenueNode venue, String callerId, String roles) {
    if (roles != null && roles.contains("ROLE_ADMIN")) {
        return true;
    }
    if (callerId != null && venue.getCreatedBy() != null
            && venue.getCreatedBy().getId() != null
            && venue.getCreatedBy().getId().equals(callerId)) {
        return true;
    }
    return venue.getStatusEnum() == VenueStatus.ACTIVE;
}

Notice: canViewVenue is now a pure function. It takes explicit inputs and returns a deterministic result. No Mono<Boolean>, no ReactiveSecurityContextHolder, no ambient state. This is testable, debuggable, and correct by construction.

The Query Object Stays Clean

public record FindVenueByIdQuery(VenueId venueId) {
    public static FindVenueByIdQuery of(String id) {
        return new FindVenueByIdQuery(VenueId.of(id));
    }
}

No callerId. No security concerns. Just a domain query.

The Architecture at a Glance

Beyond Identity: Propagating Roles

The initial implementation only propagated userId. This solved the owner-visibility problem but created a subtler bug: admins who weren't owners couldn't see INACTIVE records — not in the detail view, and not in the list view.

The DGS layer uses @PreAuthorize("hasRole('ADMIN')") for admin endpoints, so the GraphQL request succeeds. But the query handler's canViewVenue() couldn't distinguish an admin from a regular user.

The fix: extend the metadata to include roles. The key insight is that if a query handler needs any security context to make a decision, that context must travel as metadata. The @PreAuthorize annotation and the canViewVenue() check serve different purposes:

@PreAuthorize is a gate — can this user invoke this operation at all?
canViewVenue() is a filter — which results should this user see?

Both need role information, but they access it from different layers. The gateway bridges the gap.

Dropping the Blocking Scheduler

Our original ReactiveQueryGateway used subscribeOn(blockingScheduler) to avoid tying up Netty event-loop threads — a reasonable precaution when queryGateway.query() might block. But in Axon 5 with SimpleQueryBus, the entire query dispatch path is non-blocking:

queryGateway.query() does lightweight synchronous work — routing, message wrapping — and returns a CompletableFuture immediately
The query handler returns Mono<T>, which Axon converts via toFuture() — a non-blocking operation that just wires up the completion signal
The CompletableFuture completes when the handler's Mono emits, on whatever scheduler the reactive chain was already using

There's no blocking I/O anywhere in this path. The subscribeOn(blockingScheduler) only added an unnecessary context switch — and one fewer moving part means one fewer thing that can break context propagation.

Caveat: If you use AxonServerQueryBus (connecting to Axon Server), there's serialization and gRPC I/O involved. Even that is mostly async in Axon 5, but if you observe Netty thread starvation under high load, a bounded scheduler for gateway calls might still make sense. Profile first — don't add it preemptively.

Lessons Learned

Messages should be self-contained. In a CQRS system, cross-cutting concerns belong in message metadata, not in reactive context or thread-locals. Start with userId, but plan for roles and other security context.
Extract context early, propagate explicitly. The DGS layer is the last point where the reactive security context is available. Extract what you need there and pass it forward — don't rely on it surviving framework boundaries.
Gateway layers are powerful choke points. By modifying a single class, we transparently added auth injection to all query calls without touching any caller. Infrastructure concerns belong in infrastructure code.
Pure functions beat reactive context lookups. Replacing canViewVenueReactive() (Mono-returning, context-dependent) with canViewVenue() (boolean-returning, explicit inputs) made the code more testable, more debuggable, and provably correct.
Axon 5's API has gaps — but workarounds exist. The QueryGateway doesn't support metadata parameters directly, but the instanceof QueryMessage check in asQueryMessage() provides a clean workaround.

Axon 5 Migration Note

If you're migrating from Axon 4:

@MetaDataValue (Axon 4) is now @MetadataValue (Axon 5) — note the lowercase 'd'
Package changed from org.axonframework.messaging.annotation to org.axonframework.messaging.core.annotation
MessageType is now a record with a MessageType(Class<?>) constructor
GenericMessage constructor signature: GenericMessage(MessageType, Object, Map<String, String>)

Summary

The reactive security context doesn't survive Axon query handler boundaries in Spring WebFlux applications. Rather than fighting the framework with thread-local propagation hacks, lean into Axon's own messaging model: extract identity and roles at the edge, attach them as MetaData, and read them with @MetadataValue in your handlers. Your query objects stay clean, your handlers become pure functions, and the fix is transparent to every caller in the system.

DEV Community