DEV Community

Pablo Albaladejo
Pablo Albaladejo

Posted on

The Middy After Hook Problem: Why Streaming Lambda Observability Is Broken

In Part 1 we built a streaming Lambda that pipes an LLM response through API Gateway to the client using Middy and the AI SDK. It works. The client sees tokens arrive in real time, structured JSON builds incrementally in the browser, and the architecture is clean.

Now we want observability. We want to log the full LLM response, publish token usage metrics, and record the structured output for auditing. The natural place for this in a Middy-based Lambda is an after hook. That is exactly what we tried. And it is exactly what does not work.

The problem is a timing issue between Middy's middleware lifecycle and the streaming response lifecycle. It is not documented in the Middy docs, the AWS Lambda streaming docs, or any community resource we could find.

We spent significant time diagnosing it, and this post is the documentation that should have existed.

TL;DR: Middy's after hook fires at T1 (when the handler returns the ReadableStream), but the stream body is consumed at T2-T4. Any observability code in after (logging, metrics, publishing) sees empty data because the stream has not been read yet. onFinish is fire-and-forget and Lambda post-response execution is unreliable. The solution is in Part 3: TransformStream.flush().

Why does Middy's after hook see empty data when streaming?

Here is what you would naturally write. A Middy after hook that reads from a data store populated by the LLM call and publishes it:

const publishLlmLog: middy.MiddlewareObj = {
  after: async (request) => {
    const payload = llmLogDataStore.get();
    if (!payload) return;

    await publishToSns({
      payload: {
        llmParam: payload.llmParam,
        llmResult: payload.llmResult,
      },
      projectName: 'my-service',
    });

    llmLogDataStore.clear();
  },
};
Enter fullscreen mode Exit fullscreen mode

This pattern works perfectly for non-streaming responses. The handler calls the LLM, waits for the full result, stores it, and returns a JSON body.

Middy's after hook fires after the handler returns, reads the store, publishes the data. Everything is sequential and deterministic.

For a streaming Lambda, the handler looks like this:

const streamHandler = async (
  event: APIGatewayProxyEvent,
): Promise<HttpStreamResponse> => {
  const body = JSON.parse(event.body ?? '{}');
  const prompt: string = body.prompt ?? 'Summarize the benefits of serverless architecture';

  const result = streamingService({ prompt }, onError);
  const response = result.toTextStreamResponse();

  return {
    body: response.body ?? '',
    headers: Object.fromEntries(response.headers.entries()),
    statusCode: response.status,
  };
};
Enter fullscreen mode Exit fullscreen mode

The handler returns a ReadableStream as the body. It does not wait for the stream to finish. The LLM has barely started generating tokens. The data store is empty.

When does Middy's after hook fire relative to stream consumption?

This is the exact sequence of events when Middy processes a streaming response:

T0  Handler returns { body: ReadableStream, statusCode: 200 }
    |
T1  Middy after hook fires             <-- stream is UNREAD
    |                                       llmLogDataStore.get() === undefined
    |
T2  Middy pipes response.body to Lambda responseStream
    |
T3  Client receives chunks             <-- tokens flowing
    |   ...
    |   ...
T4  Stream ends                         <-- data is HERE
        DataStoreMiddleware.flush() populates the store
        But the after hook already ran at T1.
Enter fullscreen mode Exit fullscreen mode

The after hook runs at T1. The data you need only exists at T4. The gap between T1 and T4 can be seconds or minutes, depending on how long the LLM takes to generate the full response.

Here is another way to see it. In a non-streaming Lambda, the timeline is linear:

Handler executes -> LLM completes -> Store populated -> Handler returns -> After hook fires
                                                                           (store has data)
Enter fullscreen mode Exit fullscreen mode

In a streaming Lambda, the handler returns before the LLM completes:

Handler executes -> LLM starts -> Handler returns -> After hook fires -> LLM streams -> LLM completes
                                                      (store is empty)                   (store has data)
Enter fullscreen mode Exit fullscreen mode

The after hook and the stream consumption are on different timelines. The handler returning does not mean the work is done. It means the work has barely started.

Why Tests Don't Catch It

This is the part that delayed our diagnosis. The unit tests passed. Every time.

In a typical unit test for this handler, you mock the AI SDK service:

vi.mock('./streaming-service', () => ({
  streamingService: () => ({
    toTextStreamResponse: () => new Response('{"summary":"test"}'),
  }),
}));
Enter fullscreen mode Exit fullscreen mode

The mock returns a resolved Response with a complete body. When the test calls the handler, the mock populates the data store synchronously. By the time the after hook runs in the test, the store has data. The test passes.

In production, streamText() returns immediately with an unresolved stream. The AI SDK creates a ReadableStream backed by an HTTP connection to the LLM. The data flows asynchronously through internal TransformStream chains as the model generates tokens.

The DataStoreMiddleware uses a TransformStream with a flush() callback that only fires after the last chunk passes through:

// This TransformStream wraps the LLM's internal stream.
// flush() fires only when the stream closes -- at T4.
return {
  ...result,
  stream: createStreamTransform(result.stream, (streamParts) => {
    setData({
      llmParam: params,
      llmResult: {
        request: result.request,
        response: result.response,
        streamParts,
      },
    });
  }),
};
Enter fullscreen mode Exit fullscreen mode

The flush() callback runs at T4. The after hook runs at T1. In the mock, both happen at T0. The test collapses the entire timeline into a single tick and hides the race condition.

This is not a testing mistake. It is a fundamental limitation of mocking streams. The mock correctly simulates the interface but eliminates the temporal behavior that causes the bug.

Three Approaches That Don't Work

Before we found the solution (covered in Part 3), we explored three approaches that each fail for a different reason.

1. Move publishing to the AI SDK's onFinish callback

The AI SDK provides an onFinish callback on streamText(). It fires when the stream completes. That sounds like exactly what we need:

streamText({
  model,
  prompt,
  output: Output.object({ schema }),
  onFinish: async ({ object, usage }) => {
    await publishToSns({ llmResult: { object, usage } });
  },
});
Enter fullscreen mode Exit fullscreen mode

The problem: onFinish is fire-and-forget. The AI SDK calls it but does not await the returned promise. The promise resolves (or rejects) silently in the background.

In Lambda, the execution environment can freeze or terminate after responseStream.end(). If onFinish triggers an async operation like an SNS publish, there is no guarantee it completes before the environment is reclaimed.

This is a known limitation. The Vercel community has discussed it in the context of Next.js serverless functions.

On Vercel's platform, waitUntil() can extend the function lifetime. On Lambda, no such mechanism exists for streaming functions.

2. Await the stream in the after hook, then reconstruct it

You could consume the ReadableStream in the after hook to extract the data:

const afterHook: middy.MiddlewareObj = {
  after: async (request) => {
    const response = request.response as HttpStreamResponse;
    if (!(response.body instanceof ReadableStream)) return;

    const reader = response.body.getReader();
    const chunks: Array<Uint8Array> = [];
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      if (value) chunks.push(value);
    }

    // Now we have the data... but the stream is consumed.
    // We need to give Middy a new stream to pipe to the client.
    response.body = new ReadableStream({
      start(controller) {
        for (const chunk of chunks) {
          controller.enqueue(chunk);
        }
        controller.close();
      },
    });

    await publishData(chunks);
  },
};
Enter fullscreen mode Exit fullscreen mode

This technically works, but it defeats the purpose of streaming. You buffer the entire response in memory before the client receives a single byte. The TTFB becomes the full LLM generation time.

You have rebuilt a buffered response with extra steps.

3. Use Lambda post-response execution

Lambda continues executing code after responseStream.end() is called. You could set a flag when the stream completes and run your publish logic after:

// After the stream ends, Lambda still has execution time
void (async () => {
  for await (const chunk of result.textStream) {
    passThrough.write(chunk);
  }
  passThrough.end();

  // Post-stream work
  const data = llmLogDataStore.get();
  if (data) {
    await publishToSns(data);
  }
})();
Enter fullscreen mode Exit fullscreen mode

The AWS documentation explicitly warns against relying on this:

"The runtime does not wait for asynchronous work to complete after the response stream has errored or been destroyed."

If the client disconnects mid-stream (a common event in production), the response stream is destroyed. Any async work scheduled after that point may not complete. Your publish call might execute, or it might not. There is no guarantee.

This approach also moves your observability logic into the handler's streaming IIFE, coupling it to the handler code and bypassing Middy's middleware chain entirely.

How do you handle errors in streaming Lambda responses?

Before we move to the solution, we need to address a related problem: error handling in a streaming Lambda has two distinct phases, and your error handler must account for both.

Pre-stream errors

Before the handler returns a ReadableStream, errors behave normally. A validation failure, an auth error, a missing parameter: these throw before any bytes are written.

Middy catches them, runs the onError chain, and returns a standard HTTP error response with an appropriate status code. This is the familiar path.

Mid-stream errors

Once the handler returns a ReadableStream and Middy starts piping it to the client, the HTTP status code (200) and headers are already sent. If the LLM fails mid-generation, or the client disconnects, or a timeout fires, the error manifests as a stream destruction.

There is no way to change the status code retroactively.

The AI SDK wraps these errors in nested cause chains. A client disconnect might surface as:

Error: Stream processing failed
  cause: TypeError [ERR_INVALID_STATE]: The reader is not attached to a stream
Enter fullscreen mode Exit fullscreen mode

Or as a direct AbortError from the AbortController. Or as an AI SDK wrapper error with a nested AbortError in the cause property.

The streamErrorHandler in the step-2 companion code handles this by recursively checking the cause chain:

const isAbortError = (error: Error): boolean => {
  if (error.name === 'AbortError') return true;

  if (
    error.name === 'TypeError' &&
    'code' in error &&
    error.code === 'ERR_INVALID_STATE'
  )
    return true;

  if (error.cause instanceof Error) {
    return isAbortError(error.cause);
  }

  return false;
};
Enter fullscreen mode Exit fullscreen mode

This function handles three cases:

  1. Direct AbortError: the AbortController signal fired (timeout or explicit abort).
  2. ERR_INVALID_STATE: a TypeError thrown when reading from a detached stream reader. This happens when the client disconnects and the underlying stream is destroyed while a for await loop is still reading from it.
  3. Nested cause chains: the AI SDK wraps errors. A stream error from the HTTP connection gets wrapped in an AI SDK error, which may itself be wrapped in another error by the streaming pipeline. The AbortError can be several levels deep in the cause chain.

The log level strategy follows from the semantics: AbortErrors are expected in production (users close tabs, navigate away, connections drop) and get logged at warn level. All other errors are genuine failures and get logged at error level.

export const streamErrorHandler = (logger) => {
  return ({ error }: { error: unknown }): void => {
    if (!(error instanceof Error)) {
      logger.error('Unknown streaming error', { error });
      return;
    }

    const cause = extractCause(error);

    if (isAbortError(error)) {
      logger.warn('Stream aborted (client disconnect or timeout)', {
        cause,
        errorType: 'AbortError',
        message: error.message,
      });
      return;
    }

    logger.error('Streaming error', {
      cause,
      errorType: 'StreamError',
      message: error.message,
      stack: error.stack,
    });
  };
};
Enter fullscreen mode Exit fullscreen mode

We also extract AI SDK-specific properties (text, usage) from the cause when present. The NoObjectGeneratedError, for example, includes the partial text the model generated before failing and the token usage at the time of failure.

These are valuable for debugging and cost tracking.

The Documentation Gap

We searched for this problem in every place you would expect to find it:

  • Middy documentation: The streaming section describes streamifyResponse: true and the PassThrough/ReadableStream pattern. It does not mention the after hook timing issue.
  • AWS Lambda streaming docs: Cover awslambda.streamifyResponse, response format, and limits. No mention of middleware timing.
  • AI SDK documentation: Covers onFinish, streamText, and stream protocols. Does not address the fire-and-forget nature of onFinish in serverless contexts.
  • Community posts and Stack Overflow: Multiple posts about Lambda streaming setup. None about middleware lifecycle interaction with streams.

The combination of Middy's streaming mode, the AI SDK's stream lifecycle, and Lambda's execution model creates a timing problem that each project documents only their part of. No single source connects the three.

What's Next

The problem is clear: Middy's after hook fires before the stream is consumed. The data we need for observability only exists after the stream ends. The three obvious workarounds each fail for a different structural reason.

The solution exists, and it is elegant. In Part 3, we will show how TransformStream's flush() callback gives us exactly what we need: a guaranteed execution point after the entire stream is consumed, running inline with the stream pipeline, before the Lambda execution environment can freeze.

It works within Middy's after hook, preserves the streaming behavior for the client, and requires no changes to the handler code.

The stream itself becomes the mechanism for its own observability.

References

Top comments (0)