Pablo Albaladejo

Posted on Feb 15

Deploy Observable AI Streaming on AWS: CDK Stack with Tests

#serverless #aws #cdk #typescript

This is the final post in a four-part series. In Part 1, we set up the architecture: API Gateway REST streaming through Lambda with Middy and the Vercel AI SDK. In Part 2, we discovered that Middy's after hook fires before the stream body is consumed, making it useless for observability.

In Part 3, we built the solution: a TransformStream pipeline that defers logging and data publishing to the flush() callback, which the WHATWG Streams spec guarantees runs only after all chunks have been consumed.

Now we deploy it. This post walks through every piece: the CDK infrastructure, the handler composition, the test suite, and the deployment. The companion repo contains the complete, runnable code as step-4.

By the end, you will have a working stack that streams structured JSON from Bedrock Claude through API Gateway to the client, with observability that fires at the right time: after the stream completes, not before.

TL;DR: A single CDK stack deploys API Gateway REST with streaming, a Lambda function running Middy + AI SDK v6, and Bedrock IAM permissions. The handler composes AI SDK middlewares (logging + data store) with a Middy middleware (publishLog) that uses the Dual Middleware Bridge and TransformStream Pipeline patterns. Deploy with npx cdk deploy, test with curl --no-buffer. Full test suite included: 13 tests covering flush timing, error classification, and the streaming pipeline.

How do you deploy API Gateway REST streaming with CDK?

The infrastructure is a single CDK stack with three resources: a Lambda function, an API Gateway REST API, and the IAM permissions to call Bedrock.

// lib/streaming-stack.ts

import * as cdk from 'aws-cdk-lib';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as logs from 'aws-cdk-lib/aws-logs';
import type { Construct } from 'constructs';
import * as path from 'node:path';

export class StreamingStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // --- Lambda Function ---

    const fn = new lambda.Function(this, 'StreamingHandler', {
      runtime: lambda.Runtime.NODEJS_22_X,
      handler: 'index.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '..', 'lambda'), {
        bundling: {
          image: lambda.Runtime.NODEJS_22_X.bundlingImage,
          command: [
            'bash',
            '-c',
            [
              'npx esbuild index.ts --bundle --platform=node --target=node22 --outfile=/asset-output/index.mjs --format=esm --external:@aws-sdk/*',
              'cp /asset-output/index.mjs /asset-output/index.js',
            ].join(' && '),
          ],
          local: {
            tryBundle(outputDir: string): boolean {
              try {
                const { execSync } = require('node:child_process');
                execSync(
                  `npx esbuild lambda/index.ts --bundle --platform=node --target=node22 --outfile=${outputDir}/index.js --format=esm --external:@aws-sdk/*`,
                  { cwd: path.join(__dirname, '..') },
                );
                return true;
              } catch {
                return false;
              }
            },
          },
        },
      }),
      memorySize: 512,
      timeout: cdk.Duration.seconds(60),
      logRetention: logs.RetentionDays.ONE_WEEK,
      environment: {
        NODE_OPTIONS: '--enable-source-maps',
      },
    });

    // Grant Bedrock model invocation permissions
    fn.addToRolePolicy(
      new iam.PolicyStatement({
        actions: [
          'bedrock:InvokeModel',
          'bedrock:InvokeModelWithResponseStream',
        ],
        resources: ['arn:aws:bedrock:*::foundation-model/*'],
      }),
    );

    // --- API Gateway REST API ---

    const api = new apigateway.RestApi(this, 'StreamingApi', {
      restApiName: 'streaming-llm-api',
      description: 'API Gateway REST with Lambda response streaming',
      deployOptions: {
        stageName: 'v1',
      },
    });

    const streamResource = api.root.addResource('stream');

    // OPTIONS for CORS preflight
    streamResource.addCorsPreflight({
      allowOrigins: apigateway.Cors.ALL_ORIGINS,
      allowMethods: ['POST', 'OPTIONS'],
      allowHeaders: ['Content-Type'],
    });

    // POST /stream -- Lambda proxy integration
    const postMethod = streamResource.addMethod(
      'POST',
      new apigateway.LambdaIntegration(fn, { proxy: true }),
    );

    // --- CfnMethod escape hatch for streaming ---
    const cfnMethod = postMethod.node.defaultChild as apigateway.CfnMethod;

    cfnMethod.addPropertyOverride(
      'Integration.ResponseTransferMode', 'STREAM',
    );
    cfnMethod.addPropertyOverride('Integration.TimeoutInMillis', 60_000);
    cfnMethod.addPropertyOverride(
      'Integration.Uri',
      cdk.Fn.sub(
        'arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${FnArn}/response-streaming-invocations',
        { FnArn: fn.functionArn },
      ),
    );

    // --- Outputs ---

    new cdk.CfnOutput(this, 'ApiUrl', {
      value: `${api.url}stream`,
      description: 'POST endpoint for streaming LLM responses',
    });
  }
}

A few things to note.

Node.js 22 with esbuild bundling. The local.tryBundle hook runs esbuild on your machine if it is available, falling back to Docker if not.

We bundle as ESM, exclude @aws-sdk/* (already in the Lambda runtime), and enable source maps for readable stack traces.

Bedrock permissions. We grant both bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream. The AI SDK uses the streaming variant, but having both keeps the door open for non-streaming calls in the same handler.

The CfnMethod escape hatch. As we covered in Part 1, CDK's L2 addMethod construct does not expose ResponseTransferMode or the response-streaming-invocations Lambda invocation URI.

We drop down to the L1 CfnMethod and override three properties: the transfer mode, the timeout, and the integration URI. This is the only CDK workaround required to enable response streaming through API Gateway REST.

CORS via addCorsPreflight. This generates the OPTIONS mock integration automatically. The httpCors Middy middleware adds the headers to the actual POST response.

How do you compose Middy and AI SDK middlewares for streaming?

The handler wires together two middleware systems: AI SDK middlewares (which operate on the LLM stream) and Middy middlewares (which operate on the HTTP response).

// lambda/index.ts

import middy from '@middy/core';
import httpCors from '@middy/http-cors';
import type { APIGatewayProxyEvent } from 'aws-lambda';
import { DataStoreMiddleware } from './ai-middleware/data-store-middleware';
import { LoggingMiddleware } from './ai-middleware/logging-middleware';
import { publishLog } from './middleware/publish-log';
import { streamErrorHandler } from './middleware/stream-error-handler';
import { streamingService } from './streaming-service';
import { llmLogDataStore } from './utils/log-data-store';

type HttpStreamResponse = {
  body: ReadableStream<Uint8Array> | string;
  headers: Record<string, string>;
  statusCode: number;
};

const logger = {
  info: (...args: Array<unknown>) => console.log('[INFO]', ...args),
  warn: (...args: Array<unknown>) => console.warn('[WARN]', ...args),
  error: (...args: Array<unknown>) => console.error('[ERROR]', ...args),
};

const onError = streamErrorHandler(logger);

// AI SDK middlewares -- operate on the LanguageModel stream
const aiMiddlewares = [
  LoggingMiddleware(logger),
  DataStoreMiddleware(llmLogDataStore.set),
];

const streamHandler = async (
  event: APIGatewayProxyEvent,
): Promise<HttpStreamResponse> => {
  const body = JSON.parse(event.body ?? '{}');
  const prompt: string =
    body.prompt ?? 'Summarize the benefits of serverless architecture';

  const result = streamingService({ prompt }, onError, aiMiddlewares);
  const response = result.toTextStreamResponse();

  return {
    body: response.body ?? '',
    headers: Object.fromEntries(response.headers.entries()),
    statusCode: response.status,
  };
};

// Middy middleware chain:
//   1. httpCors         -- adds CORS headers
//   2. publishLog       -- before: clears store
//                          after (streaming): wraps body with TransformStream
//                          after (non-streaming): publishes immediately
//
// Data flow for a streaming response:
//
//   streamText()
//     -> AI SDK stream (LanguageModelV3StreamPart)
//       -> LoggingMiddleware.wrapStream    (flush: logs completion)
//       -> DataStoreMiddleware.wrapStream  (flush: stores data)
//     -> toTextStreamResponse()
//       -> ReadableStream<Uint8Array>
//         -> publishLog TransformStream    (flush: publishes stored data)
//           -> Middy pipes to Lambda responseStream
//             -> API Gateway streams to client

export const handler = middy<APIGatewayProxyEvent, HttpStreamResponse>({
  streamifyResponse: true,
})
  .use(httpCors({ origins: ['*'] }))
  .use(publishLog(llmLogDataStore, logger))
  .handler(streamHandler);

The data flow comment is the map of the entire pipeline. Read it bottom to top to see how data reaches the client. Read it top to bottom to see where each flush() fires.

flowchart LR
    A["streamText()"] --> B["LoggingMiddleware\n(TransformStream)"]
    B --> C["DataStoreMiddleware\n(TransformStream)"]
    C --> D["toTextStreamResponse()"]
    D --> E["publishLog\n(TransformStream)"]
    E --> F["Lambda\nresponseStream"]
    F --> G["API Gateway\n→ Client"]

    B -. "flush: log completion" .-> B
    C -. "flush: store data" .-> C
    E -. "flush: publish log" .-> E

    style B fill:#e1f5fe
    style C fill:#e1f5fe
    style E fill:#fff3e0

Three TransformStream instances are chained together:

LoggingMiddleware: wraps the AI SDK's raw LanguageModelV3StreamPart stream. On flush, it logs the chunk count and model ID.
DataStoreMiddleware: wraps the same stream (after logging). On flush, it writes the full request params and response data into the in-memory store.
publishLog: wraps the final ReadableStream<Uint8Array> (after toTextStreamResponse() converts the AI SDK stream to bytes). On flush, it reads the store and publishes the log.

The in-memory llmLogDataStore is the bridge between the two middleware systems. AI SDK middlewares write to it. The Middy middleware reads from it.

Both are connected through flush() ordering: the AI SDK flushes fire first (they are upstream in the pipeline), then the Middy flush fires last (it is downstream, wrapping the byte stream).

Testing the Pipeline

We test three units independently: the stream transform utility, the publish log middleware, and the stream error handler.

createStreamTransform

This is the generic utility that wraps any ReadableStream with a TransformStream and calls a callback on flush.

// test/create-stream-transform.test.ts

describe('createStreamTransform', () => {
  it('passes all chunks through unchanged', async () => {
    const source = new ReadableStream<string>({
      start(controller): void {
        controller.enqueue('hello');
        controller.enqueue(' ');
        controller.enqueue('world');
        controller.close();
      },
    });

    const onFlush = vi.fn();
    const transformed = createStreamTransform(source, onFlush);
    const result = await readStream(transformed);

    expect(result).toEqual(['hello', ' ', 'world']);
  });

  it('calls onFlush with all accumulated chunks after stream ends', async () => {
    const source = new ReadableStream<string>({
      start(controller): void {
        controller.enqueue('a');
        controller.enqueue('b');
        controller.enqueue('c');
        controller.close();
      },
    });

    const onFlush = vi.fn();
    const transformed = createStreamTransform(source, onFlush);
    await readStream(transformed);

    expect(onFlush).toHaveBeenCalledOnce();
    expect(onFlush).toHaveBeenCalledWith(['a', 'b', 'c']);
  });

  it('swallows errors thrown in onFlush', async () => {
    const source = new ReadableStream<string>({
      start(controller): void {
        controller.enqueue('data');
        controller.close();
      },
    });

    const onFlush = vi.fn().mockRejectedValue(new Error('publish failed'));
    const transformed = createStreamTransform(source, onFlush);

    const result = await readStream(transformed);
    expect(result).toEqual(['data']);
    expect(onFlush).toHaveBeenCalledOnce();
  });

  it('does not call onFlush when stream is cancelled', async () => {
    const source = new ReadableStream<string>({
      start(controller): void {
        controller.enqueue('first');
        // Stream stays open -- simulates an ongoing LLM stream
      },
    });

    const onFlush = vi.fn();
    const transformed = createStreamTransform(source, onFlush);
    const reader = transformed.getReader();

    await reader.read(); // Read 'first'
    await reader.cancel('client disconnected');

    expect(onFlush).not.toHaveBeenCalled();
  });
});

Four behaviors, four tests. The critical ones are the last two: flush swallows errors (observability never breaks the client stream), and flush does not fire on cancellation (a client disconnect should not trigger publication of incomplete data).

publishLog

This tests the dual-path behavior of the Middy middleware: immediate publication for non-streaming responses, deferred publication for streaming responses.

// test/publish-log.test.ts

describe('publishLog middleware', () => {
  it('publishes immediately for non-streaming responses', async () => {
    const store = createLlmLogDataStore();
    const logger = createMockLogger();
    const middleware = publishLog(store, logger);

    store.set({
      llmParam: { model: 'test' },
      llmResult: { text: 'hello' },
    });

    const request = {
      response: {
        body: JSON.stringify({ result: 'ok' }),
        headers: {},
        statusCode: 200,
      },
    };

    await middleware.after!(request as never);

    expect(logger.info).toHaveBeenCalledWith(
      'LLM log data (non-streaming)',
      expect.objectContaining({ data: expect.any(Object) }),
    );
  });

  it('defers publication to flush() for streaming responses', async () => {
    const store = createLlmLogDataStore();
    const logger = createMockLogger();
    const middleware = publishLog(store, logger);

    const sourceStream = new ReadableStream<Uint8Array>({
      start(controller): void {
        controller.enqueue(new TextEncoder().encode('chunk1'));
        controller.enqueue(new TextEncoder().encode('chunk2'));
        controller.close();
      },
    });

    store.set({
      llmParam: { model: 'test' },
      llmResult: { tokens: 42 },
    });

    const request = {
      response: {
        body: sourceStream,
        headers: { 'content-type': 'text/plain' },
        statusCode: 200,
      },
    };

    // After hook wraps the stream -- does NOT publish yet
    await middleware.after!(request as never);
    expect(logger.info).not.toHaveBeenCalled();

    // Consume the wrapped stream -- flush() fires and publishes
    const body = await readStream(
      request.response.body as ReadableStream<Uint8Array>,
    );

    expect(body).toBe('chunk1chunk2');
    expect(logger.info).toHaveBeenCalledWith(
      'LLM log data (streaming)',
      expect.objectContaining({ data: expect.any(Object) }),
    );
  });
});

The key assertion is in the streaming test: after the after hook runs, logger.info has not been called. It is only called after we consume the stream with readStream(). This proves that the deferred execution model works.

streamErrorHandler

Streaming introduces error types you do not see in request/response handlers. The streamErrorHandler classifies them.

// test/stream-error-handler.test.ts

describe('streamErrorHandler', () => {
  it('logs AbortError at warn level', () => {
    const logger = createMockLogger();
    const handler = streamErrorHandler(logger);

    const error = new Error('The operation was aborted');
    error.name = 'AbortError';

    handler({ error });

    expect(logger.warn).toHaveBeenCalledOnce();
    expect(logger.error).not.toHaveBeenCalled();
  });

  it('logs ERR_INVALID_STATE as abort at warn level', () => {
    const logger = createMockLogger();
    const handler = streamErrorHandler(logger);

    const error = new TypeError('Invalid state: Reader released');
    (error as unknown as { code: string }).code = 'ERR_INVALID_STATE';

    handler({ error });

    expect(logger.warn).toHaveBeenCalledOnce();
    expect(logger.error).not.toHaveBeenCalled();
  });

  it('detects nested AbortError in cause chain', () => {
    const logger = createMockLogger();
    const handler = streamErrorHandler(logger);

    const cause = new Error('aborted');
    cause.name = 'AbortError';
    const error = new Error('AI SDK error', { cause });

    handler({ error });

    expect(logger.warn).toHaveBeenCalledOnce();
    expect(logger.error).not.toHaveBeenCalled();
  });

  it('logs non-abort errors at error level', () => {
    const logger = createMockLogger();
    const handler = streamErrorHandler(logger);

    handler({ error: new Error('connection timeout') });

    expect(logger.error).toHaveBeenCalledOnce();
    expect(logger.warn).not.toHaveBeenCalled();
  });
});

AbortError and ERR_INVALID_STATE are normal in streaming. They mean the client disconnected. The AI SDK wraps these in its own errors, so we walk the cause chain.

These go to warn, not error, because they are not actionable failures. Everything else goes to error.

The awslambda mock fixture

Testing streamifyResponse handlers outside the Lambda runtime requires mocking the awslambda global. The test fixture creates a minimal implementation:

// test/fixtures/streaming.fixtures.ts

export const setupAwsLambdaStreamingMock = (): void => {
  (globalThis as unknown as { awslambda: unknown }).awslambda = {
    HttpResponseStream: {
      from: (
        stream: LambdaResponseStream,
        metadata: HttpResponseMetadata,
      ): LambdaResponseStream => {
        stream.__httpResponseMetadata = metadata;
        return stream;
      },
    },
    streamifyResponse: <T>(handler: T): T => handler,
  };
};

streamifyResponse is a pass-through (it returns the handler as-is). HttpResponseStream.from attaches metadata to the writable stream. This is enough to test the full Middy chain locally without deploying.

Deployment and Verification

Prerequisites

Node.js 22+
AWS CLI configured with credentials
AWS CDK CLI (npm install -g aws-cdk)
An AWS account with Bedrock model access enabled for Claude 3.5 Sonnet v2

Deploy

cd step-4
npm install
npx cdk deploy

CDK will output the API URL:

Outputs:
StreamingLlmStack.ApiUrl = https://abc123.execute-api.us-east-1.amazonaws.com/v1/stream

Test with curl

curl -i --no-buffer -X POST \
  https://<api-id>.execute-api.<region>.amazonaws.com/v1/stream \
  -H 'Content-Type: application/json' \
  -d '{"prompt": "Explain the benefits of serverless architecture"}'

The --no-buffer flag tells curl to display chunks as they arrive instead of buffering the entire response. You should see:

Headers arrive immediately. Transfer-Encoding: chunked, Content-Type: text/plain; charset=utf-8.
A 1-2 second pause. This is Lambda's initial buffering, not a bug. Lambda buffers roughly the first 6 KB before beginning to stream.
Chunks arriving progressively. JSON fragments flowing in as Bedrock generates them.
The connection closes when the stream is complete.

Verify observability

Open CloudWatch Logs for the Lambda function. After the response finishes streaming, you should see two log entries:

[INFO] LLM streaming call completed: from the LoggingMiddleware flush, with chunk count and model ID.
[INFO] LLM log data (streaming): from the publishLog flush, with the full request parameters and response data.

Both entries appear after the last byte has been sent to the client. This is the whole point.

If we had relied on Middy's after hook, these logs would appear before any streaming started, with no data to report.

Run the tests locally

npm test

Tear down

npx cdk destroy

What are the production considerations for streaming Lambda responses?

The companion code is intentionally minimal. Here is what you would add before shipping to production.

Structured logging. Replace the console.log wrapper with Powertools for AWS Lambda (TypeScript). It gives you structured JSON logs, correlation IDs, and log sampling out of the box.

Event publishing. The publishLog middleware currently writes to CloudWatch. In production, replace logger.info('LLM log data (streaming)', { data }) with a publish to SNS, Kinesis Data Firehose, or whatever event bus your platform uses. The flush() pattern works the same regardless of the destination.

Authentication. Add a Middy before hook that validates a JWT or calls an authorizer. It runs before the handler, so it has no interaction with the streaming pipeline.

Request validation. Add a Zod schema and a parser middleware to validate the incoming body. Reject bad input before you invoke Bedrock.

VPC considerations. Lambda Function URLs do not support response streaming when the function is in a VPC. API Gateway REST does, because it uses the InvokeWithResponseStream API from its own managed infrastructure.

If your Lambda needs VPC access (to reach an RDS database, for example), API Gateway REST is your path.

Timeout alignment. Set the API Gateway integration timeout equal to or greater than the Lambda function timeout. In our stack, both are 60 seconds. If the API Gateway timeout is shorter, it will close the connection while Lambda is still streaming, and the client will get a truncated response with no error signal.

Pricing and quotas. Lambda response streaming uses InvokeWithResponseStream, which is billed identically to standard Invoke (duration plus memory). However, streaming responses are not interrupted when the client disconnects. You are billed for the full function duration even if the client closes the connection mid-stream.

API Gateway REST has a hard 10 MB payload limit for streamed responses. Lambda supports up to 200 MB via InvokeWithResponseStream. For LLM responses (typically tens of KB), neither limit is a concern.

The default integration timeout of 29 seconds can be increased for regional and private APIs through Service Quotas.

Series Recap

Four posts, one problem, one solution.

Part 1: The Architecture. API Gateway REST can stream Lambda responses using ResponseTransferMode: STREAM and the response-streaming-invocations URI. Middy's streamifyResponse option preserves the full middleware chain. The AI SDK provides a clean interface to Bedrock with streamText, Output.object(), and wrapLanguageModel.

Part 2: The Problem. Middy's after hook fires when the handler returns, not when the stream body has been consumed. For streaming responses, the handler returns a ReadableStream that has not been read yet. Any observability work in after runs against empty data.

Part 3: The Solution. The WHATWG Streams TransformStream has a flush() callback that fires after all chunks have passed through. We use this as a deferred execution point: accumulate data during transform(), publish it during flush(). The AI SDK's wrapStream middleware pattern lets us do the same at the LLM stream level.

Part 4: The Implementation. A complete CDK stack, a composed Lambda handler, and a test suite that verifies every behavior.

The key takeaway: when working with streaming responses in serverless, attach your observability to the data pipeline, not to lifecycle hooks. Middleware frameworks were designed for request/response. Streams are a different paradigm.

The WHATWG Streams spec gives you flush() as a guaranteed execution point after stream completion. Use it.

The complete source code is available in the step-4 branch of the companion repository.

References

Building responsive APIs with Amazon API Gateway response streaming: AWS announcement (Nov 2025)
Response streaming for Lambda functions: AWS Lambda Developer Guide
Bedrock model access configuration: enabling Claude in your account
WHATWG Streams Standard: TransformStream specification
AI SDK streamText reference: Vercel AI SDK
AI SDK Middleware: LanguageModelMiddleware
AI SDK Amazon Bedrock Provider: Bedrock integration
Middy streaming integration: Middy documentation
Powertools for AWS Lambda (TypeScript): structured logging
Running code after returning a response from Lambda: post-response execution caveats

DEV Community