Pratik Ponde

Posted on Feb 17

How to Get Real-Time Notifications When AWS Glue Schema Registry Changes

#ai #devops #aws #beginners

A real-world DevOps journey with AWS Glue, EventBridge, and Lambda

👋 Hey there! This is Pratik, a Senior DevOps Consultant with a strong background in automating and optimizing cloud infrastructure, particularly on AWS. Over the years, I have designed and implemented scalable solutions for enterprises, focusing on infrastructure as code, CI/CD pipelines, cloud security, and resilience. My expertise lies in translating complex cloud requirements into efficient, reliable, and cost-effective architectures.

In this article, I walk through a real-world approach to detecting AWS Glue Schema Registry updates in real time and why making schema changes observable matters in production systems.

🚨The Problem That Started It All

While working as a DevOps engineer, I supported a Kafka-based event-driven system where schemas mattered a lot. Producers and consumers depended heavily on schema versions. Any change, whether intentional or accidental, could quietly disrupt downstream services.

We were already using AWS Glue Schema Registry to manage schemas for Amazon MSK. It provided us with versioning and compatibility checks.

But one question kept coming up during reviews:

“How do we know when someone updates a schema ?”

There was no alert, no trigger, no automation just a silent update sitting in Glue.

That’s when I decided to build an event-driven notification system for schema changes.

💡The Idea: Make Schema Changes Event Driven

Instead of polling Glue or depending on manual communication, the idea was straightforward:
Whenever a schema changes, trigger an action automatically.

And that action Calls a POST API which could:

Notify teams
Trigger validations
Update dashboards
Or kick off CI/CD pipelines

⚙️Comprehensive Architecture Flow

Here’s how the flow looks in real life:

A schema is created or updated in Glue Schema Registry.
CloudTrail records the API activity.
EventBridge listens for specific Glue schema events.
Lambda is triggered automatically.
Lambda calls a POST API with schema details.

No polling. No cron jobs. Fully event-driven.

💻 Source Code and GitHub Repository

The complete implementation for this architecture is available on GitHub.

🔗 GitHub Repository:

pratiksponde / AWS-Glue-Schema-EventBridge-Lambda

How to Get Real-Time Notifications When AWS Glue Schema Registry Changes

Real-time notifications for AWS Glue Schema Registry changes using EventBridge and Lambda.

This repository provides a fully event-driven solution to detect and respond to schema changes in AWS Glue Schema Registry in real time. It uses AWS CloudTrail, Amazon EventBridge, and AWS Lambda to automatically capture schema-related API events and trigger downstream notifications or integrations.

📖 Full article explaining the architecture and implementation:
https://dev.to/pratik_26/how-to-get-real-time-notifications-when-aws-glue-schema-registry-changes-4nbd

🚀 Architecture Overview

This solution follows a serverless, event-driven architecture:

AWS Glue Schema Registry
- Stores and manages schemas for streaming and data applications.
- Events occur when schemas are created, updated, or new versions are registered.
AWS CloudTrail
- Records all Glue Schema Registry API calls such as
  - CreateSchema
  - RegisterSchemaVersion
  - UpdateSchema
- These events are automatically available to EventBridge.
Amazon EventBridge
- Captures relevant Glue events from CloudTrail.
- Filters only schema-related changes.
- Triggers the Lambda function.
AWS Lambda
- Receives…

View on GitHub

Feel free to clone the repository and try it in your own AWS environment.

Now, let’s dive deep into the hands-on implementation step by step.

Step 1: Capturing Schema Changes with CloudTrail

Steps to Create a Default Trail (Console):

Open CloudTrail: Go to the AWS CloudTrail console.
Create Trail: Click Create trail.

3. Configure Name: Enter a trail name (e.g., DefaultTrail).
4. Storage Location: Select Create new S3 bucket to let CloudTrail handle permissions automatically.
5. Log File Validation: Leave enabled (default) to ensure log integrity.

6. KMS Encryption: Leave enabled (default) for security.
7. Finish: Click Next, then Create trail.

AWS Glue APIs like CreateSchema, RegisterSchemaVersion, UpdateSchema are automatically logged in CloudTrail.

This was a big win no custom instrumentation required.

Every schema change already produced a reliable audit event.

Step 2: Filtering the Right Events Using EventBridge

Next, I created an EventBridge rule that listens only to Glue schema related CloudTrail events.

Instead of triggering Lambda for every Glue operation, the rule filters on:

• Event source: glue.amazonaws.com
• Event names related to schema updates

This kept the system clean, efficient, and cost-effective.

Steps to Create a EventBridge Rule (Console):

Open EventBridge: Navigate to the Amazon EventBridge console.
Create Rule: In the navigation pane, choose Rules, and then choose Create rule.

3. Define Rule: Enter a Name and optional Description for the rule and select default Event Bus.
4. Configure the Event Source: Use the visual builder or JSON editor to define your event pattern.

Use below event pattern for AWS Glue schema registry.

  {
    "source": ["aws.glue"],
    "detail-type": ["AWS API Call via CloudTrail"],
    "detail": {
      "eventSource": ["glue.amazonaws.com"],
      "eventName": ["RegisterSchemaVersion", "UpdateSchema", "CreateSchema"]
    }
  }

5. Select Targets: From the Select a target list, choose the AWS Lambda service to invoke when the event is matched & configure the required details for the selected target.

6. Finish: Choose Create rule to activate your rule.

Step 3: Lambda as the Brain of the System

Lambda became the decision-maker and Its responsibilities were very clear:

• Parse the CloudTrail event
• Identify what changed
• Extract schema details (registry, name, version ARN)
• Prepare a meaningful payload
• Call the external POST API

This approach kept Lambda lightweight and focused.

Steps to Create a Lambda Function (Console):

Open Lambda Function: Navigate to AWS Lambda function console.
Create Function: Select Author from scratch, give function name and you can select any runtime (I have selected python 3.11), select create a new role with basic Lambda permissions and click on create.
Attach Permission: Select created lambda and go to configuration and then select permission and click on the role attached to function. Add AWSGlueSchemaRegistryFullAccess permission policy to this role.

Note: Modify the policy as needed to align with your security requirements and best practices.
4. Set Environment Variable: Go to configuration and then select permission and click on environment variables.

Key: API_URL
Value: http://ip.of.instance:port/contextpath/of/your/api
(This is just a sample syntax. Replace this with your actual API Url.)

5. Upload Code: Please upload below given code to call an external API.

import json
import os
import urllib3

http = urllib3.PoolManager()

API_URL = os.environ.get("API_URL")

def extract_schema_info(detail):
    request_params = detail.get("requestParameters", {})
    response_elements = detail.get("responseElements", {})

    schema_arn = response_elements.get("schemaArn")
    schema_version_arn = response_elements.get("schemaVersionArn")

    registry_name = request_params.get("registryId", {}).get("registryName")
    schema_name = request_params.get("schemaName")

    return {
        "registry_name": registry_name,
        "schema_name": schema_name,
        "schema_arn": schema_arn,
        "schema_version_arn": schema_version_arn
    }

def lambda_handler(event, context):
    try:
        detail = event.get("detail", {})
        event_name = detail.get("eventName")


        if event_name not in [
            "CreateSchema",
            "RegisterSchemaVersion",
            "UpdateSchema"
        ]:
            print(f"Ignored event: {event_name}")
            return {"status": "ignored"}

        schema_info = extract_schema_info(detail)

        payload = {
            "eventType": event_name,
            "schemaUpdated": True,
            "schemaDetails": schema_info,
            "timestamp": detail.get("eventTime"),
            "awsAccount": detail.get("recipientAccountId"),
            "region": detail.get("awsRegion")
        }

        encoded_body = json.dumps(payload).encode("utf-8")

        response = http.request(
            "POST",
            API_URL,
            body=encoded_body,
            headers={
                "Content-Type": "application/json"
            },
            timeout=urllib3.Timeout(connect=5.0, read=10.0)
        )

        print("API response status:", response.status)
        print("API response body:", response.data.decode())

        return {
            "status": "success",
            "httpStatus": response.status
        }

    except Exception as e:
        print("Lambda execution failed:", str(e))
        raise

Step 4: Calling a API

The final step was integration and the Lambda sends a POST request to an API with details like:
  • Schema name
  • Registry name
  • Schema version ARN
  • Event type (create/update)
  • Timestamp and region
From here, the API can:
  • Notify consumers
  • Trigger validations
  • Store audit records
  • Or block deployments if needed.

🚀Why This Approach Worked So Well

What I liked most about this solution:
• ✅ Fully event-driven
• ✅ No polling or scheduled jobs
• ✅ Zero impact on producers or consumers
• ✅ Scales automatically
• ✅ Clear audit trail for schema changes
It also aligned perfectly with serverless best practices.

📚Real Lessons Learned

While implementing this, a few things stood out:
• CloudTrail events are rich but noisy → filtering is critical
• Lambda should not assume all fields exist in every event
• Timeouts and retries matter when calling external APIs
• Logging schema ARNs saved a lot of debugging time later
These small details made the difference between a demo and a production ready solution.

✨Final Thoughts

Schema changes are not just metadata updates; they are potential breaking changes. By turning schema updates into events, this setup changed Glue Schema Registry from a passive store into an active part of the system architecture. If you’re running Kafka on AWS and care about schema governance, it’s a good idea to adopt this pattern.

💬 Let’s Keep the Conversation Going

Have thoughts, questions, or any experience with Event driven architectures to share? I would love to hear from you! Feel free to leave a comment or connect with me on LinkedIn. Let's learn and grow together as a community of builders.
Keep exploring, keep automating and see you in the next one!

Top comments (2)

Sarvar Nadaf • Feb 17

Well Written Pratik

Some comments may only be visible to logged-in visitors. Sign in to view all comments.