Detection without response is operational noise.
GuardDuty alerts are valuable β but if a human has to read, decide, and manually isolate an instance, your blast radius window is still open.
I wanted high-confidence findings to trigger automatic containment.
So I built a minimal AWS-native SOAR pipeline.
No third-party tooling.
No overengineering.
Just deterministic, event-driven response.
π― Objective
Build an automated containment workflow that:
- Responds only to high-severity GuardDuty findings
- Automatically isolates compromised EC2 instances
- Preserves forensic access
- Avoids recursive execution
- Is observable and debuggable
All event-driven. No polling. No manual trigger.
π Architecture Overview
GuardDuty Finding
β
EventBridge Rule (severity >= 7)
β
Lambda Function (Isolation Logic)
β
Modify EC2 Security Group β Quarantine SG
β
SNS Notification (Visibility Layer)
Minimal. Deterministic. Cheap.
Filtering at the Event Layer (Not Inside Lambda)
Instead of checking severity inside the Lambda function, I filtered directly in EventBridge.
Why this matters:
- Reduces unnecessary Lambda invocations
- Makes response criteria explicit
- Improves audit clarity
- Lowers operational cost
Example event pattern:
{
"detail-type": ["GuardDuty Finding"],
"detail": {
"severity": [ { "numeric": [">=", 7] } ]
}
}
Only high-confidence findings trigger automation.
Everything else remains visible β but not auto-remediated.
Quarantine Security Group Design
Containment is not termination.
Terminating an instance destroys forensic evidence.
My quarantine security group:
β No outbound internet
β No inbound from public IP ranges
β Allow only SOC bastion IP
β Allow forensic collection host
β Optional: allow VPC Flow Logs / monitoring endpoint
The goal is isolation with controlled investigation access.
Isolation Logic (Lambda Example)
Core logic:
import boto3
ec2 = boto3.client('ec2')
def isolate_instance(instance_id, quarantine_sg_id):
ec2.modify_instance_attribute(
InstanceId=instance_id,
Groups=[quarantine_sg_id]
)
Additional safeguards added:
Check instance state before modification
Tag instance Quarantined=true
Exit if already isolated
Log original security groups for rollback
Containment must be idempotent.
Idempotency: Preventing Recursive Triggers
When Lambda modifies security groups, CloudTrail events may fire.
Without safeguards, you risk infinite loops.
Mitigation:
Tag check before modification
Structured event filtering
Explicit function logging
DLQ configured for failure cases
Automation that can repeat blindly is dangerous.
Failure Modes I Modeled
Automation amplifies mistakes.
I explicitly accounted for:
IAM permission drift
Partial security group modification
Concurrent findings on same instance
Cross-region GuardDuty setup
High-volume alert bursts
Mitigations:
Dead Letter Queue
Lambda concurrency limits
CloudWatch error metrics + alarms
Explicit structured logs (JSON format)
Permission boundary controls
Automation without observability becomes silent failure.
Impact
This reduced:
MTTR from minutes to seconds
Human triage fatigue
Decision bottlenecks
Inconsistent containment actions
But the real improvement was consistency.
Humans improvise during incidents.
Code executes predictably.
Trade-Offs & Risks
Auto-isolating compute is not trivial.
You must consider:
False positives at high severity
Production-critical workloads
Stateful applications
Already-compromised lateral movement
Multi-account architecture
Severity threshold tuning took longer than writing the Lambda function.
That surprised me.
Lessons Learned
Detection maturity does not equal response maturity.
Event-driven architecture scales better than polling remediation.
Idempotency is mandatory.
Multi-account containment becomes architecture work.
Automation exposes operational blind spots you didnβt know existed.
Next Iterations
If I evolve this into a more mature Cloud SOAR pattern:
Step Functions for multi-stage workflows
Automated EBS snapshot before isolation
Memory capture integration
Slack/Jira enrichment with context
Cross-account orchestration via AWS Organizations
GuardDuty central delegated admin integration
At that point, it becomes a response framework β not a script.
Final Thought
You donβt need a commercial SOAR platform to start automating response.
Start with:
Deterministic triggers
Guardrails
Observability
Explicit blast radius control
If detection isnβt wired to action, itβs just telemetry.
Top comments (2)
This is exactly the kind of content I look for! "Detection without response is operational noise" β absolutely true. Love how you've implemented event-driven isolation without relying on paid SOAR tools. Definitely trying this in my AWS environment. Thanks for sharing!
Thanks a lot, Harsh! Really glad it resonated