Extract OTP Codes From Email, Automatically

#automation #email #api #agents

What does your automation do when the login flow it's driving sends a six-digit code instead of a confirmation link? For most teams the honest answer is "a human goes and checks a shared inbox," which is a strange bottleneck to leave in the middle of an otherwise fully automated pipeline.

There's a cleaner shape: the agent owns the mailbox the code lands in. With a Nylas Agent Account — a hosted mailbox controlled entirely through the API, currently in beta — the OTP email arrives, a webhook fires, your handler extracts the code, and whatever orchestrates the login gets it back. No human, no inbox-checking Slack message, no screen-scraping Gmail.

Step one: make sure it's the right email

A message.created webhook fires on every inbound message, so the first job is filtering down to the one that actually carries the code. The recipe uses two signals together — sender domain and a subject heuristic:

app.post("/webhooks/otp", async (req, res) => {
  res.status(200).end();

  const event = req.body;
  if (event.type !== "message.created") return;

  const msg = event.data.object;
  if (msg.grant_id !== AGENT_GRANT_ID) return;

  const sender = msg.from?.[0]?.email ?? "";
  const subject = msg.subject ?? "";

  const senderMatches = sender.endsWith("@no-reply.example.com");
  const subjectLooksRight = /code|verif|one.?time|passcode/i.test(subject);
  if (!senderMatches || !subjectLooksRight) return;

  await handleOtp(msg.id);
});

Neither check alone is enough. Sender-only matching trips on welcome emails from the same domain; subject-only matching trips on anything that mentions "verification."

Regex first, LLM second

Most OTP emails follow one of a few shapes: a standalone 4–8 digit number, or a code after a label like "Your code is:". Three patterns, tried in order from most to least specific, cover the vast majority of services:

const patterns = [
  /(?:code|passcode|one[\s-]?time)[^\d]{0,20}(\d{4,8})/i, // "Your code is: 123456"
  /\b(\d{6})\b/,        // bare 6-digit
  /\b(\d{4,8})\b/,      // bare 4–8 digit (last resort)
];

One detail that's easy to miss: strip the HTML before matching. Inline styles and hidden tracking pixels are full of digit sequences that will happily satisfy your last-resort pattern.

When regex strikes out — usually a code buried in a noisy marketing layout — fall back to a small LLM with a deliberately narrow prompt:

async function extractWithLlm(plaintext) {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "system",
        content:
          "You extract one-time verification codes from email bodies. " +
          "Respond with JSON only: {\"code\": \"<the code>\"} or " +
          "{\"code\": null} if no code is present.",
      },
      { role: "user", content: plaintext.slice(0, 4000) },
    ],
    response_format: { type: "json_object" },
  });

  const parsed = JSON.parse(response.choices[0].message.content);
  if (parsed.code) return returnCode(parsed.code);
}

Only the first 4,000 characters of plaintext go in, and the model is asked for one thing in one shape — JSON with a code field or null. Don't ask the model to "understand" the email. Banking and enterprise senders sometimes rotate formats across sessions (6 digits, 8 digits, alphanumeric), and the LLM fallback is what absorbs those shifts without a regex update.

Getting the code back to whoever's waiting

The signup or login that triggered all this is blocked, waiting. The simplest bridge is a promise registry keyed by a correlation value — session ID, expected sender, run ID — with a timeout (the recipe defaults to 60 seconds):

export function awaitCode(correlationKey, timeoutMs = 60_000) {
  return new Promise((resolve, reject) => {
    const timer = setTimeout(() => {
      pending.delete(correlationKey);
      reject(new Error("OTP timeout"));
    }, timeoutMs);
    pending.set(correlationKey, { resolve, reject, timer });
  });
}

In production, swap the in-memory Map for a real queue or pub/sub — webhook handlers run on short-lived processes, and a restart between "code arrived" and "code consumed" loses the code.

The failure modes that actually bite

The recipe's warning list is the best part, because each item is a production incident in miniature:

Codes expire fast. Most services invalidate OTPs in 5–15 minutes. Check message.date freshness before returning a code — a slow agent will confidently hand back a dead one.
Multiple codes in the inbox. A stale code from an earlier attempt plus a fresh one means your regex can grab the wrong match. Sort by message timestamp, newest first, always.
Never log the code. OTPs are credentials. Log that one was received and returned; never the value.
Back off on failure. A tight retry loop requesting code after code looks like an attack from the service's side and gets the agent's address blocked.
Dedup redelivered webhooks. Nylas delivers webhooks at least once. A redelivered message.created can re-trigger extraction and hand a stale code back to a fresh login attempt — the duplicate-reply prevention patterns apply here too.

There's also a defense you set up before any of this code runs: lock the inbox down at the mail layer. Agent Account policies and rules can constrain inbound so only expected sender domains ever reach the agent's inbox. An OTP mailbox that accepts mail from anyone is an OTP mailbox someone will eventually try to confuse with look-alike messages; an allowlist makes the "match the right email" step mostly a formality.

Quick answers

What about magic links instead of codes? Same architecture, different regex — match a URL pattern instead of digits and follow the link instead of returning a value. The signup recipe covers that variant.

Why fetch the message body separately? The message.created webhook payload only carries summary fields — sender, subject, snippet. The full body comes from GET /v3/grants/{grant_id}/messages/{message_id}, which is the first call inside the extraction handler.

Why not just use the LLM for everything? Cost and latency, but mostly determinism. Regex either matches or it doesn't; you want the probabilistic component to be the fallback, not the front door.

Where this slots in

OTP extraction is rarely the whole feature — it's the middle step of something bigger, usually an agent signing up for a third-party service end to end: provision the mailbox, submit the form, catch the verification, finish onboarding. The link-based variant of verification is the same architecture with a URL regex instead of a digit regex.

Try this with a service you control first: point a test signup at the agent's address, watch the webhook land, and check which of the three regex tiers actually matched. If you've built OTP extraction before — what's the weirdest code format you've had to parse? I'm collecting nominations.

Top comments (1)

TopStar AI • Jun 12

The "regex first, LLM second" ordering is the call I'd defend to anyone — making the probabilistic component the fallback rather than the front door is exactly right when you want determinism on the happy path. And the failure list is the real gold here: checking message.date freshness and sorting newest-first are the kind of details you only learn by shipping a confident agent that handed back a dead code.
I build agent and automation systems — Python/FastAPI, webhooks, LLM tool-use — and have wrestled this OTP-bridge problem on real login flows. Would love to connect and trade notes, and happy to collaborate if you're building here.