DEV Community

Cover image for Presidio as an LLM Guardrail
Brian Spann
Brian Spann

Posted on

Presidio as an LLM Guardrail

Every previous part of this series has been building toward this one. You can detect PII. You can anonymize it with the right operator for each entity type. You can build custom recognizers for your organization's specific data patterns. Now we put it all together into the architecture that matters most in 2026: a PII guardrail that sits between your users and your LLM.

The problem is straightforward. Users type personal information into prompts. Support agents paste customer records into chat interfaces. Developers pipe production data into debugging workflows. All of that PII flows to your model provider's API endpoint. Even if the provider says they don't train on your data, the information still transits their infrastructure. For regulated industries, that transit itself can be a compliance violation.

The PII Proxy Pattern

The solution is a proxy that intercepts every LLM request, scrubs PII from the prompt, forwards the clean version, and then restores the PII in the response.

The flow looks like this:

  1. User sends a prompt containing PII
  2. Proxy detects and encrypts all PII entities
  3. Clean prompt (with encrypted tokens) goes to the LLM
  4. LLM responds using the encrypted tokens
  5. Proxy decrypts the tokens in the response, restoring original PII
  6. User sees a response with their real data intact

The user never notices the proxy exists. The LLM never sees the real PII. The encryption key stays on your infrastructure.

Building the Proxy in Python

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine, DeanonymizeEngine
from presidio_anonymizer.entities import OperatorConfig
import openai

# Initialize Presidio engines
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
deanonymizer = DeanonymizeEngine()

ENCRYPTION_KEY = "WmZq4t7w!z%C*F-J"  # In production, pull from Key Vault

def scrub_prompt(text: str) -> tuple:
    """Detect and encrypt PII in the prompt."""
    results = analyzer.analyze(text=text, language="en")

    if not results:
        return text, None

    anonymized = anonymizer.anonymize(
        text=text,
        analyzer_results=results,
        operators={
            "DEFAULT": OperatorConfig("encrypt", {"key": ENCRYPTION_KEY})
        }
    )

    return anonymized.text, anonymized.items

def restore_response(text: str, items: list) -> str:
    """Decrypt PII tokens in the LLM response."""
    if not items:
        return text

    deanonymized = deanonymizer.deanonymize(
        text=text,
        entities=items,
        operators={
            "DEFAULT": OperatorConfig("decrypt", {"key": ENCRYPTION_KEY})
        }
    )

    return deanonymized.text

def chat_with_guardrail(user_message: str) -> str:
    """Send a message to the LLM with PII protection."""
    # Step 1: Scrub
    clean_prompt, pii_items = scrub_prompt(user_message)

    # Step 2: Send to LLM
    client = openai.AzureOpenAI(
        azure_endpoint="https://your-endpoint.openai.azure.com/",
        api_key="your-api-key",
        api_version="2024-02-01"
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": clean_prompt}]
    )

    llm_response = response.choices[0].message.content

    # Step 3: Restore
    final_response = restore_response(llm_response, pii_items)

    return final_response
Enter fullscreen mode Exit fullscreen mode

Test it:

user_input = """
Summarize this customer case: John Smith (john.smith@acme.com, 
SSN 123-45-6789) reported unauthorized charges on his Visa 
ending 4242. He can be reached at 206-555-0147.
"""

response = chat_with_guardrail(user_input)
print(response)
Enter fullscreen mode Exit fullscreen mode

What the LLM sees: encrypted tokens where the PII was. What the user sees: a response with their real customer data. The LLM processes the request without ever handling the actual PII.

Moving the Guardrail into Azure API Management

The Python proxy works, but it lives inside one application. Every team that wants the same protection has to wire in the same code and keep it current. A guardrail belongs at the edge, where every model call already passes through. On Azure, that edge is API Management.

Put APIM in front of Azure OpenAI and point your applications at the APIM endpoint instead of the model endpoint. Now APIM is the one place that sees every prompt and every completion. An inbound policy scrubs PII out of the prompt before it reaches the model. An outbound policy restores it on the way back, so the caller still gets their real values. You can run either direction on its own, or both.

The flow with APIM:

  1. App calls the APIM endpoint with a prompt containing PII
  2. Inbound policy sends the prompt to Presidio, which encrypts the PII entities
  3. APIM stashes the entity map in a context variable and forwards the scrubbed prompt to Azure OpenAI
  4. Azure OpenAI responds, echoing back the encrypted tokens
  5. Outbound policy sends the response plus the saved entity map to Presidio to decrypt
  6. APIM returns the restored response to the app

The model never sees real PII. The encryption key and the entity map never leave your APIM instance and its backend. No application code changes.

In this setup Presidio sits behind two small endpoints, /deidentify and /reidentify, that wrap the analyzer and anonymizer (a thin container that encrypts on the way in, decrypts on the way out, with the key pulled from Key Vault). The APIM policy calls them with send-request:

<policies>
  <inbound>
    <base />
    <!-- Pull the user's prompt out of the chat completion body -->
    <set-variable name="userPrompt"
      value="@(context.Request.Body.As<JObject>(preserveContent: true)["messages"].Last["content"].ToString())" />

    <!-- De-identify: send the prompt to Presidio before the model sees it -->
    <send-request mode="new" response-variable-name="deidentified" timeout="10">
      <set-url>https://presidio.internal/deidentify</set-url>
      <set-method>POST</set-method>
      <set-header name="Content-Type" exists-action="override">
        <value>application/json</value>
      </set-header>
      <set-body>@(new JObject(new JProperty("text", (string)context.Variables["userPrompt"])).ToString())</set-body>
    </send-request>

    <!-- Save the entity map so the outbound step can re-identify -->
    <set-variable name="entityMap"
      value="@(((IResponse)context.Variables["deidentified"]).Body.As<JObject>()["entities"].ToString())" />

    <!-- Swap the scrubbed prompt back into the request before it hits the model -->
    <set-body>@{
      var body = context.Request.Body.As<JObject>();
      var clean = ((IResponse)context.Variables["deidentified"]).Body.As<JObject>()["text"].ToString();
      body["messages"].Last["content"] = clean;
      return body.ToString();
    }</set-body>
  </inbound>

  <backend>
    <base />
  </backend>

  <outbound>
    <base />
    <!-- Re-identify: decrypt the PII back into the model's response -->
    <send-request mode="new" response-variable-name="reidentified" timeout="10">
      <set-url>https://presidio.internal/reidentify</set-url>
      <set-method>POST</set-method>
      <set-header name="Content-Type" exists-action="override">
        <value>application/json</value>
      </set-header>
      <set-body>@{
        var resp = context.Response.Body.As<JObject>(preserveContent: true);
        var content = resp["choices"][0]["message"]["content"].ToString();
        return new JObject(
          new JProperty("text", content),
          new JProperty("entities", JArray.Parse((string)context.Variables["entityMap"]))
        ).ToString();
      }</set-body>
    </send-request>

    <set-body>@{
      var resp = context.Response.Body.As<JObject>();
      var restored = ((IResponse)context.Variables["reidentified"]).Body.As<JObject>()["text"].ToString();
      resp["choices"][0]["message"]["content"] = restored;
      return resp.ToString();
    }</set-body>
  </outbound>

  <on-error>
    <base />
  </on-error>
</policies>
Enter fullscreen mode Exit fullscreen mode

With this policy in place, every application pointed at the APIM endpoint gets PII protection without changing a line of its own code. The inbound and outbound blocks are independent: scrub on the way in only, restore on the way out only, or both, depending on whether you need the real values back in the response.

Two decisions shape the setup:

Reversibility. The policy above uses Presidio's encrypt operator so the outbound step can decrypt. If you only need to keep PII away from the model and never need it back, switch the wrapper to replace and drop the outbound policy. It's simpler and there's no key to manage.

Where Presidio runs. The send-request calls point at an internal Presidio endpoint. Keep it on the same VNet as APIM so prompts never touch the public internet. The next section covers those deployment options.

Deploying on Azure

For production, you need Presidio running as a service, not embedded in your application code. Here are the deployment options on Azure, from the quickest to stand up to the most production-ready.

Azure App Service

The fastest path to production. Deploy the Presidio Docker containers to App Service with minimal configuration.

# Create a resource group
az group create --name rg-presidio --location eastus

# Create an App Service plan
az appservice plan create \
  --name presidio-plan \
  --resource-group rg-presidio \
  --is-linux \
  --sku B2

# Deploy the analyzer
az webapp create \
  --name presidio-analyzer-prod \
  --resource-group rg-presidio \
  --plan presidio-plan \
  --deployment-container-image-name mcr.microsoft.com/presidio-analyzer:latest

# Deploy the anonymizer
az webapp create \
  --name presidio-anonymizer-prod \
  --resource-group rg-presidio \
  --plan presidio-plan \
  --deployment-container-image-name mcr.microsoft.com/presidio-anonymizer:latest
Enter fullscreen mode Exit fullscreen mode

Azure Container Apps

For more control over scaling, networking, and multi-container deployments:

# Create an ACA environment
az containerapp env create \
  --name presidio-env \
  --resource-group rg-presidio \
  --location eastus

# Deploy analyzer
az containerapp create \
  --name presidio-analyzer \
  --resource-group rg-presidio \
  --environment presidio-env \
  --image mcr.microsoft.com/presidio-analyzer:latest \
  --target-port 3000 \
  --ingress internal \
  --min-replicas 1 \
  --max-replicas 10

# Deploy anonymizer
az containerapp create \
  --name presidio-anonymizer \
  --resource-group rg-presidio \
  --environment presidio-env \
  --image mcr.microsoft.com/presidio-anonymizer:latest \
  --target-port 3000 \
  --ingress internal \
  --min-replicas 1 \
  --max-replicas 10
Enter fullscreen mode Exit fullscreen mode

Using --ingress internal means the Presidio services aren't exposed to the internet. Only other services in the same ACA environment (or VNet) can reach them. Your /deidentify and /reidentify wrapper sits in the same environment and calls the analyzer and anonymizer over the internal network, and APIM calls the wrapper the same way.

Kubernetes

For enterprise deployments with existing AKS clusters, Presidio publishes Helm charts. The setup is more involved but gives you full control over resource limits, HPA scaling, pod affinity, and network policies.

Production Hardening

Logging and Monitoring

Log every detection for audit trails, but never log the actual PII values. Log the entity types, confidence scores, and positions.

import logging

logger = logging.getLogger("presidio-guardrail")

def scrub_with_logging(text: str, request_id: str) -> tuple:
    results = analyzer.analyze(text=text, language="en")

    # Log detection summary (not the actual PII)
    for r in results:
        logger.info(
            f"request={request_id} "
            f"entity_type={r.entity_type} "
            f"score={r.score:.2f} "
            f"start={r.start} end={r.end}"
        )

    logger.info(f"request={request_id} total_entities={len(results)}")

    anonymized = anonymizer.anonymize(
        text=text,
        analyzer_results=results,
        operators={"DEFAULT": OperatorConfig("encrypt", {"key": ENCRYPTION_KEY})}
    )

    return anonymized.text, anonymized.items
Enter fullscreen mode Exit fullscreen mode

False Positive Handling

Presidio will occasionally flag non-PII as PII. A city name like "Jordan" might be detected as a person name. A product SKU might match a phone number pattern. For production systems, build a feedback mechanism:

# Maintain an allow list of known false positives
FALSE_POSITIVE_ALLOWLIST = {
    "PERSON": ["Jordan", "Phoenix", "Austin"],  # Cities that are also names
    "PHONE_NUMBER": ["555-0100"],  # Known test number
}

def filter_false_positives(text: str, results: list) -> list:
    filtered = []
    for r in results:
        value = text[r.start:r.end].strip()
        allowlist = FALSE_POSITIVE_ALLOWLIST.get(r.entity_type, [])
        if value not in allowlist:
            filtered.append(r)
    return filtered
Enter fullscreen mode Exit fullscreen mode

Performance Considerations

Presidio's analyzer is CPU-intensive, especially with the large spaCy model. For high-throughput workloads:

Keep the analyzer engine warm. Initializing AnalyzerEngine() loads the NLP model, which takes a few seconds. Do it once at startup, not per request.

Set a score threshold. Processing low-confidence detections wastes CPU cycles and increases false positives. Start with 0.5 and adjust based on your accuracy requirements.

Use the right NLP model size. en_core_web_lg is more accurate but slower. en_core_web_sm is faster but misses more entities. Profile your specific workload to find the right tradeoff.

Cache recognizer results for repeated text. If the same support template gets processed thousands of times, cache the detection results and only run the anonymizer.

When the guardrail runs inside APIM, two more things matter. Set a sane timeout on the send-request calls so a slow Presidio response can't hang the whole model call, and decide how to fail. Failing closed (block the request if Presidio is unreachable) protects PII at the cost of availability. Failing open does the reverse. For regulated workloads, fail closed and put Presidio behind enough replicas that it rarely comes to that.

Series Wrap-Up

Over these five parts we've gone from zero to a production-ready PII detection and anonymization pipeline. You can install and run Presidio, detect PII in text, images, and structured data, build custom recognizers for your organization's specific patterns, choose the right anonymization strategy for each use case, and deploy Presidio as an LLM guardrail at the APIM edge that keeps sensitive data off third-party infrastructure.

The framework is actively maintained, the Docker images are production-ready, and the extensibility model (custom recognizers, custom operators, external NLP services) means it adapts to whatever compliance requirements your organization throws at it.


This is Part 5 of the Hands-On Microsoft Presidio series. I write about PII detection, AI infrastructure, and building with Claude Code on Dev.to.

Top comments (0)