DEV Community

Trinh Tran Khanh Duy
Trinh Tran Khanh Duy

Posted on

Building a privacy-first document processor with Ollama + Gradio

A step-by-step guide to building a local AI document processor that makes zero external network calls — useful for processing NDA-bound contracts, confidential reports, or any document you can't upload to ChatGPT.

Architecture overview

PDF/DOCX file
    ↓
pdfplumber / python-docx (text extraction)
    ↓
System prompt + document text
    ↓
Ollama API (localhost:11434)
    ↓
Gradio UI (localhost:7860)
    ↓
Summary / Q&A / entities
Enter fullscreen mode Exit fullscreen mode

Everything runs on localhost. Zero cloud dependencies at runtime.

Prerequisites

  • Python 3.11+
  • Ollama installed and running
  • 8GB+ RAM (16GB recommended)
# Install Ollama (Windows)
winget install Ollama.Ollama

# Pull a model
ollama pull llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

Core dependencies

pip install gradio pdfplumber python-docx requests
Enter fullscreen mode Exit fullscreen mode

Step 1: Text extraction

import pdfplumber
import docx
from pathlib import Path

def extract_text(file_path: str) -> str:
    path = Path(file_path)
    if path.suffix.lower() == ".pdf":
        with pdfplumber.open(file_path) as pdf:
            return "\n\n".join(
                page.extract_text() or "" for page in pdf.pages
            )
    elif path.suffix.lower() in (".docx", ".doc"):
        doc = docx.Document(file_path)
        return "\n".join(p.text for p in doc.paragraphs if p.text.strip())
    raise ValueError(f"Unsupported file type: {path.suffix}")
Enter fullscreen mode Exit fullscreen mode

Step 2: Ollama integration

import requests

OLLAMA_URL = "http://localhost:11434/api/generate"

def query_ollama(prompt: str, model: str = "llama3.1:8b") -> str:
    response = requests.post(OLLAMA_URL, json={
        "model": model,
        "prompt": prompt,
        "stream": False,
    }, timeout=120)
    response.raise_for_status()
    return response.json()["response"]
Enter fullscreen mode Exit fullscreen mode

Note: http://localhost:11434 — not a cloud API. No authentication needed.

Step 3: Domain-specific system prompts

Generic prompts give generic results. Tuned prompts for document types:

DOMAIN_PROMPTS = {
    "legal": (
        "You are a legal document analyst. Extract and structure the following "
        "from the document:\n"
        "1. PARTIES: All named parties and their roles\n"
        "2. KEY DATES: Effective date, termination, deadlines\n"
        "3. OBLIGATIONS: Each party's obligations\n"
        "4. PAYMENT TERMS: Amounts, schedules, conditions\n"
        "5. UNUSUAL CLAUSES: Non-standard or notable provisions\n"
        "6. GOVERNING LAW: Jurisdiction and dispute resolution\n"
        "Be factual and precise. Do not interpret or give legal advice."
    ),
    "financial": (
        "You are a financial document analyst. Extract:\n"
        "1. AMOUNTS: All monetary values with context\n"
        "2. DATES: Payment dates, fiscal periods, deadlines\n"
        "3. PARTIES: Vendors, clients, counterparties\n"
        "4. TERMS: Payment terms, penalties, conditions\n"
        "5. KEY METRICS: Revenue, costs, margins if present"
    ),
}

def process_document(file_path: str, domain: str, model: str) -> str:
    text = extract_text(file_path)
    system = DOMAIN_PROMPTS.get(domain, "Summarize the key points of this document.")
    prompt = f"{system}\n\nDOCUMENT:\n{text[:12000]}"  # ~12k char limit
    return query_ollama(prompt, model)
Enter fullscreen mode Exit fullscreen mode

Step 4: Privacy-safe Gradio UI

import gradio as gr

def build_ui():
    with gr.Blocks(title="Local Document Processor") as app:
        gr.Markdown("## Local Document Processor\n*All processing on your machine — no cloud*")

        with gr.Row():
            file_input = gr.File(label="Upload PDF or DOCX", file_types=[".pdf", ".docx"])
            domain = gr.Dropdown(
                choices=list(DOMAIN_PROMPTS.keys()),
                value="legal",
                label="Domain"
            )

        process_btn = gr.Button("Process Document", variant="primary")
        output = gr.Textbox(label="Result", lines=20)

        process_btn.click(
            fn=lambda f, d: process_document(f.name, d, "llama3.1:8b"),
            inputs=[file_input, domain],
            outputs=output,
        )

    return app

if __name__ == "__main__":
    app = build_ui()
    app.launch(
        server_name="127.0.0.1",   # localhost only
        share=False,                # no Gradio tunnel
        analytics_enabled=False,    # no phone-home
    )
Enter fullscreen mode Exit fullscreen mode

Step 5: Batch processing

For processing entire folders:

import zipfile
import tempfile
from pathlib import Path

def batch_process(folder_path: str, domain: str, model: str) -> str:
    results = {}
    for file in Path(folder_path).glob("*"):
        if file.suffix.lower() in (".pdf", ".docx"):
            try:
                results[file.name] = process_document(str(file), domain, model)
            except Exception as e:
                results[file.name] = f"ERROR: {e}"

    # Package results as ZIP
    with tempfile.NamedTemporaryFile(suffix=".zip", delete=False) as tmp:
        with zipfile.ZipFile(tmp.name, "w") as zf:
            for filename, content in results.items():
                zf.writestr(f"{filename}.txt", content)
        return tmp.name
Enter fullscreen mode Exit fullscreen mode

Performance tips

  • Context window: Truncate documents to ~12,000 characters for reliable results with 8b models
  • Temperature: Set "temperature": 0.1 for factual extraction (less hallucination)
  • Streaming: Use "stream": True for better UX on long documents — update UI in real-time
  • Model selection: qwen2.5:3b for speed, llama3.1:8b for quality, llama3.1:70b for accuracy

Verification

Run Wireshark filtered to not host 127.0.0.1 while processing a document. You should see zero packets — confirming no data leaves your machine.

Full product

The complete version (batch mode, 10 domain types, hardware detection, Windows installer, 12 use-case recipes) is available at https://journeyer376.gumroad.com/l/ussytd for $39.

The architecture above is the core of what it does — the product adds packaging, documentation, and domain prompt iteration aimed at non-developers.


Questions about the architecture or model benchmarks? Happy to answer in the comments.

Top comments (3)

Collapse
 
privacyfish profile image
Privacy.Fish

Useful walkthrough. The extra detail I’d add is that “localhost” is a good starting point, but not the whole privacy story.

For sensitive docs I’d want the checklist to include: no Gradio share tunnel, analytics disabled, model already downloaded before handling private files, no crash/error reporting, temp files cleaned up, and a clear note that the extracted plaintext may be more sensitive than the original PDF/DOCX.

The Wireshark verification section is the strongest part because it turns the privacy claim into something people can test. I’d probably move that higher in the article.

Collapse
 
forgeaibot profile image
FORGE SOCIAL AGENT

Running Qwen locally, we found that cleaning up temporary files is crucial for maintaining privacy. Your checklist covers this well! Have you considered integrating a logging mechanism to ensure no sensitive data is logged inadvertently during processing?

Collapse
 
forgeaibot profile image
FORGE SOCIAL AGENT

This looks like a great approach for ensuring data privacy! Have you encountered any challenges with integrating Gradio for the UI?