If you run a small business, you know the drill. Every month you sit down with your bank statements, open QuickBooks, and start the tedious process of matching transactions. Did that $47.99 charge from "SQ *COFFEE HOUSE LLC" match the office supplies entry or the client meeting expense? Multiply that ambiguity by hundreds of transactions and you have a full afternoon gone.
I built a pipeline that handles this automatically. It pulls transactions from Plaid, categorizes them with AI, reconciles them against expected journal entries, and flags anything it is not confident about. The whole thing runs on Python with Supabase as the data store.
This article walks through the architecture and the code. I am Parker Gawne, founder of Syntora, and this came out of a real project we built for accounting automation.
Architecture Overview
The pipeline has four stages:
- Pull - Plaid API syncs bank transactions into Supabase
- Categorize - Each transaction gets classified. Ambiguous ones go through Claude for AI categorization.
- Reconcile - Categorized transactions get matched against expected entries from QuickBooks
- Review - High-confidence matches auto-approve. Low-confidence matches get flagged for human review.
Every step writes to Supabase so there is a full audit trail. Nothing gets deleted. Nothing gets overwritten. Financial data demands that level of care.
Setting Up the Plaid Client
Plaid gives you access to bank transaction data through a clean API. You connect a bank account once through their Link flow, then pull transactions on a schedule.
import plaid
from plaid.api import plaid_api
from plaid.model.transactions_sync_request import TransactionsSyncRequest
import os
def get_plaid_client():
configuration = plaid.Configuration(
host=plaid.Environment.Production,
api_key={
"clientId": os.environ["PLAID_CLIENT_ID"],
"secret": os.environ["PLAID_SECRET"],
},
)
api_client = plaid.ApiClient(configuration)
return plaid_api.PlaidApi(api_client)
I use the Transactions Sync endpoint rather than the older Transactions Get. Sync gives you a cursor-based approach so you only pull new transactions each run instead of re-fetching everything.
Syncing Transactions
from supabase import create_client
from datetime import datetime
supabase = create_client(
os.environ["SUPABASE_URL"],
os.environ["SUPABASE_SERVICE_KEY"],
)
def sync_transactions(access_token: str, account_id: str):
client = get_plaid_client()
# Get the last cursor from Supabase
result = supabase.table("plaid_sync_cursors").select("cursor").eq(
"account_id", account_id
).single().execute()
cursor = result.data["cursor"] if result.data else ""
has_more = True
all_added = []
while has_more:
request = TransactionsSyncRequest(
access_token=access_token,
cursor=cursor,
)
response = client.transactions_sync(request)
all_added.extend(response.added)
has_more = response.has_more
cursor = response.next_cursor
# Store transactions in Supabase
for txn in all_added:
supabase.table("transactions").upsert({
"plaid_transaction_id": txn.transaction_id,
"account_id": account_id,
"amount": float(txn.amount),
"date": txn.date.isoformat(),
"merchant_name": txn.merchant_name,
"name": txn.name,
"category": txn.personal_finance_category.primary if txn.personal_finance_category else None,
"pending": txn.pending,
"status": "pending_categorization",
"synced_at": datetime.utcnow().isoformat(),
}).execute()
# Update cursor
supabase.table("plaid_sync_cursors").upsert({
"account_id": account_id,
"cursor": cursor,
}).execute()
return len(all_added)
Plaid provides its own category field, but it is too broad for accounting purposes. "Food and Drink" does not tell you whether it was a client dinner or a personal lunch. That is where AI categorization comes in.
AI Categorization with Claude
Most transactions are straightforward. "ADOBE CREATIVE CLOUD" is obviously a software subscription. "AMZN MKTP US" is probably office supplies, but it could be anything. The ambiguous ones are where Claude earns its keep.
I define a set of accounting categories that map to my chart of accounts, then ask Claude to classify each transaction.
import anthropic
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
ACCOUNTING_CATEGORIES = [
"software_subscriptions",
"office_supplies",
"travel",
"meals_entertainment",
"professional_services",
"advertising",
"utilities",
"insurance",
"payroll",
"contractor_payments",
"client_reimbursable",
"owner_distribution",
"other",
]
def categorize_transaction(txn: dict) -> dict:
prompt = f"""You are a bookkeeper categorizing bank transactions for a small tech consultancy.
Given this transaction, classify it into exactly one category and provide a confidence score from 0.0 to 1.0.
Transaction:
- Merchant: {txn["merchant_name"]}
- Description: {txn["name"]}
- Amount: ${txn["amount"]}
- Date: {txn["date"]}
- Plaid Category: {txn.get("category", "unknown")}
Valid categories: {", ".join(ACCOUNTING_CATEGORIES)}
Respond in JSON format:
{{"category": "category_name", "confidence": 0.85, "reasoning": "brief explanation"}}
Rules:
- If the merchant name is ambiguous (e.g., Amazon, Square), use the amount and Plaid category as context clues
- If you are not confident, say so with a low score rather than guessing
- "other" should only be used when nothing else fits"""
response = claude.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": prompt}],
)
import json
result = json.loads(response.content[0].text)
return result
The confidence score is critical. I set two thresholds:
- Above 0.85: Auto-approve. The categorization gets applied directly.
- Between 0.5 and 0.85: Needs review. Stored with a "needs_review" status.
- Below 0.5: Flagged as uncertain. These go into a manual review queue.
def process_pending_transactions():
result = supabase.table("transactions").select("*").eq(
"status", "pending_categorization"
).execute()
for txn in result.data:
classification = categorize_transaction(txn)
confidence = classification["confidence"]
if confidence >= 0.85:
status = "categorized"
elif confidence >= 0.5:
status = "needs_review"
else:
status = "uncertain"
supabase.table("transactions").update({
"ai_category": classification["category"],
"ai_confidence": confidence,
"ai_reasoning": classification["reasoning"],
"status": status,
"categorized_at": datetime.utcnow().isoformat(),
}).eq("plaid_transaction_id", txn["plaid_transaction_id"]).execute()
In practice, about 70% of transactions land above the 0.85 threshold. Another 20% fall into the review bucket. Only about 10% are truly ambiguous. That is a significant time savings compared to manually categorizing everything.
Reconciliation Against Expected Entries
Once transactions are categorized, they need to be matched against the entries you expect to see. Recurring charges like software subscriptions, payroll, and contractor payments should show up on predictable schedules.
I keep a table of expected entries in Supabase that gets populated from QuickBooks recurring transactions.
def reconcile_transactions():
# Get categorized transactions that have not been reconciled
txns = supabase.table("transactions").select("*").eq(
"status", "categorized"
).execute()
# Get expected entries for this period
expected = supabase.table("expected_entries").select("*").eq(
"period", datetime.utcnow().strftime("%Y-%m")
).execute()
for txn in txns.data:
match = find_best_match(txn, expected.data)
if match:
supabase.table("journal_entries").insert({
"transaction_id": txn["plaid_transaction_id"],
"expected_entry_id": match["id"],
"amount": txn["amount"],
"category": txn["ai_category"],
"match_confidence": match["score"],
"status": "matched",
"created_at": datetime.utcnow().isoformat(),
}).execute()
supabase.table("transactions").update({
"status": "reconciled",
}).eq("plaid_transaction_id", txn["plaid_transaction_id"]).execute()
else:
# No match found, flag as new/unexpected
supabase.table("journal_entries").insert({
"transaction_id": txn["plaid_transaction_id"],
"expected_entry_id": None,
"amount": txn["amount"],
"category": txn["ai_category"],
"match_confidence": 0.0,
"status": "unmatched",
"created_at": datetime.utcnow().isoformat(),
}).execute()
def find_best_match(txn: dict, expected_entries: list) -> dict | None:
best = None
best_score = 0.0
for entry in expected_entries:
score = 0.0
# Category match
if txn["ai_category"] == entry["category"]:
score += 0.4
# Amount within 5% tolerance
if entry["expected_amount"]:
diff = abs(txn["amount"] - entry["expected_amount"])
tolerance = entry["expected_amount"] * 0.05
if diff <= tolerance:
score += 0.4
elif diff <= tolerance * 2:
score += 0.2
# Merchant name similarity
if txn.get("merchant_name") and entry.get("merchant_pattern"):
if entry["merchant_pattern"].lower() in txn["merchant_name"].lower():
score += 0.2
if score > best_score and score >= 0.6:
best_score = score
best = {"id": entry["id"], "score": score}
return best
The matching is intentionally conservative. A 60% threshold for matches means both the category and the amount need to line up before it will auto-match. Anything that does not meet that bar goes into the unmatched queue.
The Supabase Schema
The data model is straightforward. Four tables handle the core workflow.
-- Transactions from Plaid
create table transactions (
plaid_transaction_id text primary key,
account_id text not null,
amount numeric not null,
date date not null,
merchant_name text,
name text,
category text,
pending boolean default false,
status text default 'pending_categorization',
ai_category text,
ai_confidence numeric,
ai_reasoning text,
synced_at timestamptz,
categorized_at timestamptz
);
-- Expected recurring entries from QuickBooks
create table expected_entries (
id uuid primary key default gen_random_uuid(),
period text not null,
category text not null,
expected_amount numeric,
merchant_pattern text,
description text
);
-- Reconciled journal entries
create table journal_entries (
id uuid primary key default gen_random_uuid(),
transaction_id text references transactions(plaid_transaction_id),
expected_entry_id uuid references expected_entries(id),
amount numeric not null,
category text not null,
match_confidence numeric,
status text default 'pending',
created_at timestamptz default now()
);
-- Plaid sync cursors for incremental sync
create table plaid_sync_cursors (
account_id text primary key,
cursor text not null
);
Every AI categorization stores the reasoning alongside the confidence score. When a human reviewer looks at a flagged transaction, they can see why the system was uncertain and make a faster decision.
Running It on a Schedule
The full pipeline runs as a scheduled Python script. In production I trigger it with a cron job, but you could just as easily use a Supabase Edge Function or a cloud scheduler.
def run_pipeline():
accounts = supabase.table("plaid_accounts").select("*").execute()
for account in accounts.data:
count = sync_transactions(account["access_token"], account["account_id"])
print(f"Synced {count} transactions for {account['account_id']}")
process_pending_transactions()
reconcile_transactions()
# Summary
stats = supabase.table("transactions").select("status", count="exact").execute()
print(f"Pipeline complete. Transaction summary: {stats.count}")
if __name__ == "__main__":
run_pipeline()
What I Learned Building This
Financial data is unforgiving. Unlike a content management system where a miscategorization is annoying but harmless, a wrong journal entry creates real accounting problems. That is why every decision the system makes is logged with a confidence score and reasoning. The audit trail is not optional.
AI categorization is good enough to save time, not good enough to trust blindly. The 0.85 confidence threshold was calibrated over a few months of running the pipeline and checking results. Lowering it to 0.7 would auto-approve more transactions but would also introduce errors that take longer to fix than the time saved.
Plaid's transaction data varies wildly by bank. Some banks send clean merchant names. Others send cryptic strings like "POS DEBIT 02/03 SQ *STORE." Building the AI categorizer was partly motivated by this inconsistency. Rule-based matching breaks down fast when merchant names are unpredictable.
Start with a narrow scope. This pipeline handles a single entity with a handful of bank accounts. Scaling to multiple entities or multi-currency would add significant complexity. Build for what you need today.
If you are dealing with similar automation challenges, whether it is financial reconciliation or another data pipeline that needs AI classification, this is exactly the kind of work we do at Syntora.
The Honest Caveats
Never auto-approve financial transactions without a confidence threshold. Never delete transaction records. Never skip the audit trail. These are not suggestions. If your accountant cannot trace every automated decision back to its source data, you have a compliance problem.
This pipeline does not replace an accountant. It replaces the manual data entry and categorization that takes up most of their time. The accountant still reviews flagged items, validates the journal entries, and signs off on the books. The pipeline just makes sure they are spending their time on judgment calls rather than data entry.
Built by Syntora - automation and AI consulting for teams that ship.
Top comments (0)