Syntora

Posted on Feb 14 • Originally published at syntora.io

How I Handle Mismatched Data Across Shopify, QuickBooks, and 3 Other Systems

#python #automation #database #data

If you run a business that touches more than two platforms, you already know the problem. Shopify calls it an "order." QuickBooks calls it an "invoice." Your CRM calls it a "deal." They all represent the same transaction, but none of them agree on the ID, the field names, or even the dollar amount after fees and tax adjustments.

I am Parker Gawne, founder of Syntora, and this is one of the most common problems I solve for clients. Not the kind of problem where you install a Zapier integration and walk away. The kind where you have thousands of records drifting out of sync across five systems and no one knows which version of the truth is correct.

Here is how I build reconciliation pipelines that actually work.

The Real Problem: No System Agrees on Anything

Consider a typical e-commerce operation. You might have:

Shopify for orders and products
QuickBooks Online for invoices and accounting
a CRM (HubSpot, Salesforce, etc.) for customer records
Plaid for bank transaction matching
a shipping platform (ShipStation, EasyPost) for fulfillment

Each system has its own internal ID for a customer, its own format for addresses, its own rules for how refunds get recorded. When a customer places an order, that single event creates records in at least three of these systems. Within a week, those records have diverged.

The symptoms are familiar: duplicate customer entries, invoices that do not match orders, revenue numbers that differ between your accounting software and your storefront, bank transactions that cannot be traced back to a specific order.

Manual reconciliation does not scale. I have watched operations teams burn entire days cross-referencing spreadsheets. The fix is a dedicated reconciliation service.

The Architecture: A Canonical Data Model

The core idea is simple. Instead of trying to sync System A directly to System B, you define a single internal data model that every system maps into. I call this the canonical model.

from dataclasses import dataclass, field
from datetime import datetime
from decimal import Decimal
from typing import Optional


@dataclass
class CanonicalCustomer:
    canonical_id: str
    email: str
    first_name: str
    last_name: str
    phone: Optional[str] = None
    source_ids: dict = field(default_factory=dict)
    created_at: Optional[datetime] = None


@dataclass
class CanonicalOrder:
    canonical_id: str
    customer_canonical_id: str
    total: Decimal
    currency: str
    status: str
    line_items: list = field(default_factory=list)
    source_ids: dict = field(default_factory=dict)
    created_at: Optional[datetime] = None

The source_ids dictionary is the key piece. It maps each external system to its own ID for that record. A single customer might look like this:

customer = CanonicalCustomer(
    canonical_id="cust_abc123",
    email="jane@example.com",
    first_name="Jane",
    last_name="Doe",
    phone="+15551234567",
    source_ids={
        "shopify": "7891011",
        "quickbooks": "CUS-442",
        "hubspot": "contact_98765",
    },
)

This gives you a single place to look up any customer regardless of which system you are working with.

Normalizer Functions: Making Every System Speak the Same Language

Each platform returns data in a different shape. Shopify gives you nested JSON with snake_case fields. QuickBooks uses PascalCase with deeply nested reference objects. Your CRM has its own conventions entirely.

I write a normalizer for each system. Each one takes the raw API response and returns a canonical object.

from decimal import Decimal


def normalize_shopify_order(raw: dict) -> CanonicalOrder:
    return CanonicalOrder(
        canonical_id="",  # assigned during reconciliation
        customer_canonical_id="",  # resolved via customer matching
        total=Decimal(str(raw["total_price"])),
        currency=raw["currency"].upper(),
        status=map_shopify_status(raw["financial_status"]),
        line_items=[
            {
                "sku": item["sku"],
                "quantity": item["quantity"],
                "unit_price": Decimal(str(item["price"])),
            }
            for item in raw["line_items"]
        ],
        source_ids={"shopify": str(raw["id"])},
        created_at=parse_iso(raw["created_at"]),
    )


def normalize_quickbooks_invoice(raw: dict) -> CanonicalOrder:
    return CanonicalOrder(
        canonical_id="",
        customer_canonical_id="",
        total=Decimal(str(raw["TotalAmt"])),
        currency=raw.get("CurrencyRef", {}).get("value", "USD").upper(),
        status=map_qb_status(raw.get("Balance", 0), raw["TotalAmt"]),
        line_items=[
            {
                "sku": line.get("SalesItemLineDetail", {})
                .get("ItemRef", {})
                .get("value", ""),
                "quantity": int(
                    line.get("SalesItemLineDetail", {}).get("Qty", 0)
                ),
                "unit_price": Decimal(
                    str(
                        line.get("SalesItemLineDetail", {}).get(
                            "UnitPrice", 0
                        )
                    )
                ),
            }
            for line in raw.get("Line", [])
            if line.get("DetailType") == "SalesItemLineDetail"
        ],
        source_ids={"quickbooks": str(raw["Id"])},
        created_at=parse_iso(raw.get("TxnDate")),
    )

The status mapping functions handle the vocabulary differences between systems. Shopify uses "paid," "partially_refunded," "refunded." QuickBooks does not have a status field at all; you infer payment status by comparing Balance to TotalAmt.

def map_shopify_status(financial_status: str) -> str:
    mapping = {
        "paid": "completed",
        "partially_refunded": "partial_refund",
        "refunded": "refunded",
        "pending": "pending",
        "authorized": "pending",
        "voided": "cancelled",
    }
    return mapping.get(financial_status, "unknown")


def map_qb_status(balance: float, total: float) -> str:
    if balance == 0:
        return "completed"
    if balance < total:
        return "partial_payment"
    return "pending"

This is where I encourage clients to invest time upfront. Getting the normalizers right means every downstream comparison is clean. If you want to see how we approach this for other integration types, check out our data automation services.

Deduplication: Matching Records Across Systems

Once everything is normalized, you need to figure out which records refer to the same real-world entity. For customers, email is usually the strongest match key. But email alone is not enough because people use different addresses on different platforms.

I use a scoring approach:

from difflib import SequenceMatcher


def calculate_match_score(a: CanonicalCustomer, b: CanonicalCustomer) -> float:
    score = 0.0

    if a.email and b.email and a.email.lower() == b.email.lower():
        score += 0.5

    if a.phone and b.phone:
        a_digits = "".join(filter(str.isdigit, a.phone))[-10:]
        b_digits = "".join(filter(str.isdigit, b.phone))[-10:]
        if a_digits == b_digits:
            score += 0.3

    name_a = f"{a.first_name} {a.last_name}".lower().strip()
    name_b = f"{b.first_name} {b.last_name}".lower().strip()
    name_similarity = SequenceMatcher(None, name_a, name_b).ratio()
    score += name_similarity * 0.2

    return score


MATCH_THRESHOLD = 0.7


def find_duplicates(
    customers: list[CanonicalCustomer],
) -> list[tuple[CanonicalCustomer, CanonicalCustomer, float]]:
    duplicates = []
    for i, a in enumerate(customers):
        for b in customers[i + 1 :]:
            score = calculate_match_score(a, b)
            if score >= MATCH_THRESHOLD:
                duplicates.append((a, b, score))
    return duplicates

For orders, matching is usually easier. Most businesses have an order number that appears in both Shopify and QuickBooks (often as a custom field or memo line on the invoice). When that is not available, I match on date plus total amount plus customer, which catches the vast majority of cases.

The Reconciliation Flow

Here is the full pipeline, run on a schedule (usually nightly for most clients, hourly for high-volume operations):

from dataclasses import dataclass


@dataclass
class Discrepancy:
    canonical_id: str
    field: str
    source_a: str
    value_a: str
    source_b: str
    value_b: str
    severity: str


def reconcile_orders(
    shopify_orders: list[CanonicalOrder],
    qb_invoices: list[CanonicalOrder],
) -> list[Discrepancy]:
    discrepancies = []

    qb_lookup = {}
    for inv in qb_invoices:
        qb_lookup[inv.source_ids.get("quickbooks", "")] = inv

    for order in shopify_orders:
        matched_invoice = find_matching_invoice(order, qb_invoices)

        if matched_invoice is None:
            discrepancies.append(
                Discrepancy(
                    canonical_id=order.canonical_id,
                    field="existence",
                    source_a="shopify",
                    value_a=order.source_ids.get("shopify", ""),
                    source_b="quickbooks",
                    value_b="MISSING",
                    severity="high",
                )
            )
            continue

        if order.total != matched_invoice.total:
            discrepancies.append(
                Discrepancy(
                    canonical_id=order.canonical_id,
                    field="total",
                    source_a="shopify",
                    value_a=str(order.total),
                    source_b="quickbooks",
                    value_b=str(matched_invoice.total),
                    severity="high",
                )
            )

        if order.status != matched_invoice.status:
            discrepancies.append(
                Discrepancy(
                    canonical_id=order.canonical_id,
                    field="status",
                    source_a="shopify",
                    value_a=order.status,
                    source_b="quickbooks",
                    value_b=matched_invoice.status,
                    severity="medium",
                )
            )

    return discrepancies

The pipeline follows these steps:

Pull fresh data from each API (Shopify orders, QuickBooks invoices, CRM contacts)
Normalize everything into canonical objects using the per-system normalizers
Match records across systems using IDs, emails, or scoring
Compare matched records field by field
Flag discrepancies with severity levels
Push corrections back to the appropriate system or queue them for review

Not every discrepancy should be auto-corrected. I categorize them by severity. A missing invoice in QuickBooks is high severity and might be auto-created. A minor name spelling difference is low severity and goes into a review queue. Total mismatches always get flagged for human review because they could indicate a real accounting issue.

Bank Reconciliation with Plaid

One area where this approach pays off immediately is bank reconciliation. We have built pipelines for clients that pull transaction data from Plaid and match it against expected deposits from Shopify payouts and recorded revenue in QuickBooks.

The logic is similar. Normalize Plaid transactions into a canonical format, then match them against expected payouts by date range and amount. The tricky part is that payment processors batch transactions, so a single bank deposit might correspond to dozens of individual orders minus processing fees. You have to account for that grouping and the fee deductions when comparing amounts.

This is where Decimal arithmetic matters. Floating-point rounding errors will create phantom discrepancies across thousands of records. Always use Decimal for money.

Conflict Resolution: Deciding Which System Wins

When two systems disagree, you need a policy for which one is the source of truth. This varies by field:

Price and totals: the payment processor or accounting system wins (QuickBooks for invoices, Shopify for order amounts before fees)
Customer contact info: the most recently updated record wins
Order status: the system closest to the actual event wins (shipping platform for fulfillment status, payment processor for payment status)
Product data: the storefront wins (Shopify for SKUs and descriptions)

I encode these rules explicitly:

AUTHORITY_MAP = {
    "total": "quickbooks",
    "status": "shopify",
    "email": "most_recent",
    "shipping_status": "shipstation",
    "sku": "shopify",
    "description": "shopify",
}


def resolve_conflict(
    field: str,
    value_a: str,
    source_a: str,
    value_b: str,
    source_b: str,
    updated_a: datetime,
    updated_b: datetime,
) -> tuple[str, str]:
    authority = AUTHORITY_MAP.get(field)

    if authority == "most_recent":
        if updated_a >= updated_b:
            return value_a, source_a
        return value_b, source_b

    if authority == source_a:
        return value_a, source_a
    if authority == source_b:
        return value_b, source_b

    return value_a, source_a

Having this codified means the pipeline can auto-resolve most conflicts without human intervention. The ones it cannot resolve get queued with full context so the operations team can make a decision in seconds instead of minutes.

What This Looks Like in Production

For a typical client running Shopify plus QuickBooks plus a CRM, the reconciliation service runs as a scheduled Python job. It processes the last 48 hours of data on each run (overlapping windows catch late-arriving records), writes discrepancies to a database table, and sends a daily summary to Slack or email.

The first run usually surfaces hundreds of issues that have been accumulating for months. After the initial cleanup, the daily count drops to single digits. Most of those are legitimate timing differences (an order placed at 11:58 PM that does not hit QuickBooks until the next day) that resolve themselves on the next run.

The result is that the client's operations team stops spending hours on manual reconciliation and starts trusting their numbers again. When the CEO asks "what was revenue last month," the answer is the same regardless of which system you pull it from.

Wrapping Up

Data mismatch across business systems is not a tooling problem. It is an architecture problem. No off-the-shelf connector will solve it because the real work is in defining what "correct" means for your specific business and encoding those rules into a pipeline that runs reliably.

The pattern is always the same: canonical model, normalizers per system, matching logic, field-by-field comparison, and explicit conflict resolution. The specifics change for every client, but the structure does not.

If your team is burning time on manual reconciliation or you have stopped trusting the numbers coming out of your systems, this is exactly the kind of problem we solve at Syntora.

Built by Syntora -- automation infrastructure for growing businesses.

DEV Community