At 9:14am on a Tuesday, the system flagged an incoming purchase order from a large enterprise buyer as a duplicate.
The PO had arrived in two separate emails over 48 hours — sent by different procurement contacts, both for the same batch of stainless steel flanges, same quantities, same delivery window.
Under the old system, a staff member would have read both, entered both into Tally, and allocated raw material stock twice. The first sign of the error would have been an inventory shortfall two weeks later.
The client is a Jaipur-based precision manufacturer serving enterprise buyers in India and overseas. At roughly ₹60Cr annual revenue, their team handled a steady flow of purchase orders across a demanding customer base.
Every one of those orders arrived as a PDF in a shared Gmail inbox.
Every one of those PDFs was read and entered into Tally by hand.
This is the build log for the system built to replace that process.
The Problem: PDFs in Gmail, Nobody Watching
The procurement workflow before the build:
- Emails arrive in a shared Gmail inbox.
- A staff member opens each attachment.
- Reads part numbers, quantities, delivery deadlines, and supplier codes.
- Manually enters everything into Tally.
On a slow day this took around 90 minutes.
On heavier order days, it could stretch to 3–4 hours.
The inbox had no workflow state:
- No processed flag
- No queue
- No audit trail outside Tally
If the same PO arrived twice, the team would know only by accident.
Duplicate Orders
There was no detection mechanism.
Two contacts at the same enterprise customer could send the same PO independently, and neither Gmail nor Tally would flag it.
No Operational Visibility
Knowing which orders were due the following week required opening Tally and manually cross-referencing entries.
There was:
- No dashboard
- No queue view
- No workload overview
Manual Raw Material Calculations
Once a PO was entered, another manual calculation followed:
- Pulling specifications from the PDF
- Checking stock levels
- Estimating requirements
This introduced a second opportunity for human error.
What Was Built: A Four-Stage Pipeline
Each stage solved a different operational problem.
1. Gmail Push Notifications
Instead of polling Gmail every few minutes, the system registers a Google Cloud Pub/Sub topic that triggers a webhook whenever a new email arrives.
Benefits:
- Near real-time processing
- Lower infrastructure overhead
- Faster visibility for operations teams
A PO is processed before a staff member would have opened the email.
2. GPT-4 Structured Extraction
The system sends the purchase order to GPT-4 using a strict JSON schema.
Required fields include:
- PO number
- Supplier details
- Delivery date
- Line items
- Quantities
The model returns structured JSON directly.
No:
- Regex maintenance
- Template mapping
- Field-position assumptions
For scanned PDFs, the vision endpoint is used.
For text PDFs, extracted text is sent directly.
Both paths produce the same JSON output.
3. Tally Integration
The extracted JSON is converted into a Tally-compatible XML purchase voucher and sent to Tally Prime's local HTTP server.
Once accepted:
- The voucher is created automatically.
- Inventory calculations run immediately.
- The order appears exactly as if a user entered it manually.
4. Duplicate Detection
Every incoming PO is stored in PostgreSQL.
A normalized fingerprint is generated using:
- Line items
- Quantities
- Delivery windows
This allows the system to detect duplicate orders before they reach Tally.
The duplicate mentioned earlier was caught using this mechanism.
The Technology Stack
Backend
- FastAPI
- PostgreSQL
- Gmail API
- Google Pub/Sub
Frontend
- React
- Framer Motion
AI Layer
- GPT-4 Structured Outputs
- GPT-4 Vision
ERP Layer
- Tally Prime via TDL XML imports
Working with the Tally API
The official documentation leaves a lot to be desired.
The most reliable approach is:
- Enable Tally's HTTP server.
- Generate TDL XML vouchers.
- POST them to Tally.
- Parse the acknowledgement response.
Enable Tally HTTP Server
Gateway of Tally
→ F12
→ Advanced Configuration
→ Enable ODBC/HTTP Server
Minimal Purchase Voucher Example
<ENVELOPE>
<HEADER>
<TALLYREQUEST>Import Data</TALLYREQUEST>
</HEADER>
<BODY>
<IMPORTDATA>
<REQUESTDESC>
<REPORTNAME>Vouchers</REPORTNAME>
<STATICVARIABLES>
<SVCURRENTCOMPANY>[Company Name]</SVCURRENTCOMPANY>
</STATICVARIABLES>
</REQUESTDESC>
<REQUESTDATA>
<TALLYMESSAGE xmlns:UDF="TallyUDF">
<VOUCHER VCHTYPE="Purchase" ACTION="Create">
<DATE>[YYYYMMDD]</DATE>
<VOUCHERTYPENAME>Purchase</VOUCHERTYPENAME>
<PARTYLEDGERNAME>[Supplier Name]</PARTYLEDGERNAME>
<ALLLEDGERENTRIES.LIST>
<LEDGERNAME>[Ledger]</LEDGERNAME>
<AMOUNT>[Amount]</AMOUNT>
</ALLLEDGERENTRIES.LIST>
</VOUCHER>
</TALLYMESSAGE>
</REQUESTDATA>
</IMPORTDATA>
</BODY>
</ENVELOPE>
The first working voucher took three days.
Everything after that was extension work.
The Admin Dashboard
The dashboard became the team's primary operational interface.
Orders Panel
Shows:
- Status
- Supplier
- PO Number
- Delivery Deadline
- Part Count
Duplicate orders are highlighted and blocked from entering Tally until reviewed.
Tally Panel
Provides:
- Revenue trends
- Receivables
- Payment status
Without opening Tally.
Operations Panel
Handles:
- Dispatch notes
- QA checklists
- Production milestones
All driven from the same purchase-order data.
What Failed First
The original extraction system used:
- pdf-parse
- OCR
- Regex pipelines
The assumption:
Purchase order formats would remain stable.
They didn't.
Enterprise customers used:
- Multiple templates
- Different table structures
- Scanned PDFs
- International formats
The result:
- ~60% success rate
- Frequent maintenance
- Silent extraction failures
One part-number transposition eventually led to production of an incorrect batch.
That was the turning point.
The system was rebuilt around GPT-4 structured outputs.
Validation against historical purchase orders increased extraction accuracy from roughly 60% to over 98%.
Three Lessons From the Project
1. PDF Extraction Has a Clear ROI
For teams processing purchase orders every week:
- AI costs remain small.
- Manual labor costs do not.
The business case becomes obvious once the numbers are compared.
2. Tally Integration Is Easier Than Its Reputation
The difficult part is finding a working example.
Once you successfully create one XML voucher, the rest is straightforward engineering.
3. Push Beats Polling
The biggest benefit wasn't lower latency.
It was trust.
A dashboard updated within seconds becomes an operational tool.
A dashboard updated every few minutes becomes a report.
That difference determines whether teams actually adopt the system.
Could This Work For Your Business?
If your team receives PDFs in Gmail and manually enters them into:
- Tally
- ERP systems
- Inventory software
the workflow can likely be automated.
The core pattern is repeatable:
Gmail
→ AI Extraction
→ Validation
→ ERP Integration
→ Operations Dashboard
This project went from discovery call to production deployment in six weeks.
FAQ
Does this work with Tally ERP 9?
Yes.
The XML import approach works with both:
- Tally Prime
- Tally ERP 9
The main requirement is enabling Tally's HTTP server.
Can it process scanned PDFs?
Yes.
Scanned PDFs use GPT-4 Vision.
Text PDFs use direct extraction.
Both produce the same structured output.
What if the AI extracts incorrect data?
The system includes:
- Confidence scoring
- Validation rules
- Human review queues
Suspicious records are blocked before reaching Tally.
This prevents errors from entering downstream operations.
Top comments (0)