Problem-Solution 09 May 2026 9 min read

Automate digital invoice validation and routing: the half UK businesses skip

Most invoicing automation covers sending. The harder half is receiving: capturing PDFs, reading them, validating, routing, and posting to the ledger.

David Perkins · Founder, Perkins SmartOps

Quick answer

Automating invoice validation and routing means handling the side of invoicing most businesses do by hand: reading the inbound PDF, checking it is genuine, deciding which budget it hits, sending it for approval, and posting it to the ledger. The pattern uses an email trigger, structured data extraction, validation rules, and a two-step approval step.

The full story

Most articles about automating invoicing are about sending invoices. That is the easier half. The harder half, the one that quietly costs UK businesses real time every week, is the inbound side. An invoice arrives by email. Someone opens it, reads it, decides if it is genuine, decides which budget it hits, sends it to the right person for approval, and posts it to the ledger. Multiply that by 50 invoices a month and a real chunk of someone’s week is gone.

This is the part of invoicing that people who sell “automated invoicing” tools tend to skip over, because the receiving side is messier than the sending side. It is also the side where the time savings actually live.

The five jobs nobody calls “invoice processing”

Most businesses think of invoice processing as one job. It is five.

From inbox to ledger, in five stages

Capture

Get the PDF from where it lives, the inbox or a supplier portal, into a single queue.

Read

Turn the PDF into structured data: supplier, date, total, VAT, line items, PO number.

Validate

Check the data is correct, complete, and not a duplicate. Catch fraud and OCR errors before they reach the ledger.

Route

Decide who needs to approve and get it to them. Two-step approval above a threshold.

Post

Write it to the ledger with the right GL code, project, and cost centre. Attach the original PDF to the journal entry.

Each stage can be automated independently. You do not need to do all five at once. Picking the most painful one and automating just that is usually the right starting point.

Capture: the inbox is not a database

The default capture point in most UK businesses is a shared accounts inbox. Someone, usually the same person, opens each email, downloads the attachment, renames it, and drops it in a folder. That works until volume passes around 30 invoices a week, at which point the inbox becomes the bottleneck.

The fix is an email trigger that watches the accounts inbox. When a new email arrives with a PDF attachment, the workflow saves the PDF to a single store (Drive, S3, OneDrive, or a database file column), logs the source email and timestamp, and adds a row to a queue table. From this point onward there is one place to look, and one place to audit.

For supplier portals (Amazon Business, niche industry portals, the various SaaS billing dashboards), a scheduled poll picks up new invoices on a cadence. Once a day is usually enough.

Read: get the data out of the PDF

Most invoices arriving in 2026 are PDFs created by software, which means the text inside them is already there in a form a computer can read. The workflow does not need to “look” at the invoice the way a human does; it just lifts the text straight out of the file. Standard tools handle this cleanly.

The exception is scanned invoices, screenshots, or image-only PDFs. These are essentially photos of an invoice rather than a real document. For those, the workflow needs an extra step that looks at the picture and works out what the letters say. The shorthand for this step is OCR, which stands for Optical Character Recognition. It is reliable on clean scans, less reliable on phone photos taken at an angle. If most of your suppliers send proper PDFs, you can skip this step entirely.

Where the actual work is: turning the raw text into structured data. The invoice has a supplier name, an invoice number, a date, a total, a VAT line, and usually some line items. Getting these out reliably needs one of two approaches:

A template per supplier. Works if you have a small number of regular suppliers and they do not change their invoice format often. Cheap to run, brittle to maintain.
An LLM extraction step using a structured-output schema. Works for variable suppliers; the schema acts as the contract. The model receives the invoice text plus a JSON schema describing the fields you want, and returns the schema filled in. The workflow validates the result against the schema before continuing. If validation fails, the invoice goes to a human review queue rather than corrupting the ledger.

The LLM approach is what makes this practical at small business scale. You stop maintaining 40 brittle templates and start trusting a single contract.

Validate: catch errors before they hit the ledger

Validation is the stage that catches duplicate invoices, supplier fraud attempts, and OCR errors. It is also the stage most homemade automations skip, which is why those automations break trust the first time something odd slips through.

A practical validation pass checks:

Duplicate detection. Has this supplier sent an invoice with this number before? If yes, route to a duplicate review queue, do not post.
Total reconciliation. Do the line items sum to the stated total, including VAT? Pence differences are a quiet sign of OCR or extraction error.
VAT validity. Is the supplier’s VAT registration number a real one? HMRC has a free lookup endpoint; the workflow can hit it once per supplier per month and cache the result.
PO matching. If the business uses purchase orders, is there a matching open PO and is the invoice value within tolerance? Out-of-tolerance values go to a query queue.
Supplier bank-account whitelist. Is the bank account on the invoice the same as the bank account on the supplier record? This single check is the best defence against invoice redirection fraud.

The single best fraud control most businesses miss

Invoice redirection fraud is one of the fastest-growing fraud types affecting UK businesses. The pattern: a fraudster impersonates a real supplier, sends an otherwise valid-looking invoice with a different bank account, and waits for finance to pay. The whitelist check compares the bank account on the invoice with the bank account on the supplier record. If they do not match, the invoice does not reach an approver until finance has confirmed the change with the supplier on a phone number known from before the email.

Each check that fails sends the invoice to a different review queue. A duplicate goes back to the supplier as a polite query. A bank-account mismatch goes to finance with a high-priority flag. A VAT lookup failure goes to a clarify-with-supplier queue. Failures are not errors; they are the system doing its job.

Route: approval is a workflow, not an email thread

Most businesses route invoices in email. The accounts person sees the invoice, asks the budget owner for approval, the budget owner replies, the accounts person logs it. The audit trail is the inbox. The audit trail is not really an audit trail.

A clean automated route uses a structured approval step. The validated invoice arrives in an approval surface, anything from a Slack message with Approve and Query buttons through to a proper approvals product. The budget owner sees the invoice (the actual PDF, not a paraphrase), approves or queries it, and the system records who approved, when, and on what evidence. If approved, it moves on. If queried, it sits in a “with supplier” state until the query closes.

Two-step approval kicks in above a threshold. Anything above £5,000, or any new supplier, or any change to a supplier’s bank account, requires a second approver. That rule lives in the workflow, not in someone’s head, which is the whole point.

Post: the ledger is the last step, not the first

Once approved, the invoice posts to the ledger with the right GL code, project, and cost centre. The codes can come from rules (supplier X always codes to GL line Y) or from the approver’s choice during the routing step. Either way, the ledger only ever sees clean, validated, approved data.

The posting step is also where the invoice file gets attached to the ledger record. This matters for audit. Most ledgers (Xero, QuickBooks, Sage, NetSuite) accept a file attachment via API. The workflow uploads the original PDF and the structured extraction, both, alongside the journal entry. Six months later when an auditor asks “show me the source for this expense”, you click one link.

What about data protection?

This is a fair question, and one most “automate invoicing” pieces gloss over.

A supplier invoice contains personal data (contact names, sometimes director names, addresses), bank details, and commercial pricing. Under UK GDPR you are the data controller for that information once it reaches you. If your automation calls a public LLM (the free or consumer tier of Claude or ChatGPT), you are sharing that data with a third-party processor. That may be acceptable, depending on the provider’s data handling policies and your own assessment, but it is not a question to skip.

A production build handles this in four layered ways:

Strip what the model does not need. Bank details, sort codes, IBANs, and contact names are validated by deterministic rules against your supplier database. The LLM only sees the parts of the invoice it actually has to read (line items, totals, descriptions). If a field can be checked without an LLM, it should be.
Use a model that does not train on your data. The enterprise and API tiers of Claude and ChatGPT both contractually exclude training on your inputs, with a Data Processing Agreement attached. The free consumer tiers do not always offer this. The difference matters.
Or run the model on your own infrastructure. Open-source AI models can run on your own server, which means the data never leaves the network you control. Slower and less capable than the leading models from Anthropic or OpenAI, but the data sovereignty argument is decisive for some businesses, particularly accountancy practices and HR firms that hold third-party personal data professionally.
Choose UK or EU data residency. Both Anthropic and OpenAI offer enterprise plans that pin processing to UK or EU regions. If your contracts with suppliers or your sector regulator require this, it is non-negotiable.

The right combination depends on the sensitivity of your invoices, your sector, and your appetite for trade-offs between capability and control. There is no single right answer. There is a wrong one: pretending the question does not exist.

This is also the reason we build automation on n8n on infrastructure you own, rather than on a hosted SaaS that decides for you. The full argument is in n8n vs Zapier on UK data sovereignty.

Where to host this

This pattern can run on any automation platform that can receive emails, talk to other systems, and hold a queue of work. n8n is the obvious choice for UK businesses because it self-hosts, which keeps invoice data on infrastructure you control, and it avoids the per-execution pricing trap that other platforms hit at volume. The wider argument is in n8n vs Zapier on UK data sovereignty.

You can start small and grow it. The single most common mistake is trying to automate all five stages at once and getting overwhelmed. The second is buying a packaged accounts payable automation product that does all five stages out of the box, but only works the way the vendor decided you should work.

The work that vanishes is not glamorous. The work it leaves you free to do is.

What this leaves you free to do

The work that vanishes when this runs is not glamorous: opening attachments, renaming files, copying numbers from PDFs to spreadsheets, chasing budget owners on Teams, finding last quarter’s audit trail by trawling email. None of it produces revenue. All of it costs time.

What it leaves you free to do is the work that does produce revenue: actually negotiating with suppliers, reviewing margin, planning capacity. The barrier is rarely desire. It is usually knowing where to start.

What to do this week

Three concrete moves.

Count. Open your accounts inbox and count how many invoices came in last month. If the answer is over 30, this is already costing real time.
Categorise. Of those, how many came from the same five suppliers? If more than half, the template-per-supplier approach is enough; you do not need LLM extraction yet.
Pick one stage. Do not try to automate all five at once. Pick the one that costs you the most time today (usually capture or routing) and automate just that. The rest can come later.

Once the receiving side is automated, the next layer is sending the right invoices in the first place, and chasing the ones that do not get paid. For those, see automated invoicing for UK small businesses and automate late payment chasing.

The takeaways

Most articles about automated invoicing cover sending. The harder half, where the real time saving lives, is receiving.
Invoice processing is five jobs, not one. Capture, Read, Validate, Route, Post. Each one can be automated independently.
Validate is the stage most homemade automations skip. Duplicate detection, total reconciliation, VAT validity, and the supplier bank-account whitelist are the four checks that pay for the build.
Route should be a structured approval step with an audit trail, not an email thread. Two-step approval kicks in above a threshold or for any new supplier or bank-account change.
Start with one stage, not five. Capture or routing usually drains the most time today. The rest can come later.

How this was written

Drafted by Otto, the Perkins SmartOps AI assistant. Reviewed, edited and published by David Perkins, the human.

Curious how this could work for your business?

Take the 2-minute assessment, or send me an email. I'll come back with something useful, not a sales pitch.

2-minute assessment