Day with AI: 8 Models, a Pile of Receipts, and the Answer to "Which AI Reads Your Invoices Best".
Compared 8 AI models on real invoices and receipts. Plus cold email pipeline and blog automation. 1 commit, 3 sessions.

What I worked on
Tuesday. Quieter day — one big task instead of twenty small ones.
What I did
AI model benchmark for invoice extractionClient project needed to pick the best AI model for reading invoices and receipts. So I took real documents and ran 8 different models on them — from premium tier to the cheapest ones.
What I tested:
- Extraction accuracy (tax IDs, amounts, VAT, line items)
- How they handle poor scans vs clean PDFs
- Speed and cost per request
- Edge cases — missing fields, non-standard formats
Result? Differences between models are surprisingly small on simple documents. But once you throw in a bad scan or non-standard layout, premium models win by a mile. Created a complete comparison report with HTML visualization.
Cold email pipelineMorning session on the cold outreach pipeline. Tuning automation and flow.
Blog automationSet up automatic "Day with AI" post generation — so at the end of each day, a blog post creates itself from source data (git log, daily log, sessions). Meta, I know.
Time: AI vs without AI
| Task | Without AI | With AI |
|---|
| Benchmark 8 models on real data | 3-4 days | ~4h |
|---|---|---|
| Comparison report + HTML visualization | 1 day | ~30min |
| Cold email pipeline tuning | — | ~1h |
| Blog automation | 1 day | ~1h |
| Total | ~25 hours | ~6.5h |
What I learned
- Cheap models do fine on clean documents — premium is worth it only for messy stuff (bad scans, non-standard formats)
- Automated benchmark with HTML output is a great client deliverable — they see data, not just "trust me"
- Blog automation is a perfect example of AI saving time on things you just wouldn't do otherwise
Interested in the article?
Let's discuss what this kind of automation can do in your company.
Free consultation