
How AI Document Classification Turns Piles of Mail into Actionable Data in Seconds
Dive into the OCR and transformer models Alfred uses to route 2,000 letters/min with 97% accuracy.
Manual tagging is dead. Modern mailrooms plug straight into business apps via AI…
1. From Scanner to Tokeniser
The journey from physical letter to actionable data begins with high-resolution scanning. Swiss Post's ePost service captures documents at 300 DPI, creating TIFF images that preserve every detail. But raw images are just the beginning.
Advanced OCR for Swiss Documents
Alfred employs a Tesseract-based OCR engine specifically tuned for Swiss requirements:
- Fraktur Support: Historical German script recognition for older documents
- Swiss-German Street Names: Custom dictionary for "Strasse/StraĂźe" variations
- Multi-language Detection: Automatic language identification for German, French, Italian, and Romansh
- Layout Analysis: Column detection, table extraction, and form field recognition
OCR Performance Metrics:
- • Character accuracy: 99.8% for printed text
- • Handwriting recognition: 94% accuracy
- • Processing speed: 0.3 seconds per page
- • Language detection: 99.5% accuracy
2. Transformer Magic
Once text is extracted, Alfred's AI brain takes over. We use a fine-tuned BERT-like transformer model that understands the nuances of business correspondence.
Model Architecture
Technical Specifications:
- • Base Model: Swiss-BERT (multilingual)
- • Parameters: 110M fine-tuned on 2M Swiss documents
- • Input: 512 token sequences
- • Output: 15 document classes + confidence scores
- • Inference time: <50ms per document
The model predicts document types with remarkable accuracy:
| Document Type | Precision | Recall | F1-Score |
|---|---|---|---|
| Invoice | 98.5% | 97.8% | 98.1% |
| Legal | 96.2% | 95.9% | 96.0% |
| Personal | 97.1% | 96.5% | 96.8% |
| Overall | 97.3% | 96.7% | 97.0% |
AI-powered mailrooms cut processing time by 40% on average in 2025.[packagex.io],[metasource.com]
3. Confidence Thresholds & Fallbacks
Not all documents are created equal. Alfred implements a sophisticated confidence system to ensure accuracy:
Confidence Handling:
- • >0.95: Automatic routing, no human review needed
- • 0.80-0.95: Route with flag for optional review
- • <0.80: Hold for manual classification
Below 0.80 confidence? Alfred leaves the doc un-routed and pings the user for feedback—improving the model nightly through active learning.
Continuous Learning Pipeline
- 1User corrects misclassified document
- 2Feedback logged with document features
- 3Nightly batch retraining on corrections
- 4Model validation on holdout set
- 5Deployment if performance improves
API Integration
Alfred's classification engine is accessible via a simple REST API:
POST /v1/documents/1234/classify
{
"document_id": "1234",
"extracted_text": "Rechnung Nr. 2025-001...",
"language": "de",
"confidence_threshold": 0.80
}Response:
{
"classifications": [
{
"type": "invoice",
"confidence": 0.973,
"sub_type": "supplier_invoice",
"entities": {
"vendor": "Swisscom AG",
"amount": 249.50,
"currency": "CHF",
"due_date": "2025-02-15"
}
}
],
"processing_time_ms": 47,
"model_version": "2.3.1"
}Real-World Performance
Letters per minute
Classification accuracy
Average processing time
The Future of Intelligent Mail Processing
As we look ahead, Alfred's AI capabilities continue to evolve:
- Vision Transformers: Next-gen models that understand document layouts without OCR
- Few-shot Learning: Adapt to new document types with just 10 examples
- Explainable AI: Show exactly why a document was classified a certain way
- Multimodal Processing: Combine text, layout, and visual features for 99%+ accuracy
Ready to Automate Your Mailroom?
See Alfred's AI classification in action with your own documents