Scanned Tenders and Handwritten Notes: How AI Processes Difficult Documents

Technology
Team BlackSwanAIFebruary 12, 20269 min

OCR (Optical Character Recognition) for tender documents is the use of optical character recognition to convert scanned PDFs, photographed documents and handwritten notes into machine-readable text. Advanced OCR systems go far beyond simple text recognition: they understand the structure of bills of quantities, recognize tables in scanned documents and process handwritten additions — stamps, margin notes and corrections that are commonplace in the construction industry. For door manufacturers, window builders and precast plants that daily work with a mix of digital and analog documents, OCR closes the last gap in AI-powered tender analysis.

What Is OCR for Tender Documents?

OCR for tender documents refers to the automated conversion of scanned or photographed tender documents into machine-readable, structured text. Unlike generic OCR that simply extracts text from images, OCR specialized for tenders understands the document structure: position numbers, quantities, units, short text and long text are not only recognized but correctly assigned. The challenge with tender documents is their variety: cleanly scanned bills of quantities, skewed plan scans, PDFs from photographed documents, documents with stamps, handwritten corrections and sticky notes. Each of these formats requires a different processing strategy. Modern OCR systems combine multiple recognition technologies — from classical character recognition through handwriting recognition to AI-powered context analysis that delivers reliable results even with poor image quality.

The Reality: Not All Tenders Are Digital

In an ideal world, all tenders would be cleanly structured GAEB files. Reality looks different: Scanned existing plans for renovation projects — often decades old, with handwritten dimension entries and corrections. Photographed protocols from site visits with notes on installation situations. Older tenders archived as scanned PDFs and reused for follow-up projects. Documents with official stamps, signatures and handwritten notes that may be contractually relevant. The figures vary by industry, but conversations with door manufacturers, window manufacturers and precast plants show: 10-30% of all tender documents contain scanned components. For renovation projects and existing buildings, the proportion is even higher. A system that can only process clean digital documents ignores a relevant portion of daily volume. For public tenders under VOB/VgV, the main documents are typically digital, but appendices — particularly existing plans and expert reports — frequently come as scans. For private tenders, the level of digitization is even more variable.

How Advanced OCR Works for Construction Tenders

Processing scanned tender documents follows a multi-stage process: 1. Image preprocessing — Skewed scans are deskewed, contrast and brightness optimized, noise reduced. Multi-page documents are automatically oriented. 2. Layout analysis — Before text is recognized, the system analyzes the page structure: where are tables, headings, body text, illustrations? This structural recognition is critical for correctly capturing BoQ positions. 3. Text recognition — Printed text is recognized with high accuracy. For tenders in German, the models are optimized for technical terminology — specialist terms like fire protection class, frame dimensions or exposure class are reliably recognized even with poor print quality. 4. Handwriting recognition — Handwritten notes, corrections and additions are separately recognized and marked. Recognition is reliable for legible handwriting; for illegible passages, this is transparently flagged. 5. Stamp and marking recognition — Official stamps, review marks and colored markings are identified and assigned to context. 6. Quality validation — Recognized texts are checked for plausibility: are position numbers correct, are quantities realistic, do units match the described services?

GAEB + PDF + Scans: Processing the Format Mix

In practice, tender documents rarely consist of just one format. A typical package contains: GAEB files with the structured bill of quantities, PDF documents with building description, contract terms and technical specifications, scanned plans and existing documentation, and occasionally photos from site visits with handwritten notes. BlackSwanAI processes this entire format mix and merges the results into a unified analysis. GAEB files provide the structured BoQ basis, PDFs supplement the technical details and scanned documents close the gaps. The merging is critical: when the GAEB BoQ contains a door position but the fire protection requirement is specified in a scanned addendum, the AI must make this connection. That is exactly what cross-format analysis delivers — all requirements are consolidated position by position, regardless of which document section they originate from.

Quality Assurance: How AI Validates OCR Results

OCR is not infallible — particularly with poor scan quality, old documents or hard-to-read handwriting. Quality assurance of OCR results is therefore a central component of the process: Confidence scores: every recognized character receives a confidence score. Areas with low scores are flagged and presented to the user for manual review. Context checking: recognized values are checked for plausibility. A door height of '21.50 m' instead of '2.15 m' is flagged as a probable OCR error. Cross-referencing: when the same information appears in multiple document sections — for example dimension specifications in both the BoQ and plans — the values are compared. Discrepancies are flagged as clarification points. Transparency: the system always shows the user which information comes from scanned sources and what confidence the recognition has. There are no hidden uncertainties. This validation approach aims to ensure that OCR-based analyses approach the reliability of analysis of clean digital documents — with the difference that manual review effort arises for low-confidence areas.

Industry Examples: Doors, Windows, Precast — Real Document Challenges

Door manufacturers: A renovation project in an existing building — the existing plans are scanned drawings from 1975 with handwritten dimension corrections. The AI recognizes the opening dimensions, marks handwritten corrections as 'verified via handwriting' and warns where plan quality does not permit reliable dimension extraction. Result: sales immediately knows where an on-site survey is necessary. Window manufacturers: A private tender is submitted as a photographed document — 50 pages, partly at an angle. The OCR deskews the images, recognizes the BoQ positions with U-value requirements and sound insulation classes, and flags three positions where image quality does not permit reliable number recognition. Result: 95% of positions are immediately analyzable, 5% require follow-up. Precast plant: Older tender documents from a public client — partly as scanned PDFs with official stamps and review marks. The OCR recognizes the BoQ structure, identifies stamps as contractually relevant notes and extracts the precast positions with exposure classes and strength specifications. Result: despite poor scan quality, the initial analysis is ready in minutes.

Frequently Asked Questions

How well does OCR work with handwritten notes?
Recognition accuracy depends on the handwriting: for legible handwriting, numbers and short notes are reliably recognized. For hard-to-read passages, the system flags the affected areas and shows the user the original image for manual review. In every case, it is transparent what was automatically recognized and what should be manually checked.
Are original scans stored?
All original documents are stored in accordance with GDPR standards and used exclusively for your analysis.
How does the system handle poor scan quality?
The system uses image preprocessing (deskewing, contrast optimization, noise reduction) to extract usable results even from poor scans. For extremely poor quality, this is transparently communicated so the user can decide whether a new scan or manual entry is more practical.
Does OCR also work for foreign-language documents?
The system is primarily optimized for German and English tender documents. The 5-lens analysis is supported in both languages.
Can GAEB files contain scanned appendices?
Yes, in practice GAEB files are often supplemented by scanned appendices. The system processes the format mix and merges the results.

Conclusion

Not all tenders arrive as clean digital files — scanned PDFs, handwritten notes and stamps are part of everyday life in the construction industry. Advanced OCR technology closes this gap and makes even difficult documents accessible for AI-powered tender analysis. With transparent quality assurance and cross-format analysis, door manufacturers, window builders and precast plants receive a complete initial assessment — regardless of whether the documents arrive as GAEB files, PDFs or scanned documents. Test the processing of your tender documents at /en/kostenlose-analyse.

Try it on your own tender

Upload a tender document and get a free Tender Dossier within 48 hours — no risk, no registration.