Arsalan Younus.
Back to Projects

Medical Document Entity Extraction

AI-powered extraction that raised accuracy from 84% to 88% and removed weekly retraining cycles, cutting manual document review overhead by around 60%.

The Business Problem

Medical and insurance teams were spending hours manually extracting data from documents that mix printed forms, handwritten notes, and messy multi-page layouts. Critical fields were missed, and the previous system required weekly fine-tuning just to keep up with new form types, with accuracy stuck at 84%.

The client needed a production-ready pipeline that handles varied document layouts without constant model retraining, while pushing accuracy past 84% and reducing operational overhead.

The Technical Solution

I built a multimodal extraction pipeline using LLMs (OpenAI, QWEN3) orchestrated with LangGraph. The system ingests raw document images and layout metadata, runs structured extraction with fallbacks for low-confidence regions, and resolves multi-page entities (patient name, policy number) consistently across pages.

Output passes through normalization and validation gates before downstream systems consume it, so bad extractions are caught in the pipeline, not in production databases.

The Scalability Factor

The pipeline ships via Docker containers on AWS with GitHub Actions CI/CD. Every model or prompt change goes through automated build and staged deployment before reaching production traffic.

LangGraph orchestration includes provider fallbacks (OpenAI, QWEN3) so a single API outage does not halt extraction. Validated output gates and monitoring on extraction confidence scores catch regressions before they reach downstream systems.

Business Impact

Extraction accuracy improved from 84% to 88%.

Weekly fine-tuning cycles eliminated; the system generalizes to new form types with fewer updates. Manual review time dropped by around 60%.

Built with

OpenAI
QWEN3
LangGraph
LLMs
AWS
Docker
Python
Medical Document Entity Extraction screenshot 1
View
Medical Document Entity Extraction screenshot 2
View
Medical Document Entity Extraction screenshot 3
View
Medical Document Entity Extraction screenshot 4
View