Arsalan Younus.
Home

Solutions

What I Deliver

Production AI across document intelligence, agentic RAG, and cloud infrastructure — measurable outcomes, not slide decks.

Document Intelligence & Extraction

Automate document intake from scanned images to structured data, without weekly model retraining. Production pipelines that handle messy layouts, handwriting, and multi-page forms at scale.

  • OCR and layout parsing for invoices, forms, and ID documents
  • 88%+ field-level accuracy on production document volumes
PyTorch
Computer Vision
OCR

Agentic AI & RAG Systems

Replace hours of manual research with natural-language interfaces grounded in your data. Chatbots and agentic workflows that query, analyze, and answer complex multi-step questions from a single conversation.

  • RAG over private docs, databases, and APIs
  • Multi-step agent workflows with LangGraph
LLMs & RAG
LangGraph
Python

Production AI Infrastructure

Take AI from prototype to production with CI/CD, containerized deployments on AWS/Kubernetes, and self-hosted LLM serving, so your models ship reliably and stay running.

  • Containerized services on AWS EKS with rolling deploys
  • Self-hosted LLM serving (2× instances in production)
AWS & Kubernetes
CI/CD (Jenkins, GitHub Actions)
Docker

Computer Vision & Detection

Custom vision models for detection, localization, and classification on real-world imagery, from handwriting regions on forms to real-time object detection in video streams.

  • Handwriting localization and noise classification on scanned docs
  • YOLO-based detection with 90%+ accuracy in production
YOLO
OpenCV
PyTorch

Self-Hosted LLM & GPU Inference

Run open-weight models on your own hardware for lower cost, lower latency, and full data privacy. Tuned serving with KV cache and prefix caching on dedicated GPUs.

  • Production serving on RTX 4090 and A100 80GB hardware
  • KV cache and prefix caching for throughput gains
LLMs & RAG
GPU Inference
Docker

Model Benchmarking & Selection

Compare cloud APIs and open-source models on your actual workload before committing. Data-driven picks on price, accuracy, and latency so you do not overpay for the wrong model.

  • Benchmarked Azure OpenAI, AWS Bedrock, Qwen, and Llama 4
  • Price vs accuracy trade-offs on real production tasks
LLMs & RAG
AWS SageMaker
Python

AI Architecture & Team Leadership

Technical leadership for GenAI teams: product direction, sprint delivery, and the architecture decisions that keep AI initiatives shipping instead of stalling in POC limbo.

  • Led GenAI team delivering 10+ production ML systems
  • Sprint planning, boards, and delivery rhythm across products
LangGraph
FastAPI
AWS

How I work

From first call to production — tuned for AI where data and deployment matter as much as the model.

Step 1

Discover

Map your data, constraints, and success metrics before writing code.

Step 2

Build

Iterate in tight loops with working prototypes you can evaluate early.

Step 3

Ship

Deploy to production with monitoring, CI/CD, and handoff documentation.

Beyond work

Open source, ML reading, travel, and gaming.

Open SourceTech ReadingTravelingGaming

Ready to build something?

See case studies in production or get in touch to discuss your project.