Amrood Labs
MARIA - AI-Powered Government Transparency Platform
Led team of 6 developers building an LLM-powered government transparency platform with a full RAG pipeline — document intelligence via Claude 3.5 Sonnet Vision API, vector-embedded semantic search across 10,000+ procurement PDFs, LangChain orchestration, 95% extraction accuracy, and ~$50K annual cost savings.
Project Overview
MARIA is an LLM-powered government procurement transparency platform for the Dominican Republic — combining a full RAG pipeline, Claude 3.5 Sonnet Vision API document intelligence, and semantic search to detect fraud and overvaluation in public tenders at scale.
Led technical architecture for team of 6 developers, implementing hybrid Python/Node.js microservices with custom sliding window rate limiter (8 req/min) for Anthropic API achieving 99.8% compliance. Implemented jsonrepair + Zod validation improving extraction success from 75% to 98%.
Designed AI document intelligence pipeline using Claude 3.5 Sonnet Vision API with 4 specialized prompts for Spanish procurement documents (general info, awards, timelines, bids), processing 10,000+ PDFs at 95% extraction accuracy — reducing manual review by 40 hrs/week and delivering ~$50K annual cost savings.
Built a RAG (Retrieval-Augmented Generation) pipeline using LangChain: extracted document text is chunked, embedded into a vector store, and indexed for semantic retrieval — enabling natural language querying across the full corpus of government procurement records powering the public transparency dashboard.
Built Selenium-based web scraper for Dominican procurement portal (comunidad.comprasdominicana.gob.do) with advanced search filtering, date range selection, pagination handling, and automated PDF extraction from detail pages.
Implemented Python-based PDF processing pipeline using pdfplumber, pytesseract, and pdf2image for multi-page document conversion to PNG, supporting batch processing with 50 concurrent workers and state persistence.
Architected BullMQ job queue system with Redis for async document processing, implementing separate queues for PDF downloads, image conversion, AI extraction, and result persistence with automatic retry and dead-letter queue handling.
Integrated AWS S3 for PDF storage (10,000+ documents) with CloudFront signed URLs for secure access, implementing IAM authentication and automated bucket synchronization between raw and processed document stores.
Developed public-facing transparency dashboard with Next.js 15, React Query, and real-time data visualization, featuring fraud detection metrics, agency efficiency leaderboards, full-text search, and date range filtering.
Key Features
RAG Pipeline — Semantic Document Intelligence
Built end-to-end Retrieval-Augmented Generation pipeline using LangChain: processed documents are chunked, embedded into a vector store, and indexed for semantic retrieval. Enables natural language querying across 10,000+ government procurement records — powering fraud detection insights and the public transparency dashboard.
AI Document Intelligence (95% Extraction Accuracy)
Claude 3.5 Sonnet Vision API with 4 specialized prompt templates for Spanish procurement documents — extracting institution details, award amounts, contract timelines, and winning bidders with 95% accuracy across 10,000+ PDFs.
Custom Rate Limiter (80% → 16% Failure Reduction)
Sliding window rate limiter (8 req/min) for Anthropic API with exponential backoff strategy, reducing job queue failures from 80% to 16% and ensuring API compliance.
Selenium Web Scraper
Automated scraper for Dominican procurement portal with advanced search filtering, date range selection (calendar interaction), pagination handling, and automated PDF extraction from detail pages.
Python Document Processing Pipeline
Multi-page PDF-to-PNG conversion using pdfplumber, pytesseract, and pdf2image, supporting batch processing with 50 concurrent workers and state persistence for reliable document handling.
BullMQ Job Queue Architecture
Redis-backed async processing with separate queues for PDF downloads, image conversion, AI extraction, and result persistence, featuring automatic retry, dead-letter queue, and health monitoring.
AWS S3/CloudFront Integration (10,000+ Documents)
Secure document storage with presigned URLs, IAM authentication, and automated synchronization between raw (avahi-unzip-tenders-og-dump) and processed document buckets.
Fraud Detection & Transparency Dashboard
Real-time fraud metrics, overvaluation percentage calculation, agency efficiency leaderboards, full-text search, and date range filtering (7d, 15d, 1m, 6m, 12m) for public transparency.
Team Leadership & Architecture
Led 6-developer team through technical architecture design, code reviews, PR-based workflow, and multi-service deployment coordination using PM2 process management.
Challenges & Solutions
Anthropic API Rate Limiting (80% Failure Rate)
Processing 10,000+ documents with Claude Vision API hit rate limits (8 req/min), causing 80% job failures and blocking pipeline progress with 429 errors.
Solution
Designed custom sliding window rate limiter tracking requests per minute, implementing exponential backoff (2s → 4s → 8s), and coordinating across multiple BullMQ workers, reducing failures from 80% to 16%.
Spanish Document AI Extraction Accuracy
Extracting structured data from Spanish procurement PDFs with inconsistent formatting, multi-page layouts, and complex table structures required precise prompt engineering.
Solution
Developed custom Claude Vision prompts with Spanish-specific instructions, multi-image processing for page-by-page analysis, and deterministic extraction (temperature 0), achieving 95% accuracy across 10,000+ documents.
Hybrid Python/Node Architecture
Coordinating PDF processing (Python pdfplumber, pytesseract) with Node.js API servers and BullMQ job queues required inter-process communication and state synchronization.
Solution
Built Flask microservice for PDF-to-image conversion, exposed REST endpoints for Node workers, implemented shared Redis state for progress tracking, and used BullMQ for async job coordination across services.
Web Scraping with Dynamic JavaScript Portal
Dominican procurement portal (comunidad.comprasdominicana.gob.do) used dynamic JavaScript loading, AJAX pagination, and calendar date pickers requiring browser automation.
Solution
Implemented Selenium WebDriver with headless Chrome, handled calendar interactions for date range selection, automated pagination through detail pages, and extracted PDF URLs from dynamically loaded content.
Project Gallery
MARIA platform processing government documents
Project Details
Timeline
May 2025 - September 2025
Company
Amrood Labs
My Role
Team Lead / Full-Stack Engineer
Tech Stack
Links
Visit Live Site