MARIA - AI-Powered Government Transparency Platform
Project Overview
MARIA is an AI-powered government procurement transparency platform for the Dominican Republic, using Claude 3.5 Sonnet Vision API to detect fraud and overvaluation in public tenders through automated document analysis.
Led technical architecture for team of 6 developers, implementing hybrid Python/Node.js microservices with custom rate limiter (8 req/min sliding window) for Anthropic API, reducing job queue failures from 80% to 16% with exponential backoff strategy.
Designed AI-powered data extraction system using Claude Vision API with custom prompt engineering for Spanish documents, processing 10,000+ procurement PDFs and achieving 95% extraction accuracy for institution details, award amounts, timelines, and winning companies.
Built Selenium-based web scraper for Dominican procurement portal (comunidad.comprasdominicana.gob.do) with advanced search filtering, date range selection, pagination handling, and automated PDF extraction from detail pages.
Implemented Python-based PDF processing pipeline using pdfplumber, pytesseract, and pdf2image for multi-page document conversion to PNG, supporting batch processing with 50 concurrent workers and state persistence.
Architected BullMQ job queue system with Redis for async document processing, implementing separate queues for PDF downloads, image conversion, AI extraction, and result persistence with automatic retry and dead-letter queue handling.
Integrated AWS S3 for PDF storage (10,000+ documents) with CloudFront signed URLs for secure access, implementing IAM authentication and automated bucket synchronization between raw and processed document stores.
Developed public-facing transparency dashboard with Next.js 15, React Query, and real-time data visualization, featuring fraud detection metrics, agency efficiency leaderboards, full-text search, and date range filtering.
Key Features
AI-Powered Data Extraction (95% Accuracy)
Claude Vision API with custom prompt engineering for Spanish documents, extracting institution details, award amounts, timelines, and winning companies from 10,000+ procurement PDFs with 95% accuracy.
Custom Rate Limiter (80% → 16% Failure Reduction)
Sliding window rate limiter (8 req/min) for Anthropic API with exponential backoff strategy, reducing job queue failures from 80% to 16% and ensuring API compliance.
Selenium Web Scraper
Automated scraper for Dominican procurement portal with advanced search filtering, date range selection (calendar interaction), pagination handling, and automated PDF extraction from detail pages.
Python Document Processing Pipeline
Multi-page PDF-to-PNG conversion using pdfplumber, pytesseract, and pdf2image, supporting batch processing with 50 concurrent workers and state persistence for reliable document handling.
BullMQ Job Queue Architecture
Redis-backed async processing with separate queues for PDF downloads, image conversion, AI extraction, and result persistence, featuring automatic retry, dead-letter queue, and health monitoring.
AWS S3/CloudFront Integration (10,000+ Documents)
Secure document storage with presigned URLs, IAM authentication, and automated synchronization between raw (avahi-unzip-tenders-og-dump) and processed document buckets.
Fraud Detection & Transparency Dashboard
Real-time fraud metrics, overvaluation percentage calculation, agency efficiency leaderboards, full-text search, and date range filtering (7d, 15d, 1m, 6m, 12m) for public transparency.
Team Leadership & Architecture
Led 6-developer team through technical architecture design, code reviews, PR-based workflow, and multi-service deployment coordination using PM2 process management.
Challenges & Solutions
Anthropic API Rate Limiting (80% Failure Rate)
Processing 10,000+ documents with Claude Vision API hit rate limits (8 req/min), causing 80% job failures and blocking pipeline progress with 429 errors.
Solution:
Designed custom sliding window rate limiter tracking requests per minute, implementing exponential backoff (2s → 4s → 8s), and coordinating across multiple BullMQ workers, reducing failures from 80% to 16%.
Spanish Document AI Extraction Accuracy
Extracting structured data from Spanish procurement PDFs with inconsistent formatting, multi-page layouts, and complex table structures required precise prompt engineering.
Solution:
Developed custom Claude Vision prompts with Spanish-specific instructions, multi-image processing for page-by-page analysis, and deterministic extraction (temperature 0), achieving 95% accuracy across 10,000+ documents.
Hybrid Python/Node Architecture
Coordinating PDF processing (Python pdfplumber, pytesseract) with Node.js API servers and BullMQ job queues required inter-process communication and state synchronization.
Solution:
Built Flask microservice for PDF-to-image conversion, exposed REST endpoints for Node workers, implemented shared Redis state for progress tracking, and used BullMQ for async job coordination across services.
Web Scraping with Dynamic JavaScript Portal
Dominican procurement portal (comunidad.comprasdominicana.gob.do) used dynamic JavaScript loading, AJAX pagination, and calendar date pickers requiring browser automation.
Solution:
Implemented Selenium WebDriver with headless Chrome, handled calendar interactions for date range selection, automated pagination through detail pages, and extracted PDF URLs from dynamically loaded content.
Project Gallery
MARIA platform processing government documents
Project Details
Timeline
May 2025 - September 2025
Company
Amrood Labs
My Role
Team Lead / Full-Stack Engineer