Back to projects

MARIA - AI-Powered Government Transparency Platform

MARIA - AI-Powered Government Transparency Platform
Amrood Labs

Project Overview

MARIA is an AI-powered government procurement transparency platform for the Dominican Republic, using Claude 3.5 Sonnet Vision API to detect fraud and overvaluation in public tenders through automated document analysis.

Led technical architecture for team of 6 developers, implementing hybrid Python/Node.js microservices with custom rate limiter (8 req/min sliding window) for Anthropic API, reducing job queue failures from 80% to 16% with exponential backoff strategy.

Designed AI-powered data extraction system using Claude Vision API with custom prompt engineering for Spanish documents, processing 10,000+ procurement PDFs and achieving 95% extraction accuracy for institution details, award amounts, timelines, and winning companies.

Built Selenium-based web scraper for Dominican procurement portal (comunidad.comprasdominicana.gob.do) with advanced search filtering, date range selection, pagination handling, and automated PDF extraction from detail pages.

Implemented Python-based PDF processing pipeline using pdfplumber, pytesseract, and pdf2image for multi-page document conversion to PNG, supporting batch processing with 50 concurrent workers and state persistence.

Architected BullMQ job queue system with Redis for async document processing, implementing separate queues for PDF downloads, image conversion, AI extraction, and result persistence with automatic retry and dead-letter queue handling.

Integrated AWS S3 for PDF storage (10,000+ documents) with CloudFront signed URLs for secure access, implementing IAM authentication and automated bucket synchronization between raw and processed document stores.

Developed public-facing transparency dashboard with Next.js 15, React Query, and real-time data visualization, featuring fraud detection metrics, agency efficiency leaderboards, full-text search, and date range filtering.

Key Features

AI-Powered Data Extraction (95% Accuracy)

Claude Vision API with custom prompt engineering for Spanish documents, extracting institution details, award amounts, timelines, and winning companies from 10,000+ procurement PDFs with 95% accuracy.

Custom Rate Limiter (80% → 16% Failure Reduction)

Sliding window rate limiter (8 req/min) for Anthropic API with exponential backoff strategy, reducing job queue failures from 80% to 16% and ensuring API compliance.

Selenium Web Scraper

Automated scraper for Dominican procurement portal with advanced search filtering, date range selection (calendar interaction), pagination handling, and automated PDF extraction from detail pages.

Python Document Processing Pipeline

Multi-page PDF-to-PNG conversion using pdfplumber, pytesseract, and pdf2image, supporting batch processing with 50 concurrent workers and state persistence for reliable document handling.

BullMQ Job Queue Architecture

Redis-backed async processing with separate queues for PDF downloads, image conversion, AI extraction, and result persistence, featuring automatic retry, dead-letter queue, and health monitoring.

AWS S3/CloudFront Integration (10,000+ Documents)

Secure document storage with presigned URLs, IAM authentication, and automated synchronization between raw (avahi-unzip-tenders-og-dump) and processed document buckets.

Fraud Detection & Transparency Dashboard

Real-time fraud metrics, overvaluation percentage calculation, agency efficiency leaderboards, full-text search, and date range filtering (7d, 15d, 1m, 6m, 12m) for public transparency.

Team Leadership & Architecture

Led 6-developer team through technical architecture design, code reviews, PR-based workflow, and multi-service deployment coordination using PM2 process management.

Challenges & Solutions

Anthropic API Rate Limiting (80% Failure Rate)

Processing 10,000+ documents with Claude Vision API hit rate limits (8 req/min), causing 80% job failures and blocking pipeline progress with 429 errors.

Solution:

Designed custom sliding window rate limiter tracking requests per minute, implementing exponential backoff (2s → 4s → 8s), and coordinating across multiple BullMQ workers, reducing failures from 80% to 16%.

Spanish Document AI Extraction Accuracy

Extracting structured data from Spanish procurement PDFs with inconsistent formatting, multi-page layouts, and complex table structures required precise prompt engineering.

Solution:

Developed custom Claude Vision prompts with Spanish-specific instructions, multi-image processing for page-by-page analysis, and deterministic extraction (temperature 0), achieving 95% accuracy across 10,000+ documents.

Hybrid Python/Node Architecture

Coordinating PDF processing (Python pdfplumber, pytesseract) with Node.js API servers and BullMQ job queues required inter-process communication and state synchronization.

Solution:

Built Flask microservice for PDF-to-image conversion, exposed REST endpoints for Node workers, implemented shared Redis state for progress tracking, and used BullMQ for async job coordination across services.

Web Scraping with Dynamic JavaScript Portal

Dominican procurement portal (comunidad.comprasdominicana.gob.do) used dynamic JavaScript loading, AJAX pagination, and calendar date pickers requiring browser automation.

Solution:

Implemented Selenium WebDriver with headless Chrome, handled calendar interactions for date range selection, automated pagination through detail pages, and extracted PDF URLs from dynamically loaded content.

Project Gallery

MARIA platform processing government documents

MARIA platform processing government documents

Project Details

Timeline

May 2025 - September 2025

Company

Amrood Labs

My Role

Team Lead / Full-Stack Engineer

Technologies Used

Node.js
Express
Python
Flask
Claude Vision API
Anthropic AI
Selenium
pdfplumber
pytesseract
pandas
Next.js 15
React 19
TypeScript
TanStack React Query
MongoDB
Mongoose
Redis
BullMQ
AWS S3
AWS CloudFront
PM2
REST APIs
Microservices
Event-driven Architecture
Web Scraping
OCR