PROJECT STORY — Luc.IA: A Multi-Agent Application

Inspiration

Latin America is facing a silent epidemic. Over 7.6 million people are living with dementia across the region today — a number projected to exceed 27 million by 2050. Yet the system meant to catch it early is broken in three specific, connected ways: families don't get a diagnosis until the disease is already advanced, care is fragmented across providers who don't talk to each other, and the informal caregivers — mostly women — are left to absorb everything alone, with no training, no respite, and no support.

What struck us wasn't just the scale of the problem. It was the precision of the failure. Three things go wrong, in sequence, every time. Late diagnosis leads to fragmented care. Fragmented care overwhelms caregivers. Overwhelmed caregivers burn out. And the cycle repeats.

We built Luc.IA because we believed that if you could address all three failures in a single, coordinated system — one that works in Spanish and Portuguese, designed for both urban and rural low-resource settings — you could actually change the trajectory for patients and families across the region.

The name Luc.IA comes from luz (light in Spanish and Portuguese) and IA (Inteligencia Artificial). It's meant to evoke clarity in a system that has been opaque for too long.


What It Does

Luc.IA is a multi-agent AI application that combines three specialized agents into a unified care pipeline, orchestrated by a central supervisor.

  • The Screening Agent guides families or primary care workers through a validated cognitive screening protocol — the Mini-Cog (3-word recall + clock drawing description) combined with an Activities of Daily Living questionnaire (6 activities) — in a natural, conversational format adapted for low-literacy users. It computes a weighted composite score (Mini-Cog 60%, ADL 40%) and classifies risk as low, moderate, or high. The structured screening record is persisted to a shared SQLite database via an idempotent upsert keyed on session ID.

  • The Care Coordination Agent picks up where screening ends. It generates referral recommendations with validated urgency levels (immediate, within 72 hours, or routine), schedules appointments with time-overlap detection to prevent double-booking, produces de-identified inter-provider handoff notes with PII automatically stripped, and monitors care plan adherence using optimistic locking for concurrent updates — flagging gaps before patients fall through the cracks.

  • The Caregiver Support Agent runs in parallel, focused entirely on the informal caregiver. It conducts structured wellbeing check-ins across five domains (sleep, emotional, physical, social, burden), computes a weighted burnout score (0–100) with domain-specific weights, surfaces support resources matched to the caregiver's language, country, and region from the database, and escalates to a human coordinator when burnout risk crosses a critical threshold (≥70). Escalation includes country-specific crisis line numbers for 8 LATAM countries and is DB-backed idempotent per check-in via an escalation_log table to prevent duplicate notifications.

All three agents share a common patient and family record stored in SQLite with 11 database tables and an audit log. A Main Orchestrator classifies intent (screening, care coordination, caregiver support, or general inquiry), detects language (Spanish or Portuguese), routes every incoming message to the right specialist, maintains session context across turns via an in-process TTL cache, and ensures continuity across the full care journey — so users never have to repeat themselves.

The system currently operates in Spanish and Portuguese via a web chat interface (MVP), with a FastAPI REST layer for health system integrations. Planned expansion includes WhatsApp, SMS, Alexa, and voice channels via Amazon Nova 2 Sonic.


How We Built It

We built Luc.IA on Amazon Bedrock and the Strands Agents SDK, using the Agent-to-Agent (A2A) protocol to wire the specialist agents together.

All agents run on Amazon Nova 2 Lite — fast, cost-effective, and well-suited for both complex reasoning and structured conversational interactions. Voice input and output are planned for Phase 2 using Nova 2 Sonic for natural speech in Spanish and Portuguese.

The backend is Python 3.12+ with FastAPI for the REST layer, SQLite (via async SQLAlchemy + aiosqlite) for the shared record, and structlog for structured JSON observability with automatic PII scrubbing. All configuration is centralized in a config.py dataclass singleton loaded from .env via python-dotenv — no scattered os.getenv calls, no surprises at runtime.

The shared infrastructure layer (shared/) provides:

  • Database: Async SQLAlchemy engine with transactional session management and optimistic locking (version columns) on mutable tables (patients, care plans, caregiver records)
  • Caching: In-process TTL caches (via cachetools) for sessions (30 min), resources (1 hour), question banks (1 hour), and risk statistics (24 hours)
  • PII protection: HMAC-SHA256 pseudonymization (truncated to 32 hex chars) using a dedicated PII_HMAC_KEY (falls back to the deployment's secret key), integrated into both the logging pipeline and API responses
  • Bedrock integration: Model factory with support for both IAM credentials and Bedrock API keys (Bearer token auth), plus exponential backoff retry for transient errors
  • HTTP client: Shared async httpx singleton for inter-agent escalation notifications

The API layer uses Bearer token authentication on all endpoints except the health check, which reports liveness status for API, database, and Bedrock connectivity. The sessions endpoint supports both streaming (SSE) and synchronous response modes.

We designed the system with property-based testing in mind from the start. Every agent tool has formal correctness properties — invariants, monotonicity conditions, round-trip guarantees, idempotence checks — specified before a line of implementation code was written. This gave us a clear definition of "correct" that we could test against, not just a list of features to ship.

The architecture follows a working-backwards approach: we started from the outcomes we needed (earlier diagnosis, reduced caregiver burnout, fewer avoidable hospitalizations) and worked back to the capabilities required to produce them.


Challenges We Ran Into

  • Designing for low-resource environments. Most AI applications assume reliable connectivity and reasonable literacy. Luc.IA has to work for a rural community health worker in Colombia with intermittent 3G and a patient who has never used a smartphone. That constraint shaped every design decision — from the conversational screening format (≤10 words per sentence for low-literacy users) to the offline-first Phase 2 architecture to the plain-language requirements for agent responses.

  • The clock drawing problem. The Mini-Cog screening protocol includes a clock drawing task — a validated cognitive test that requires visual interpretation. In a text-based MVP, there's no clean way to administer it. We adapted it to a verbal description task: the patient describes the clock they drew, and the agent evaluates the description using keyword matching for correctness indicators in both Spanish and Portuguese. Phase 2 with Nova 2 Omni multimodal input could enable actual image analysis.

  • Keeping PII out of model context. Health data is sensitive. Dementia health data is especially sensitive. Ensuring that no raw personally identifiable information — names, phone numbers, national IDs — ever reaches a Bedrock model prompt required careful data architecture: HMAC-SHA256 pseudonymized identifiers, a strip_pii() utility integrated into both the structlog pipeline and API responses, and explicit audit trails on every database write.

  • Coordinating four agents without losing context. Multi-agent systems are easy to design on a whiteboard and hard to get right in practice. Session state has to survive agent handoffs, process restarts, and partial failures. We implemented in-process TTL caches for session state (30-minute TTL, 1024 max entries) and optimistic locking with version columns on mutable database tables to handle concurrent updates safely.

  • Defining burnout algorithmically. The burnout monitor needs to produce a score in [0, 100] that is clinically meaningful, testable, and explainable to a caregiver. We implemented a weighted domain model: burden (30%), emotional (25%), sleep (20%), physical (15%), social (10%), with each domain scored 0–10 and the weighted sum normalised to 0–100. The dominant contributing domain is identified alongside the score. Whether to align this more closely with validated clinical scales (Zarit Burden Interview, PHQ-9) remains an open question.


Accomplishments That We're Proud Of

We're proud that Luc.IA addresses a real, documented, and growing crisis — not a hypothetical one. The 7.6 million people living with dementia in Latin America today are not an abstraction. The women burning out as unpaid caregivers are not an abstraction. We built something aimed directly at them.

We're proud of the architecture. Four agents, one shared record with 11 database tables and an audit log, optimistic locking for safe concurrent updates, a clean A2A protocol, and a FastAPI layer that makes the whole system integrable with existing health infrastructure — without requiring health systems to rip and replace anything they already have.

We're proud of the correctness-first approach. Property-based test specifications written before implementation, covering invariants that matter clinically: risk classification uses weighted composite scoring (Mini-Cog 60% + ADL 40%), burnout scores are bounded to [0, 100] with explicit domain weights, appointments never double-book thanks to overlap detection, escalations are idempotent per (caregiver_id, checkin_id) pair, and care plan updates use optimistic locking. These aren't just unit tests — they're executable specifications of what the system must be.

We're proud of the multilingual design. Spanish and Portuguese are first-class citizens, not afterthoughts. Separate question banks for both languages in screening (10 questions) and caregiver check-ins (5 questions), culturally adapted prompts, plain-language requirements calibrated to a 6th-grade reading level, and a resource recommender that filters by country, region, and language from the database — these details matter for adoption in the communities that need this most.


What We Learned

Building for health in Latin America is not the same as building for health in a high-income country. The constraints are different, the trust dynamics are different, the infrastructure is different. Designing for the hardest case — rural, low-literacy, intermittent connectivity — produces a better system for everyone, not just the edge cases.

Multi-agent systems require a different kind of specification discipline. When four agents share state and hand off to each other, the failure modes multiply. Writing correctness properties before writing code forced us to think clearly about what each component must guarantee — and revealed ambiguities in the design that would have been expensive to discover in production.

The working-backwards method works. Starting from outcomes (earlier diagnosis, reduced burnout) and working back to capabilities (screening tool, burnout monitor, escalation tool) kept the design grounded in what actually matters. Every feature in the system traces back to a specific failure in the current care pathway.

We also learned that the hardest problems in this space are not technical. The clock drawing question, the burnout algorithm, the pilot country selection, the EHR authentication strategy — these are decisions that require clinical expertise, community input, and policy context that no amount of engineering can substitute for.


What's Next for Luc.IA

  • The MVP establishes the core pipeline: web chat, four agents, shared SQLite record with 11 tables, FastAPI layer with Bearer token auth, Spanish and Portuguese. That's the foundation.

  • Phase 2 expands reach: WhatsApp and SMS channels for mobile-first access, voice interaction via Nova 2 Sonic, an Alexa skill for voice-first home interactions, and a Community Health Worker copilot mode for offline-first screenings in rural areas where patients can't interact with the system directly.

  • Phase 3 scales impact: a population risk stratification agent for ministry-level resource planning (the /analytics/population-risk endpoint stub is already in place), integration with national health registries (SISPRO in Colombia, IMSS in Mexico, SUS in Brazil), EHR medication data in provider summaries, and an outcome analytics dashboard for pilot monitoring and scale decisions.

The immediate next step is a pilot in one country — likely Colombia, Mexico or Peru, both of which have high dementia prevalence and existing community health worker networks. The pilot will test the screening pipeline end-to-end, measure referral completion rates, and establish a baseline for caregiver burnout scores that we can track over 90 days.

The longer-term vision is a system that any LATAM health ministry, insurer, or community organization can deploy — one that shifts dementia care from crisis response to early, coordinated, and sustainable management. Luc.IA is the first step toward that.

Built With

  • a2a-protocol
  • aiobotocore
  • aiosqlite
  • amazon-bedrock
  • amazon-nova-2-lite
  • amazon-nova-2-pro
  • anyio
  • boto3
  • cachetools
  • fastapi
  • httpx
  • hypothesis
  • mypy
  • pydantic
  • pytest
  • pytest-asyncio
  • python-3.12
  • ruff
  • sqlalchemy-(async)
  • sqlite
  • strands-agents-sdk
  • structlog
  • uvicorn
Share this project:

Updates