WorryLens — Multimodal Health Signal Intelligence

Inspiration

Medical anxiety often starts small. A minor symptom leads to a search, which quickly spirals into worst-case scenarios. Existing tools provide information, but not clarity—they leave users to interpret risk on their own.

WorryLens was inspired by a simple idea:

Instead of asking users to guess what’s wrong, what if we could read signals directly from the body?

Your voice and skin already contain measurable indicators of stress, fatigue, and potential health irregularities. We wanted to turn those raw signals into something structured, grounded, and actionable—before anxiety takes over.


What it does

WorryLens is a multimodal AI health signal analyzer that uses:

  • Voice input to detect stress, fatigue, and vocal irregularities
  • Camera input to assess skin condition, hydration, and visible anomalies

These signals are transformed into structured features and combined into a unified interpretation.

Instead of returning generic AI responses, WorryLens provides:

  • signal-based health insights
  • confidence-aware feedback
  • grounded, non-diagnostic guidance

At its core:

$$ \text{Insight} = f(\text{Voice}, \text{Skin}, \text{User Context}) $$


How we built it

We designed WorryLens as a signal → structure → reasoning pipeline:

  1. Input Layer

    • Browser-based microphone and camera capture
    • Real-time voice and image acquisition
  2. Feature Extraction

    • Voice → speech patterns, pacing, and tone analysis
    • Skin → computer vision heuristics (texture, color variation, contrast)
  3. Multimodal Fusion

    • Convert signals into structured features
    • Combine into a unified representation of user state
  4. Reasoning Layer

    • AI interprets structured inputs
    • Outputs grounded insights instead of raw generated text

Challenges we ran into

  • Signal noise & variability
    Lighting conditions and background noise impacted reliability, requiring normalization and fallback handling.

  • Avoiding the “GPT wrapper” trap
    We moved beyond text-in/text-out by building a structured pipeline with real intermediate signals.

  • Trust vs. overclaiming
    Providing useful insight without implying diagnosis required careful framing and confidence-aware outputs.


Accomplishments that we're proud of

  • Built a true multimodal system combining voice and visual signals
  • Designed a structured reasoning pipeline, not just a chatbot
  • Addressed a real and widely experienced problem (health anxiety)
  • Delivered a working real-time product within hackathon constraints

What we learned

  • Multimodal systems are significantly more complex than text-based AI
  • Users trust systems that show how conclusions are formed
  • Explicit uncertainty improves credibility and usability
  • Solving real problems requires balancing technical accuracy with human psychology

What's next for WorryLens

  • Improve signal accuracy with better normalization and calibration
  • Add confidence scoring and uncertainty visualization
  • Integrate clinical grounding and medical reference layers
  • Expand to additional health signals beyond voice and skin
  • Develop into a pre-diagnostic companion tool for everyday health awareness

Built With

Share this project:

Updates