Inspiration

Skin conditions are one of the most common reasons people search symptoms online, yet most advice ignores the single variable that matters most — where you are standing right now. A rash in Los Angeles after a beach day is a different story than the same rash in a humid, high-pollen city, or under a UV index of 11 vs. 3. Existing dermatology apps treat images in isolation, as if the skin lived in a vacuum.

We wanted to build something that asked a more honest question: given the image, the person, and the environment they were in, what is the most likely explanation, and how urgent is it? That question became EnviroSkin.

What it does

EnviroSkin is a browser app where a user:

  1. Uploads a photo of a skin concern.
  2. Shares lightweight context (ZIP code, recent exposure, symptoms).
  3. Receives a structured triage response — a green / yellow / red severity level, a likely condition, environment-linked drivers, and clear next steps.

Under the hood, each analysis fuses three signals:

  • Visual evidence — a local skin-classification model that produces top-$k$ candidate labels from the uploaded image.
  • Personal context — what the user entered about symptoms and recent exposures.
  • Environmental context — real air-quality readings (AirNow), UV exposure, and oceanographic context pulled from the CalCOFI dataset near the user's ZIP code.

All three streams are packaged into a prompt for Gemini 2.5 Flash, which returns a strictly-structured JSON object (triage level, confidence, causes, symptoms, treatments, environmental drivers, voice narration). The frontend renders this as an explainable breakdown rather than a black-box label, and every piece of copy can be translated on demand into 10 languages.

How we built it

Frontend — React + Vite + Tailwind. 3D scene rendering with Three.js, and a map layer combining react-map-gl, maplibre-gl, and deck.gl for a UV heatmap overlay. Routing via react-router-dom.

Agent backend (FastAPI on Hugging Face Spaces) — wraps the skin classifier and the Gemini call. The server enforces a normalized schema on the model's output so the frontend never has to handle malformed fields. In pseudocode the pipeline is:

$$\text{score}{i} = f{\text{classifier}}(\text{image}), \quad \text{context} = g(\text{form}, \text{scores}_{1..k})$$

$$\text{result} = \text{normalize}\big(\text{Gemini}(\text{system_prompt}, \text{image}, \text{context})\big)$$

Data backend (Express on Render) — serves CalCOFI-based environmental context keyed by ZIP code and date, plus a batch translation endpoint that supports Spanish, French, Italian, Korean, Chinese, Arabic, Somali, Sesotho, Portuguese, and Sango.

Deployment topology — the production site lives on Vercel. Its vercel.json rewrites split traffic by path: /api/analyze* goes to the Hugging Face Space (GPU-adjacent, model-serving), and all other /api/* calls go to the Render Express service (data + i18n). This let each backend live where it was cheapest and most natural, with the frontend treating them as one origin.

What we learned

  • Structured output is the real unlock. Our first version asked Gemini for a narrative paragraph; the UI looked impressive and was useless. Forcing a strict JSON schema — with enumerated triage levels, clamped confidences, and bounded list lengths — turned the model from a storyteller into a component we could actually wire into a UI.
  • Environmental data is messy. CalCOFI records, air-quality stations, and user ZIP codes almost never line up in space or time. We ended up building a ranker with three modes — balanced, closest, and recent — so the same query could be tuned depending on whether freshness or proximity mattered more.
  • Deployment shape is a product decision. Splitting the ML backend (HF Space) from the data backend (Render) cost us an afternoon of CORS debugging, but it meant we could iterate on each half without redeploying the other.

Challenges we ran into

  • CORS across three origins. A browser calling Vercel → rewritten to HF Space and Render meant both backends had to agree on allowed origins. We eventually centralized this behind Vercel rewrites so the browser only ever saw one origin, and kept an env-driven allowlist on the FastAPI side as defense in depth.
  • Cold starts. Render's free-tier service sleeps, and the HF Space can spin down as well. The first analysis of the session could take $\sim 30\text{–}60\text{s}$; we added loading states and pre-warming pings to hide most of it.
  • Model drift vs. guardrails. Gemini occasionally returned markdown-wrapped JSON, extra commentary, or missing fields. The backend now strips code fences, parses defensively, clamps numeric fields to $[0, 1]$, validates enum values, and falls back to safe defaults — so a misbehaving model never takes down the UI.
  • Translation without an API key. We needed 10 languages without adding another paid dependency, so the translation service proxies Google's public translate_a endpoint with request deduplication and parallel batching.

What's next

  • Tighter integration of the UV map with the triage result, so the "why" of a red-level flag is visualized, not just described.
  • On-device inference for the skin classifier to cut latency and keep images off the network.
  • A longitudinal mode: let users re-upload the same concern days later and get a "trend" read rather than a one-shot classification.

Built With

Share this project:

Updates