Locus — gene-editing target finder for crop stress

Rice x Salinity

Inspiration

The crops that smallholder farmers in the world's most food-insecure regions depend on — sorghum, millet, cowpea, cassava — are the ones agricultural science has invested in least. The major commercial crops get the research dollars; the staples feeding the Sahel, the Horn of Africa, and South Asia's drylands get a fraction of the attention, even as the droughts and saline soils they face intensify. Engineering these crops to survive stress means finding the right protein to edit — but a researcher faces thousands of candidate proteins, has to check whether each has a reliable 3D structure, and must read scattered, inconsistent literature to judge whether editing it would actually help. That triage is slow and expensive, which is exactly why under-resourced crops stay under-researched.

I built Locus to collapse that triage from weeks into minutes — and to be honest about the strength of the evidence behind every suggestion, so a researcher without a large lab behind them can trust what they're looking at instead of chasing dead ends.

What it does

Given a crop and a climate stress (e.g. rice × salinity), Locus:

Discovers candidate proteins live from UniProt, filtered by the stress's Gene Ontology term and ranked by curation quality.
Checks structure by pulling each protein's AlphaFold model and reporting its pLDDT confidence, shown as an interactive 3D structure colored by per-residue reliability.
Grades the evidence by searching the real literature (Europe PMC) and using Gemini to tag each paper by evidence type — causal, mechanism, homolog, correlation, review — and, critically, flagging papers that don't actually study the protein as IRRELEVANT rather than counting them.
Ranks candidates in a sortable dashboard, by structural confidence or evidence strength, with every citation clickable for verification.

The defining feature is honesty. A protein that's well-folded but unstudied is graded UNGRADED and labeled an open target — not dressed up as a sure thing. The agent shows its work and won't be fooled by topically-adjacent literature.

How I built it

Gemini 2.5 Flash on Vertex AI for agent reasoning and structured evidence grading. (Gemini 3 was not available in my project region; 2.5 Flash handled grading and tool orchestration well.)
Google Agent Development Kit (ADK) for the agent and its six purpose-built tools.
MongoDB Atlas as the agent's growing research memory: discovered proteins are stored as rich documents, embedded with Vertex AI embeddings, and made searchable by meaning via Atlas Vector Search — so a conceptual query like "proteins for surviving high-salt soil" surfaces the right candidates without keyword matches.
MongoDB's MCP server, integrated over Streamable HTTP, gives the agent native access to query the research database directly — listing collections, counting and filtering documents in natural language — complementing the purpose-built research tools.
AlphaFold and UniProt public APIs for structure and discovery; Europe PMC for literature.
A FastAPI backend serving a two-panel dashboard with a 3Dmol.js structure viewer, deployed on Cloud Run.

Challenges I ran into

Integrating MongoDB's MCP server on Windows was the hardest part — the default stdio transport hit a broken-pipe bug in the async subprocess layer. I solved it by running the MCP server as a standalone HTTP service and connecting the agent over Streamable HTTP, which sidestepped the platform issue entirely.

Grading also taught me to make the agent conservative. Early versions over-credited tangential papers, so I built explicit evidence-type tagging and an IRRELEVANT flag — turning "looks supported" into "is actually supported."

Accomplishments that I'm proud of

An agent that grades its own confidence on two independent axes — structural and evidential — and is honest when the evidence isn't there. The IRRELEVANT-tagging behavior, where it reads six on-topic papers and correctly discounts all of them, is the thing I'm proudest of: it's the difference between a tool a scientist can trust and one they can't.

What I learned

Building zero-to-working across Google Cloud, ADK, and MongoDB Atlas in a single sprint; that honest "I don't know" answers are a feature, not a gap; and that real agentic value comes from chaining real data sources with judgment, not from a single clever prompt.

What's next for Locus

Field-trial validation flags, deeper modifiable-vs-conserved region analysis for editing strategy, and growing the research corpus so vector search discriminates across a larger protein space.

Built With

3dmol.js
alphafold
atlas-vector-search
europe-pmc
fastapi
gemini
google-adk
google-cloud
mcp
mongodb
mongodb-atlas
python
uniprot
vertex-ai

Updates

Locke082 Bracken started this project — Jun 11, 2026 04:48 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.