In many research labs, a surprising amount of time is spent on tasks that don’t require scientific insight at all. Collecting gene metadata, copying values from web portals, reformatting tables, and keeping datasets up to date are often assigned to undergrad research assistants. I’ve been that assistant.

During one project, I was asked to extract gene sequence information from a website with a broken API. I built a Selenium-based UI scraper to get the job done. It worked, until a minor UI update broke the entire pipeline. Buttons moved, element IDs changed, and the scraper silently failed. The science hadn’t changed, but the tooling collapsed.

SLC Structure Explorer is inspired by that experience and explores a better approach.

Instead of hardcoding selectors or relying on fragile DOM assumptions, this project uses a Computer-Using Agent (CUA) that interacts with scientific websites the way a human researcher would. The agent navigates the RCSB Protein Data Bank, searches for specific SLC genes, identifies the most relevant protein structures, and extracts core metadata such as PDB ID, experimental method, resolution, organism, and release date. It then writes this information directly to a structured CSV file using terminal commands.

Because the agent reasons about what it sees on the screen rather than targeting specific UI elements, it remains robust to layout changes, renamed buttons, or reordered pages. The workflow does not break just because a website is redesigned.

To demonstrate scale without unnecessary compute cost, the system augments real scraped entries with realistic synthetic structure records, simulating what a fully automated lab assistant could produce over time. The resulting dataset is automatically visualized in a Streamlit dashboard, allowing researchers to filter by gene and method, track structure counts, and explore resolution trends over time.

This project reframes computer-using agents not as chatbots, but as durable research assistants. The goal is not to replace scientific thinking, but to eliminate the repetitive, error-prone data collection work that slows down discovery and burns out junior researchers.

Built With

Share this project:

Updates