Inspiration

Our project was born from the need for a more efficient networking platform for bio-researchers. We observed that while many researchers struggle to find collaborators with similar interests and complementary expertise, existing platforms fail to harness the power of automated data extraction and recommendation. This inspired us to create a tool that not only makes it easy to share and access research profiles but also intelligently recommends potential collaborators based on deep analysis of research data.

What it does

Our platform enables researchers to seamlessly create profiles by linking their Google Scholar data. It automatically extracts key information like paper abstracts, experimental equipment, and reagents using state-of-the-art APIs and LLM-powered parsers. The system then stores this data in both a relational database and a vector database for efficient similarity searches, allowing users to receive tailored recommendations and connect with like-minded researchers.

How we built it

We built the platform using a modern tech stack that balances simplicity and power. The front-end is developed in React to deliver a responsive and intuitive user experience. The back-end is powered by FastAPI, designed to handle asynchronous operations such as API calls to Google Scholar and Sci-Hub, data processing via PyPDF2, and integration with LLM parsers. PostgreSQL is used for structured data storage, while a vector database like Pincone supports fast similarity searches for our recommendation engine.

Challenges we ran into

During development, we encountered several challenges. Managing API rate limits and ensuring reliable data extraction from diverse sources like Google Scholar and Sci-Hub required robust error handling and retry mechanisms. Integrating asynchronous workflows to handle long-running processes such as PDF parsing and embedding generation was complex. Additionally, maintaining data consistency between our relational and vector databases while enabling effective similarity searching pushed us to refine our data pipelines.

Accomplishments that we're proud of

We are proud of our ability to quickly prototype a robust platform that integrates multiple cutting-edge technologies. Our system successfully automates the tedious process of data extraction and analysis, delivering accurate recommendations based on nuanced research similarities. The cohesive integration of various services—from Google Scholar data fetching to LLM-powered content parsing—has resulted in a platform that truly enhances research networking.

What we learned

This project taught us valuable lessons in agile development, asynchronous programming, and data integration. We learned how to efficiently manage multiple APIs and build robust data pipelines to handle complex tasks. Moreover, the process deepened our understanding of both relational and vector databases, and the importance of designing systems that can scale. Working collaboratively under a time constraint also highlighted the significance of clear communication and strategic planning.

Built With

Share this project:

Updates