NASA Bioscience Knowledge Engine

💡 Inspiration

NASA releases a vast amount of bioscience research related to spaceflight, human health, and life sciences. While this data is publicly available, it is often buried inside long, highly technical research papers that are difficult to search and extract insights from efficiently.

We were inspired to build an intelligent, search-engine–style system that can retrieve, understand, and summarize NASA bioscience research, making high-quality scientific knowledge more accessible without compromising accuracy or citations.

🤖 What it does

NASA Bioscience Knowledge Engine is an AI-powered research search engine that answers natural-language questions using NASA bioscience papers.

Instead of returning a list of documents, the system:

Retrieves relevant research papers.
Reranks them for relevance.
Generates concise, citation-backed answers.
Provides extractive evidence when generative models are unavailable.

The system is designed to be consumed via an API or command-line interface, focusing entirely on search quality, accuracy, and explainability.

⚙️ How we built it

Indexing: Indexed NASA bioscience research papers using FAISS and mpnet embeddings.
Retrieval: Implemented neural reranking to improve search relevance.
RAG Pipeline: Built a Retrieval-Augmented Generation (RAG) pipeline for grounded answers.
Safety: Integrated LLMs with extractive fallbacks to prevent hallucinations.
Architecture: Designed a modular Python-based backend usable as a research search engine.
Hardware: Added GPU acceleration with CPU fallback for wider accessibility.

🧠 Challenges we ran into

Ensuring non-hallucinated, scientifically grounded answers.
Balancing abstractive summaries with verifiable source evidence.
Optimizing retrieval accuracy for long and complex scientific documents.
Designing a system that works efficiently without requiring a user interface.

🏅 Accomplishments that we're proud of

Built a fully functional RAG-based research search engine.
Achieved 100% citation-backed responses.
Successfully combined retrieval, reranking, and generation into a single pipeline.
Created a system that can operate via API and CLI, independent of any frontend.
Prioritized trust, transparency, and explainability in scientific AI.