NASA Bioscience Knowledge Engine
đź’ˇ Inspiration
NASA releases a vast amount of bioscience research related to spaceflight, human health, and life sciences. While this data is publicly available, it is often buried inside long, highly technical research papers that are difficult to search and extract insights from efficiently.
We were inspired to build an intelligent, search-engine–style system that can retrieve, understand, and summarize NASA bioscience research, making high-quality scientific knowledge more accessible without compromising accuracy or citations.
🤖 What it does
NASA Bioscience Knowledge Engine is an AI-powered research search engine that answers natural-language questions using NASA bioscience papers.
Instead of returning a list of documents, the system:
- Retrieves relevant research papers.
- Reranks them for relevance.
- Generates concise, citation-backed answers.
- Provides extractive evidence when generative models are unavailable.
The system is designed to be consumed via an API or command-line interface, focusing entirely on search quality, accuracy, and explainability.
⚙️ How we built it
- Indexing: Indexed NASA bioscience research papers using FAISS and mpnet embeddings.
- Retrieval: Implemented neural reranking to improve search relevance.
- RAG Pipeline: Built a Retrieval-Augmented Generation (RAG) pipeline for grounded answers.
- Safety: Integrated LLMs with extractive fallbacks to prevent hallucinations.
- Architecture: Designed a modular Python-based backend usable as a research search engine.
- Hardware: Added GPU acceleration with CPU fallback for wider accessibility.
đź§ Challenges we ran into
- Ensuring non-hallucinated, scientifically grounded answers.
- Balancing abstractive summaries with verifiable source evidence.
- Optimizing retrieval accuracy for long and complex scientific documents.
- Designing a system that works efficiently without requiring a user interface.
🏅 Accomplishments that we're proud of
- Built a fully functional RAG-based research search engine.
- Achieved 100% citation-backed responses.
- Successfully combined retrieval, reranking, and generation into a single pipeline.
- Created a system that can operate via API and CLI, independent of any frontend.
- Prioritized trust, transparency, and explainability in scientific AI.
📚 What we learned
- How to design and deploy a search-engine–style RAG system.
- The trade-offs between search accuracy, latency, and compute cost.
- The importance of citations and extractive grounding in research tools.
- Building AI systems that focus on reliability over flashy interfaces.
🚀 What's next for NASA-Bioscience-Knowledge-Engine
- Expand indexing to include additional NASA bioscience datasets.
- Improve reranking and retrieval with domain-specific fine-tuning.
- Add evaluation metrics for search quality and answer faithfulness.
- Integrate the engine with external research tools and platforms.
- Develop optional interfaces (web or notebook-based) for broader adoption.

Log in or sign up for Devpost to join the conversation.