My AI Prescription News Bot

Uses MongoDB Atlas Vector Search paired with Google Cloud Vertex AI enabling users to get answers about diseases and prescription drugs from pharma news articles and Google Search.

Choose from a list of over 400 diseases to ask questions about, either your own or top 15 common questions. Upload a photo of a prescription label (not kept by the app) to get details of your prescription and to ask questions about it.

With people living longer and an aging population, there's increased thirst for knowledge about diseases and treatments which this app is designed to quench.

The Pharma News Dataset

I scraped over 2700 public news articles from the top pharma news websites PharmaVoice (https://www.pharmavoice.com) and BioPharma Dive (https://www.biopharmadive.com), published from Jan 2001 to May 2025. Then I extracted the text from this more than 4 years' worth of pharma news and chunked each article's text using Unstructured (https://unstructured.io) into over 37,000 chunks. This worked out to about 14 chunks per article using Unstructured's strategy of partitioning each to its semantically meaningful elements.

Using Vertex AI's latest stable Text Embedding 005 model (https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions#latest-stable), I embedded each of the text chunks to a vector with 768 dimensions, the model's maximum (https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api).

To store the resulting 37,000 plus vectors for RAG, I used a MongoDB Atlas Vector Search index of corresponding 768 dimensions: https://take.ms/jv4vt, https://take.ms/fiKY1. I inserted each vector into this index along with metadata: article URL, chunk number, total chunks, published at date, title, and companies. To support filtering a vector search I specified key metadata fields as filters in the index: https://take.ms/gRWQu. The text chunks from the articles were inserted into the news_articles collection's text field: https://take.ms/Fdo7N.

Answering Questions Using RAG and Grounding With Google Search

When you ask a question about a disease in the app, your question is embedded using the same Vertex AI Text Embedding 005 model. The resulting vector is then sent to MongoDB Atlas Vector Search (https://www.mongodb.com/products/platform/atlas-vector-search) to find the top 10 (configurable) most semantically similar vectors in the index. Atlas returns each vector with its metadata and chunk of text from the original news article. You can also optionally choose which years between 2021 and 2025 you want Atlas to filter chunk metadata by.

The app then sends these top 10 chunks of text along with your question to the Vertex AI LLM Gemini Flash 2.0 (https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-0-flash). Gemini Flash is prompted to answer your question using only the info in the text chunks to ground the answer in reliable reporting from top pharma news sites noted earlier. Gemini Flash is also prompted to include the sources in its answer from the URL and published at metadata fields. This lets you follow up on the answer by clicking the source links to load the original public and free articles on PharmaVoice and BioPharma Dive.

In the event Gemini Flash doesn't find the answer from RAG with the chunks from the vector search, the model is then prompted to use its Grounding with Google Search support (https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/grounding-with-google-search) to answer your question. I opted for Google Search to ensure the latest info available on the Internet is used, since otherwise the knowledge cutoff date for Gemini Flash 2.0 is a year ago, June 2024.

To avoid abuse of the Google Search feature, Gemini Flash is prompted to only answer questions related to pharma topics such as diseases, drugs, therapies, treatments, research, medicines, etc.

To guide you in the types of questions the app can answer, a list of 15 top questions is presented you can optionally choose from:

  • What are the latest treatment advancements for [Disease]?
  • Are there any newly approved medications for [Disease]?
  • What promising new therapies for [Disease] are being studied in clinical trials?
  • What are the results of recent clinical trials for [Disease] treatments?
  • Are there different types of new treatments being developed for [Disease]?
  • What research breakthroughs could lead to new treatments for [Disease]?
  • What new treatments for [Disease] might become available in the near future?
  • What are the latest updates on drug approvals for [Disease]?
  • What aspects of [Disease] are current treatments still trying to improve?
  • What innovative approaches are being explored for treating [Disease]?
  • Are there new developments for managing symptoms or side effects related to [Disease] treatments?
  • What progress is being made on treatments for different stages or types of [Disease]?
  • Are new combination therapies being developed or studied for [Disease]?
  • What are scientists learning about [Disease] that could lead to better therapies?
  • What are the most significant recent updates for treating [Disease]?

Vision/OCR From Prescription Labels

The app also lets you upload a photo of a prescription label. The multimodal ability of Gemini Flash 2.0 is used on the label to read the name of the prescription drug and look up its info to give you a description, instructions, and side effects, noting that you should always follow your doctor's specific directions. You can then ask questions about the drug, which are answered the same way as your questions about diseases, by prioritizing a RAG query using MongoDB Atlas Vector Search on the news articles, followed by Grounding with Google Search if needed.

For safe usage, if the image uploaded isn't a prescription or medicine label, the app will show an error. Also for privacy, in all cases any image uploaded isn't kept; it's deleted after it's checked for a prescription drug name.

Downloading the Answers

For your convenience, you can download all your questions and answers and prescription details shown in the app. They're formatted as a web page so you can view them at any time in your browser. Make sure you download before you click Reset because that wipes everything from the app's memory. For privacy, only you can download your questions and answers.

Built With

  • atlas-vector-search
  • docker
  • gcp
  • gcr
  • gemini-2.0-flash
  • google-cloud-run
  • google-text-embedding-005
  • gradio
  • grounding-with-google-search
  • jinja
  • langchain
  • langgraph
  • markdown-to-html
  • mongodb
  • multimodal-llm
  • ocr
  • prompt-engineering
  • python3.12
  • tenacity
  • text-llm
  • unstructured-io
  • vertex-ai
  • vision-llm
  • web-scraping
Share this project:

Updates