Inspiration

The creator was inspired by the Tricorder from Star Trek. It was a multi-function device with the ability to scan, record and analyze it's immediate surrounding.

What it does

This web client is designed to accept a wide range of inputs, including both text and images, which it then uses to gather rich contextual information in the form of knowledge graph triples from a graph database. The client combines this context with the user's prompt for multi-agents reasoning before sending it to the SambaNova API for processing. After receiving the API's output, the client performs additional post-processing before finally rendering the results on the screen for the user to see.

How we built it

The project began with a research and design phase that spanned several days, ultimately leading to a focus on the domain of human nutrition. A thorough review of existing and emerging technologies was conducted to identify the most promising candidates for the upcoming sprint. To enhance the capabilities of large language models (LLMs) and vision language models (VLMs) with retrieval-augmented generation, the creators turned their attention to modeling knowledge graph (KG) data, schema, and queries. This effort resulted in the automated generation of KG triples, which formed the foundation of a global KG designed to support graph-based question answering (QA).

Following the implementation of text-only natural language QA, the creator went on to extend its capabilities by integrating vision-capable models for image captioning and visual question-answering.

Challenges we ran into

  • Limited context length on VLMs
  • Aggressive rate limits on VLMs
  • Modelling knowledge graph data and queries

Accomplishments that we're proud of

  • Getting LLMs output to fit within the context length constraints of VLMs

What we learned

  • Integration of both VLMs and LLMs with semantic caching and graph-based retrieval-augmented generation is a non-trivial endeavor.

What's next for Nutrition Tricorder

  • Recognize toxins and hazardous compounds on ingredient lists and food labels
  • Extending of VLMs to process video input.

Built With

  • llama-index
  • openai
  • python
  • sambanova
  • vllm
Share this project:

Updates