Inspiration

Grokipedia currently has no images and it's a lot more difficult to read and digest an article that is purely text compared to one that uses visuals and shows textual information in a more visual way (i.e., through widgets). We wanted to see these features in Grokipedia, so we made them!

What it does

  • Brings content enhancements to Grokipedia articles
  • Embed relevant, contextually-aware images & widgets to articles to make them more visually appealing and digestible
  • Smart captions created by Grok for each image

How we built it

We built a Python pipeline that:

  • Reads the article and understands its structure
  • Uses Grok to figure out where images & widgets would be most helpful
  • Searches Google for relevant candidate images
  • Uses Grok's vision to actually look at the images and pick the best one
  • Generates accurate captions based on what's actually in the image
  • Injects everything back into the article HTML

Refer to the Excalidraw link for a visual of the technical pipeline!

Challenges we ran into

  • Trying to limit the number of tokens we feed to Grok while maintaining full context of the article
  • Figuring out how to allow Grok to insert content into specific parts of the article in a token-efficient manner (i.e., without Grok re-outputting the entire article with content inserted)
  • Dealing with issues related to image search integration (e.g,. edge cases where the image can't be retrieved)

Accomplishments that we're proud of

  • Utilizing Grok and various evaluation/validation strategies to output the best possible image/content to the user
  • The final article is a lot easier to digest, and it turned out really well
  • Clever strategy to convert articles to token-efficient representation and back, and giving Grok a precise way to insert content efficiently

What we learned

  • How to build a context-aware content embedding pipeline in a token-efficient way
  • Interacting with Grok API and various enterprise tooling offered by xAI
  • How to use the Google Custom Search API for fast image retrieval

What's next for Multimodal Grokipedia Enricher

  • Optimize code efficiency and quality for scalability (e.g., adding parallelization & batch requests)
  • Backfilling all Grokipedia articles with images and widgets
  • Adding test suites, such as unit tests to ensure robustness and factuality

Built With

Share this project:

Updates