Inspiration
Grokipedia currently has no images and it's a lot more difficult to read and digest an article that is purely text compared to one that uses visuals and shows textual information in a more visual way (i.e., through widgets). We wanted to see these features in Grokipedia, so we made them!
What it does
- Brings content enhancements to Grokipedia articles
- Embed relevant, contextually-aware images & widgets to articles to make them more visually appealing and digestible
- Smart captions created by Grok for each image
How we built it
We built a Python pipeline that:
- Reads the article and understands its structure
- Uses Grok to figure out where images & widgets would be most helpful
- Searches Google for relevant candidate images
- Uses Grok's vision to actually look at the images and pick the best one
- Generates accurate captions based on what's actually in the image
- Injects everything back into the article HTML
Refer to the Excalidraw link for a visual of the technical pipeline!
Challenges we ran into
- Trying to limit the number of tokens we feed to Grok while maintaining full context of the article
- Figuring out how to allow Grok to insert content into specific parts of the article in a token-efficient manner (i.e., without Grok re-outputting the entire article with content inserted)
- Dealing with issues related to image search integration (e.g,. edge cases where the image can't be retrieved)
Accomplishments that we're proud of
- Utilizing Grok and various evaluation/validation strategies to output the best possible image/content to the user
- The final article is a lot easier to digest, and it turned out really well
- Clever strategy to convert articles to token-efficient representation and back, and giving Grok a precise way to insert content efficiently
What we learned
- How to build a context-aware content embedding pipeline in a token-efficient way
- Interacting with Grok API and various enterprise tooling offered by xAI
- How to use the Google Custom Search API for fast image retrieval
What's next for Multimodal Grokipedia Enricher
- Optimize code efficiency and quality for scalability (e.g., adding parallelization & batch requests)
- Backfilling all Grokipedia articles with images and widgets
- Adding test suites, such as unit tests to ensure robustness and factuality
Built With
- bs4
- css
- google-custom-search
- grok
- grok-cli
- html
- python
Log in or sign up for Devpost to join the conversation.