Inspiration
In today’s rapidly evolving digital landscape, the internet holds vast amounts of information—yet it remains fragile. Websites disappear, change, or restrict access, leaving users with incomplete records and lost knowledge. Our vision was to empower everyday users to take control of their browsing, preserving the web's essence through decentralized, semantic organization.
We imagined a future where every click contributes to a resilient, user-owned knowledge base—where communities safeguard their data, free from censorship and manipulation. Samba Crawler was born from this vision, transforming browsing into a powerful act of preservation and empowerment.
What it does
How we built it
Building Samba Crawler taught us that innovation often lies in collaboration between cutting-edge AI and the everyday actions of users. We gained insights into:
Semantic Web Technologies: Structuring data for maximum utility. User Behavior: Designing seamless integrations to make complex tasks effortless for users. AI Optimization: Leveraging models like SambaNova Cloud Llama 405B to handle massive data quickly and efficiently. Most importantly, we learned the power of community in preserving knowledge and protecting access to information.
Samba Crawler is implemented as a web browser extension agent that scrapes and processes websites users visit. Here's how it works:
Semantic Schema Generation: Using Samba Nova AI, we create website-specific schemas that deconstruct pages into meaningful semantic units. Data Collection: As users browse, the extension collects data and organizes it according to the schema. Offline save: then saves the collected data offline and online
AI chat for stored data: user can chat with save data for particular information, Decentralized, reliable backups 10x faster data processing Censorship-free access
Manual Editing: Users can validate and refine data through our Schema Editor for even greater accuracy. The integration of SambaNova's Lightning Fast Inference Speeds supercharges the AI, making schema generation and validation nearly instantaneous. Our stack includes:
Front-end: React.js, TypeScript, Ant Design Back-end: Node.js, Nest.js, Rust AI: SambaNova Cloud,
Challenges we ran into
Innovation is never without hurdles.
AI Token Limits: Parsing whole websites challenged token constraints, but SambaNova’s high-speed inference helped mitigate delays.
UX Design: Simplifying advanced features into user-friendly workflows took several iterations and valuable feedback.
Accomplishments that we're proud of
It worked
Reposted - https://x.com/SambaNovaAI/status/1841153603920167260
Comment - https://x.com/WisdomN69527/status/1856604406893916177
Quote Tweeted - https://x.com/WisdomN69527/status/1856604172449120440
What we learned
Samba Cloud AI s Speed
What's next for Samba Crawler
The journey doesn’t end here. We're working to:
Enhance user experience with real-time notifications and seamless manual editing. Introduce data validation features to ensure the highest accuracy. Community sharing feature Expand into personalized AI agents capable of assisting users with the structured data they’ve collected. With SambaNova’s cutting-edge technology, we’re excited to push the boundaries of what’s possible—making browsing a transformative act for individuals and communities alike.
Log in or sign up for Devpost to join the conversation.