Inspiration

In today’s rapidly evolving digital landscape, the internet holds vast amounts of information—yet it remains fragile. Websites disappear, change, or restrict access, leaving users with incomplete records and lost knowledge. Our vision was to empower everyday users to take control of their browsing, preserving the web's essence through decentralized, semantic organization.

We imagined a future where every click contributes to a resilient, user-owned knowledge base—where communities safeguard their data, free from censorship and manipulation. Samba Crawler was born from this vision, transforming browsing into a powerful act of preservation and empowerment.

What it does

How we built it

Building Samba Crawler taught us that innovation often lies in collaboration between cutting-edge AI and the everyday actions of users. We gained insights into:

Semantic Web Technologies: Structuring data for maximum utility. User Behavior: Designing seamless integrations to make complex tasks effortless for users. AI Optimization: Leveraging models like SambaNova Cloud Llama 405B to handle massive data quickly and efficiently. Most importantly, we learned the power of community in preserving knowledge and protecting access to information.

Samba Crawler is implemented as a web browser extension agent that scrapes and processes websites users visit. Here's how it works:

Semantic Schema Generation: Using Samba Nova AI, we create website-specific schemas that deconstruct pages into meaningful semantic units. Data Collection: As users browse, the extension collects data and organizes it according to the schema. Offline save: then saves the collected data offline and online

AI chat for stored data: user can chat with save data for particular information, Decentralized, reliable backups 10x faster data processing Censorship-free access

Manual Editing: Users can validate and refine data through our Schema Editor for even greater accuracy. The integration of SambaNova's Lightning Fast Inference Speeds supercharges the AI, making schema generation and validation nearly instantaneous. Our stack includes:

Front-end: React.js, TypeScript, Ant Design Back-end: Node.js, Nest.js, Rust AI: SambaNova Cloud,

Challenges we ran into

Innovation is never without hurdles.

AI Token Limits: Parsing whole websites challenged token constraints, but SambaNova’s high-speed inference helped mitigate delays.

UX Design: Simplifying advanced features into user-friendly workflows took several iterations and valuable feedback.

Accomplishments that we're proud of

It worked

Reposted - https://x.com/SambaNovaAI/status/1841153603920167260

Comment - https://x.com/WisdomN69527/status/1856604406893916177

Quote Tweeted - https://x.com/WisdomN69527/status/1856604172449120440

What we learned

Samba Cloud AI s Speed

What's next for Samba Crawler

The journey doesn’t end here. We're working to:

Enhance user experience with real-time notifications and seamless manual editing. Introduce data validation features to ensure the highest accuracy. Community sharing feature Expand into personalized AI agents capable of assisting users with the structured data they’ve collected. With SambaNova’s cutting-edge technology, we’re excited to push the boundaries of what’s possible—making browsing a transformative act for individuals and communities alike.

Built With

Share this project:

Updates