Inspiration

The web is rich with information, but much of it remains locked in intricate layouts and complex visuals that are challenging to extract. Most conventional approaches to data extraction are limited to handling individual page elements, which often leaves valuable data untouched. We wanted to create a system that could intelligently look at entire websites—analyzing their visual and textual components together to derive comprehensive insights. This vision led us to build Webstral AI, a system designed to redefine data extraction from the ground up.

What it does

Webstral AI is a next-gen agentic system that leverages advanced visual and reasoning models to extract and structure data from entire websites holistically. Instead of just focusing on tables or individual page components, it treats the whole page as a visual entity, understanding the relationship between the layout, text, and images.

Powered by Pixtral 12b for visual analysis and mistral-large-latest for everything else, Webstral AI uses a combination of cutting-edge technologies to simultaneously interpret visual and textual data. It is guided by an algorithmic reasoning layer that develops an effective approach to extract the data, no matter how complex the webpage is. Users only need to issue a single prompt to initiate this entire process, making data extraction from intricate websites easy and accessible.

How we built it

The foundation of Webstral AI lies in its ability to analyze entire websites visually. For the visual component, we integrated Pixtral 12b, a large visual analysis model capable of interpreting the overall structure of a webpage—capturing its layout, images, graphs, and any visual relationships. This model processes the entire screenshot of a webpage, enabling it to understand not only individual components but also their spatial relationships and visual context.

The algorithmic guidance and reasoning are provided by mistral-large-latest, a powerful model that generates strategies for solving each extraction problem. It guides the visual analysis model to determine which parts of the page are relevant, and what data needs to be extracted.

Once the approach is defined, Pixtral 12b and mistral-large-latest work together to interpret and extract both visual and textual elements, ensuring that structured data is obtained without losing any critical information.

Challenges we ran into

One major challenge was dealing with the complexity of websites where multiple "images" or graphs are embedded within a single larger image. For instance, when analyzing an entire webpage visually, elements like mixed charts, infographics, and overlays can create ambiguity for the visual model. Training Pixtral 12b to handle such situations was a time-consuming process, and it still faces limitations when parsing such intricate visual arrangements.

Another challenge we faced was the size of certain websites. Websites with large or high-resolution visuals can exceed the processing capabilities of the current model, limiting the ability to analyze the entire page effectively. We are working on optimizing the model to handle larger input sizes without compromising on the accuracy of extraction.

Accomplishments that we're proud of

We are proud of creating a system that looks at an entire webpage as a visual and informational entity, rather than simply extracting isolated data points. This holistic approach allows for deeper insights and makes Webstral AI more adaptable to different webpage formats.

Another accomplishment is the effective integration of visual and textual analysis through the collaboration of Pixtral 12b and mistral-large-latest. By merging these models, we’ve achieved simultaneous data extraction that preserves the relationships between different elements, resulting in more accurate and comprehensive data retrieval.

What we learned

Developing Webstral AI taught us the value of combining visual analytics with intelligent reasoning. By looking at entire websites and analyzing their components collectively, we learned that it is possible to extract data with far greater accuracy and context than with traditional methods.

We also learned about the limitations of current models when dealing with mixed content, such as multiple embedded images and large-scale visuals. This insight is helping us refine our models to improve their ability to handle more complex visual scenarios.

What's next for Webstral AI: Intelligent Data Extraction Redefined

Looking forward, we aim to enhance Webstral AI by fine-tuning our models for even better algorithmic guidance and visual interpretation. For the algorithmic guidance provided by mistral-large-latest, we plan to fine-tune the model specifically on complex extraction scenarios to further improve its accuracy and adaptability.

The visual component, powered by Pixtral 12b, also requires improvements to handle scenarios with multiple embedded images and graphs within a larger visual context. Addressing these challenges will be key to ensuring the system can parse even the most intricate page layouts accurately.

Another area for improvement is expanding the system’s capability to process larger website sizes more effectively. Optimizing both the visual analysis and data extraction pipeline will be crucial to overcoming current size limitations and delivering high-quality extraction results regardless of the page scale.

With these advancements, Webstral AI will continue to push the boundaries of what is possible in data extraction—making it more intelligent, adaptable, and capable of transforming complex web content into structured, usable information.

Built With

Share this project:

Updates