Inspiration

The internet is full of valuable information, but extracting and processing that data can often be a challenge. Parsing and scraping this data efficiently requires specialized tools. We wanted to develop a system that not only makes it easier to scrape data from websites but also leverages the power of the cloud to handle these tasks at scale. By providing a UI interface, users can easily interact with our cloud scrapers, making the process more accessible and manageable.

What it does

Scavenger is a self-hostable web interface designed to launch cloud scrapers. Users can interact with the scrapers through a WebSocket connection, allowing for real-time feedback. The platform simplifies the process of gathering useful web data, providing a convenient and efficient way to collect and process information from multiple sources on the web.

How we built it

The web interface is powered by Go, which was chosen for its performance, simplicity, and scalability. We utilized Go's HTML templating system to render dynamic content and manage user interactions. The backend handles scraper management, and WebSockets are used to facilitate communication between the UI and cloud services in real-time. The data is stored in MongoDB, allowing for scalable and flexible storage. Our cloud scrapers are designed to be modular, making it easy to scale the system as needed.

Challenges we ran into

One of the primary challenges we faced was handling cloud deployments. Ensuring our app would run smoothly in a cloud environment proved to be difficult. Additionally, since some team members were working with new technologies, like Go and MongoDB, there was a learning curve that we had to overcome. Debugging and optimizing the performance of cloud-based scrapers added another layer of complexity to the project.

Accomplishments that we're proud of

For many of our group members, this was our first experience with Go. Despite the challenges, we were able to navigate the learning curve and implement core features successfully. We were impressed by Go's flexibility and performance, which made it an excellent choice for handling our web scraping needs. We also managed to integrate MongoDB as our database, providing efficient and scalable storage for the data gathered from web scraping tasks.

What we learned

Throughout the development process, we gained valuable experience working with Go and MongoDB. We learned how Go's simplicity and efficiency allowed us to handle backend logic with minimal overhead. MongoDB’s flexibility helped us organize and store large amounts of data without needing a rigid schema. The project also gave us insights into cloud deployment strategies, as we worked with services that required scaling and managing resources dynamically. Each team member came away with a deeper understanding of full-stack development and the integration of cloud services.

What's next for Scavenger

Looking ahead, we plan to integrate Scavenger with additional cloud providers to increase flexibility and scalability. We aim to provide users with a seamless experience regardless of the cloud service they prefer to use. Additionally, we want to improve the user interface, making it even more intuitive and powerful for both novice and experienced users.

Share this project:

Updates