Inspiration

The inspiration for building Housekeep arose from the growing need for a streamlined file management solution. Whilst brainstorming ideas for the hackathon and future startup, we began sharing resources we've complied over the years, such as pdf formatted books, decks, legal contract templates, etc. However, we continuously experienced the frustrating issue of searching for a file that had been labelled incorrectly. And so the idea for Housekeep was born, we envisioned a solution that leveraged AI and similarity search to simplify the process.

What it does

Housekeep is an intuitive tool that scans and embeds sentences within files, which then allows users to locate the content using any keywords and sentences that they can recall from the documents. Our goal is to enable users to focus on more important tasks by eliminating endless scrolling and frustrating searches.

How we built it

Google Drive API Integration: We started by working closely with the Google Drive API, enabling seamless integration with the popular cloud storage platform. This allowed us to access and manipulate files stored on Google Drive.

Text Embedding with OpenAI's Text-Embedding-ADA Model: Recognising the importance of understanding the content within files, we utilised the state-of-the-art text-embedding-ada model from OpenAI. This model enabled us to extract meaningful representations of text, which served as the foundation for our similarity search capabilities.

Storing Vectors in Pinecone: To efficiently handle the indexing and retrieval of file vectors, we leveraged Pinecone. By storing the file vectors in Pinecone, we could perform fast and accurate similarity queries.

Embedding User Search Queries and Running Similarity Queries: To facilitate user searches, we embedded their search queries using the same text-embedding-ada model. We then executed similarity queries on the file vectors stored in Pinecone, allowing us to identify the most relevant files based on the user's query.

Django Web Server Integration: To provide a user-friendly interface and seamless interaction with Housekeep, we implemented the core logic within Django web server endpoints. This ensured smooth integration between the user interface and the backend functionality.

Google Drive Plugin: To enhance the user experience and simplify file interactions, we developed a quick and convenient Google Drive plugin. This plugin streamlined the process of accessing and managing files within Housekeep.

Challenges we ran into

Extracting Information from PDF and Word: Extracting clean and meaningful data from PDF and Word files proved to be a significant challenge. The inherent complexities and variations in file structures often resulted in messy data extraction. We dedicated substantial time and effort to refine our extraction algorithms and ensure accurate information retrieval.

Authentication Logic for Google Drive Plugin: Setting up a robust authentication logic for the Google Drive plugin presented its own set of challenges. We opted to use JWT tokens to authenticate user access to Google Drive, ensuring secure and reliable interactions. Implementing this authentication flow required careful consideration and thorough testing to ensure seamless integration.

Google Drive Development: Working with the Google Drive development environment posed its own difficulties. Unlike traditional CSS/HTML web development, we had to navigate the intricacies of the Google JavaScript package. This required a deep understanding of the Google Drive API and its nuances, as well as adapting our development practices to align with the platform's requirements.

Accomplishments that we're proud of

  • Successful Integration of Text Extraction

  • Seamless Google Drive Integration

  • Advanced Search Capabilities

  • User-Friendly Interface

What we learned

Technical Challenges: Developing Housekeep exposed us to various technical challenges, such as extracting clean and meaningful data from complex file formats like PDF and Word. We learnt the importance of adaptability, perseverance, and continuous iteration in overcoming these challenges and refining our algorithms and processes.

Embracing New Technologies: We learned the value of adopting new technologies, notably Pinecone.

API Integration: Integrating with the Google Drive API taught us valuable lessons about working with external services. We gained insights into the intricacies of API authentication, data synchronisation, and handling the limitations and requirements of third-party platforms.

Collaboration and Teamwork: Building Housekeep required strong collaboration and teamwork. We learnt the value of effective communication, task delegation, and leveraging each other's strengths. Through open dialogue and collective problem-solving, we were able to overcome challenges more efficiently and achieve our common goals.

What's next for Housekeep

Our future roadmap is filled with exciting developments and advancements. We are dedicated to expanding platform support, incorporating advanced AI capabilities, and introducing collaborative features to enhance user productivity. We aim to prioritise continuous improvements based on user feedback and are committed to upholding data security and privacy standards. Our vision is to evolve Housekeep into the ultimate search tool, empowering individuals and organizations to effortlessly organize, discover, and collaborate on not only their digital files but their data and content as a whole.

Share this project:

Updates