Inspiration
The inspiration for PulsarWave came from the sheer volume of data that is generated daily, particularly from digital news sources. We recognized the potential within this wealth of information, but also understood the challenge of extracting meaningful and actionable insights from it. The idea was to create a tool that goes beyond the superficial tracking of what's "trending," to provide a more in-depth understanding of significant events and developments in various fields. We aimed to build a system that could utilize the power of machine learning to sift through and prioritize the relevance of emerging news, thereby helping users make informed decisions.
What it does
PulsarWave is an Automated Trend Monitoring Radar (TMR) that uses advanced Machine Learning algorithms to evaluate and draw insights from a sea of data from news feeds. It leverages Natural Language Processing (NLP) to understand text-based data and identifies and follows emerging trends in digital media. The system prioritizes the importance and relevance of emerging news, focusing on the quality of information rather than its popularity or frequency. PulsarWave can be applied in various fields, such as portfolio management, daily news analysis, public health monitoring, policy making, and disinformation combating, to name just a few.
How we built it
We built PulsarWave on the foundation of Zalando's public Tech Radar, modifying it to incorporate machine learning and natural language processing capabilities. The system uses an array of ML models to parse through data, analyze it, and identify critical trends. The scanning process has been run indefinitely on daily basis (every 6 hours), while the summarization process will be done on weekly basis. We've tested our platforms on cloud services like Amazon Web Services (AWS) and Google Cloud Platform (GCP) for scalability and versatility. We also used the 2022 Wikipedia dataset to enrich the knowledge base of the model beyond its original cutoff date, and adopted a technology learned from Langchain Retrieval Augmentation to achieve this.
Challenges we ran into
One of the foremost challenges we grappled with while building PulsarWave was achieving optimal performance, which included minimizing AI hallucination, enhancing factual accuracy, and boosting creativity. To address this, we adopted a multi-faceted approach:
Prompt Engineering: We spent a considerable amount of effort crafting precise instructions to guide the AI's output. This was especially useful for our business users, who employed PulsarWave for tasks such as news summarization. Precisely engineered prompts guided the AI to generate comprehensive and relevant summaries of news articles.
Temperature Setting: Balancing between encouraging creativity and ensuring fact-checking was a major challenge. Adjusting the temperature setting played a critical role in this balance. This technique is quite not efficient albeit come with low cost. It has been applied for general users who use AI for casual interactions, such as a Q&A chatbot answering general knowledge questions.
Verification with Multiple AIs: To ensure accuracy and depth, we adopted a strategy of cross-verifying results using various AI engines. This method proved crucial for our users involved in rigorous academic research, such as analyzing legal documents, contracts, and case law.
Verification with Credible Online Sources: Our process includes automated cross-checking of information with reliable, up-to-date online sources via Pinecone vector database, we incorporated the 2022 Wikipedia dataset into the Pinecone vector database. This strategy was found to be particularly useful for our business and academic users.
Fine-Tuning: We took advantage of the power of fine-tuning, using specific data in certain areas to enhance the AI's training and improve its accuracy in those domains. This was especially critical for our highly sensitive policy users, such as those involved in geopolitical risk assessment. The AI was fine-tuned with relevant data, making it adept at analyzing and forecasting risks in international relations. This process will be our next step as we plan to implement LLMs from scratch with supercomputer LANTA in Thailand.
In terms of cost and efficiency, each method has its trade-offs. For instance, while the use of temperature setting might be more cost-effective for general users, for highly sensitive policy users, a combination of multiple strategies was necessary despite the higher cost. These efforts greatly helped us in overcoming the challenges of handling a wide array of user requirements while ensuring optimal performance.
Accomplishments that we're proud of
Navigating the complexities and nuances of implementing and managing a vector database was indeed a steep learning curve. It involved understanding advanced techniques and technologies that were initially challenging and more importantly learning how to appropriately vectorized the dataset. The system's dynamic nature required a deep understanding of how to structure and manage data effectively, and how to leverage this data to enable the AI to perform at its peak potential.
However, overcoming this learning curve has been one of our proudest achievements. Once we were able to anticipate and adapt to the intricacies of the vector database, we discovered the power it holds in extending the capabilities of our AI model. With the ability to continuously update the database with running daily information, we were able to keep the AI's knowledge up-to-date and relevant, vastly improving the value and accuracy of its insights.
The integration of the vector database with our AI model unlocked a new level of potential in the field of trend monitoring and data analysis. Through this accomplishment, we were able to transform PulsarWave into a powerful tool capable of sifting through massive data streams, identifying significant trends, and providing valuable, timely insights.
What we learned
We learned about the intricacies of handling and analyzing vast amounts of data, and the importance of keeping models updated with the most recent information. We also learned about the value of open-source licenses in promoting collaboration and the broad applicability of our tool in various fields.
What's next for PulsarWave
Our vision for the future of PulsarWave is anchored in our commitment to continuous learning and improvement. We have seen the benefits of fine-tuning, and we intend to explore this potential further. We have found that using specific data in certain areas to refine the AI's training significantly improves its accuracy in those domains. This has been particularly important for our highly sensitive policy users, such as those analyzing and forecasting risks in international relations.
Our next step in this direction is to implement large language models (LLMs) from scratch. We're excited about the potential benefits that creating our own tailored LLMs can offer. This will allow us to customize our models to better suit our users' diverse needs. Moreover, we plan to leverage the processing power of LANTA, a supercomputer based in Thailand, to carry out this task. Harnessing such advanced computational power will help us push the boundaries of what our AI can achieve.
At the same time, we will also be focusing on enhancing the overall functionality of PulsarWave and improving its user interface and user experience. We aim to engage in multi modal AI with media like video database, make our current data pipeline more efficient, and further test our existing techniques to find room for improvement. In addition, we plan to develop the PulsarWave to be better capable and more reliable methods for credible and up-to-date phenomena.
Built With
- amazon-web-services
- javascript
- langchain
- openai
- pinecone
- python
- shell
Log in or sign up for Devpost to join the conversation.