Inspiration
We come from a background of doing log analysis for our cybersecurity club, and once we saw the news articles that were scraped by Primer.AI, we knew what we wanted to do. Leveraging our familiarity with sentiment analysis, we aimed to align this technology with national security interests, focusing on getting trends of news articles in different countries.
What it does
Senti.net scrapes news articles from around the world, parses them into key words and phrases, and tracks those words based on the origin country of each piece of news.
How we built it
There are three main parts to senti.net. We first scrape news data from various source countries and in many different languages. This was done with the usage of the GDELT API, along with a translation service. This scraper was written in Python and outputs csv files containing articles and article metadata. We then have the backend, which is constructed in R with the front end directly generated and deployed through ShinyApps. We generate a sentiment analysis based on the bing dictionary. We constructed a data pipeline that collects the sentiment of each word in the dataset while tracking their news source and origin country. Finally, we display the outputs in a web application created with Shiny.
Challenges we ran into
During the web scraping process, it was challenging to find a generalized method to scrape the complete article and title. However, by using the GDELT API, we were able to collect the links and several pieces of metadata for each article. Then, we could easily scrape the articles from their respective news sites.
During the computational process, turning the data into a visual format, there was more issues in regards to cleaning the data set and creating a standardized format to which our processes could universally digest the information. Some tools and libraries were learned on the fly in order to make this project happen. Some limitations were presented in the languages and tools chosen to make this project, something to think about for future iterations.
Accomplishments that we're proud of
We take pride in successfully creating a comprehensive system capable of aggregating, analyzing, and visualizing sentiment data from diverse global news sources. Our platform's integration of sentiment analysis into the realm of national security underscores the practical application of advanced technologies in safeguarding geopolitical interests.
What we learned
The development of senti.net provided us with valuable insights into handling large-scale data sources, implementing sentiment analysis algorithms, and designing intuitive user interfaces. We also deepened our understanding of language processing nuances, particularly in multilingual contexts.
What's next for senti.net
Moving forward, we aim to enhance the sophistication of our sentiment analysis tool by implementing additional trend analysis, such as being able to find spikes in certain keyword trends. We are working on caching sentiment queries to allow for smoother user experience, as well as additional country and even language support. Additionally, we plan to expand the platform's capabilities by integrating other user generated content, such as through social media, into our data.
Also, if this manages to be successful we are hoping to acquire the "senti.net" domain. Someone already owns it and we aren't financially able to purchase it.
NOTE OUR DEMO IS SLOW AT THE MOMENT BECAUSE WE ARE USING A FREE HOSTING SERVICE WITH TONS OF DATA, IT MAY TAKE SOME TIME FOR THE LOADING TO BE FINISHED
Log in or sign up for Devpost to join the conversation.