https://docs.google.com/presentation/d/1Lx6D1q3_yY2eYLKGxmk-eIaRjx8g09WCdEHxoV6Vg_w/edit#slide=id.p
Inspiration
In the presentation from last night, reliable unbiased primary sources of news media proved themselves to be crucial in understanding real time events both by journalists and ordinary people. Twitter is advertised as "a place to create and share ideas and information" to potentially millions upon millions of users, and is frequently used by regular people at ground zero to disperse live updates about important events happening near them. In PrimaryLarry, we aspire to organize and make useful thousands of distributed tweets a second, by analyzing live streams of tweets from localities of significance. With our proprietary algorithm, we take into consideration various factors such as event proximity, tweet readability and relevance, and content quality to provide useful insights with first-person perspectives.
What it does
- Looks online for new significant events and queries Twitter for relevant information.
- Streams tweets from the location of significance, looking at various quality factors.
- Displays easily understandable, and useful tweets for journalists and people to use.
How we build it
- We build a worker to look for critical events that may benefit from first-person perspectives.
- We build a worker to open up streams of potentially important tweets, and save them to a database.
- We build a worker to analyze the the tweets, and provide a JSON endpoint for consumption.
- We build a web-front end to display the tweet and associated articles.
A sample page is shown below: ![logo]:(http://image.prntscr.com/image/e3caa6ad0d8f45abac50b6f9d432a5ad.png)
Challenges we ran into
- The readability of a tweet is hard to determine. Most modern readability measurements are for long English texts, whereas a tweet is a 140-character message filled with lack of punctuation, misspellings, leet-speak, hashtags, and URL's -- this proved troublesome as readability measurements are determined by metrics such as the number of characters, words, sentences, syllables, or even the number of words with polysyllables. We attempted to mitigate this issue by analyzing a parsed version of the tweet without hashtags or URL's.
Accomplishments that we're proud of
- Thinking of useful algorithms for judging the utility of tweets.
- Mixing diverse computing solutions to produce a functioning product.
- Using MongoDB to process and understand geospatial relationships.
What we learned
- Twitter Streaming API
- MongoDB
- Javascript
What's next for PrimaryLarry
- Implementing/improving the readability functionality
- A way to view previous headlines and dates
- Determining a better method of finding adequate headlines using graph connectivity
- Hyperlinking URLs
Log in or sign up for Devpost to join the conversation.