Group number: 16
Group name: Trying but Crying
Project: Manage Me
Members: Elina, Emily, Hao, Oliver
Hackathon/event: CalgaryHacks 2022
- Most Innovative Idea
- Best use of a cloud computing service (we used Oracle Cloud)
- Best use of machine learning
- Best UX/UI design
In adolescent minds, the relevance of a healthy lifestyle is often disputed, while its significance is not. On average, teenage students spend 3 hours and 17 minutes each day on social media, with no clear indication of where exactly their time is going upon those platforms. Here at Manage Me, we believe that recognition of any given problem is the first step towards its eradication. This helps the user become aware of their time spent towards different topics on social media. This way, they can understand how poor or outstanding their time management skills are, and hopefully change for the better. We decided to automate usage analysis for our two most time-killing apps: Discord and YouTube!
What it does
The Discord watcher tracks your activity on Discord, presenting you with blunt and specific facts. The YouTube manager parses your Youtube history that is recorded by Google and uses a hybrid technique we created precisely for this task to discriminate between educational and recreational use of the platform. Read more in our Technical Specifications.
How we built it
For a sleek data analysis dashboard, we utilize a handy Python library called ActivityWatch, which simplifies the development of time-tracking software. Due to time constraints of the hackathon, we utilized a method we discovered to import Discord usage data through parsing browser history, which were stored by the Browser and other ActivityWatch watchers since the beginning of time. Manage Me is able to load both historical data from the user's browser history, but also works as a userscript or a BetterDiscord plugin to collect and analyze data on the fly! With this live technology, updates to your Discord consumption are sent straight to your ActivityWatch dashboard. Furthermore, a custom-made visualization query allows us to re-evaluate the ways our time spent is affecting us mentally, socially, and academically. Now we know whether the massive portion of our day is spent chatting with internet strangers about cat pics or doing group work (like this hackathon! :D).
Unfortunately, the same could not be done for YouTube, an app installed on a plethora of devices. To overcome this, we found Google Takeout's data export features to be the most efficient solution for this occasion.
We combine Google's automatic speech recognition (ASR) technology and several public state-of-the-art natural language processing (NLP) models to pull off this task. The models used are as follows:
- Automatic Speech Recognition: unknown
- Prompt Generator, Grammar and Punctuator: distill_1_02_2021
- Summarization Engine (GPT): j1-large
- Classification Engine (GPT): gptj_6B
The GPT models employed are billion-parameter autoregressive generative pre-trained language models, implementing state-of-the art machine learning / artificial intelligence research. We prompt the neural network using zero-shot learning, with parameters (argmax sampling) we’ve found to be the most optimal for our NLP classification task.
For more on the way we used our models, read “Challenges we ran into”.
Brief summary of video: Five cool things Google can do
Five cool things Google can do. Number one, if you don't have cash but need to make a decision, search flip a coin and Google will help you decide. Number two, if you ever want to make sure something is straight, Google has a bubble level. Number three, if you're a fan of Friends, searching the names of the main characters will bring up an interactive widget that best summarizes them. Ross is quite amusing. Four, if you're ever unsure how much to tip someone, Google's tip calculator can factor in how much the total bill was, what percentage of that bill you want to give, and how many people are with you to figure it out. Finally, if you ever want a taste of tech from the past, searching google in 1998 will take you to a 23 year old version of the search engine; it's amazing how much we've improved since then. Follow us for more tech tidbits like this.
- https://youtu.be/IQYLu5QsPTI => Arabs are Generous (Recreational)
- https://youtu.be/1fcdpEPd7pI => The Secret Service (Educational)
- https://youtu.be/JlMn3Pt05QQ => Rare Rolex (Recreational)
- https://youtu.be/ImWenmB12M8 => Astronauts See Things In Space That Are Unexplainable (Educational)
These examples are not cherrypicked, they are the most recent YouTube Shorts I’ve watched.
Full tests list can be found on GitHub, JSON format.
For classification, we implemented the following paper from 2011: Roemmele, M., Bejan, C., and Gordon, A. (2011) Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning. AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning, Stanford University, March 21-23, 2011. pdf
Challenges we ran into
Problem: Oliver accidentally committed and pushed secrets to GitHub.
Solution: Found in a leak remediation tutorial by GitGuardian
Problem: Some YouTube Shorts don’t have audible speech and thus no transcripts.
Solution: We found that most of these videos are not educational. This makes sense because talking is usually necessary in educational videos whereas recreational videos are characterized by (potentially meme-) music.
Problem: It was difficult for us to gather data on our technology use in such a short timespan of 24 hours.
Solution: We took some data from Oliver's browsing history and made a script to translate it to data that we could visualize in ActivityWatch (formerly described).
Problem: Google’s ASR technology doesn’t generate punctuation or proper grammar (eg. capitalization), which inhibits the language model’s ability to predict structural language. This is because the models are trained to produce text that is coherent with its context (our input). Thus, if we pass garbage as input, we will get garbage as output. And we don’t want quality, not garbage.
Solution: We used more NLP AI models (designed and finetuned to the particular task of fixing bad English) to correct the punctuation and grammar before passing it to the model that generates an output. The results are impressively good.
Problem: YouTube Shorts often have captions which don’t resemble the contents of the video.
Solution: We used AI to generate a title for each short based on the transcript.
What we learned
We learnt a lot about time management while creating this application to help people understand their own internet behavior. We exited with more understanding on the origin of our tendency towards procrastination, and we explored the psychological reasons this may be the case. We learned about the effects of social media overstimulating us and the ways in which our modern lifestyles are messing with our brains. In our world of supernormal stimuli, through ferocious technological advancement and capitalistic ventures, we discovered gradual eradication of natural stimuli and the divine prominence for us as humans, gifted to us through biological evolution. It makes us question the power dopamine rushes have in our lives.
In terms of the nature of hackathons, we learned to set strict deadlines, and plan ahead in case of emergency. This time, we actually submitted with half an hour left, instead of 3 minutes like last time! Obviously, it was a piece of cake with such amazing teammates.
During the development of the YouTube watcher, we decided that the simple ActivityWatch libraries wouldn’t suffice for the level of flexibility we desired. Due to the complex nature of designing a web app in vanilla HTML/JS/CSS, the GUI is a WIP. We still wanted to show you the 95% accuracy rate of the AI backend that we made, though.
Since we have transcripts to the YouTube video, we would love to see an AI-based search engine as a follow-up work.