Do you know the feeling, when you're in a crowded room full of people, everybody's talking, some people so loud that you cannot hear the voices you actually want to hear. We've had this feeling sometimes during the opening ceremonies and the following challenge talks - but also at home in our daily lives. It is a feeling we also know from another world, our digital one. Today, browsing the web, reading news and consuming our social streams is exhausting. It feels like a huge cognitive effort to just scan through the vast amount of fake news, hate speech, shit storms, and the enormous amount of media reports about every single incident for weeks. At some point, this influences the way we interact with the web negatively. So we thought about how we can make the web the safe and sane place it used to be for everybody by combining technologies that have evolved in great ways over the last years.
What it does
moopoo is a website and a browser extension. Users can set words and phrases, they do not like to see on a daily basis when browsing the web and define suitable replacements. moopoo will then, via the browser extension, transform webpages into better versions for this user. On the website, there is a sample of live translating speech into nicer versions or image processing too, to show the potential of moopoo.
How we built it
We've made use of many Google Services, like the Google Voice API/WebkitSynthesis to transform speech-to-text (and vice versa), Google Cloud Functions to do serverless NLP processing and finally Google Vision APIs for OCR-based image processing demo. The website with the user customizable settings is built on React, the Google Chrome Browser extension using Chrome's extension API.
Challenges we ran into
We have had issues with performance/stability of the Google's Cloud function, but appreciated Google's accuracy in Speech-to-Text/-To-Speech and OCR Service in the Vision API. But doing audio recording and transmission to Google's services is hard to do on a website only (which is one of the requirements we set ourselves, to make the app as accessible as possible), why we had to put the Voice Recognition part onto Webkit's built-in Speech API (slightly worse quality, but stable and fast). The text-to-speech-part relies fully on Google's Voice API. An incredible headache are browser extensions and their limitations - making login-less, persisting settings with a usable UI and exposing them to the browser extension was hard task that required some trickery. In the end, the NLP part of making thoughtful replacements that don't break websites or the experience turned out to be tricky as well - but was expected to be so.
Accomplishments that we're proud of
moopoo does an excellent job in most situations we tested it for. The replacements are almost unspottable, and do in no way disrupt the experience. Also, adding and removing words and phrases is easy and fast, while still being powerful. The voice and image demos required a great amount of time and came out unexpectedly well.
What we learned
Sleeping at night, working at day really pays off. Thinking longer about ideas as well. Focus on the most important part, and craft an MVP early helps a lot to get creative in the details; and sometimes, not every plan goes well but there is always a way around it. UX is as important as engineering, one cannot without the other to make the product useful.
What's next for moopoo
We would really, really like to go further with moopoo. This includes expanding the website filtering onto all kind of media; videos on youtube, where we could replace audio tracks and even do image filtering in realtime. Segmenting the user base to improve moopoo for individuals but also for parents, businesses or governments is definitely something to look out for. On the NLP side, there is almost no limit - it can always get better, up until going further from a token-based replacement to a sentiment-based transformation engine which changes not the only the words, but more the tone of statements.