We took inspiration from That '70s Show, putting our own automated spin on the scene between Michael, Fez and Jackie.
Have you ever been stuck on the phone with a nagging parent? A venting girlfriend or boyfriend? A spam call that's droning on? Ever felt your mind drift off while they were on a tirade?
Daydream was made for people that value their time - be it students, professionals, or anyone else. Like you, we understand the importance of productivity, and our web app allows you to seamlessly navigate life while being in a phone call.
What it does
Daydream enables you to zone out of a phone conversation and do what you want to do. It does this by listening to your phone calls, giving you a play-by-play live log of what the other person is saying. When it detects that they've finished talking and have asked you a question, it prompts you to respond with a pop-up of the question and a quick context rundown of what the other person has said. This allows you to effectively multitask and feign listening to the other party.
How we built it
Using Google Cloud Speech to Text we converted the speech into a .wav file, then used translate features to place it chunk by chunk into a text file. These textfiles were passed to language detection functions that analysed the files for any question keywords e.g. 'who', 'what', 'when', 'where', 'why', etc. and then determined if it was a question or not. If it was a question, it would be sent to the user front end with the question as a prompt and added to the log. Otherwise it was added to the log.
Challenges we ran into
We ran into challenges in minimising the delay of the speech to text and natural language processing. For such a product, achieving low latency is a must, so it seems like it is happening in real time. However the delay in our prototype is inevitable, but there are solutions that require more time that would allow us to minimise these delays.
Additionally, challenges surfaced with the natural language processing algorithm we originally used - taking a Bayes classifier and training & testing it with data sets. However we ran into errors with merging with the speech processing, so we went for a more naive, simpler approach to detecting if it was a question or not.
Accomplishments that we're proud of and what we've learned
Interaction with the spacy library was adapted from https://spacy.io/usage/linguistic-features#sbd-manual 3rd party libraries: NLKT, Spacy, speech_recognition, textblob, os, time
- Abhay: came up with the idea, explored front end options, helped connect front and back end
- Aleney: analysed textfile for question detection, helped connect front and back end
- Ana: analysed textfile for question detection, helped connect front and back end, created mock phone app design
- Arya: converted speech to textfile, helped connect front and back end, helped analysing textfile for question detection
- Barbara: converted speech to textfile, designed user web app
- Ewan: explored options for architecture of the process, helped connect front and back end
What's next for Daydream
- connecting front end to back end communication using Flask
- parallel processing for our speech to text stages, and text analysis stage
- using a Natural Language Processing algorithm that we can actually get to work
- implement security measures for our chat logs, including intruder detection or attacks against the database, and appropriate firewalls/encoding to prevent this