Drones are becoming increasingly popular and so there is a rising need for a better experience of using drones, so it can appeal and reach out to consumers. We feel a speech input to drone navigation is important to users and building it using a Powerful Cognitive Engine - IBM Watson is a revolutionary thing because it simplifies the ease and complexity of speech-to-text translation.
What it does
IBM Cognitive Engine launched in August 2015 is designed with desktop devices and enterprises in mind and drones are helping in mapping new areas, aiding the military and driving the revolution towards autonomous devices, as the drone is one of the easiest devices to control using a smartphone app or Developer API. The Speech-to-text API translates the user's voice into meaningful text (command in the computer jargon), which is fed to to the AR.Drone to navigate it in the real world.
How I built it
Voice controlled drone navigation system is built on top of Node.js framework which integrates web development framework from Express and I/O functionality through Socket.IO. The idea is the take in audio in real-time from the web application, be it on desktop, mobile device or smart-TV set which can stream the live video stream from the drone's cameras, making the interaction even more intuitive and efficient. IBM's Watson API for speech-to-text enabled us to take in audio in the form of media file or live stream and we designed a user-friendly and interactive UI similar to that of a game controller. We programmed it as direct interaction with the service with HTTP REST API and using the broadband model. The output from Watson's STT (Speech-to-text) Engine is a transcript, consisting of a sentence spoken by the user to the Server side built on Node.js. The Web server then parses the input string to commands which are transmitted to the AR.Drone's controller using Wifi which responds to the commands by navigating in the direction or by performing multi-dimensional activities such as rotating or flipping.
Challenges I ran into
The biggest challenge of this problem is the parsing of data from multiple formats into the format that the drone can understand with speech recognition and style of the user. The engine from API mitigates this by learning every time the user interacts with the system. Node.js was a relatively new platform to the team and we had to spend lot of hours understanding the way things work with frameworks in Node.js. The restricted time per team to use the drones made it even harder to test the code as we built it, so we had to adopt a plan driven approach rather than an agile approach which is more robust and also the battery would have to be recharged due to the intensive communication with the drone.
Accomplishments that I'm proud of
IBM Watson's API is relatively new to the user community and not much support is being offered, still being able to connect it with a drone is another milestone for the team. Implementing REST API & Node.js frameworks on the back-end broadened our outlook towards developing a web application than dedicating to a particular mobile operating system. Utilising networking protocols, UDP for audio and TCP for video stream enabled us to put them in a better way, accessible from any device, from anywhere and at any time.
What I learned
The opportunity for our team to meet and interact with the programmer community in and around College Station and the constant encouragement from industry experts and company representatives is truly a remarkable experience. The ability to discuss your problem as you encounter it enables you to solve the roadblocks in development with ease. The experience of using a Drone API and navigating it to according to a computer program was a fantastic one.
What's next for Voice Controlled Drone Navigation
The voice controlled drone navigation is a radical idea and is the basis for our future development wherein we look forward to understanding natural language better using NLP classifier and we look forward to integrate more robust APIs to make an autonomous drone, capable of reacting to stimuli and context awareness. With the aim to solve the problems faced of estimating damage caused to a property from an aerial survey can help small, medium and large enterprises in expanding to other parts of the world and that of finding survivors when a terrorist attack or tracking suspicious activity without risking a human life is a very powerful motivator for our team to carry this momentum moving forward and contribute to the community of developers.