We were inspired by the recent interest of many companies in drone delivery and drone search. In particular, we wanted to bring drone abilities to the consumer – and we ended up doing even more.
There are many applications that can stem from our work, from search and rescue missions, drone delivery, or just finding your keys. In addition, we’ve brought the ability for a consumer to, with just their voice, train a classifier for object recognition.
What it does
We build a pipeline that allows anyone to visually search for objects using a drone and simplified computer vision and machine learning.
It consists of mainly 3 parts: 1) A search drone (controlled normally with your phone) that performs image classification in real time for a given object 2) Being able to train an image classifier on any object by just using your voice. 3) A voice-controlled drone that can perform targeted delivery
How we built it
We used an Amazon Echo to handle voice input, and the transcribed input was sent to a AWS Lambda server. Depending on the text, it was classified into one of several categories (such as commands). This server updated a Firebase database with the appropriate commands/information. Next, our local computers were notified whenever the database changed, and executed appropriate commands -- whether that be train an image classifier or fly a drone.
To get the non-programmable drone to become a search drone, we had it live stream its video feed to an Android phone, and we had a script that constantly took screenshots of the Android phone and stored them on our computer. Then we could use this images either for training data or to classify them in real time, using image segmentation and IBM Watson.
To train a classifier with only your voice, we would take the search term and use the Bing Search API to get images associated with that term. This served as the training data. We would then feed this training data into IBM Watson to build a classifier. This classifier could later be used for the search drone. All the consumer had to do was use their voice to do computer vision -- we took care of getting the data and applying the machine learning applications.
Challenges we ran into
We are working with sandboxed technologies meant for the average consumer – but that are not developer friendly. It wasn't possible to take pictures or move the drone programmatically. We had to hack creative ways to enable this new capabilities for technologies, such as the screenshot pulling described above.
Additionally, the stack of coordinating communication with Alexa’s server, databases, and sending commands to the drone was quite a relay.
Accomplishments that we're proud of
-Being able to create a super consumer-friendly way of training image classifiers. -Taking a non-programmable drone and being able to hack with it still -Being able to do voice control in general!
What we learned
Hardware is finnicky.
What's next for Recon
Even more precise control of the drone as well as potentially control over multiple drones.