My inspiration is personal. With aging parents and relatives losing eyesight and becoming visually impaired, I wanted to help them remember their old memories and to experience the new ones without being dependent on others to describe their precious family pictures.
What it does
The application allows users to select images on their device and narrates the contents of the images in the language of their choosing. The application announces the title of every page to the users to keep them always informed. Users can select their preferred language by navigating to the Settings page. The application does not store any personal data. It only registers the language preference so that it can use in subsequent sessions.
How we built it
I used C# language and Xamarin Forms in Visual Studio to develop a cross-platform application that runs on iOS, Android and Windows devices. The application uses Microsoft Azure's Computer Vision and Translator Cognitive Services. Computer Vision service is used to retrieve the picture description and relevant tags. Translator service is used to translate the description and tags in real-time. The narration text is displayed in large fonts and read to the user simultaneously. The image descriptions are cached in memory of current session to minimize connectivity to Computer Vision and Translator services.
Challenges we ran into
The challenges I ran into during implementation were:
- Setting up the development environment
- Design of the interface for simplicity
- Ensuring the picture narration and translation does not impede the app usability (user interaction)
Accomplishments that we're proud of
Application simplicity cannot be overstated. An application's intuitive user interface turns a simple application into a powerful tool users can depend on. I am particularly proud at keeping the design of the application easy to understand with minimal user configuration. By utilizing sophisticated AI services, the complexity is taken away from the user and puts them in the driver's seat.
What we learned
I learnt and used a few of Azure Cognitive Services to build the application. Computer Vision is a sophisticated AI Azure service that identifies the objects in provided pictures. Translator service is another AI Azure service that translates text from any supported language to another.
What's next for PicNarrator
PicNarrator is a tool that can empower many users with its current feature set. But, here's what I envision next for PicNarrator:
- Use Speech to Text cognitive service in selected language to allow for user voice commands
- Improvement of Computer Vision model to include face recognition of loved ones for those users willing to assign first names to the faces in the images and make the experience more personal
- Interface to OneDrive or iCloud or other cloud services to easily browse and narrate the pictures stored in the cloud
- Interface with the camera and microphone to allow for recording of more personal description of new photos in the cloud, as events occur, to preserve precious memory for years to come with a personal touch
- Use TextAnalytics cognitive service to assign sentiments to pictures and allow users to search for pictures by sentiments (applicable to pictures with personal description)
- Publication to all major digital app stores