Blind People's Eyes

"What's happened?" the blind people think

Then he takes a photo and clicks the screen

"A woman walking down a street with a cell phone." said by phone.

"OK Got it!"

AI to help blind people to see the world!

How we built it

1. Model Training: (Python)

  • Training Environment: CentOS 7 with one NVIDIA-V100
  • Model Constructed: Transformer Picture Encoder + Transformer Text Decoder
  • Dataset: COCO 2014

2. Front-end: (Java)

  • Andriod (TexttoSpeech, Vibrate... )

3. Back-end: (Python)

  • RESTful API with Flask
  • Tensorflow Serving Docker
  • Training Environment: CentOS 7

Challenges we ran into & What we learned

1. Model Training

  • Positional Embedding for 1D and 2D inputs
  • Design the model inputs signature for deploying with Tensorflow Serving
  • Select randomly from the validation dataset to monitor the robustness of the model, whether overfit in training data

2. Front-end

  • Use fundamental java libraries to POST files, by constricting fundamental HTTP request structure
  • Order the camera within App and save images with temporary space and address
  • The relationship between the multithread and Intend

3. Back-end

  • Tensorflow Serving Deployment in two Linux systems
  • Restful API Design with Flask and cooperating with TensorFlow Serving

What's next for Blind People's Eye

Model Side

  • Expand Dataset, by covering more events pictures.
  • Use Alberta and some pretrianed model for decoder, could have better text generating effect
  • Use distilled models, such as electra, distilled-bert, thus the model can become smaller enough to inference on the phone
  • Concentrate the picture decoder and voice decoder model, generate more fluent audio

User Interface Side

  • Need more clear instructions for people to click buttons
  • Modify vibration style, give different rhythms for different situations

Looing for Internship

Pete Yu: hao.yu2@mail.mcgill.ca

  • Backend & Operation (CentOS7//Docker/VPS)
  • Artificial Intelligence (TensorFlow)
  • Software Developer

Max Shen: ao.shen@mail.mcgill.ca

  • Java, C
  • Android Developer
  • Full-Stack Developer

Built With

Share this project:

Updates