Inspiration

Twitter has various accessibility features for people like "alt text," "adjusting the color contrast," "reducing the motion of in-app animations" etc. However, Twitter can do much more to improve the accessibility of people with challenges.

What it does

Our project has the following features:

  • For voice tweets, the user can display written transcripts or play the audio in another language.
  • For videos with voice, the user can display a written summary of the video, and subtitles.
  • For images, the user can display a written summary of the image.
  • For text tweet, the user can switch it into a vocal note which reads the original text tweet.
  • All the format conversions are generated automatically whenever a user tweets. This differs from, for instance, the already-existing feature which allows users to write image descriptions "manually" before they tweet an image.

How we built it

In our base file:

  • We fetched text/video/audio/animated gifs tweets from Twitter API
  • We filtered the media tweets
  • We call upon the functions for transcribing voice/video tweets into text, image captioning, text summary, and converting text tweets into audio accordingly.
  • We translated text tweets into different languages using Google Translation API
  • We updated our database with CSV files with tweets and media URL.

To transcribe voice/video tweets into text:

  • We imported the voice/video from the tweet URL retrieved from the Twitter API
  • We performed some noise filtering on the audio of the voice/video files
  • We connected to Google Cloud Speech-to-Text to convert the audio into text
  • We produced captions and subtitles from this textual output.

For image captioning:

We have used an attention-based deep-learning model, similarly to the architecture used in "Show, Attend and Tell". We have used InceptionV3 as the feature extractor and then an encoder-decoder model generates captions. The model is trained with the MS-COCO dataset. Deep learning models have been implemented with TensorFlow. To build it:

  • Train model on colab and save the trained model
  • When an image URL is received, we download it if it is not already in the cache
  • We use the model to automatically generate a caption for the image.

For Text Summary:

We have used a python library that implements Encoder/Decoder based on LSTM improving the accuracy of summarization by Sequence-to-Sequence(Seq2Seq) learning. Process:

  • The script reads the generated video transcript
  • Summarize it in no more than three sentences
  • Display the summary

Challenges we ran into

  • The challenge we faced while transcribing voice/video tweets into text was to make sure that the noise filter can adapt for different levels of noise in the audio file, but we managed to do so for not-too-noisy audios.

  • The challenge faced while building image captioning was the computation resources needed to train the model, as time was limited.

Accomplishments that we're proud of

That we finished an impactful project this weekend in time to submit while facing and solving the above challenges. Most importantly the fact that we had fun building this project!

What we learned

  • How to use various APIs
  • How to make use of various open-source resources.
  • How to collaborate and work in/as a team with people of different technical backgrounds, skills, timezones, and nationalities.

What's next

  • To improvise so that the subtitles are aligned with the voice in the video files.
  • To further improvise image captioning by training our model with more datasets.
  • To further improvise our video/audio transcribing so that it works even for more noisy videos/audios
  • To add a feature that enables users to give a feedback on the voice transcript, the summaries, and image captioning.

Built With

  • ai
  • bulma-css-framework
  • convolutional-neural-network-(cnn)
  • csv
  • deep-learning
  • flask
  • flask-migrate
  • google-cloud-speech-to-text-api
  • google-cloud-text-to-speech-api
  • google-cloud-translate-api
  • html
  • image-analysis
  • image-captioning
  • keras
  • machine-learning
  • moviepy
  • natural-language-processing
  • pandas
  • pysummarization
  • python
  • recurrent-neural-network-(rnn)
  • sha-256
  • sqlalchemy-(orm)
  • tensorflow
  • text-summarization
  • tweepy
  • twitter
  • twitter-rest-api
  • wtforms
Share this project:

Updates