The ConverTwitter

Inspiration

Twitter has various accessibility features for people like "alt text," "adjusting the color contrast," "reducing the motion of in-app animations" etc. However, Twitter can do much more to improve the accessibility of people with challenges.

What it does

Our project has the following features:

For voice tweets, the user can display written transcripts or play the audio in another language.
For videos with voice, the user can display a written summary of the video, and subtitles.
For images, the user can display a written summary of the image.
For text tweet, the user can switch it into a vocal note which reads the original text tweet.
All the format conversions are generated automatically whenever a user tweets. This differs from, for instance, the already-existing feature which allows users to write image descriptions "manually" before they tweet an image.

How we built it

In our base file:

We fetched text/video/audio/animated gifs tweets from Twitter API
We filtered the media tweets
We call upon the functions for transcribing voice/video tweets into text, image captioning, text summary, and converting text tweets into audio accordingly.
We translated text tweets into different languages using Google Translation API
We updated our database with CSV files with tweets and media URL.

To transcribe voice/video tweets into text:

We imported the voice/video from the tweet URL retrieved from the Twitter API
We performed some noise filtering on the audio of the voice/video files
We connected to Google Cloud Speech-to-Text to convert the audio into text
We produced captions and subtitles from this textual output.

For image captioning:

We have used an attention-based deep-learning model, similarly to the architecture used in "Show, Attend and Tell". We have used InceptionV3 as the feature extractor and then an encoder-decoder model generates captions. The model is trained with the MS-COCO dataset. Deep learning models have been implemented with TensorFlow. To build it:

Train model on colab and save the trained model
When an image URL is received, we download it if it is not already in the cache
We use the model to automatically generate a caption for the image.

For Text Summary:

We have used a python library that implements Encoder/Decoder based on LSTM improving the accuracy of summarization by Sequence-to-Sequence(Seq2Seq) learning. Process:

The script reads the generated video transcript
Summarize it in no more than three sentences
Display the summary

Challenges we ran into

The challenge we faced while transcribing voice/video tweets into text was to make sure that the noise filter can adapt for different levels of noise in the audio file, but we managed to do so for not-too-noisy audios.
The challenge faced while building image captioning was the computation resources needed to train the model, as time was limited.

Accomplishments that we're proud of

That we finished an impactful project this weekend in time to submit while facing and solving the above challenges. Most importantly the fact that we had fun building this project!

What we learned

How to use various APIs
How to make use of various open-source resources.
How to collaborate and work in/as a team with people of different technical backgrounds, skills, timezones, and nationalities.

What's next

To improvise so that the subtitles are aligned with the voice in the video files.
To further improvise image captioning by training our model with more datasets.
To further improvise our video/audio transcribing so that it works even for more noisy videos/audios
To add a feature that enables users to give a feedback on the voice transcript, the summaries, and image captioning.

Built With

ai
bulma-css-framework
convolutional-neural-network-(cnn)
csv
deep-learning
flask
flask-migrate
google-cloud-speech-to-text-api
google-cloud-text-to-speech-api
google-cloud-translate-api
html
image-analysis
image-captioning
keras
machine-learning
moviepy
natural-language-processing
pandas
pysummarization
python
recurrent-neural-network-(rnn)
sha-256
sqlalchemy-(orm)
tensorflow
text-summarization
tweepy
twitter
twitter-rest-api
wtforms

Submitted to

#Codechella

Created by

I worked on the image captioning. We have used an attention-based deep-learning model. InceptionV3 is used as the feature extractor and then a encoder-decoder model generates captions. I've learnt a lot during these days and I've loved to work with my team mates.

Pedro Lara Benitez
Contribution and Challenges:
1. Implementing a full-stack twitter application using flask and SQLAlchemy.
2. Integrating the models for each functionality to the application to make #twitter more accessible for everyone!

Freya Mehta
I worked on the speech-to-text features. It was my first time doing this and it wss a fun experience.

John Pougué Biyong
Khushal Chekuri
namrata agrawal