Inspiration
A picture can paint a thousand words. But there are people who are visually impaired, people who are unable to see these images and lose out on important content. While the markdown alternate text suggestor is particularly targeted on detecting markdown images with no alt text and generating relevant image captions, this concept and solution itself can be applied to many other use cases, ultimately increasing social inclusivity. Another reason that inspired me to choose this product is the fact that alternate texts contribute a lot to SEO, and having them can help greatly increase the visibility of a project in search engines. Solving this problem would not only help me improve my programming skills but also develop a practical and feasible solution. Lastly, I have always wanted to contribute to open source and create a product that is market-ready / can be released to the public for share and use. Hence, I decided to pick this problem statement that can help me achieve my goals.
What it does
The solution, AltSuggestor, is a simple yet game-changing and fully automated Markdown alternate text suggestor. With AltSuggestor, you can now enhance both social inclusiveness and web accessibility of your project. It scans through your project's README.md file, detects inline images that do not contain alt text and suggests them accordingly.
How we built it
Python is the main programming language for AltSuggestor. The whole process is automated using GitHub Actions workflow that will activate whenever a pull request is created to merge a branch with the main branch. First, it will convert the README.md markdown file into HTML format using BeautifulSoup library. It will then scan through the HTML and find out all the images using the tag. If the
tag does not have an attribute tied to it, it means that the respective image does not have an alternate text. The images will be fed to the Azure Cognitive Service API then would return a relevant image caption for every image that does not have alternate text, and the result would be displayed in the GitHub Action job details.
Please watch the video for a better live demonstration!
Please view my github for full explanation and setup/installation: https://github.com/jenniupdates/AltSuggestor
Challenges we ran into
The first challenge was how do we find out which images have or did not have the alternate text. When researching more about alt text, it was when I find out alt text can been seen as an attribute to an image tag. So the next feasible step is how do we recognise an image text? But luckily, markdown can be easily converted into HTML format using BeautifulSoup. The next challenge was how do we generate a relevant caption for the image's alt text? With very little knowledge on AI, I attended the available CWB AI-900 workshop/training and it was actually through that training that I came across the azure service: Cognito that had pre-trained models and an API that can directly generate an appropriate caption for images. The last and the hardest part would be putting all these as part of the pre-merge check. I had to research on the whole Git flow and how a pull or merge request is being made... The creation of the .yml file was even harder and I did a lot of trial and error to correctify the Action/workflow (as can be seen from the many action workflow and 100 commits in my Github repo...). Nevertheless, I am very glad that I am able to pull through all of these challenges and create a fully working and feasible product. :)
Accomplishments that we're proud of
CRASH COURSE ON AZURE COGNITO SERVICE! -- I AM SUPER KNOWLEDGABLE ON THIS NOW ~100 COMMITS, ~80 GITHUB ACTION/WORKFLOW, AND MANY MORE blood sweat tears research and effort on this submission A WORKING PRODUCT THAT PEOPLE CAN ACTUALLY USE! A product that does good and make a difference for the visually impaired community!
What we learned
A lot on Azure services Git workflows GitHub Actions and Secrets - how to create and use
What's next for AltSuggestor
While OpenAI's Dalle could generate an image from a text prompt, the vice versa was not possible (or so that I have researched on). Nevertheless, I would be finding more ways on how I can fine-tune the model used to generate the captions, such that the captions would become even more accurate or relevant to the images!
Please visit my github: https://github.com/jenniupdates/AltSuggestor. It contains a whole explanation on the project and how to setup, install and use it in your own projects!
Log in or sign up for Devpost to join the conversation.