Computer Vision Models As Service

Super Resolution
Layout Parser
Layout Parser Result
Super resolution Result

Inspiration

I have worked from long time in Computer Graphics, Game Development and Video Editing. I really like to use computer vision algorithms to enhance the image and get insightful information from it.

What it does

This project implements different Computer Vision Deep Learning Models as a service. It presents the capabilities of these models for creating ML applications.

The DL models are deployed as microservices and you can embed them on any application with one line of code using the provided Models API endpoints.

Models

Currently, Deployed Deep Learning Models:

Image Super Resolution

superresulotion

Super Resolution is a Deep Learning model to enhance low quality image to high quality image.

It has applications in a number of domains including surveillance and security, medical imagery and enhancing Satellite images from the space.

Layout Parser

layout_parser

Layout Parser is a Deep Learning model for Document Image Analysis.

Layoutparser support the analysis of complex documents and processing of the hierarchical structure in the layouts. it can be used to recognize and segment figure and text region in any document image and extract the data from them.

Project Structure

The project consists of the following:

Models API : Flask web service that provides inference endpoints for each deep learning model. This service is the core of the project and other services will be integrated with it in order to process input image and recognize it.
Super Resolution App: Streamlit data service integrated with models API service to enhance low resolution image to high resolution image by requesting single image super resolution inference endpoint of models API for each input image.
Layout Parser App: Streamlit data service integrated with models API service to detect layout different regions (Text region, Image region) for any document image. Also it can recognize text in text region using Layout OCR model and return the recognized text.

How I built it

I built it using Python tech stack including libraries and frameworks: Flask, OpenCV, TorchVision, OpenVino, Streamlit. Also I have used Docker and Docker Compose for deploying and running the ML models.

Challenges I ran into

I have encountered different challenges during serialization of image input when sending it as parameter in the post request to the inference endpoint also when converting the image from numpy to bytes format that can be downloaded to the host machine. Also I have faced issues when building the docker containers image because some of dependencies of different versions of python packages. Another issue related to the detectron2 deep learning model dependencies which is hard to install on windows OS and I resolved it by installing and running it on Linux distribution.

Accomplishments I am proud of

I am really proud that all the models inference endpoints worked successfully and the Models-API integrated with the Streamlit application for each model. Also all the models can be build and deployed using Docker technology.

What I learned

I learned valuable things regarding deploying and containerizing ML models using Docker technology and how you can streamline the process of running the models on any machine regardless of the host OS using Docker containers.

Future Plan

I will work in the future on improving the service and adding more Computer Vision models that are useful for many image processing applications.

Built With

dl
docker
docker-compose
flask
layoutparser
mlops
opencv
openvino
python
streamlit
torch

Updates

Mohamad Oghli started this project — Nov 07, 2023 06:37 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.