KompressAI

Problem Statement

-Deploying large-scale AI models on low-power or embedded devices is a major challenge.

These models require high computational power and large memory capacity.
They often depend on continuous cloud access, limiting their use in remote or edge-based environments.
This makes them unsuitable for cost-sensitive applications in industries such as agriculture, robotics, and healthcare.
On-device intelligence is essential in these sectors for real-time decision-making and reliability.
KompressAI addresses these challenges by enabling efficient and accessible AI model deployment.
It directly supports Sustainable Development Goal (SDG) 9 by promoting innovation and fostering sustainable infrastructure.

Enable efficient AI model deployment on resource-constrained hardware.
Simplify compression through a no-code, user-friendly interface.
Reduce power consumption and latency without sacrificing accuracy.
Democratize AI deployment for innovators, educators, and small-scale developers. ## Solution
We made a website where users are able to create projects and submit their AI models
then they are able to tell us what hardware and how they want their AI models to compress. These are the following features we include
- Hardware
- GPU's
- CPU's
- MCU's
- Edge Devices
- Optimization Level
- Lantency(to increase speed of processing)
- Size(decrease size of models)
- Accuracy (focus more on retaining accuracy)
Bassed on these the usser chooses any commbination of the two that suits his needs best
then we use our in house built AI model to predict the best way to achive the best compression for the give situation
our AI is able to understandwhat exactlly the model needs so if someone has access to good hardware and wants to increaase accuracy or latency our AI model understands it and can improve the performance of the model by increasing its size and reducing latency or increassing accuracy
and when thier are thgiht constraints on hardware our model can reduce the size of the users model by 70% while retaaining accruacy within a 0.001% margin and without increasing latency at all

We first recieve data from the usser and have the file names and configarations of the user in a config json file
we then use the json fole to acces the userrs model and test data they have sent
then we pass the model and test daata through the model allong with their prefrences and then our model compresses it
we trainned our model on variouse compression algorithms adn their advantages and disadvantages and it figures out the best way to compress the model bassed on them and then does it
the following are some of the algorithms we trainned it on
- Dynamic Quantanization
- Weight Sharring
- Unstructurred Pruning
- Layer Dropping
- JIT Fussion
- Etc

All algorithms ussed are Post training so their is no need for the usser to trian the algorithm again and it helps us stream line the process
Adaptive Model Scaling: The system performs real-time hardware profiling and automatically adjusts model size—compressing for constrained devices and expanding architectures on high-performance hardware—to maximize accuracy, throughput, and resource utilization in any deployment environment. ## TechStack
Torch
padnas
numpy
json
sci-kit-learn