GPU Memory Enhancer

Inspiration

Being a software performance test engineer who recently started in the field of machine learning, this challenge intrigued me. I am used to the identification of CPU and memory-related issues in applications while doing load tests. The PLiOPS challenge inspired me to perform research and gather knowledge about LLMs, GPU, and KV Cache.

What it does

The 'GPU Memory Enhancer' solution discusses the possibility of quantizing the KV cache to reduce memory usage during large context windows.

How we built it

The solution is based on KIVI algorithm - a tuning-free 2-bit KV cache quantization algorithm that quantizes the key cache per channel and the value cache, per token.

Challenges we ran into

Knowledge about LLMs, token generation, vLLM, GPUs, and KV Cache
Gathering relevant information and articles/research papers

Accomplishments that we're proud of

I am extremely proud that I could learn about new ideas and topics as part of this challenge.
I am confident that this learning will help in further research and work.

What we learned

How Large Language Models generate tokens
How GPUs aid in LLM inference and various bottlenecks associated with memory
How critical KV Cache is in LLM inference
How vLLM works
How to bring in KV Cache quantization to reduce memory-related issues in GPUs.

What's next for GPU Memory Enhancer

Explore more about Large Language Models, GPUs, vLLM, and KV Cache

Built With

gpu
kvcache
llm
memory
vllm

Updates

Rini Susan V S started this project — Feb 12, 2025 10:51 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.