Inspiration

With the rapid proliferation of IoT devices and the corresponding rise in ARM-based hardware, there is increasing concern about security incidents that exploit vulnerabilities in these devices. Existing static analysis tools detect vulnerabilities by relying on known patterns, making it difficult to respond to new types of attacks. To address this limitation, we envisioned a solution leveraging machine learning algorithms that learn from both vulnerable and secure code. By doing so, we can detect potential risks beyond the scope of traditional pattern-based analysis. This motivation served as the foundation for CodeBoomBoom.

What it does

CodeBoomBoom is an AI-driven automated inspection tool that analyzes the source code of both ARM and x86 devices to uncover potential security vulnerabilities. It converts C-language code or firmware into LLVM IR, then uses a machine learning pipeline—combining LSTM and Word2Vec—to efficiently predict vulnerabilities. The analysis results are presented in a PDF report or via a web interface for clear visualization. Additionally, CodeBoomBoom automatically identifies unsafe function calls or patterns and proposes remediation strategies, helping to enhance code quality while reducing development and maintenance time.

How we built it

Data Collection and Preprocessing

We utilized vulnerable and secure code samples from NIST SARD (Test-Suite #112). The code was transformed into LLVM IR, followed by preprocessing steps such as removing metadata and flags, normalizing global variables, stack pointers, and structure names, and stripping debug information. Machine Learning Model Setup

We applied Word2Vec to convert the preprocessed LLVM IR tokens into vectors. We then used an LSTM model to train on this data, creating a classifier capable of determining whether a piece of code is vulnerable or secure. Firmware Decompilation

RetDec was leveraged to decompile .bin files and other firmware formats. We mounted internal filesystems (like squashfs) and extracted ELF files to obtain the source code required for generating LLVM IR. Result Visualization and Reporting

We automatically analyze uploaded code via a web interface. The results are provided in a PDF format or through a web dashboard for ease of interpretation. Challenges we ran into Project Management

Coordinating schedules among five team members proved challenging. We ensured minimal progress tracking by organizing small group meetings for each role, then sharing the outcomes with the entire team. Handling Firmware Files

Initially, we assumed .bin firmware files could be directly converted into LLVM IR, but soon realized that we needed to mount squashfs filesystems to locate ELF files. With our mentor’s guidance, we mounted squashfs and were able to resolve this issue.

Accomplishments that we're proud of

We established a framework that applies both static and machine learning–based analysis across diverse codebases, including ARM firmware. By utilizing an open-source decompiler (RetDec), we successfully automated the previously unfamiliar process of firmware analysis. Through official vulnerability code sets from NIST, we strengthened the model’s reliability while also providing clear guidelines for enhancing code quality.

What we learned

Decompilation Tools: We gained a deeper understanding of decompilation tools such as RetDec and learned how to effectively analyze firmware, including filesystem structures and ELF extraction. Project Planning and Management: Over a prolonged project timeline, we realized that having a structured plan and clear role distribution from the outset is far more critical than assuming there will be sufficient time later. As complexity grows, well-defined milestones and goals are essential. Machine Learning Model Optimization: Careful design from data preprocessing through model architecture is vital. Minimizing false positives and false negatives is crucial for real-world applicability.

What's next for CodeBoomBoom

Strengthening Dynamic Analysis: We plan to integrate dynamic analysis, capturing hardware-bound packets in real time and applying machine learning to detect vulnerabilities. Ongoing Data Updates: We aim to develop a “safety service” that automatically scans code repositories like GitHub and alerts developers if they introduce vulnerable code. Expanding Capabilities: Beyond C/C++, we intend to broaden our scope to other languages and binary formats, improving our tool’s versatility. Security Community Collaboration: We plan to work with global security conferences and open-source communities to boost CodeBoomBoom’s detection accuracy and real-world use cases.

Built With

Share this project:

Updates