About the Project
Inspiration
Modern deep learning models continue to grow larger and more expensive to train, while real-world deployments increasingly demand faster convergence, robustness, and efficient inference on constrained hardware. This project was inspired by recent work on dendritic neural architectures, which suggest that adding structured, error-correcting branches to standard neural networks can improve learning dynamics without changing the core backbone.
Rather than designing a new detector from scratch, I focused on a practical question:
Can dendritic computation be integrated into a production-grade object detector like YOLO to improve convergence and enable efficient edge deployment?
YOLO was chosen because it is widely used in real-world systems and represents realistic engineering constraints, making it a strong testbed for evaluating dendritic optimization beyond toy datasets.
What I Built
This project implements Dendritic YOLO, a modified YOLOv8 pipeline where dendritic convolutional layers are injected into the detection head while keeping the backbone unchanged.
Each dendritic layer augments a standard convolution with multiple lightweight, parallel branches that act as error-correcting pathways. The output of a dendritic convolution is defined as:
$$ y = f(x) + \alpha \cdot \frac{1}{N} \sum_{i=1}^{N} g_i(x) $$
where:
- \( f(x) \) is the original convolution,
- \( g_i(x) \) are dendritic branches,
- \( N \) is the number of branches,
- \( \alpha \) controls the dendritic contribution.
In this implementation:
- 6 dendritic branches are used per convolution,
- branches are depthwise-separable to reduce parameter overhead,
- dendrites are injected only into YOLO’s detection head (
cv3layers), - the backbone is frozen to reduce training cost and isolate dendritic effects.
The full pipeline consists of:
- Loading pretrained YOLOv8 weights
- Injecting dendritic convolutions into the detection head
- Training only the head and dendrites on COCO
- Running hyperparameter sweeps for learning rate
- Applying 40 percent post-training pruning for edge deployment
What I Learned
This project provided several key insights:
- Dendritic branches improve convergence speed, achieving higher mAP earlier than the baseline under identical settings.
- Structural inductive bias matters: small architectural changes can meaningfully affect learning dynamics without increasing model size.
- Production constraints influence research decisions, especially when working with CPU-only inference, video processing, and memory limits.
- Full end-to-end retraining is not always necessary: freezing the backbone while training dendrites and the detection head was sufficient to observe gains.
The project also strengthened my understanding of YOLO internals, PyTorch model modification, and the trade-offs between accuracy, latency, and deployability.
Challenges Faced
Several challenges emerged during development:
- Inference speed on CPU: YOLOv8L is extremely slow on CPU-only environments, requiring careful use of frame skipping and resolution reduction.
- Video inference stability: containerized environments sometimes produced partially written video outputs, requiring explicit handling of file finalization.
- Deployment limitations: hosting interactive demos introduced constraints unrelated to model correctness, highlighting the gap between research code and production systems.
- Balancing rigor and feasibility: trade-offs were required to keep the project both technically meaningful and hackathon-appropriate.
Takeaway
This project demonstrates that dendritic optimization can be applied to large-scale, real-world object detection models like YOLO. With careful integration, dendritic architectures can improve convergence and support aggressive compression without redesigning the entire network.
Overall, the work serves as a proof-of-concept that biologically inspired computation can coexist with modern deep learning pipelines in a practical, engineering-focused setting.
Log in or sign up for Devpost to join the conversation.