Inspiration

We've been inspired by the UPJS challenge.

What it does

Our tool streamlines the analysis of microscopic data. You submit a BMP image of your microscope scan, and the system processes the visual information to provide accurate classification or diagnostic insights based on our trained models.

How we built it

We focused on a robust machine learning pipeline consisting of four main phases:

Preprocessing: We extracted images from their frames and noted their scales to ensure data consistency.

Feature Extraction: We experimented with several state-of-the-art pretrained convolutional neural networks (CNNs) and Vision Transformers, including ResNet, EfficientNetV2, ConvNextV2, DinoV2 (pretrained on similar datasets), and SwinV2.

Classification & Ensembling: We trained and compared a variety of classifiers on their cross-validation scores (since the dataset is rather small), namely Logistic Regression, SVM, Naive Bayes, KNN, XGBoost, LightGBM, Random Forest, and LDA, using different hyperparameters via GridSearch.

Augmentation: To improve generalization, we implemented data augmentation (rotations and shifts), as microscope orientation is often arbitrary.

Challenges we ran into

One of the main hurdles was finding the right balance between complex architectures and simple classifiers. While we tested heavy-hitters like DinoV2 and SwinV2, identifying which model extracted the most relevant features for this specific task required extensive validation. In the end, the general ConvNextV2 backbone outperformed the more domain-specific approaches.

Accomplishments that we're proud of

We achieved a high weighted f1-score of 75 on the dev set by identifying ConvNextV2 as our top-performing backbone. We are also proud of our comprehensive testing suite, which allowed us to compare dozens of model-classifier combinations systematically.

What we learned

We hit some limits of ensembling, as in our final stages, naively ensembling the neural networks of varying quality did not yield further improvements, teaching us that sometimes a single, well-optimized path is superior to a crowded "committee" of models which perform worse.

What's next for Cry for help

We definitely wanted to try some further methods and explorations such as clustering the available data or

Built With

Share this project:

Updates