Inspiration
Active learning is based on the notion that not every data point is equally valuable when training a model. In large datasets with many unlabeled data points, conventionally labelling them all can be expensive and extremely time-consuming, especially since labelling requires knowledgeable annotators specific to each use case.
For instance, in image classification, assuming a cost of USD $0.10 per image, labelling 100,000 images could amount to around USD $10,000. Depending on the requirements and use cases of the company, these costs may fluctuate. We address this pain point by focusing on the most informative samples can save both time and money.
Our inspiration to develop Labella was driven by the desire to enhance the efficiency of data labelling processes. We aimed to:
- Reduce Costs Minimise labelling expenses by selecting only the most valuable data points for annotation. Avoid unnecessary labelling of redundant or less informative samples.
- Save Time Speed up the labelling process by concentrating on the most uncertain or diverse data points. Enable quicker model improvements by iteratively training on the most informative samples.
- Optimise Resource Allocation Efficiently utilise expert annotators by directing them to label the most critical data points and reduce the workload on annotators by decreasing the volume of data needing annotation.
Labella aims to provide a solution that leverages active learning to streamline the data labelling process, making it more cost-effective, time-efficient, and ultimately leading to better-performing models.
What it does
This platform enables users to
- Create various machine learning projects, including image classification (single and multi-label) and sentiment analysis.
- Upload their custom dataset or choose from existing ones in each project.
- Label data with a user-friendly interface
- Prioritise more informative data The active learning model scores and identifies the most informative images that need to be labelled by the user. Users then label these images and re-train the model with the newly labelled images, enhancing the accuracy and other aspects of the model.
How we built it
Our team focused on two main aspects: developing a scalable, user-friendly platform and creating an active learning cycle for image classification (single and multi-label) and sentiment analysis. We integrated these into a comprehensive pipeline that involves labelling, training, and re-labelling until the user is satisfied with their model’s performance.
Challenges we ran into
- Implement active learning How to use an active learning cycle for different kinds of projects.
- Choosing the most suitable active learning algorithms that balance accuracy and time We conducted research and experimented with various algorithms (such as DeBERTa-v3 model, Resnet50) to identify those that best suit the different kinds of machine learning projects.
- Ensuring the platform can handle large-scale datasets We leveraged cloud computing resources (AWS S3 buckets for datasets).
- Designing an intuitive user flow when integrating the GUI with the active learning process We experimented with different user flows to refine user flows.
Accomplishments that we're proud of
1. Supports various machine learning
Our solution can also support various use cases, such as sentiment analytics and image classification tasks.
2. Ability of handle large datasets
We ensured that Labella can handle large datasets efficiently, maintaining performance and responsiveness.
3. User-friendly interface
We have developed an intuitive and user-friendly interface for Labella, making the labelling process seamless and efficient for users e.g. Descriptions for different types of models were included
What we learned
Our key takeaways when developing Labella:
- Customisation options are key to meeting the different requirements of different users.
- Developing a platform that supports various data types and machine learning tasks required flexible and scalable solutions.
- Each use case, such as image classification and sentiment analysis, presented unique challenges and required tailored approaches. As datasets grow, ensuring the platform remains fast and responsive is a significant challenge.
What's next for Labella
With further development, we envision several enhancements and expansions for Labella.
- Pretrained model selection
- Provide users with a wider range of pretrained models to choose from for various tasks
- Allow more customisation of pretrained models to fit specific user needs and datasets
- Broader range of use cases
- Expand active learning capabilities to object detection tasks, semantic and instance segmentation tasks
- Provide detailed analytics and insights on the labelling progress
- Facilitate collaborative labelling efforts by supporting multiple annotators working on the same project with seamless integration and version control.
- Allow users to view the status and estimated time to completion of training jobs
Built With
- amazon-web-services
- flask
- modal
- nextjs
- pytorch
- react
- s3
- scikit-learn
- swagger
Log in or sign up for Devpost to join the conversation.