Scalr

Simple Architecture

Inspiration

When I was working at the ABS, we wanted to deploy a ML model, however we were not allowed to use the public cloud (AWS, GCP, etc) due to privacy concerns, additionally investing into infrastructure for a single use case, simply made zero business sense.

What it does

Our platform solves the above mentioned issues along with one other important issue, decreasing the barrier of entry to ML. We are capable of accepting custom machine learning models, however we are also capable of training a data from a single data upload.

This means ANYONE is capable of using and deploying ML models.

We have implemented EXCEL extensions so that the barrier to entry is even lower. Simple click and use. EXCEL Plugin

We are able to automatically infer the best model to use along with the preprocessing needed to be done.

How we built it

We use existing solutions for automatic training, largely because this is quite a theoretically and engineering heavy. We used ludwig for inference of the preprocessing that needs to be done

Overview

Scalr provides a working, easy and quickly deployable PaaS for on premises or cloud use.

Our architecture can be split into three logical components:

User Interface, providing an easy way to monitor and deploy machine learning workloads
Control Layer, enabling dynamic routing, load balancing and metrics
Execution Layer, processing incoming requests, training and applying machine learning models

It was important that Scalr kept developer complexity to a minimum, principle to this idea is reliability. In order to ensure that the machine learning workloads could be effectively accessed in spite of an arbitrary failure, we implemented a peer-to-peer gossip model for replicating machine learning models across nodes. This greatly increases uptime and increases throughput capacity.

Tech stack

User interface: React, Bootstrap
Controller API: Golang, Docker
Execution: Python, Sanic, Ludwig, Docker

Challenges we ran into

Creating reliable software is hard. Creating reliable distributed software that runs on many machines is very hard. From scaling our network of nodes from a single server to nearly a dozen for testing - we encountered plenty of headaches. Fortunately, through this trial of fire, we had a chance to smooth out the edges.

Accomplishments that we're proud of

Building something start to finish,

What we learned

A whole lot more about CORS than we wanted to know. How to create a bespoke distributed machine learning system. The value of shopping your idea to as many mentors as you can, and the non-technical considerations that should be front of mind.

What's next for Scalr

Scale! Build out monitoring, reporting and metrics so that Scalr becomes the solution that ticks all the boxes. Expand on our UI for non-technical users and double down on our focus on user experience, with established SLAs for response and deployed uptime.

Built With

Submitted to

UNIHACK Melbourne 2021
- Winner FIRST PLACE

Created by

I worked on the compute nodes. Training of new ML models from data alone, evaluation of pre trained models and loading custom, trained ML models

Isitha Subasinghe
I like rust
I used this opportunity to learn new concepts about Machine Learning and some React skills.

Amrin Anan
I worked on the backend & devops

Austen McClernon
cockroach labs
I worked on the front-end.

James B
KW Theng