SimpliSmart

Inspiration

Machine learning has a steep learning curve, and even those with experience spend a lot of time prototyping and executing their ideas. Being machine learning engineers, we realized building a custom model pipeline is a gruesome, redundant, and time taking task.

After talking to some senior data scientists, we realized how much of a blocker it is to deploy and test every single model every time they experiment with A/B testing, ensembling, or chaining of machine learning models. There is a necessity to eliminate these redundant tasks for the people working in the industry, so they can focus on what's more important. Even for experienced engineers/data scientists, it's not an easy task to build model pipelines and a near-impossible task for people without any experience in computer science. This leaves a huge void in the industry and academia where educators, students, researchers, entrepreneurs, and enterprises can come in with great ideas but can not, right now, because of the high barrier of entry. So we came up with the idea of building a platform that lets you build your machine learning recipes without writing a single line of code.

What it does

SimpliSmart is a universal platform that allows anyone to create their machine learning recipes. The platform lets you combine machine learning models into complex data pipelines within minutes, without code, to fulfil your machine learning needs.

SimpliSmart allows users to build and execute machine learning pipelines for use cases, including but not limited to ensemble modelling, model A/B testing, chaining machine learning models to get desired results. One example could be getting the summary and keywords from the photographs of a book. SimpliSmart creates a Directed Acyclic Graph of tasks in real-time according to the user specification that can be scheduled or manually triggered by the user. This is done by taking the user specifications, breaking them into node tasks, creating a dependency graph, and topologically sorting it to produce the final DAG.

How we built it

We used Angular to build the UI and Python/Django for the backend server. We leveraged Modzy's machine learning models and SDK to provide a framework for model pipeline creation.

We wrote adapters to expose a universal interface for all Modzy machine learning models in our platform. These models can then further be used for dynamic model pipeline generation. We built an easy-to-use UI that makes it intuitive for the end-users to build and visualize complex pipelines irrespective of their level of expertise in machine learning.

We used airflow to build, generate, and manage the model pipeline. The unique thing we do here is to build the DAGs (Directed Acyclic Graph) at runtime based on the inputs received by the server. Every task node has three abstract tasks that can be parameterized. First, the node pulls the required data from Airflow XCOM (Airflow cross-communication data store). Second, it processes the required operations/computations on the data. Finally pushes it back to Airflow XCOM. This makes the platform very robust, extensible, and adaptive for any pipeline specifications it may receive.

Challenges we ran into

Concretising the nebulous concept of dynamic model pipeline generation into scalable, extensible modules and designing the architecture for the same was an interesting challenge. We had to ideate on the theory of node interaction, dependency generation, and graph creation to make the entire system as robust and resilient as possible.
One of the major challenges that we ran into was to build an adaptive and extensible interface to incorporate all the machine learning models available on the Modzy platform and at the same time make it very easy to use and intuitive for the end-user. The solution we came up with was to fetch sample requests/responses for all the available models and represent the dependent parameters in a tree structure to make the chaining of models more intuitive.
Generating raw code dynamically in real-time according to the user specifications and pipeline requirements was another challenge that was a critical feature to implement. This alone took a significant amount of time to ideate and implement since this was one of the most sensitive parts of DAG generation and had to be highly robust and stable.
The lack of computation resources (single-core machine with 1GB memory) for such a computationally heavy platform proved to be a big challenge for us. We approached this challenge by optimizing configurations for the Airflow task scheduler and webserver. By managing its workers, parallel threads, scheduler heartbeat among other parameters, we improved the performance of the system. We also did some OS-level optimizations in the machine by utilizing disk space as swap memory to compensate for the low volatile-memory.
We came across this hackathon only a week ago and were eager to implement our idea in the best possible way. It left us with very little time to ideate, design and execute our idea.

Accomplishments that we're proud of

We are proud to have built:

A general-purpose platform that lets users easily create model pipelines within a few minutes to best suit their machine learning needs, that would take a good machine learning engineer a few days to build.
An easy to use platform that significantly reduces the barrier of entry for creating and using machine learning pipelines, especially for the users not having a strong background in machine learning.
An adaptive and extensible platform that already incorporates all the models available on Modzy and can easily consume any new models or their versions when need be.
A robust and resilient system that can consume any valid user specification, and generate a stable and optimized model pipeline.
A highly scalable containerized system that can be easily deployed and horizontally/vertically scaled.

What we learned

We learned:

How to use and leverage Modzy's platform and SDK to provide our users with a diverse range of best-in-class machine learning models.
The intricacies of the python interpreter, local context management, variable scoping to dynamically generate resilient code blocks in real-time.
Airflow in-depth - such as DAG management, task cross-communication and the intricacies of dynamically generating DAGs from a single file.
How to leverage traditional graph theory and algorithms for inter-dependency generation and resolution of the actionable tasks.
How to make the most out of a minimal computation unit by optimizing OS and web server configurations.

What's next for SimpliSmart

We plan to:

Extend the platform to support custom model training pipelines. This would allow our users to just as easily create their models using their data and leverage them in our inference pipeline generation system.
Add extensions for data lakes and make the code of the nodes editable. Introducing this feature will also require a more active validation of the pipelines created by the user.
Make the UI more intuitive and simpler. Rather than expecting the end-user to learn the platform, we plan to learn from the user's textual description of the pipeline they wish to create. It would require extracting the action items and generating a plan which can be consumed by the system to further generate model pipelines.