ensemble_factory

ensemble graph for mnist, SOTA accuracy!

Ensemble learning is the practice of taking multiple weak learners, and combining them to create one strong learner. Ensemble techniques are known dominate the world of data science in accuracy, with techniques such as Random Forests taking top spots in challenges, and most winning solutions involving ensembling a couple of strong learners. However, current ensembles must be designed by hand, and machine learning libraries do not support building complex heterogeneous ensembles.

To fix this, we built ensemble_factory, which allows programmers to create complex heterogeneous graphs of models combined with ensembling methods, and given a dataset, can perform AutoML to autonomously search for optimal ensemble topology. We support common ML methods such as decision trees, regressions, SVMs, etc, and a wide variety of powerful ensembling methods for boosting, stacking, and bagging. Our library also has powerful tooling for inspecting ensemble graphs, allowing data scientists to make precise bias-variance tradeoffs using the ensembling toolkit.

Astonishingly, our AutoML is able to find ensemble topologies for big datasets like MNIST that consistently outperform ensemble benchmarks such as Random Forests in test. Attached is a visualization of one such topology produced by our tool. In the future, we'd like to perform more experiments on more datasets, write a paper, and get ensemble_factory into the hands of real data scientists.