AutoML, or automated machine learning, is one of the fastest-growing subfields of machine learning. AutoML is the process of automating the time-consuming, iterative tasks of machine learning model development. It allows data scientists, data analysts, and developers to build ML models with high scale, efficiency, and productivity without sacrificing data quality.
A traditional machine learning model development is a resource-intensive process and requires significant domain knowledge. It also requires time to produce and compare dozens of models. With AutoML, it is possible to accelerate the time taken to get ML models production-ready. It also helps automate the most boring and time-consuming parts of designing, training, and deploying an ML pipeline.
AutoML has not only simplified the process of building ML models faster, but has also helped non-experts to make use of ML models and techniques without requiring them to become domain experts. If you are a data scientist, an analyst, or a developer looking to get started with AutoML then these 10 open source AutoML will serve as a great option.
Auto-Sklearn
Auto-Sklearn is one of the most popular AutoML systems available right now since it provides out-of-the-box supervised machine learning. Since it is built around the scikit-learn machine learning library, auto-sklearn automatically searches for the right learning algorithm for a new machine learning dataset and optimises its hyperparameters.
It is written in Python and is a drop-in replacement for scikit-learn classifiers. As an open-source tool, all the development is done on Github. Alejandro Piad Morffis notes on Twitter that auto-sklearn gives its users a black-box AutoML wrapper that abstracts away most of scikit-learn’s estimators.
Auto-WEKA
Auto-WEKA is another well-known AutoML framework based on WEKA, a popular and widely adopted machine learning framework. The project is no longer in active development like Auto-Sklearn but it is still widely used by the machine learning community.
This use is mainly because of Auto-WEKA’s ability to consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters. It uses a fully automated approach and leverages developments in Bayesian optimisation.
AutoKeras
AutoKeras is an open source software library developed by DATA Lab at Texas A&M University and community contributors. It provides functions to automatically search for architecture and hyperparameters of deep learning models.
It also focuses on neural architecture search using Keras and gives users a high-level API inspired by scikit-learn, where you only need to select a task-specific model. The tool also offers a low-level API to completely customise the neural search space. This tool is ideal for those building deep learning models.
AutoGluon
AutoGluon is based on Amazon’s GLUON framework and was released in January 2020. The ML application developers can extend AutoML with an emphasis on deep learning and real-world applications using this easy-to-use, open source toolkit. AutoGluon is often used for real-world applications involving text, image, or tabular data.
With AutoGluon, developers can build prototype deep learning solutions in reduced time and improve data pipelines and models. It also enables automatic hyperparameter tuning. From object detection, image classification, to supervised learning and text classification with tabular datasets, AutoGluon creates models for a number of use cases.
TPOT
TPOT, or Tree-based Pipeline Optimization Tool, is another AutoML framework based on scikit-learn and it is interesting because of its tree-based approach to pipeline optimisation. This design allows the AutoML tool to be very flexible for preprocessing and feature extraction and selection.
RECIPE
RECIPE, or REsilient ClassifIcation Pipeline Evolution, is another interesting AutoML tool that is also built on top of scikit-learn. It is unique because of its ability to overcome previous evolutionary-based frameworks, such as generating invalid individuals, and organises a high number of possible suitable data pre-processing and classification methods into a grammar. RECIPE uses genetic programming to evolve pipelines defined by context-free grammar, which allows for a new level of flexibility.
Hyperopt-sklearn
HyperOpt-Sklearn is a wrapper for HyperOpt, an open source library for large scale AutoML. It supports AutoML with HyperOpt for the scikit-learn machine learning library, including the suite of data preparation transforms and classification and regression algorithms. Hyperopt-sklearn inherits a flexible search space definition and gives users a drop-in replacement for a standard scikit-learn estimator.
TransmogrifAI
TransmogrifAI is an end-to-end AutoML toolkit used for structured data written in Scala and runs on Apache Spark. The toolkit was introduced to accelerate machine learning developer’s productivity through AutoML and uses an API to enforce “compile-time type-modularity and safety.”
It is used in five areas of an ML workflow. It is used to extract features from a given dataset or convert features into numeric values. TransmogrifAI also helps with reducing dimensions and identifying potential bias. Lastly, it conducts searches across thousands of potential models and supports hyperparameter configuration tuning.
H2O AutoML
H2O is an open source, distributed in-memory machine learning platform with linear scalability built on top of the powerful H2O machine learning framework. It combines some of the most useful ML models in practice, including “GLMs, XGBoost, Random Forests, NNs, and several types of ensembles.”
There is also Driverless AI, another product from H2O designed specifically for automatic machine learning. It fully automates some of the most challenging and productive tasks in applied data science. Driverless AI can be used for feature engineering, model tuning, model ensembling, and model deployment.
AutoGOAL
AutoGOAL is a Python library for automatically finding the best way to solve a given task. Designed mainly for AutoML, it is being used in a number of scenarios where the developer has several possible ways to solve a given task. It comes pre-packaged with a number of low-level machine learning algorithms that can be automatically assembled into pipelines for solving different problems.
It also acts as a framework for program synthesis, which means finding the best program to solve a given problem. This works only if the user is able to describe the space of all possible programs. It is another AutoML toolkit to be liked by ML programmers since it offers flexibility not usually found with such tools.