Model blueprint is our core technology
At DataRobot, we pride ourselves on being the best automated machine learning (AML) platform on the market. Our platform runs many state-of-the-art open source algorithms in parallel and deploys the best models in real time. To achieve full automation, DataRobot searches through millions of combinations of modeling approaches and selects the top models to run. Each modeling approach is what we call a model blueprint: a unique sequence of data processing, feature engineering, algorithm training, algorithm tuning, and more.
Data processing and feature engineering are often overlooked, even though they are essential to building a great model and are much more complicated to master.
Traditionally when people think about modeling, they think about algorithm training, like XGBoost, RandomForest, GLM, etc. However, it takes more than just training an algorithm to build a great model. In fact, algorithm training is only a small piece of the puzzle. Data processing and feature engineering are often overlooked, even though they are essential to building a great model and are much more complicated to master.
With the help of our Kaggle-top-ranked data scientists, DataRobot built a comprehensive, best-in-class machine learning framework to help anyone develop and deploy great models regardless of data science skill level. The diagram below is an example of a model blueprint generated automatically by DataRobot for training a Regularized Logistic Regression algorithm:
The model blueprint clearly outlines the process DataRobot went through, starting with the uploaded data set and all the way to finishing the Regularized Logistic Regression model. Users can get a thorough understanding of what feature processing and engineering took place under the hood, along with training the Regularized Logistic Regression algorithm at the end.
Model blueprint is the key to making models transparent
DataRobot built a successful business around integrating community-driven, open source technologies and machine learning algorithms. These algorithms are extremely powerful and help reveal insights from massive amount of data, but some of them suffer from interpretability challenges due to their complex nature – what are often referred to as “black box” models. As a result of their opaqueness, they are one of the biggest roadblocks to machine learning adoption. Both business executives and regulators hesitate to allow black box models in production.
Model blueprint is a key step to addressing this problem. Each model blueprint is a sequence of building blocks that helps answer a number of important questions, including:
- How does it process data?
- What features did it engineer?
- What algorithms did it train?
For every building block, DataRobot shares full documentation that contains:
- A detailed description of the block
- Default parameter settings and choices
- External links and references to original sources
This complete set of documentation not only helps new users learn from the automation process, but also helps experienced users justify the models DataRobot builds.
As the creator of automated machine learning, DataRobot values model transparency. That’s why model blueprint is so important. It ensures each and every step of the model building process is being valued and addressed, which eliminates the confusion and mystery of the black box effect. This attention to detail and commitment to transparency is unique to DataRobot – just like our automated machine learning technology and expertise.
About the Author
Cliff Yang is a Customer Facing Data Scientist at DataRobot. He has over 10 years of experience working in a number of verticals including Insurance, Banking, Healthcare and Technology. His roles range from data scientist, tech sales, product development, and program management.