MLFlow + DataRobot API for Tracking Experimentation

AI Experimentation Horizontal MLflow Papermill

As illustrated below, you will use the orchestration notebook to design and run the experiment notebook, with permutations of parameters handled automatically. At the end of the experiments, copies of the experiment notebook will be available, with the outputs for each permutation for collaboration and reference.

Experimentation is a mandatory activity in any machine learning developer’s day-to-day activities. For time series projects, the number of parameters and settings to tune for achieving the best model is in itself a vast search space. 

About this Accelerator

Many of the experiments in time series use cases are common and repeatable. Tracking these experiments and logging results is a task that needs streamlining. Manual errors and time limitations may lead to selection of suboptimal models leaving better models lost in global minima. 

The integration of DataRobot API, Papermill, and MLFlow automates machine learning experimentation so that is becomes easier, robust, and easy to share.


Run the mlflow ui command in the same directory to get the dashboard.

1. Use MLFlow with the DataRobot API for Experimentation and Logging

This notebook provides a framework that showcases the integration of MLFlow and Papermill to track machine learning experiments with DataRobot.

This framework outlines how to:

  • Use MLFlow with the DataRobot API to track and log ML experiments
    • Benefit: Consistent comparison of results across experiments
  • Use Papermill with the DataRobot API to create artifacts from machine learning experiments to reduce effort needed for collaboration
    • Benefit: Automation of experiments to avoid errors and reduce manual effort
  • Execute jupyter notebooks with parameters like Python scripts
  • Loop through parameter combinations to run multiple projects; build a Model Factory.

This notebook is the experimentation notebook for running individual time series experiments. Papermill is used to receive parameters from the main notebook (orchestration_notebook.ipynb) and run a copy of this notebook for each combination of the parameters.

The experiment notebook doesn’t require any updates as the parameters are passed from the main notebook. However, this notebook will be updated for different modeling approach like AutoML, Unsupervised learning, etc.


Bind inputs

FDW = 35
KIA = False
UUID = str("bcf6c090-1899-11ed-a7a1-f018981f05a4")
ACC_OPT = False
SRCH_INT = False
MODE = "quick"
TRAINING_DATA = "./DR_Demo_Sales_Multiseries_training (1).xlsx"
DATE_COL = "Date"
TRAINING_STOP_DATE = "01-06-2014"
DR_AUTH_YAML_FILE = "~/.config/datarobot/drconfig.yaml"
TARGET_COL = "Sales"
KIA_COLS = ["Marketing", "Near_Xmas", "Near_BlackFriday", "Holiday", "DestinationEvent"]

Import libraries

import matplotlib.pyplot as plt
import mlflow
import numpy as np
import pandas as pd
from permetrics.regression import (  # permetrics library for simplifying metric calculation

Connect to DataRobot

Read more about different options for connecting to DataRobot from the client.

# Authenticate in to your DataRobot instance
import datarobot as dr
import yaml

cred_file = open(DR_AUTH_YAML_FILE, "r")
credentials = yaml.safe_load(cred_file)

DATAROBOT_API_TOKEN = credentials["token"]
DATAROBOT_ENDPOINT = credentials["endpoint"]

client = dr.Client(
    user_agent_suffix="AIA-AE-MLF-1",  # Optional but helps DataRobot improve this workflow

dr.client._global_client = client
Import training data

df = pd.DataFrame()
if TRAINING_DATA.find(".csv") != -1:
    df = pd.read_csv(TRAINING_DATA, parse_dates=[DATE_COL])
elif TRAINING_DATA.find(".xls") != -1:
    df = pd.read_excel(TRAINING_DATA, parse_dates=[DATE_COL])
    df = pd.DataFrame()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7140 entries, 0 to 7139
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Store             7140 non-null   object        
 1   Date              7140 non-null   datetime64[ns]
 2   Sales             7140 non-null   int64         
 3   Store_Size        7140 non-null   int64         
 4   Num_Employees     7140 non-null   int64         
 5   Returns_Pct       7140 non-null   float64       
 6   Num_Customers     7140 non-null   int64         
 7   Pct_On_Sale       7130 non-null   float64       
 8   Marketing         7140 non-null   object        
 9   Near_Xmas         7140 non-null   int64         
 10  Near_BlackFriday  7140 non-null   int64         
 11  Holiday           7140 non-null   object        
 12  DestinationEvent  7140 non-null   object        
 13  Pct_Promotional   7140 non-null   float64       
 14  Econ_ChangeGDP    80 non-null     float64       
 15  EconJobsChange    1020 non-null   float64       
 16  AnnualizedCPI     240 non-null    float64       
dtypes: datetime64[ns](1), float64(6), int64(6), object(4)
memory usage: 948.4+ KB

Private holdout

Set a cutoff date for private holdout. This is necessary to enable the same holdout for all experiments irrespective of feature derivation windows and forecast windows.

training_stop_date = pd.to_datetime(
df_train = df[df[DATE_COL] < training_stop_date]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7000 entries, 0 to 7125
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Store             7000 non-null   object        
 1   Date              7000 non-null   datetime64[ns]
 2   Sales             7000 non-null   int64         
 3   Store_Size        7000 non-null   int64         
 4   Num_Employees     7000 non-null   int64         
 5   Returns_Pct       7000 non-null   float64       
 6   Num_Customers     7000 non-null   int64         
 7   Pct_On_Sale       6990 non-null   float64       
 8   Marketing         7000 non-null   object        
 9   Near_Xmas         7000 non-null   int64         
 10  Near_BlackFriday  7000 non-null   int64         
 11  Holiday           7000 non-null   object        
 12  DestinationEvent  7000 non-null   object        
 13  Pct_Promotional   7000 non-null   float64       
 14  Econ_ChangeGDP    80 non-null     float64       
 15  EconJobsChange    1000 non-null   float64       
 16  AnnualizedCPI     230 non-null    float64       
dtypes: datetime64[ns](1), float64(6), int64(6), object(4)
memory usage: 984.4+ KB


Create a DataRobot project

# Upload data and create a new DataRobot project
project = dr.Project.create(df_train, project_name="Repex_" + UUID)
Out [8]:


Configure project settings

Set up time series settings for the newly created project.

known_in_advance = KIA_COLS
feature_settings = [
    dr.FeatureSettings(feat_name, known_in_advance=True)
    for feat_name in known_in_advance

time_partition = dr.DatetimePartitioningSpecification(
    feature_derivation_window_start=-1 * FDW,

if KIA:
    time_partition.feature_settings = feature_settings

    time_partition.multiseries_id_columns = MULTISERIES_COLS

advanced_options = dr.AdvancedOptions(
    accuracy_optimized_mb=ACC_OPT, autopilot_with_feature_discovery=SRCH_INT

Initiate Autopilot

After creating settings objects, Autopilot is started using the analyze_and_model function.

After Autopilot completes, get the recommended model from DataRobot

recommendation = dr.ModelRecommendation.get(
recommended_model = recommendation.get_model()

DatetimeModel('eXtreme Gradient Boosted Trees Regressor with Early Stopping (learning rate =0.3)')

Performance validation

Create the private holdout from original dataset and get predictions from DataRobot recommended model. Once predictions are available, the predictions are compared to actuals using regression metrics.

dataset = project.upload_dataset(df, forecast_point=training_stop_date)
pred_job = recommended_model.request_predictions(
preds = pred_job.get_result_when_complete()
preds["timestamp"] = pd.to_datetime(preds["timestamp"], utc=True)
df[DATE_COL] = pd.to_datetime(df[DATE_COL], utc=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   row_id             130 non-null    int64              
 1   prediction         130 non-null    float64            
 2   forecast_distance  130 non-null    int64              
 3   forecast_point     130 non-null    object             
 4   timestamp          130 non-null    datetime64[ns, UTC]
 5   series_id          130 non-null    object             
dtypes: datetime64[ns, UTC](1), float64(1), int64(2), object(2)
memory usage: 6.2+ KB
    df_comparison = df[MULTISERIES_COLS + [DATE_COL, TARGET_COL]].merge(
        preds[["prediction", "timestamp", "series_id"]],
        left_on=MULTISERIES_COLS + [DATE_COL],
        right_on=["series_id", "timestamp"],
    df_comparison = df[[DATE_COL, TARGET_COL]].merge(
        preds[["prediction", "timestamp"]], left_on=[DATE_COL], right_on=["timestamp"]
assert df_comparison.shape[0] == preds.shape[0]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 130 entries, 0 to 129
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype              
---  ------      --------------  -----              
 0   Store       130 non-null    object             
 1   Date        130 non-null    datetime64[ns, UTC]
 2   Sales       130 non-null    int64              
 3   prediction  130 non-null    float64            
 4   timestamp   130 non-null    datetime64[ns, UTC]
 5   series_id   130 non-null    object             
dtypes: datetime64[ns, UTC](2), float64(1), int64(1), object(2)
memory usage: 7.1+ KB
Plotting actuals vs predicted for visual verification
    df_viz = df_comparison[
        df_comparison["series_id"] == df_comparison.series_id.unique()[0]
    plt.plot(df_viz["timestamp"], df_viz["prediction"], label="Prediction", color="red")
        df_viz["timestamp"], df_viz["Sales"], label="Actuals", color="blue", alpha=0.5
Out [24]:

(array([16224., 16226., 16228., 16230., 16232., 16234.]),
 [Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, '')])
download 11
# Validate experiment performance
evaluator = RegressionMetric(
    df_comparison[TARGET_COL].values, df_comparison["prediction"].values

Tracking and logging experiments

Log experiment metrics and parameters for display and comparison on the MLFlow UI.

with mlflow.start_run():
    mlflow.log_param("Project URL", project.get_uri())  # URL for DataRobot Project
        "Notebook Location", REFERENCE_NOTEBOOK
    )  # location of final notebook for reference
    mlflow.log_param("Feature Derivation Window", FDW)  # feature derivation used
        "Enabled Known In Advance features", KIA
    )  # known in advance setting
        "Ran Accuracy Optimized BPs", ACC_OPT
    )  # accuracy optimized setting
        "Enabled Search Interactions option", SRCH_INT
    )  # search for interactions setting
    mlflow.log_param("Autopilot Mode", MODE)  # autopilot mode

    mlflow.log_artifact(REFERENCE_NOTEBOOK)  # location of final notebook for reference

    # logging model performance metrics
    mlflow.log_metric("MASE", evaluator.MASE())
    mlflow.log_metric("MAPE", evaluator.MAPE())
    mlflow.log_metric("RMSE", evaluator.RMSE())
    mlflow.log_metric("MAE", evaluator.MAE())
    mlflow.log_metric("R2", evaluator.R2())
    mlflow.log_metric("Support", preds.shape[0])

2. Integrate MLFlow and Papermill to Track ML Experiments with DataRobot

This notebook outlines how to:

  • Use MLFlow with DataRobot API to track and log machine learning experiments
    • Benefit: Consistent comparison of results across experiments
  • Use Papermill with DataRobot API to create artifacts from machine learning experiments to reduce effort needed for collaboration
    • Benefit: Automation of experiments to avoid errors and reduce manual effort.
  • Execute Jupyter notebooks with parameters like Python scripts
  • Loop through parameter combinations to run multiple projects; build a Model Factory.

This orchestration notebook illustrates the framework to integrate MLFlow and Papermill with the DataRobot API to run the experiment notebook with different parameters per experiment.

This notebook will run the experiment_notebook.ipynb with different parameters

Required Python Libraries:


Import libraries

uuid is used to generate unique identifiers for our experimentation. itertools is used to generate permutations of all experiments.

import itertools
import os
import uuid

import papermill as pm

Use the snippet below to create requisite folders.

if not os.path.isdir("./experiments_bkup"):

Configure use case settings

These are the basic settings needed to run Time Series projects through the DataRobot API. These settings have to be updated for the intended use case.

    "~/.config/datarobot/drconfig.yaml"  # yaml file with authentication details
    "./DR_Demo_Sales_Multiseries_training (1).xlsx"  # location of training dataset
DATE_COL = "Date"  # datetime column
TRAINING_STOP_DATE = "01-06-2014"  # cutoff date for private holdout for experiments
    "%d-%m-%Y"  # datetime format specifier for TRAINING_STOP_DATE
TARGET_COL = "Sales"  # target column for the usecase
]  # known in advance features
IS_MULTISERIES = True  # does the dataset have multiple time series
]  # if the dataset has multiple ts, columns that uniquely identify a ts.


There are many experiments that need to be tried in Time Series projects. The most basic ones include experimenting with multiple forecast derivation windows and enabling known in advance features. Only these two parameters can result in atleast six different experiments as shown by the example in the cell below;

First experiment series set

This example starts with basic set of experiments to identify quickly if the dataset has any signal. You will use a combination of feature derivation windows and known in advance features to do so.

fdws = [
]  # The Time Series feature derivation window parameter values to experiment
kias = [False, True]  # The known in advance parameter values to experiment with

Run multiple projects for all permutations of the values from the above two parameter sets. This can be seen as a “DataRobot Project Factory” where you will run multiple projects using Papermill. Papermill allows us to send parameters to a Jupyter notebook and execute if for those parameters. It will also create copies of the notebook execute in a specified folder.

INPUT_PATH = "./experiment_notebook.ipynb"
for item in itertools.product(fdws, kias):
    UUID = str(uuid.uuid1())
    OUTPUT_PATH = "./experiments_bkup/experiment_{}.ipynb".format(UUID)
            "FDW": item[0],
            "KIA": item[1],
            "UUID": UUID,
            "DATE_COL": DATE_COL,
            "TARGET_COL": TARGET_COL,
            "KIA_COLS": KIA_COLS,
Experiment results

After completion of the above set of experiments, MLFlow dashboard can be invoked for perusal of the results. Run the below cell or the contents of the cell in command line to run the MLFlow server and UI.

# Ensure to stop the execution of this cell before running next cells
!mlflow ui
Further experimentations

Once comfortable with the initial set of experiments and results, you can further expand the experiment combinations as below. The advantage of parameterization of the notebook is that you can run only the experiments that are needed and you can keep building on the experiments you already ran.

For example, you can run accuracy optimized blueprints set as “is false” by default if you have run that experiment in the prior cells. Time and Compute can be saved by only using the True option for the parameter in subsequent experiments.

# Import datarobot library for the enums
import datarobot as dr
fdws = [35, 14]  # TS feature derivation window parameter values to experiment
kias = [False]  # Known in advance parameter values to experiment
acc_opt = [True]  # Enable accuracy optimized blueprints
search_int = [True]  # Search for interactions between features
mode = [dr.enums.AUTOPILOT_MODE.FULL_AUTO]  # Autopilot mode values to experiment
INPUT_PATH = "./experiment_notebook.ipynb"
for item in itertools.product(*[fdws, kias, acc_opt, search_int, mode]):
    UUID = str(uuid.uuid1())
    OUTPUT_PATH = "./experiments_bkup/experiment_{}.ipynb".format(UUID)
            "FDW": item[0],
            "KIA": item[1],
            "ACC_OPT": item[2],
            "UUID": UUID,
            "DATE_COL": DATE_COL,
            "TARGET_COL": TARGET_COL,
            "KIA_COLS": KIA_COLS,
!mlflow ui
