DataRobot R&D Archives | DataRobot AI Platform https://www.datarobot.com/blog/category/datarobot-rd/ Deliver Value from AI Thu, 21 Dec 2023 16:37:21 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 2023: A Year of Innovation and Impact https://www.datarobot.com/blog/2023-a-year-of-innovation-and-impact/ Thu, 21 Dec 2023 16:37:18 +0000 https://www.datarobot.com/?post_type=blog&p=52683 Explore the 2023 highlights: discover the DataRobot advances in Generative AI and the expansion of our global partner ecosystem.

The post 2023: A Year of Innovation and Impact appeared first on DataRobot AI Platform.

]]>
2023 brings to a close a remarkable year.

Generative AI made AI a household name and ushered in one of the most exciting and fast-paced periods of technological advancement in recent history.

And we’ve been busy at DataRobot bringing to market the newest cutting-edge innovations with generative AI to accelerate impact for our customers around the world. 

This year, we “capped a comeback” and delivered three value-packed launches to significantly advance the DataRobot AI Platform. We unveiled new enterprise-grade generative AI functionality to close the confidence gap and accelerate adoption, including generative AI application cost and performance monitoring, a unified observability console and registry for governance, multi-provider comparison playground and other enhancements to deliver greater transparency and governance.

We also strengthened our partner ecosystem to provide maximum optionality and flexibility for our customers, announcing new capabilities and solutions with SAP, Microsoft, Google Cloud, and AWS. 

Finally, we celebrated industry recognition for our market-leading platform, including most recently being named a Leader in the inaugural IDC MarketScape: Worldwide AI Governance Platforms 2023 Vendor Assessment. Of the 10 AI/ML platform vendors evaluated, DataRobot was one of just four leaders, with the report citing strengths including extensive experience in regulated industries and continuous innovation.

A Leader like DataRobot sets the bar for AI at a time when both expertise in governance and rapid innovation is paramount – particularly for organizations with the most stringent regulatory requirements like banking and healthcare. DataRobot understands the unique challenges and requirements businesses face today, with an AI platform designed to avoid silos and vendor lock-in, and the tools that help customers accelerate AI deployments and ROI.
1665500590893
Ritu Jyoti

Group Vice President, Worldwide Artificial Intelligence and Automation Research Practice, IDC

Why DataRobot?

DataRobot is the fastest path to AI value, empowering organizations to accelerate AI from idea to impact. 

With over a decade at the forefront of AI innovation, we know what it takes to make a real difference – to your bottom line, to your business vision, and to the world around us. 

Organizations across industries and geographies like CVS Health, Freddie Mac, Aflac, Warner Bros., BMW, and the U.S. Army trust DataRobot to help solve their biggest challenges with AI, leveraging generative and predictive capabilities today while providing the flexibility to adapt to the innovations of tomorrow. 

We’re incredibly proud of the work we do to help organizations across industries to deliver real-world value from AI solutions. 

We use DataRobot because it just works. We can rapidly build new AI use cases and have the optionality we need, especially with generative AI, that otherwise would not be possible.
F DeLetter Headshot
Frederique De Letter

Senior Director Business Insights & Analytics, Keller Williams

 

What’s next for DataRobot in 2024?

Next year, we are laser-focused on continuing to push the boundaries of innovation, listening to our customers and building what is needed most to deliver value for your organizations. 

Thank you to our customers and partners for the privilege of working together and delivering more and more impact with AI, every day.

Thank you to Robots around the world for always pushing to be better than yesterday. Here’s to 2024!

Demo
See the DataRobot AI Platform in Action
Book a demo

The post 2023: A Year of Innovation and Impact appeared first on DataRobot AI Platform.

]]>
Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool https://www.datarobot.com/blog/deep-dive-into-jitr-the-pdf-ingesting-and-querying-generative-ai-tool/ Thu, 07 Dec 2023 14:00:00 +0000 https://www.datarobot.com/?post_type=blog&p=52473 Learn how to utilize LLMs to answer user questions based on ingested PDFs at runtime. Accelerate generative AI innovation and real-world value using DataRobot’s GenAI Accelerators.

The post Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool appeared first on DataRobot AI Platform.

]]>
Motivation

Accessing, understanding, and retrieving information from documents are central to countless processes across various industries. Whether working in finance, healthcare, at a mom and pop carpet store, or as a student in a University, there are situations where you see a big document that you need to read through to answer questions. Enter JITR, a game-changing tool that ingests PDF files and leverages LLMs (Language Language Models) to answer user queries about the content. Let’s explore the magic behind JITR.

What Is JITR?

JITR, which stands for Just In Time Retrieval, is one of the newest tools in DataRobot’s GenAI Accelerator suite designed to process PDF documents, extract their content, and deliver accurate answers to user questions and queries. Imagine having a personal assistant that can read and understand any PDF document and then provide answers to your questions about it instantly. That’s JITR for you.

How Does JITR Work?

Ingesting PDFs: The initial stage involves ingesting a PDF into the JITR system. Here, the tool converts the static content of the PDF into a digital format ingestible by the embedding model. The embedding model converts each sentence in the PDF file into a vector. This process creates a vector database of the input PDF file.

Applying your LLM: Once the content is ingested, the tool calls the LLM. LLMs are state-of-the-art AI models trained on vast amounts of text data. They excel at understanding context, discerning meaning, and generating human-like text. JITR employs these models to understand and index the content of the PDF.

Interactive Querying: Users can then pose questions about the PDF’s content. The LLM fetches the relevant information and presents the answers in a concise and coherent manner.

Benefits of Using JITR

Every organization produces a variety of documents that are generated in one department and consumed by another. Often, retrieval of information for employees and teams can be time consuming. Utilization of JITR improves employee efficiency by reducing the review time of lengthy PDFs and providing instant and accurate answers to their questions. In addition, JITR can handle any type of PDF content which enables organizations to embed and utilize it in different workflows without concern for the input document. 

Many organizations may not have resources and expertise in software development to develop tools that utilize LLMs in their workflow. JITR enables teams and departments that are not fluent in Python to convert a PDF file into a vector database as context for an LLM. By simply having an endpoint to send PDF files to, JITR can be integrated into any web application such as Slack (or other messaging tools), or external portals for customers. No knowledge of LLMs, Natural Language Processing (NLP), or vector databases is required.

Real-World Applications

Given its versatility, JITR can be integrated into almost any workflow. Below are some of the applications.

Business Report: Professionals can swiftly get insights from lengthy reports, contracts, and whitepapers. Similarly, this tool can be integrated into internal processes, enabling employees and teams to interact with internal documents.  

Customer Service: From understanding technical manuals to diving deep into tutorials, JITR can enable customers to interact with manuals and documents related to the products and tools. This can increase customer satisfaction and reduce the number of support tickets and escalations. 

Research and Development: R&D teams can quickly extract relevant and digestible information from complex research papers to implement the State-of-the-art technology in the product or internal processes.

Alignment with Guidelines: Many organizations have guidelines that should be followed by employees and teams. JITR enables employees to retrieve relevant information from the guidelines efficiently. 

Legal: JITR can ingest legal documents and contracts and answer questions based on the information provided in the input documents.

How to Build the JITR Bot with DataRobot

The workflow for building a JITR Bot is similar to the workflow for deploying any LLM pipeline using DataRobot. The two main differences are:

  1. Your vector database is defined at runtime
  2. You need logic to handle an encoded PDF

For the latter we can define a simple function that takes an encoding and writes it back to a temporary PDF file within our deployment.

```python

def base_64_to_file(b64_string, filename: str='temp.PDF', directory_path: str = "./storage/data") -> str:     

    """Decode a base64 string into a PDF file"""

    import os

    if not os.path.exists(directory_path):

        os.makedirs(directory_path)

    file_path = os.path.join(directory_path, filename)

    with open(file_path, "wb") as f:

        f.write(codecs.decode(b64_string, "base64"))   

    return file_path

```

With this helper function defined we can go through and make our hooks. Hooks are just a fancy phrase for functions with a specific name. In our case, we just need to define a hook called `load_model` and another hook called `score_unstructured`.  In `load_model`, we’ll set the embedding model we want to use to find the most relevant chunks of text as well as the LLM we’ll ping with our context aware prompt.

```python

def load_model(input_dir):

    """Custom model hook for loading our knowledge base."""

    import os

    import datarobot_drum as drum

    from langchain.chat_models import AzureChatOpenAI

    from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

    try:

        # Pull credentials from deployment

        key = drum.RuntimeParameters.get("OPENAI_API_KEY")["apiToken"]

    except ValueError:

        # Pull credentials from environment (when running locally)

        key = os.environ.get('OPENAI_API_KEY', '')

    embedding_function = SentenceTransformerEmbeddings(

        model_name="all-MiniLM-L6-v2",

        cache_folder=os.path.join(input_dir, 'storage/deploy/sentencetransformers')

    )

    llm = AzureChatOpenAI(

        deployment_name=OPENAI_DEPLOYMENT_NAME,

        openai_api_type=OPENAI_API_TYPE,

        openai_api_base=OPENAI_API_BASE,

        openai_api_version=OPENAI_API_VERSION,

        openai_api_key=OPENAI_API_KEY,

        openai_organization=OPENAI_ORGANIZATION,

        model_name=OPENAI_DEPLOYMENT_NAME,

        temperature=0,

        verbose=True

    )

    return llm, embedding_function

```

Ok, so we have our embedding function and our LLM. We also have a way to take an encoding and get back to a PDF. So now we get to the meat of the JITR Bot, where we’ll build our vector store at run time and use it to query the LLM.

```python

def score_unstructured(model, data, query, **kwargs) -> str:

    """Custom model hook for making completions with our knowledge base.

    When requesting predictions from the deployment, pass a dictionary

    with the following keys:

    - 'question' the question to be passed to the retrieval chain

    - 'document' a base64 encoded document to be loaded into the vector database

    datarobot-user-models (DRUM) handles loading the model and calling

    this function with the appropriate parameters.

    Returns:

    --------

    rv : str

        Json dictionary with keys:

            - 'question' user's original question

            - 'answer' the generated answer to the question

    """

    import json

    from langchain.chains import ConversationalRetrievalChain

    from langchain.document_loaders import PyPDFLoader

    from langchain.vectorstores.base import VectorStoreRetriever

    from langchain.vectorstores.faiss import FAISS

    llm, embedding_function = model

    DIRECTORY = "./storage/data"

    temp_file_name = "temp.PDF"

    data_dict = json.loads(data)

    # Write encoding to file

    base_64_to_file(data_dict['document'].encode(), filename=temp_file_name, directory_path=DIRECTORY)

    # Load up the file

    loader = PyPDFLoader(os.path.join(DIRECTORY, temp_file_name))

    docs = loader.load_and_split()

    # Remove file when done

    os.remove(os.path.join(DIRECTORY, temp_file_name))

    # Create our vector database 

    texts = [doc.page_content for doc in docs]

    metadatas = [doc.metadata for doc in docs] 

    db = FAISS.from_texts(texts, embedding_function, metadatas=metadatas)  

    # Define our chain

    retriever = VectorStoreRetriever(vectorstore=db)

    chain = ConversationalRetrievalChain.from_llm(

        llm, 

        retriever=retriever

    )

    # Run it

    response = chain(inputs={'question': data_dict['question'], 'chat_history': []})

    return json.dumps({"result": response})

```

With our hooks defined, all that’s left to do is deploy our pipeline so that we have an endpoint people can interact with. To some, the process of creating a secure, monitored and queryable endpoint out of arbitrary Python code may sound intimidating or at least time consuming to set up. Using the drx package, we can deploy our JITR Bot in one function call.

```python

import datarobotx as drx

deployment = drx.deploy(

    "./storage/deploy/", # Path with embedding model

    name=f"JITR Bot {now}", 

    hooks={

        "score_unstructured": score_unstructured,

        "load_model": load_model

    },

    extra_requirements=["pyPDF"], # Add a package for parsing PDF files

    environment_id="64c964448dd3f0c07f47d040", # GenAI Dropin Python environment

)

```

How to Use JITR

Ok, the hard work is over. Now we get to enjoy interacting with our newfound deployment. Through Python, we can again take advantage of the drx package to answer our most pressing questions.

```python

# Find a PDF

url = "https://s3.amazonaws.com/datarobot_public_datasets/drx/Instantnoodles.PDF"

resp = requests.get(url).content

encoding = base64.b64encode(io.BytesIO(resp).read()) # encode it

# Interact

response = deployment.predict_unstructured(

    {

        "question": "What does this say about noodle rehydration?",

        "document": encoding.decode(),

    }

)['result']

— – – – 

{'question': 'What does this say about noodle rehydration?',

 'chat_history': [],

 'answer': 'The article mentions that during the frying process, many tiny holes are created due to mass transfer, and they serve as channels for water penetration upon rehydration in hot water. The porous structure created during frying facilitates rehydration.'}

```

But more importantly, we can hit our deployment in any language we want since it’s just an endpoint. Below, I show a screenshot of me interacting with the deployment right through Postman. This means we can integrate our JITR Bot into essentially any application we want by just having the application make an API call.

Integrating JITR Bot into an application - DataRobot

Once embedded in an application, using JITR is very easy. For example, in the Slackbot application used at DataRobot internally, users simply upload a PDF with a question to start a conversation related to the document. 

JITR makes it easy for anyone in an organization to start driving real-world value from generative AI, across countless touchpoints in employees’ day-to-day workflows. Check out this video to learn more about JITR. 

Things You Can Do to Make the JITR Bot More Powerful

In the code I showed, we ran through a straightforward implementation of the JITRBot which takes an encoded PDF and makes a vector store at runtime in order to answer questions.  Since they weren’t relevant to the core concept, I opted to leave out a number of bells and whistles we implemented internally with the JITRBot such as:

  • Returning context aware prompt and completion tokens
  • Answering questions based on multiple documents
  • Answering multiple questions at once
  • Letting users provide conversation history
  • Using other chains for different types of questions
  • Reporting custom metrics back to the deployment

There’s also no reason why the JITRBot has to only work with PDF files! So long as a document can be encoded and converted back into a string of text, we could build more logic into our `score_unstructured` hook to handle any file type a user provides.

Start Leveraging JITR in Your Workflow

JITR makes it easy to interact with arbitrary PDFs. If you’d like to give it a try, you can follow along with the notebook here.

Demo
See DataRobot AI Platform in Action
Book a demo

The post Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool appeared first on DataRobot AI Platform.

]]>
Improve Model Performance with DataRobot Sliced Insights https://www.datarobot.com/blog/improve-model-performance-with-datarobot-sliced-insights/ Thu, 31 Aug 2023 15:20:31 +0000 https://www.datarobot.com/?post_type=blog&p=50349 Model segmentation unveils hidden performance insights. DataRobot's Sliced Insights allows focused evaluation for better decisions.

The post Improve Model Performance with DataRobot Sliced Insights appeared first on DataRobot AI Platform.

]]>
There are countless metrics that help data scientists better understand model performance. But model accuracy metrics and diagnostic charts, despite their usefulness, are all aggregations — they can obscure critical information about situations in which a model might not perform as expected. We might build a model that has a high overall accuracy, but unknowingly underperforms in specific scenarios, akin to how a vinyl record may appear whole, but has scratches that are impossible to discover until you play a specific portion of the record. 

Any person who uses models — from data scientists to executives — may need more details to decide whether a model is truly ready for production and, if it’s not, how to improve it. These insights may lie within specific segments of your modeling data. 

Why Model Segmentation Matters

In many cases, building separate models for different segments of the data will yield better overall model performance than the “one model to rule them all” approach.

Let’s say that you are forecasting revenue for your business. You have two main business units: an Enterprise/B2B unit and a Consumer/B2C unit. You might start by building a single model to forecast overall revenue. But when you measure your forecast quality, you may find that it’s not as good as your team needs it to be. In that situation, building a model for your B2B unit and a separate model for your B2C unit will likely improve the performance of both

By splitting a model up into smaller, more specific models trained on subgroups of our data, we can develop more specific insights, tailor the model to that distinct group (population, SKU, etc.), and ultimately improve the model’s performance. 

This is particularly true if:

  1. Your data has natural clusters — like your separate B2B and B2C units.
  2. You have groupings that are imbalanced in the dataset. Larger groups in the data can dominate small ones and a model with high overall accuracy might be masking lower performance for subgroups. If your B2B business makes up 80% of your revenue, your “one model to rule them all” approach may be wildly off for your B2C business, but this fact gets hidden by the relative size of your B2B business. 

But how far do you go down this path? Is it helpful to further split the B2B business by each of 20 different channels or product lines? Knowing that a single overall accuracy metric for your entire dataset might hide important information, is there an easy way to know which subgroups are most important, or which subgroups are suffering from poor performance? What about the insights – are the same factors driving sales in both the B2B and B2C businesses, or are there differences between those segments? To guide these decisions, we need to quickly understand model insights for different segments of our data — insights related to both performance and model explainability. DataRobot Sliced Insights make that easy. 

DataRobot Sliced Insights, now available in the DataRobot AI Platform, allow users to examine model performance on specific subsets of their data. Users can quickly define segments of interest in their data, called Slices, and evaluate performance on those segments. They can also quickly generate related insights and share them with stakeholders. 

How to Generate Sliced Insights

Sliced Insights can be generated entirely in the UI — no code required. First, define a Slice based on up to three Filters: numeric or categorical features that define a segment of interest. By layering multiple Filters, users can define custom groups that are of interest to them. For instance, if I’m evaluating a hospital readmissions model, I could define a custom Slice based on gender, age range, the number of procedures a patient has had, or any combination thereof.

define a custom Slice - DataRobot

After defining a Slice, users generate Sliced Insights by applying that Slice to the primary performance and explainability tools within DataRobot: Feature Effects, Feature Impact, Lift Chart, Residuals, and the ROC Curve.

Feature Impact - DataRobot AI Explainability

This process is frequently iterative. As a data scientist, I might start by defining Slices for key segments of my data — for example, patients who were admitted for a week or longer versus those who stayed only a day or two. 

From there, I can dig deeper by adding more Filters. In a meeting, my leadership may ask me about the impact of preexisting conditions. Now, in a couple of clicks, I can see the effect this has on my model performance and related insights. Toggling back and forth between Slices leads to new and different Sliced Insights. For more in-depth information on configuring and using Slices, visit the documentation page.

Case Study: Hospital No-Shows

I was recently working with a hospital system that had built a patient no-show model. The performance looked pretty accurate: the model distinguished the patients at lowest risk for no-show from those at higher-risk, and it looked well-calibrated (the predicted and actual lines closely follow one another). Still, they wanted to be sure it would drive value for their end-user teams when they rolled it out.

Lift Chart - DataRobot AI Platform

The team believed that there would be very different behavioral patterns between departments. They had a few large departments (Internal Medicine, Family Medicine) and a long tail of smaller ones (Oncology, Gastroenterology, Neurology, Transplant). Some departments had a high rate of no-shows (up to 20%), whereas others rarely had no-shows at all (<5%). 

They wanted to know whether they should be building a model for each department or if one model for all departments would be good enough.

Using Sliced Insights, it quickly became clear that building one model for all departments was the wrong choice. Because of the class imbalance in the data, the model fit the large departments well and had a high overall accuracy that obscured poor performance in small departments. 

Slice: Internal Medicine

Lift chart - Internal medicine - DataRobot
The model fit well for the Internal Medicine department, which was large.

Slice: Gastroenterology

Gastroenterology predictions - DataRobot
The model fit extremely poorly for a smaller department, Gastroenterology, generating predictions that were often far from the true values.

As a result, the team chose to limit the scope of their “general” model to only the departments where they had the most data and where the model added value. For smaller departments, the team used domain expertise to cluster departments based on the types of patients they saw, then trained a model for each cluster. Sliced Insights guided this medical team to build the right set of groups and models for their specific use case, so that each department could realize value.

Sliced Insights for Better Model Segmentation

Sliced Insights help users evaluate the performance of their models at a deeper level than by looking at overall metrics. A model that meets overall accuracy requirements might consistently fail for important segments of the data, such as for underrepresented demographic groups or smaller business units. By defining Slices and evaluating model insights in relation to those Slices, users can more easily determine if model segmentation is necessary or not, quickly surface these insights to communicate better with stakeholders, and, ultimately, help organizations make more informed decisions about how and when a model should be applied. 

FREE TRIAL
Try Sliced Insights and Other DataRobot Features for Free
Sign Up for Free

The post Improve Model Performance with DataRobot Sliced Insights appeared first on DataRobot AI Platform.

]]>
Optimizing Large Language Model Performance with ONNX on DataRobot MLOps https://www.datarobot.com/blog/optimizing-large-language-model-performance-with-onnx-on-datarobot-mlops/ Thu, 01 Jun 2023 13:00:00 +0000 https://www.datarobot.com/?post_type=blog&p=47246 Learn how to easily convert a pre-trained foundation model from Tensorflow or PyTorch to ONNX for better performance. Speed up model inference on DataRobot MLOps with native ONNX support.

The post Optimizing Large Language Model Performance with ONNX on DataRobot MLOps appeared first on DataRobot AI Platform.

]]>
In our previous blog post we talked about how to simplify the deployment and monitoring of foundation models with DataRobot MLOps. We took a pre-trained model from HuggingFace using Tensorflow, and we wrote a simple inference script and uploaded the script and the saved model as a custom model package to DataRobot MLOps. We then easily deployed the pre-trained foundation model on DataRobot servers in just a few minutes.

In this blog post, we’ll showcase how you can effortlessly gain a significant improvement in the inference speed of the same model while decreasing its resource consumption. In our walkthrough, you’ll learn that the only thing needed is to convert your language model to the ONNX format. The native support for the ONNX Runtime in DataRobot will take care of the rest.

Why Are Large Language Models Challenging for Inference?

Previously, we talked about what language models are. The neural architecture of large language models can have billions of parameters. Having a huge number of parameters means these models will be hungry for resources and slow to predict. Because of this, they are challenging to serve for inference with high performance. In addition, we want these models to not only process one input at a time, but also process batches of inputs and consume them more efficiently. The good news is that we have a way of improving their performance and throughput by accelerating their inference process, thanks to the capabilities of the ONNX Runtime.

What Is ONNX and the ONNX Runtime?

ONNX (Open Neural Network Exchange) is an open standard format to represent machine learning (ML) models built on various frameworks such as PyTorch, Tensorflow/Keras, scikit-learn. ONNX Runtime is also an open source project that’s built on the ONNX standard. It’s an inference engine optimized to accelerate the inference process of models converted to the ONNX format across a wide range of operating systems, languages, and hardware platforms. 

ONNX Runtime Execution Providers
ONNX Runtime Execution Providers. Source: ONNX Runtime

ONNX and its runtime together form the basis of standardizing and accelerating model inference in production environments. Through certain optimization techniques1, ONNX Runtime accelerates model inference on different platforms, such as mobile devices, cloud, or edge environments. It provides an abstraction by leveraging these platforms’ compute capabilities through a single API interface. 

In addition, by converting models to ONNX, we gain the advantage of framework interoperability as we can export models trained in various ML frameworks to ONNX and vice versa, load previously exported ONNX models into memory, and use them in our ML framework of choice. 

Accelerating Transformer-Based Model Inference with ONNX Runtime

Various benchmarks executed by independent engineering teams in the industry have demonstrated that transformer-based models can significantly benefit from ONNX Runtime optimizations to reduce latency and increase throughput on CPUs. 

Some examples include Microsoft’s work around BERT model optimization using ONNX Runtime2, Twitter benchmarking results for transformer CPU inference in Google Cloud3, and sentence transformers acceleration with Hugging Face Optimum4.

image
Source: Microsoft5

These benchmarks demonstrate that we can significantly increase throughput and performance for transformer-based NLP models, especially through quantization. For example, in Microsoft team’s benchmark above, the quantized BERT 12-layer model with Intel® DL Boost: VNNI and ONNX Runtime can achieve up to 2.9 times performance gains. 

How Does DataRobot MLOps Natively Support ONNX?

For your modeling or inference workflows, you can integrate your custom code into DataRobot with these two mechanisms:  

  1. As a custom task: While DataRobot provides hundreds of built-in tasks, there are situations where you need preprocessing or modeling methods that are not currently supported out-of-the-box. To fill this gap, you can bring a custom task that implements a missing method, plug that task into a blueprint inside DataRobot, and then train, evaluate, and deploy that blueprint in the same way as you would for any DataRobot-generated blueprint. You can review how the process works here.
  2. As a custom inference model: This might be a pre-trained model or user code prepared for inference, or a combination of both. An inference model can have a predefined input/output schema for classification/regression/anomaly detection or be completely unstructured. You can read more details on deploying your custom inference models here.

In both cases, in order to run your custom models on the DataRobot AI Platform with MLOps support, you first select one of our public_dropin_environments such as Python3 + PyTorch, Python3 + Keras/Tensorflow or Python3 + ONNX. These environments each define the libraries available in the environment and provide a template. Your own dependency requirements can be applied to one of these base environments to create a runtime environment for your custom tasks or custom inference models. 

The bonus perk of DataRobot execution environments is that if you have a single model artifact and your model conforms to certain input/output structures, meaning you do not need a custom inference script to transform the input request or the raw predictions, you do not even need to provide a custom script in your uploaded model package. In the ONNX case, if you only want to predict with a single .onnx file, and this model file conforms to the structured specification, when you select Python3+ONNX base environment for your custom model package, DataRobot MLOps will know how to load this model into memory and predict with it. To learn more and get your hands on easy-to-reproduce examples, please visit the custom inference model templates section in our custom models repository. 

Walkthrough

After reading all this information about the performance benefits and the relative simplicity of implementing models through ONNX, I’m sure you’re more than excited to get started.

To demonstrate an end-to-end example, we’ll perform the following steps: 

  1. Grab the same foundation model in our previous blog post and save it on your local drive.
  2. Export the saved Tensorflow model to ONNX format.
  3. Package the ONNX model artifact along with a custom inference (custom.py) script.
  4. Upload the custom model package to DataRobot MLOps on the ONNX base environment.
  5. Create a deployment.
  6. Send prediction requests to the deployment endpoint.

For brevity, we’ll only show the additional and modified steps you’ll perform on top of the walkthrough from our previous blog, but the end-to-end implementation is available on this Google Colab notebook under the DataRobot Community repository. 

Converting the Foundation Model to ONNX

For this tutorial, we’ll use the transformer library’s ONNX conversion tool to convert our question answering LLM into the ONNX format as below.

FOUNDATION_MODEL = "bert-large-uncased-whole-word-masking-finetuned-squad"

!python -m transformers.onnx --model=$FOUNDATION_MODEL --feature=question-answering $BASE_PATH


Modifying Your Custom Inference Script to Use Your ONNX Model

For inferencing with this model on DataRobot MLOps, we’ll have our custom.py script load the ONNX model into memory in an ONNX runtime session and handle the incoming prediction requests as follows:

%%writefile $BASE_PATH/custom.py

"""

Copyright 2021 DataRobot, Inc. and its affiliates.

All rights reserved.

This is proprietary source code of DataRobot, Inc. and its affiliates.

Released under the terms of DataRobot Tool and Utility Agreement.

"""

import json

import os

import io

from transformers import AutoTokenizer

import onnxruntime as ort

import numpy as np

import pandas as pd

def load_model(input_dir):

    global model_load_duration

    onnx_path = os.path.join(input_dir, "model.onnx")

    tokenizer_path = os.path.join(input_dir)

    tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

    sess = ort.InferenceSession(onnx_path)

    return sess, tokenizer

def _get_answer_in_text(output, input_ids, idx, tokenizer):

    answer_start = np.argmax(output[0], axis=1)[idx]

    answer_end = (np.argmax(output[1], axis=1) + 1)[idx]

    answer = tokenizer.convert_tokens_to_string(

        tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])

    )

    return answer

def score_unstructured(model, data, query, **kwargs):

    global model_load_duration

    sess, tokenizer = model

    # Assume batch input is sent with mimetype:"text/csv"

    # Treat as single prediction input if no mimetype is set

    is_batch = kwargs["mimetype"] == "text/csv"

    if is_batch:

        input_pd = pd.read_csv(io.StringIO(data), sep="|")

        input_pairs = list(zip(input_pd["context"], input_pd["question"]))

        inputs = tokenizer.batch_encode_plus(

            input_pairs, add_special_tokens=True, padding=True, return_tensors="np"

        )

        input_ids = inputs["input_ids"]

        output = sess.run(["start_logits", "end_logits"], input_feed=dict(inputs))

        responses = []

        for i, row in input_pd.iterrows():

            answer = _get_answer_in_text(output, input_ids[i], i, tokenizer)

            response = {

                "context": row["context"],

                "question": row["question"],

                "answer": answer,

            }

            responses.append(response)

        to_return = json.dumps(

            {

                "predictions": responses

            }

        )

    else:

        data_dict = json.loads(data)

        context, question = data_dict["context"], data_dict["question"]

        inputs = tokenizer(

            question,

            context,

            add_special_tokens=True,

            padding=True,

            return_tensors="np",

        )

        input_ids = inputs["input_ids"][0]

        output = sess.run(["start_logits", "end_logits"], input_feed=dict(inputs))

        answer = _get_answer_in_text(output, input_ids, 0, tokenizer)

        to_return = json.dumps(

            {

                "context": context,

                "question": question,

                "answer": answer

            }

        )

    return to_return

Creating your custom inference model deployment on DataRobot’s ONNX base environment

As the final change, we’ll create this custom model’s deployment on the ONNX base environment of DataRobot MLOps as below:

deployment = deploy_to_datarobot(BASE_PATH,

                                "ONNX",

                                "bert-onnx-questionAnswering",

                                "Pretrained BERT model, fine-tuned on SQUAD for question answering")

When the deployment is ready, we’ll test our custom model with our test input and make sure that we’re getting our questions answered by our pre-trained LLM:

datarobot_predict(test_input, deployment.id)

Performance Comparison

Now that we have everything ready, it’s time to compare our previous Tensorflow deployment with the ONNX alternative.

For our benchmarking purposes, we constrained our custom model deployment to only have 4GB of memory from our MLOps settings so that we could compare the Tensorflow and ONNX alternatives under resource constraints.

As can be seen from the results below, our model in ONNX predicts 1.5x faster than its Tensorflow counterpart. And this result can be seen through just an additional basic ONNX export, (i.e. without any further optimization configurations, such as quantization). 

Prediction Duration

Regarding resource consumption, somewhere after ~100 rows, our Tensorflow model deployment starts returning Out of Memory (OOM) errors, meaning that this model would require more than 4GBs of memory to process and predict for ~100 rows of input. On the other hand, our ONNX model deployment can calculate predictions up to ~450 rows without throwing an OOM error. As a conclusion for our use case, the fact that the Tensorflow model handles up to 100 rows, while its ONNX equivalent handles up to 450 rows shows that the ONNX model is more resource efficient, because it uses much less memory. 

Start Leveraging ONNX for Your Custom Models

By leveraging the open source ONNX standard and the ONNX Runtime, AI builders can take advantage of the framework interoperability and accelerated inference performance, especially for transformer-based large language models on their preferred execution environment. With its native ONNX support, DataRobot MLOps makes it easy for organizations to gain value from their machine learning deployments with optimized inference speed and throughput. 

In this blog post, we showed how effortless it is to use a large language model for question answering in ONNX as a DataRobot custom model and how much inference performance and resource efficiency you can gain with a simple conversion step. To replicate this workflow, you can find the end-to-end notebook under the DataRobot Community repo, along with many other tutorials to empower AI builders with the capabilities of DataRobot.

Trial
Set up your Trial account and experience the DataRobot AI Platform today
Start for Free

1 ONNX Runtime, Model Optimizations

2 Microsoft, Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider

3 Twitter Blog, Speeding up Transformer CPU inference in Google Cloud 

4 Philipp Schmid, Accelerate Sentence Transformers with Hugging Face Optimum

5 Microsoft, Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider

The post Optimizing Large Language Model Performance with ONNX on DataRobot MLOps appeared first on DataRobot AI Platform.

]]>
Deep Learning for Decision-Making Under Uncertainty https://www.datarobot.com/blog/deep-learning-for-decision-making-under-uncertainty/ Thu, 18 May 2023 15:52:19 +0000 https://www.datarobot.com/?post_type=blog&p=46902 ​​Find out why quantile regression might be a simple way to do modeling under uncertainty and how the transformer architecture supports this use case.

The post Deep Learning for Decision-Making Under Uncertainty appeared first on DataRobot AI Platform.

]]>
One thing we learned from our customers is that they often need more than point predictions to make informed decisions. An example of such point predictions would be a temperature forecast (regression). But what if, besides the expected temperature, we wanted to predict the probability for every temperature? In that case, we would use distributional predictions. But many of the most powerful machine learning models do not produce distributions in their predictions. DataRobot is once more pushing the boundary of what’s possible to provide this important capability to our customers. In this article, we will highlight one simple, yet powerful, approach to modeling under uncertainty: quantile regression.

DataRobot already supports class probabilities for Multiclass prediction. We also offer prediction intervals for time series. Prediction intervals give a range of values for the entire distribution of future observations. They are often applied in areas such as finance and econometrics. Distributional regression goes a step further than prediction intervals. It estimates the distribution of the target variable for every prediction. Another way to model the conditional distribution is quantile regression. As the name suggests, it estimates a selection of quantiles. It is simpler to do than distributional predictions but helps us to estimate the full distribution.

A quick reminder: a quantile splits the values in subsets of the given size. For instance, for q=0.5 the quantile is the median and 50% of the data points are below and 50% are above the quantile. For q=0.99 we have the 99th percentile and only 1% of the data is above that line.

Multi quantile regression DataRobot
Picture 1
Multi quantile regression
Picture 2
Quantile errors DataRobot AI Platform
Picture 3

Picture 1 shows the results of a quantile regression using the deepquantiles Python package and the Bishop-S dataset.1 Picture 2 zooms into the upper right corner of the plot from Picture 1. Here the distribution is quite heteroskedastic, but the model successfully avoids quantile crossings. Picture 3 demonstrates that the quantiles, as predicted by the model, separate our random test sample as required.

Deep Learning for Quantile Regression

There are many ways to incorporate uncertainty into predictions. For instance, classical examples for modeling under uncertainty are time series models, such as ARIMA2 and GARCH3 or, more recently, NGBoost.4 But what if you want to use a different model that fits your problem, or perhaps a higher performing model? Deep Learning models can help with modeling under uncertainty in various ways. For instance, a separate deep learning model could learn to predict quantiles based on the predictions of an underlying model. Thereby, we could add quantiles to all kinds of interesting models that do not support this by default. As a first step, my colleague, Peter Prettenhofer, and I explored how well deep learning models predict the quantiles of various target distributions directly, not on top of the predictions of another model.

In the past, practitioners avoided deep learning models for uncertainty modeling. Deep learning models were hard to interpret, sensitive to hyperparameters, and required a lot of data and training time. Now, the transformer architecture5 is in the spotlight and powers successful tools like ChatGPT. A few years ago, the transformer architecture was mostly used to build large language models (LLMs). Now it is clear that it can be used with tabular data too.6

In our research, we adapted two existing solutions for our purposes to compare them with the quantiles we get from NGBoost’s predicted distributions using the percent point function:

  1. A custom multi-quantile regressor, DeepQuantiles, with an architecture that is similar to the multi-quantile regressor from the deepquantiles Python package. This is an example for a classical multi-layer neural network.
  2. The FTTransformer,7 which makes use of the transformer architecture like in state-of-the-art large language models but with a special tokenizer for numerical and categorical data.

Both deep learning models use a tweaked pinball loss function to better cope with the problem of quantile crossings. The original pinball loss function is a standard loss function used for quantile regression. It consists of two parts. Let y be the true target value and ŷ the predicted target value. Then the pinball loss for a given quantile q is (1 – q)(ŷ – y) in case y < ŷ and q(y – ŷ) in case y ≥ ŷ.

Pinball loss (q=0.2)
image5

Comparison

Below you see results for eight publicly available datasets often used for regression analysis with numerical and categorical variables and 506 to 11934 rows.

ID# Rows# Categorical# Numerical# Values
ames-housing1,4604337116,800
concrete1,030088,240
energy768086,144
housing5060136,578
kin8nm8,1920865,536
naval11,934016190,944
power9,5680438,272
wine159901117,589

The plots below show the out-of-sample performance for the three models NGBoost, DeepQuantiles, and FTTransformer. We picked the quantiles 0.01, 0.1, 0.3, 0.5, 0.7, 0.9 and 0.99. For every model, we ran a 20-fold cross-validation with 5 repetitions. This means 100 runs per dataset. The dots depict the average predicted quantiles. The error bars are the standard deviations, not the standard errors.

quantile regression

Observations

NGBoost is doing a decent job but, apparently, has some problems with certain datasets (ames-housing, energy, naval, power). DeepQuantiles appears to be a bit stronger but also disappoints in a few cases (ames-housing, energy, housing). The FTTransformer gives very good results on average but with a huge variance.

One drawback of NGBoost is that it requires us to specify the type of distribution in advance. We did not make any additional assumptions and only used the default normal distribution. This could be the reason why NGBoost performs rather poorly on some datasets. Given that the performance of DeepQuantiles is quite sensitive to the choice of its hyperparameters, it is not a good alternative to NGBoost. In many situations, an ensemble of FTTransformers could be a good way to do quantile regression. The model is also not that sensitive to the choice of hyperparameters and is trained rather quickly.

Conclusion

We are constantly seeking to implement the learnings from our customers at DataRobot, and modeling under uncertainty is undoubtedly a very important capability that we are exploring. In this research, we saw that quantile regression is a simple way to do modeling under uncertainty and that the transformer architecture appears to be useful for that application. Deep learning even has the potential to enhance DataRobot regression models that currently lack this capability. Stay tuned, as we highlight even more innovations and research happening at DataRobot.

Demo
See DataRobot AI Platform in Action
Watch a demo

1 Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, 2007.

2 Time Series Analysis: Forecasting and Control, Jenkins, Gwilym M., et al. Wiley, 2015.

3 Generalized autoregressive conditional heteroskedasticity, Tim Bollerslev, Journal of Econometrics, vol. 31, no. 3, 1986, pp. 307-327.

4 arXiv, NGBoost: Natural Gradient Boosting for Probabilistic Prediction, Tony Duan, et al. 2019.

5 arXiv, Attention Is All You Need, Ashish Vaswani, et al. June 2017.

6 arXiv, Revisiting Deep Learning Models for Tabular Data, Yury Gorishniy, et al. June 2021.

7 Ibid

The post Deep Learning for Decision-Making Under Uncertainty appeared first on DataRobot AI Platform.

]]>