You searched for feed | DataRobot https://www.datarobot.com/ Deliver Value from AI Thu, 07 Mar 2024 16:19:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 Choosing the Right Vector Embedding Model for Your Generative AI Use Case https://www.datarobot.com/blog/choosing-the-right-vector-embedding-model-for-your-generative-ai-use-case/ Thu, 07 Mar 2024 15:33:37 +0000 https://www.datarobot.com/?post_type=blog&p=53883 When building a RAG application we often need to choose a vector embedding model, a critical component of many generative AI applications. Learn mor

The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

]]>
In our previous post, we discussed considerations around choosing a vector database for our hypothetical retrieval augmented generation (RAG) use case. But when building a RAG application we often need to make another important decision: choose a vector embedding model, a critical component of many generative AI applications. 

A vector embedding model is responsible for the transformation of unstructured data (text, images, audio, video) into a vector of numbers that capture semantic similarity between data objects. Embedding models are widely used beyond RAG applications, including recommendation systems, search engines, databases, and other data processing systems. 

Understanding their purpose, internals, advantages, and disadvantages is crucial and that’s what we’ll cover today. While we’ll be discussing text embedding models only, models for other types of unstructured data work similarly.

What Is an Embedding Model?

Machine learning models don’t work with text directly, they require numbers as input. Since text is ubiquitous, over time, the ML community developed many solutions that handle the conversion from text to numbers. There are many approaches of varying complexity, but we’ll review just some of them.

A simple example is one-hot encoding: treat words of a text as categorical variables and map each word to a vector of 0s and single 1.

image1

Unfortunately, this embedding approach is not very practical, since it leads to a large number of unique categories and results in unmanageable dimensionality of output vectors in most practical cases. Also, one-hot encoding does not put similar vectors closer to one another in a vector space.

Embedding models were invented to tackle these issues. Just like one-hot encoding, they take text as input and return vectors of numbers as output, but they are more complex as they are taught with supervised tasks, often using a neural network. A supervised task can be, for example, predicting product review sentiment score. In this case, the resulting embedding model would place reviews of similar sentiment closer to each other in a vector space. The choice of a supervised task is critical to producing relevant embeddings when building an embedding model.

image2

image3
Word embeddings projected onto 2D axes

On the diagram above we can see word embeddings only, but we often need more than that since human language is more complex than just many words put together. Semantics, word order, and other linguistic parameters should all be taken into account, which means we need to take it to the next level – sentence embedding models

Sentence embeddings associate an input sentence with a vector of numbers, and, as expected, are way more complex internally since they have to capture more complex relationships.

image4

Thanks to progress in deep learning, all state-of-the-art embedding models are created with deep neural nets, since they better capture complex relationships inherent to a human language.

A good embedding model should: 

  • Be fast since often it is just a preprocessing step in a larger application
  • Return vectors of manageable dimensions
  • Return vectors that capture enough information about similarity to be practical

Let’s now quickly look into how most embedding models are organized internally.

Modern Neural Networks Architecture

As we just mentioned, all well-performing state-of-the-art embedding models are deep neural networks. 

This is an actively developing field and most top performing models are associated with some novel architecture improvement. Let’s briefly cover two very important architectures: BERT and GPT.

BERT (Bidirectional Encoder Representations from Transformers) was published in 2018 by researchers at Google and described the application of the bidirectional training of “transformer”, a popular attention model, to language modeling. Standard transformers include two separate mechanisms: an encoder for reading text input and a decoder that makes a prediction. 

BERT uses an encoder that reads the entire sentence of words at once which allows the model to learn the context of a word based on all of its surroundings, left and right unlike legacy approaches that looked at a text sequence from left to right or right to left. Before feeding word sequences into BERT, some words are replaced with [MASK] tokens and then the model attempts to predict the original value of the masked words, based on the context provided by the other, non-masked words in the sequence.  

Standard BERT does not perform very well in most benchmarks and BERT models require task-specific fine-tuning. But it is open-source, has been around since 2018, and has relatively modest system requirements (can be trained on a single medium-range GPU). As a result, it became very popular for many text-related tasks. It is fast, customizable, and small. For example, a very popular all-Mini-LM model is a modified version of BERT.

GPT (Generative Pre-Trained Transformer) by OpenAI is different. Unlike BERT, It is unidirectional, i.e. text is processed in one direction and uses a decoder from a transformer architecture that is suitable for predicting the next word in a sequence. These models are slower and produce very high dimensional embeddings, but they usually have many more parameters, do not require fine-tuning, and are more applicable to many tasks out of the box. GPT is not open source and is available as a paid API.

Context Length and Training Data

Another important parameter of an embedding model is context length. Context length is the number of tokens a model can remember when working with a text. A longer context window allows the model to understand more complex relationships within a wider body of text. As a result, models can provide outputs of higher quality, e.g. capture semantic similarity better.

To leverage a longer context, training data should include longer pieces of coherent text: books, articles, and so on. However, increasing context window length increases the complexity of a model and increases compute and memory requirements for training. 

There are methods that help manage resource requirements e.g. approximate attention, but they do this at a cost to quality. That’s another trade-off that affects quality and costs: larger context lengths capture more complex relationships of a human language, but require more resources.

Also, as always, the quality of training data is very important for all models. Embedding models are no exception. 

Semantic Search and Information Retrieval

Using embedding models for semantic search is a relatively new approach. For decades, people used other technologies: boolean models, latent semantic indexing (LSI), and various probabilistic models.

Some of these approaches work reasonably well for many existing use cases and are still widely used in the industry. 

One of the most popular traditional probabilistic models is BM25 (BM is “best matching”), a search relevance ranking function. It is used to estimate the relevance of a document to a search query and ranks documents based on the query terms from each indexed document. Only recently have embedding models started consistently outperforming it, but BM25 is still used a lot since it is simpler than using embedding models, it has lower computer requirements, and the results are explainable.

Benchmarks

Not every model type has a comprehensive evaluation approach that helps to choose an existing model. 

Fortunately, text embedding models have common benchmark suites such as:

The article “BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models” proposed a reference set of benchmarks and datasets for information retrieval tasks. The original BEIR benchmark consists of a set of 19 datasets and methods for search quality evaluation. Methods include: question-answering, fact-checking, and entity retrieval. Now anyone who releases a text embedding model for information retrieval tasks can run the benchmark and see how their model ranks against the competition.

Massive Text Embedding Benchmarks include BEIR and other components that cover 58 datasets and 112 languages. The public leaderboard for MTEB results can be found here.

These benchmarks have been run on a lot of existing models and their leaderboards are very useful to make an informed choice about model selection.

Using Embedding Models in a Production Environment

Benchmark scores on standard tasks are very important, but they represent only one dimension.

When we use an embedding model for search, we run it twice:

  • When doing offline indexing of available data
  • When embedding a user query for a search request 

There are two important consequences of this. 

The first is that we have to reindex all existing data when we change or upgrade an embedding model. All systems built using embedding models should be designed with upgradability in mind because newer and better models are released all the time and, most of the time, upgrading a model is the easiest way to improve overall system performance. An embedding model is a less stable component of the system infrastructure in this case.

The second consequence of using an embedding model for user queries is that the inference latency becomes very important when the number of users goes up. Model inference takes more time for better-performing models, especially if they require GPU to run: having latency higher than 100ms for a small query is not unheard of for models that have more than 1B parameters. It turns out that smaller, leaner models are still very important in a higher-load production scenario. 

The tradeoff between quality and latency is real and we should always remember about it when choosing an embedding model.

As we have mentioned above, embedding models help manage output vector dimensionality which affects the performance of many algorithms downstream. Generally the smaller the model, the shorter the output vector length, but, often, it is still too great for smaller models. That’s when we need to use dimensionality reduction algorithms such as PCA (principal component analysis), SNE / tSNE (stochastic neighbor embedding), and UMAP (uniform manifold approximation). 

Another place we can use dimensionality reduction is before storing embeddings in a database. Resulting vector embeddings will occupy less space and retrieval speed will be faster, but will come at a price for the quality downstream. Vector databases are often not the primary storage, so embeddings can be regenerated with better precision from the original source data. Their use helps to reduce the output vector length and, as a result, makes the system faster and leaner.

Making the Right Choice

There’s an abundance of factors and trade-offs that should be considered when choosing an embedding model for a use case. The score of a potential model in common benchmarks is important, but we should not forget that it’s the larger models that have a better score. Larger models have higher inference time which can severely limit their use in low latency scenarios as often an embedding model is a pre-processing step in a larger pipeline. Also, larger models require GPUs to run. 

If you intend to use a model in a low-latency scenario, it’s better to focus on latency first and then see which models with acceptable latency have the best-in-class performance. Also, when building a system with an embedding model you should plan for changes since better models are released all the time and often it’s the simplest way to improve the performance of your system.

Closing the Generative AI Confidence Gap

Discover how DataRobot helps you deliver real-world value with generative AI

Learn more

The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

]]>
Anti-Money Laundering (AML) Alert Scoring https://www.datarobot.com/ai-accelerators/anti-money-laundering-aml-alert-scoring/ Thu, 22 Feb 2024 12:40:53 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53640 Our primary goal with this accelerator is to develop a powerful predictive model that utilizes historical customer and transactional data, enabling us to identify suspicious activities and generate crucial Suspicious Activity Reports (SARs).

The model will assign a suspicious activity score to future alerts, improving the effectiveness and efficiency of an AML compliance program by prioritizing alerts based on their ranking order according to the score.

The post Anti-Money Laundering (AML) Alert Scoring appeared first on DataRobot AI Platform.

]]>

The following outlines aspects of this use case.

  • Use case type: Anti-money laundering (false positive reduction)
  • Target audience: Data Scientist, Financial Crime Compliance Team
  • Desired outcomes:
    • Identify customer data and transaction activity indicative of a high risk for potential money laundering.
    • Detect anomalous changes in behavior or emerging money laundering patterns at an early stage.
    • Reduce the false positive rate for cases selected for manual review.
  • Metrics/KPIs:
    • Annual alert volume
    • Cost per alert
    • False positive reduction rate
  • Sample dataset

A crucial aspect of an effective AML compliance program involves monitoring transactions to detect suspicious activity. This encompasses various types of transactions, such as deposits, withdrawals, fund transfers, purchases, merchant credits, and payments. Typically, monitoring begins with a rules-based system that scans customer transactions for signs of potential money laundering. When a transaction matches a predefined rule, an alert is generated, and the case is referred to the bank’s internal investigation team for manual review. If the investigators determine that the behavior is indicative of money laundering, a SAR is filed with FinCEN.

However, the aforementioned standard transaction monitoring system has significant drawbacks. Most notably, the system’s rules-based and inflexible nature leads to a high rate of false positives, with as many as 90% of cases being incorrectly flagged as suspicious. This prevalence of false positives hampers investigators’ efficiency as they are required to manually filter out cases erroneously identified by the rules-based system.

Financial institutions’ compliance teams may have hundreds or even thousands of investigators, and the current systems hinder their effectiveness and efficiency in conducting investigations. The cost of reviewing an alert ranges from $30 to $70. For a bank that receives 100,000 alerts per year, this amounts to a substantial sum. By reducing false positives, potential savings of $600,000 to $4.2 million per year can be achieved.

Key takeaways:

  • Strategy/challenge: Facilitate investigators in focusing their attention on cases with the highest risk of money laundering, while minimizing time spent on reviewing false-positive cases.For banks dealing with a high volume of daily transactions, improving the effectiveness and efficiency of investigations ultimately leads to fewer unnoticed instances of money laundering. This enables banks to strengthen their regulatory compliance and reduce the prevalence of financial crimes within their network.
  • Business driver: Enhance the efficiency of AML transaction monitoring and reduce operational costs.By harnessing their capability to dynamically learn patterns in complex data, machine learning models greatly enhance the accuracy of predicting which cases will result in a SAR filing. Machine learning models for anti-money laundering can be integrated into the review process to score and rank new cases.
  • Model solution: Assign a suspicious activity score to each AML alert, thereby improving the efficiency of an AML compliance program.Any case exceeding a predetermined risk threshold is forwarded to investigators for manual review. Cases falling below the threshold can be automatically discarded or subject to a less intensive review. Once machine learning models are deployed in production, they can be continuously retrained using new data to detect novel money laundering behaviors, incorporating insights from investigator feedback.In particular, the model will employ rules that trigger an alert whenever a customer requests a refund of any amount. Small refund requests can be utilized by money launderers to test the refund mechanism or establish a pattern of regular refund requests for their account.

Work with data

The linked synthetic dataset illustrates a credit card company’s AML compliance program. Specifically the model is detecting the following money-laundering scenarios:

  • Customer spends on the card but overpays their credit card bill and seeks a cash refund for the difference.
  • Customer receives credits from a merchant without offsetting transactions and either spends the money or requests a cash refund from the bank.

The unit of analysis in this dataset is an individual alert, meaning a rule-based engine is in place to produce an alert to detect potentially suspicious activity consistent with the above scenarios.

Problem framing

The target variable for this use case is whether or not the alert resulted in a SAR after manual review by investigators, making this a binary classification problem. The unit of analysis is an individual alert—the model will be built on the alert level—and each alert will receive a score ranging from 0 to 1. The score indicates the probability of the alert being a SAR.

The goal of applying a model to this use case is to lower the false positive rate, which means resources are not spent reviewing cases that are eventually determined to not be suspicious after an investigation.

In this use case, the False Positive Rate of the rules engine on the validation sample (1600 records) is:

Number of SAR=0 divided by the total number of records = 1436/1600 = 90%.

Data preparation

Consider the following when working with data:

  • Define the scope of analysis: Collect alerts from a specific analytical window to start with; it’s recommended that you use 12–18 months of alerts for model building.
  • Define the target: Depending on the investigation processes, the target definition could be flexible. In this walkthrough, alerts are classified as Level1Level2Level3, and Level3-confirmed. These labels indicate at which level of the investigation the alert was closed (i.e., confirmed as a SAR). To create a binary target, treat Level3-confirmed as SAR (denoted by 1) and the remaining levels as non-SAR alerts (denoted by 0).
  • Consolidate information from multiple data sources: Below is a sample entity-relationship diagram indicating the relationship between the data tables used for this use case. 
Consolidate information from multiple data sources

Some features are static information—kyc_risk_score and state of residence for example—these can be fetched directly from the reference tables.

For transaction behavior and payment history, the information will be derived from a specific time window prior to the alert generation date. This case uses 90 days as the time window to obtain the dynamic customer behavior, such as nbrPurchases90davgTxnSize90d, or totalSpend90d.

Features and sample data

The features in the sample dataset consist of KYC (Know-Your-Customer) information, demographic information, transactional behavior, and free-form text information from the customer service representatives’ notes. To apply this use case in your orgaization, your dataset should contain, minimally, the following features:

  • Alert ID
  • Binary classification target (SAR/no-SAR1/0True/False, etc.)
  • Date/time of the alert
  • “Know Your Customer” score used at time of account opening
  • Account tenure, in months
  • Total merchant credit in the last 90 days
  • Number of refund requests by the customer in the last 90 days
  • Total refund amount in the last 90 days

Other helpful features to include are:

  • Annual income
  • Credit bureau score
  • Number of credit inquiries in the past year
  • Number of logins to the bank website in the last 90 days
  • Indicator that the customer owns a home
  • Maximum revolving line of credit
  • Number of purchases in the last 90 days
  • Total spend in the last 90 days
  • Number of payments in the last 90 days
  • Number of cash-like payments (e.g., money orders) in last 90 days
  • Total payment amount in last 90 days
  • Number of distinct merchants purchased at in the last 90 days
  • Customer Service Representative notes and codes based on conversations with customer (cumulative)

Below is an example of one row in the training data after it is merged and aggregated (it is broken into multiple lines for a easier visualization). 

Configure the Python client

The DataRobot API offers a programmatic alternative to the web interface for creating and managing DataRobot projects. It can be accessed through REST or DataRobot’s Python and R clients, supporting Windows, UNIX, and OS X environments. To authenticate with DataRobot’s API, you will need an endpoint and token, as detailed in the documentation. Once you have configured your API credentials, endpoints, and environment, you can leverage the DataRobot API to perform the following actions:

  1. Upload a dataset.
  2. Train a model to learn from the dataset using the Informative Features feature list.
  3. Test prediction outcomes on the model using new data.
  4. Deploy the model.
  5. Predict outcomes using the deployed model and new data.

Import libraries

In [1]:

# NOT required for Notebooks in DataRobot Workbench
# *************************************************
! pip install datarobot --quiet
# Upgrade DR to datarobot-3.2.0b0
# ! pip uninstall datarobot --yes
# ! pip install datarobot --pre

! pip install pandas --quiet
! pip install matplotlib --quiet

import getpass

import datarobot as dr

endpoint = "https://app.eu.datarobot.com/api/v2"
token = getpass.getpass()
dr.Client(endpoint=endpoint, token=token)
# *************************************************

········

Out[1]:

<datarobot.rest.RESTClientObject at 0x7fd37ba9fc40>
In[2]:

import datetime as datetime
import os

import datarobot as dr
import matplotlib.pyplot as plt
import pandas as pd

params = {"axes.titlesize": "8", "xtick.labelsize": "5", "ytick.labelsize": "6"}
plt.rcParams.update(params)

Analyze, clean, and curate data

Preparing data is an iterative process. Even if you have already cleaned and prepped your training data before uploading it, you can further enhance its quality by performing Exploratory Data Analysis (EDA).

In [3]:

# Load the training dataset
df = pd.read_csv(
    "https://s3.amazonaws.com/datarobot-use-case-datasets/DR_Demo_AML_Alert_train.csv",
    encoding="ISO-8859-1",
)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 31 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   ALERT                             10000 non-null  int64  
 1   SAR                               10000 non-null  int64  
 2   kycRiskScore                      10000 non-null  int64  
 3   income                            9800 non-null   float64
 4   tenureMonths                      10000 non-null  int64  
 5   creditScore                       10000 non-null  int64  
 6   state                             10000 non-null  object 
 7   nbrPurchases90d                   10000 non-null  int64  
 8   avgTxnSize90d                     10000 non-null  float64
 9   totalSpend90d                     10000 non-null  float64
 10  csrNotes                          10000 non-null  object 
 11  nbrDistinctMerch90d               10000 non-null  int64  
 12  nbrMerchCredits90d                10000 non-null  int64  
 13  nbrMerchCreditsRndDollarAmt90d    10000 non-null  int64  
 14  totalMerchCred90d                 10000 non-null  float64
 15  nbrMerchCreditsWoOffsettingPurch  10000 non-null  int64  
 16  nbrPayments90d                    10000 non-null  int64  
 17  totalPaymentAmt90d                10000 non-null  float64
 18  overpaymentAmt90d                 10000 non-null  float64
 19  overpaymentInd90d                 10000 non-null  int64  
 20  nbrCustReqRefunds90d              10000 non-null  int64  
 21  indCustReqRefund90d               10000 non-null  int64  
 22  totalRefundsToCust90d             10000 non-null  float64
 23  nbrPaymentsCashLike90d            10000 non-null  int64  
 24  maxRevolveLine                    10000 non-null  int64  
 25  indOwnsHome                       10000 non-null  int64  
 26  nbrInquiries1y                    10000 non-null  int64  
 27  nbrCollections3y                  10000 non-null  int64  
 28  nbrWebLogins90d                   10000 non-null  int64  
 29  nbrPointRed90d                    10000 non-null  int64  
 30  PEP                               10000 non-null  int64  
dtypes: float64(7), int64(22), object(2)
memory usage: 2.4+ MB

The sample data contains the following features:

  1. ALERT: Alert Indicator
  2. SAR: Target variable, SAR Indicator
  3. kycRiskScore: Account relationship (Know Your Customer) score used at time of account opening
  4. income: Annual income
  5. tenureMonths: Account tenure in months
  6. creditScore: Credit bureau score
  7. state: Account billing address state
  8. nbrPurchases90d: Number of purchases in last 90 days
  9. avgTxnSize90d: Average transaction size in last 90 days
  10. totalSpend90d: Total spend in last 90 days
  11. csrNotes: Customer Service Representative notes and codes based on conversations with customer
  12. nbrDistinctMerch90d: Number of distinct merchants purchased at in last 90 days
  13. nbrMerchCredits90d: Number of credits from merchants in last 90 days
  14. nbrMerchCreditsRndDollarAmt90d: Number of credits from merchants in round dollar amounts in last 90 days
  15. totalMerchCred90d: Total merchant credit amount in last 90 days
  16. nbrMerchCreditsWoOffsettingPurch: Number of merchant credits without an offsetting purchase in last 90 days
  17. nbrPayments90d: Number of payments in last 90 days
  18. totalPaymentAmt90d: Total payment amount in last 90 days
  19. overpaymentAmt90d: Total amount overpaid in last 90 days
  20. overpaymentInd90d: Indicator that account was overpaid in last 90 days
  21. nbrCustReqRefunds90d: Number refund requests by the customer in last 90 days
  22. indCustReqRefund90d: Indicator that customer requested a refund in last 90 days
  23. totalRefundsToCust90d: Total refund amount in last 90 days
  24. nbrPaymentsCashLike90d: Number of cash-like payments (e.g., money orders) in last 90 days
  25. maxRevolveLine: Maximum revolving line of credit
  26. indOwnsHome: Indicator that the customer owns a home
  27. nbrInquiries1y: Number of credit inquiries in the past year
  28. nbrCollections3y: Number of collections in the past year
  29. nbrWebLogins90d: Number of logins to the bank website in the last 90 days
  30. nbrPointRed90d: Number of loyalty point redemptions in the last 90 days
  31. PEP: Politically Exposed Person indicator
In [4]:

# Upload a dataset
ct = datetime.datetime.now()
file_name = f"AML_Alert_train_{int(ct.timestamp())}.csv"
dataset = dr.Dataset.create_from_in_memory_data(data_frame=df, fname=file_name)
dataset
Out [4]:

Dataset(name='AML_Alert_train_1687350171.csv', id='6492eb9c1e1e2e52c305e3ca')

While a dataset is being registered in Workbench, DataRobot also performs EDA1 analysis and profiling for every feature to detect feature types, automatically transform date-type features, and assess feature quality. Once registration is complete, you can view the exploratory data insights uncovered while computing EDA1, as detailed in the documentation.

Based on the exploratory data insights above, you can draw the following quick observations:

  1. The entire population of interest comprises only alerts, which aligns with the problem’s focus.
  2. The false positive alerts (SAR=0) account for approximately 90%, which is typical for AML problems.
  3. Some features, such as PEP, do not offer any useful information as they consist entirely of zeroes or have a single value.
  4. Certain features, like nbrPaymentsCashLike90d, exhibit signs of zero inflation.
  5. There is potential to convert certain numerical features, such as indOwnsHome, into categorical features.

Additionally, DataRobot automatically detects and addresses common data quality issues with minimal or no user intervention. For instance, a binary column is automatically added within a blueprint to flag rows with excess zeros. This allows the model to capture potential patterns related to abnormal values. No further user action is required.

Create and manage experiments

Experiments are the individual “projects” within a Use Case. They allow you to vary data, targets, and modeling settings to find the optimal models to solve your business problem. Within each experiment, you have access to its Leaderboard and model insights, as well as experiment summary information.

In [5]:

# Create a new project based on a dataset
ct = datetime.datetime.now()
project_name = f"Anti Money Laundering Alert Scoring_{int(ct.timestamp())}"
project = dataset.create_project(project_name=project_name)
print(
    f"""Project Details
Project URL: {project.get_uri()}
Project ID: {project.id}
Project Name: {project.project_name}
    """
)
Project Details
Project URL: https://app.eu.datarobot.com/projects/6492ebd2b83ed3cc6ec5bb2e/models
Project ID: 6492ebd2b83ed3cc6ec5bb2e
Project Name: Anti Money Laundering Alert Scoring_1687350226

Start modeling

In [6]:

# Select modeling parameters and start the modeling process
project.analyze_and_model(target="SAR", mode=dr.AUTOPILOT_MODE.QUICK, worker_count="-1")

project.wait_for_autopilot(check_interval=20.0, timeout=86400, verbosity=0)

Evaluate experiments

As you proceed with modeling, Workbench generates a model Leaderboard, a ranked list of models that facilitates quick evaluation. The models on the Leaderboard are ranked based on the selected optimization metrics, such as LogLoss in this case.

Autopilot, DataRobot’s “survival of the fittest” modeling mode, automatically selects the most suitable predictive models for the specified target feature and trains them with increasing sample sizes. Autopilot not only identifies the best-performing models but also recommends a model that excels at predicting the target feature SAR. The model selection process considers a balance of accuracy, metric performance, and model simplicity. For a detailed understanding, please refer to the Model recommendation process Model recommendation process description.

Within the Leaderboard, you can click on a specific model to access visualizations for further exploration, as outlined in the documentation.

download 5

Lift Chart

The Lift Chart above shows how effective the model is at separating the SAR and non-SAR alerts. After an alert in the out-of-sample partition gets scored by the trained model, it will be assigned with a risk score that measures the likelihood of the alert being a SAR risk, or becoming a SAR.

In the Lift Chart, alerts are sorted based on the SAR risk, broken down into 10 deciles, and displayed from lowest to the highest. For each decile, DataRobot computes the average predicted SAR risk (blue plus) as well as the average actual SAR event (orange circle) and depicts the two lines together. For the recommended model built for this false positive reduction use case, the SAR rate of the top decile is about 65%, which is a significant lift from the ~10% SAR rate in the training data. The top three deciles capture almost all SARs, which means that the 70% of alerts with very low predicted SAR risk rarely result in a SAR.

ROC Curve

Once you have confidence that the model is performing well, select an explicit threshold to make a binary decision based on the continuous SAR risk predicted by DataRobot. To pick up the optimal threshold, there are three important criteria:

  1. The false negative rate has to be as small as possible. False negatives are the alerts that DataRobot determines are not SARs which then turn out to be true SARs. Missing a true SAR is very dangerous and would potentially result in an MRA (matter requiring attention) or regulatory fine. This example takes a conservative approach to have a 0 false negative rate, meaning all true SARs are captured. To achieve this, the threshold has to be low enough to capture all the SARs.
  2. Keep the alert volume as low as possible to reduce enough false positives. In this context, all alerts generated in the past that are not SARs are the de-facto false positives; the machine learning model is likely to assign a lower score to those non-SAR alerts. Therefore, pick a high enough threshold to reduce as many false positive alerts as possible.
  3. Ensure the selected threshold is not only working on the seen data, but also on the unseen data. This is required so that when the model is deployed to the transaction monitoring system for on-going scoring, it can still reduce false positives without missing any SARs.

From experimenting with different choices of thresholds using the cross-validation data (the data used for model training and validation), it seems that 0.03 is the optimal threshold since it satisfies the first two criteria. On one hand, the false negative rate is 0; on the other hand, the alert volume is reduced from 8000 to 2098 (False Positive + True Positive), meaning the number of investigations are reduced by 73% (5902/8000) without missing any SARs.

For the third criterion—setting the threshold to work on unseen alerts—you can quickly validate it in DataRobot. By changing the Data Selection dropdown to Holdout, and applying the same threshold (0.03), the false negative rate remains 0 and the reduction in investigations is still 73% (1464/2000). This proves that the model generalizes well and will perform as expected on unseen data.

ROC curve

Model insights

DataRobot offers a comprehensive suite of powerful tools and features designed to facilitate the interpretation, explanation, and validation of the factors influencing a model’s predictions. One such tool is Feature Impact, which provides a high-level visualization that identifies the features that have the strongest influence on the model’s decisions. A large impact indicates that removing this feature would significantly deteriorate the model’s performance. On the other hand, features with lower impact may have relatively less importance individually but can still contribute to the overall predictive power of the model.

Predict and deploy

Once you identify the model that best learns patterns in your data to predict SARs, DataRobot makes it easy to deploy the model into your alert investigation process. This is a critical step for implementing the use case, as it ensures that predictions are used in the real world to reduce false positives and improve efficiency in the investigation process. The following sections describe activities related to preparing and then deploying a model.

The following applications of the alert-prioritization score from the false positive reduction model both automate and augment the existing rule-based transaction monitoring system.

  • If the FCC (Financial Crime Compliance) team is comfortable with removing the low-risk alerts (very low prioritization score) from the scope of investigation, then the binary threshold selected during the model building stage will be used as the cutoff to remove those no-risk alerts. The investigation team will only investigate alerts above the cutoff, which will still capture all the SARs based on what was learned from the historical data.
  • Often regulatory agencies will consider auto-closure or auto-removal as an aggressive treatment to production alerts. If auto-closing is not the ideal way to use the model output, the alert prioritization score can still be used to triage alerts into different investigation processes, hence improving the operational efficiency.

See the deep dive at the end of this use case for information on decision process considerations.

You can use the following code to return the Recommended for Deployment model to use for model predictions.

In [7]:

model = dr.ModelRecommendation.get(project.id).get_model()
model
Out [7]:

Model('RandomForest Classifier (Gini)')

Compute predictions before deployment

By uploading an external dataset, you can ensure consistent performance in production prior to deployment. This new data will need to have the same transformations applied to the training data.

You can use the UI and follow the five steps of the workflow for testing predictions. When predictions are complete, you can save prediction results to a CSV file.

With the following code, you can obtain more detailed results including predictions, probability of class_1 (positive_probability), probability of class_0 (autogenerated), actual values of the target (SAR), and all features. Furthermore, you can compute Prediction Explanations on this external dataset (which was not part of training data).

In [10]:

# Load an alert dataset for predictions
df_score = pd.read_csv(
    "https://s3.amazonaws.com/datarobot-use-case-datasets/DR_Demo_AML_Alert_pred.csv",
    encoding="ISO-8859-1",
)

# Get the recommended model
model_rec = dr.ModelRecommendation.get(project.id).get_model()
model_rec.set_prediction_threshold(0.03)

# Upload a scoring data set to DataRobot
prediction_dataset = project.upload_dataset(df_score.drop("SAR", axis=1))
predict_job = model_rec.request_predictions(prediction_dataset.id)

# Make predictions
predictions = predict_job.get_result_when_complete()

# Display prediction results
results = pd.concat(
    [predictions.drop("row_id", axis=1), df_score.drop("ALERT", axis=1)], axis=1
)
results.head()
01234
prediction00011
positive_probability0000.1209180.407422
prediction_threshold0.030.030.030.030.03
class_0.01110.8790820.592578
class_1.00000.1209180.407422
SAR00011
kycRiskScore23322
income54400100100598004110052100
tenureMonths1437053
creditScore681702681718704
indCustReqRefund90d11111
totalRefundsToCust90d30.8665.7225.342828.512778.84
nbrPaymentsCashLike90d00024
maxRevolveLine1000013000140001500011000
indOwnsHome00111
nbrInquiries1y41433
nbrCollections3y00000
nbrWebLogins90d86863
nbrPointRed90d20112
PEP00000

Look at the results above. Since this is a binary classification problem:

  • As the positive_probability approaches zero, the row is a stronger candidate for class_0 with prediction value of 0 (the alert is not SAR).
  • As positive_probability approaches one, the outcome is more likely to be of class_1 with prediction value of 1 (the alert is SAR).

From the KDE (Kernel Density Estimate) plot below, you can see that this sample of the data is weighted more strongly toward class_0 (the alert is not SAR); the Probability Density for predictions is close to actuals.

In [11]:

plt_kde = results[["positive_probability", "SAR"]].plot.kde(
    xlim=(0, 1), title="Prediction Distribution"
)
prediction distribution
In [12]:

# Prepare Prediction Explanations
pe_init = dr.PredictionExplanationsInitialization.create(project.id, model_rec.id)
pe_init.wait_for_completion()

Computing Prediction Explanations is a resource-intensive task. You can set a maximum number of explanations per row and also configure prediction value thresholds to speed up the process.

Considering the prediction distribution above, set the threshold_low to 0.2 and threshold_high to 0.5. This will provide Prediction Explanations only for those extreme predictions where positive_probability is lower than 0.2 or higher than 0.5.

In [13]:

# Compute Prediction Explanations with a custom config
number_of_explanations = 3
pe_comput = dr.PredictionExplanations.create(
    project.id,
    model_rec.id,
    prediction_dataset.id,
    max_explanations=number_of_explanations,
    threshold_low=0.2,
    threshold_high=0.5,
)
pe_result = pe_comput.get_result_when_complete()
explanations = pe_result.get_all_as_dataframe().drop("row_id", axis=1).dropna()
display(explanations.head())
01235
prediction00010
class_0_label00000
class_0_probability1110.8790820.98379
class_1_label11111
class_1_probability0000.1209180.01621
explanation_0_featuretotalSpend90davgTxnSize90dtotalSpend90dnbrCustReqRefunds90davgTxnSize90d
explanation_0_feature_value216.2514.92488.88495.55
explanation_0_label11111
explanation_0_qualitative_strength
explanation_0_strength-3.210206-3.834376-3.20076-1.514812-0.402981
explanation_1_featurenbrPaymentsCashLike90dtotalSpend90dnbrPaymentsCashLike90dcsrNotescsrNotes
explanation_1_feature_value0775.840billing address plastic replace moneyordercustomer call statement moneyorder
explanation_1_label11111
explanation_1_qualitative_strength++
explanation_1_strength-2.971257-3.261914-3.031098-0.7084360.390769
explanation_2_featurecsrNotestotalMerchCred90dtotalMerchCred90davgTxnSize90dnbrPurchases90d
explanation_2_feature_valuecard replace statement customer call statement80.471715.6196
explanation_2_label11111
explanation_2_qualitative_strength
explanation_2_strength-2.819563-2.982999-2.990864-0.141831-0.329526

The following code lets you see how often various features are showing up as the top explanation for impacting the probability of SAR.

In [14]:

from functools import reduce

# Create a combined histogram of all the explanations
explanations_hist = reduce(
    lambda x, y: x.add(y, fill_value=0),
    (
        explanations["explanation_{}_feature".format(i)].value_counts()
        for i in range(number_of_explanations)
    ),
)

plt_expl = explanations_hist.plot.bar()
download 6

Having seen the model’s Feature Impact insight earlier, the high occurrence of totalSpend90doverPaymentAmt90d, and totalMerchCred90d as Prediction Explanations is not entirely surprising. These were some of the top-ranked features in the impact chart.

Deploy a model and monitor performance

The DataRobot platform offers a wide variety of deployment methods, among which the most direct route is deploying a model from the Leaderboard. When you create a deployment from the Leaderboard, DataRobot automatically creates a model package for the deployed model. You can access the model package at any time in the Model Registry. For more details, see the documentation for deploying from the Leaderboard. The programmatic alternative to create deployments can be implemented by the code below.

DataRobot will continuously monitor the model deployed on the dedicated prediction server. With DataRobot MLOps, the modeling team can monitor and manage the alert prioritization model by tracking the distribution drift of the input features as well as the performance deprecation over time.

In [15]:

pred_serv_id = dr.PredictionServer.list()[0].id
deployment = dr.Deployment.create_from_learning_model(
    model_id=model_rec.id,
    label="Anti Money Laundering Alert Scoring",
    description="Anti Money Laundering Alert Scoring",
    default_prediction_server_id=pred_serv_id,
)
deployment
Out [15]:

Deployment(Anti Money Laundering Alert Scoring)

When you select a deployment from the Deployments Inventory, DataRobot opens to the Overview page for that deployment, which provides a model and environment specific summary that describes the deployment, including the information you supplied when creating the deployment and any model replacement activity.

The Service Health tab tracks metrics about a deployment’s ability to respond to prediction requests quickly and reliably. This helps identify bottlenecks and assess capacity, which is critical to proper provisioning.

The Data Drift tab provides interactive and exportable visualizations that help identify the health of a deployed model over a specified time interval.

Implementation risks

When operationalizing this use case, consider the following, which may impact outcomes and require model re-evaluation:

  • Change in the transactional behavior of the money launderers.
  • Novel information introduced to the transaction, and customer records that are not seen by the machine learning models.

Deep dive: Imbalanced targets

In AML and Transaction Monitoring, the SAR rate is usually very low (1%–5%, depending on the detection scenarios); sometimes it could be even lower than 1% in extremely unproductive scenarios. In machine learning, such a problem is called class imbalance. The question becomes, how can you mitigate the risk of class imbalance and let the machine learn as much as possible from the limited known-suspicious activities?

DataRobot offers different techniques to handle class imbalance problems. Some techniques:

  • Evaluate the model with different metrics. For binary classification (the false positive reduction model here, for example), LogLoss is used as the default metric to rank models on the Leaderboard. Since the rule-based system is often unproductive, which leads to very low SAR rate, it’s reasonable to take a look at a different metric, such as the SAR rate in the top 5% of alerts in the prioritization list. The objective of the model is to assign a higher prioritization score with a high risk alert, so it’s ideal to have a higher rate of SAR in the top tier of the prioritization score. In the example shown in the image below, the SAR rate in the top 5% of prioritization score is more than 70% (original SAR rate is less than 10%), which indicates that the model is very effective in ranking the alert based on the SAR risk.
  • DataRobot also provides flexibility for modelers when tuning hyperparameters which could also help with the class imbalance problem. In the example below, the Random Forest Classifier is tuned by enabling the balance_boostrap (random sample equal amount of SAR and non-SAR alerts in each decision trees in the forest); you can see the validation score of the new ‘Balanced Random Forest Classifier’ model is slightly better than the parent model.
  • You can also use Smart Downsampling (from the Advanced Options tab) to intentionally downsample the majority class (i.e., non-SAR alerts) in order to build faster models with similar accuracy.

Deep Dive: Decision process

A review process typically consists of a deep-dive analysis by investigators. The data related to the case is made available for review so that the investigators can develop a 360-degree view of the customer, including their profile, demographic, and transaction history. Additional data from third-party data providers, and web crawling, can supplement this information to complete the picture.

For transactions that do not get auto-closed or auto-removed, the model can help the compliance team create a more effective and efficient review process by triaging their reviews. The predictions and their explanations also give investigators a more holistic view when assessing cases.

Risk-based alert triage

Based on the prioritization score, the investigation team could take different investigation strategies. For example:

  • No-risk or low-risk alerts can be reviewed on a quarterly basis, instead of monthly. The frequently alerted entities without any SAR risk can then be reviewed once every three months, which will significantly reduce the time of investigation.
  • High-risk alerts with higher prioritization scores can have their investigation fast-tracked to the final stage in the alert escalation path. This will significantly reduce the effort spent on level 1 and level 2 investigation.
  • Medium-risk alerts can use standard investigation process.

Smart alert assignment

For an alert investigation team that is geographically dispersed, the alert prioritization score can be used to assign alerts to different teams in a more effective manner. High-risk alerts can be assigned to the team with the most experienced investigators while low risk alerts can be handled by a less experienced team. This mitigates the risk of missing suspicious activities due to lack of competency with alert investigations.

For both approaches, the definition of high/medium/low risk could be either a set of hard thresholds (for example, High: score>=0.5, Medium: 0.5>score>=0.3, Low: score<0.3), or based on the percentile of the alert scores on a monthly basis (for exxample, High: above 80th percentile, Medium: between 50th and 80th percentile, Low: below 50th percentile).

Get Started with Free Trial

Experience new features and capabilities previously only available in our full AI Platform product.

The post Anti-Money Laundering (AML) Alert Scoring appeared first on DataRobot AI Platform.

]]>
Tackling Churn with AI – Before Modelling https://www.datarobot.com/ai-accelerators/tackling-churn-with-ai-before-modelling/ Wed, 21 Feb 2024 14:57:11 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53597 This accelerator will teach the problem framing and data management steps required before modelling begins. We will use two examples to illustrate concepts: a B2C retail example, and a B2B example based on DataRobot’s internal churn model.

The post Tackling Churn with AI – Before Modelling appeared first on DataRobot AI Platform.

]]>
Customer retention is central to any successful business and machine learning is frequently proposed as a way of addressing churn. It is tempting to dive right into a churn dataset, but improving outcomes requires correctly framing the problem. Doing so at the start will determine whether the business can take action based on the trained model and whether your hard work is valuable or not.

This blog will teach the problem framing and data management steps required before modelling begins. We will use two examples to illustrate concepts: a B2C retail example and a B2B example based on DataRobot’s internal churn model.

One of the fundamental misconceptions about modelling churn is that a good churn model will reduce churn. Even an excellent model will have no impact on churn by itself. It will just correctly identify at risk customers. It is the consumers of the model’s predictions who take action to retain customers. In fact, if a churn model perfectly predicts which customers will leave, it means the interventions had no impact on customer retention.

Sometimes these interventions can be automated, like triggering an email with a discount. Often it is a person who decides whether and how to intervene. This means that as we build a churn model, we need our end users to trust the model and consider its recommendations in their actions. Keeping our end users in mind is a theme that will be present throughout our blog series as we demonstrate how to build a useful churn model.

Problem Framing

To frame this problem, we need to identify stakeholders, create three business definitions, and then decide on the consumption plan. Stakeholders will help with every section, which is why we need to identify them first.

Identify stakeholders

Because our end users are so critical to the success of the churn modelling project, it is important to identify the right people. To identify them, you can ask: who cares about this? Who is responsible for reducing churn? Who will take action once the model identifies a high-risk customer? Bring these stakeholders in early. They need to trust the results, or else they might ignore them. Their feedback can often provide ideas for feature engineering, improve data quality, and make the model actionable for the business.

Define churn

The next step is to define churn, which your stakeholders can help do. This tends to be segmented by the business model. Note that within a company, it is possible to have multiple business models for different revenue streams, so your definition of churn may need to vary by product or service offering!

For subscription-based business models, this is typically whether a customer renews their subscription. In our B2B example, customers typically have annual subscriptions, so our churn definition uses whether they renew their contract. Revenue could also be a factor in this definition, where any downsell might also be considered churn.

In retail-like business models, where customers make individual purchases (not limited to retail businesses, this applies to other industries too!), this is typically related to whether a customer makes a purchase in some window of time. In our B2C example, churn is defined as a customer who does not make any purchases in the next 3 months. This could be the next 6 months, or the next 30 days, etc. It also could use a revenue threshold, where a customer purchases at least $50 worth of products or services. Ultimately your stakeholders will have the best idea on what definition will be most valuable to the business, which is why it’s important that they sign off on this definition.

Population

We also need to define our population, which will impact who we end up training and making predictions on. Determining the boundaries of this population requires understanding the business goals and listening to stakeholders. There may be a specific group of customers that the business cares about retaining, such as mid-sized and large companies. Maybe there is one product that is particularly important. Or there could be no differentiation, and the business wants to predict on all customers. Ultimately, your stakeholders will make this decision, which is why it’s important to discuss this with them.

In the B2C example, the focus is on retaining customers who had placed an order in the previous 3 months. This was a commonly used definition for various metrics throughout the business. Using a common definition pays dividends down the churn-modelling road because the model will align with existing processes and analyses.

In the B2B example, the population of interest started as customers with a managed cloud subscription. We used this restriction because we had more data available for those customers. Over time, this definition was expanded to include all other customers. This approach allowed us to solve for the easier population first and prove value quickly. Then we could address the population which is harder to predict.

Prediction point

Finally, we need to define our prediction point. This is the time at which we will make predictions about churn. If you plan to operationalize your model (make predictions on new customers), it is important this aligns to the time at which you make predictions in production. Remember that the end goal is to prevent churn, so these prediction points need to be early enough for your stakeholders to intervene and prevent churn. These prediction points should also be spaced out far enough that there is a meaningful chance for churn risk to change. If your customers typically make one purchase a week, then making a new prediction every day is unlikely to add much value beyond a model which predicts once a week. The simpler your model is, the easier it will be to build and consume!

In the B2C example, the model is used to make predictions every month. The prediction point is the first day of each month and on that date the model predicts the probability that each customer in the population of interest (those who had placed an order in the preceding three months) will not place an order in the following three months, and therefore will churn.

In the B2B example, the prediction point is every four weeks, up to 36 weeks prior to the renewal date. This gives one prediction every month for the 9 months before renewal.

Model consumption strategy

Before diving into the data, it is important to have line of sight into how the business will use the churn model. This will impact modeling choices, such as how strictly we need to follow the prediction point and whether we can include features which would be difficult to use in production.

One method of consumption, and a good first objective, is to surface insights in order to reduce churn across the entire customer base. As with most data science projects, it makes sense to begin with exploration. Often a thorough understanding of the problem immediately surfaces potential solutions. A good model will present insights which might, for example, uncover regular churn patterns. Presented to relevant stakeholders, these insights may lead to suggested changes to the product in order to divert customers from those patterns. In this way, model insights can be useful to understanding and addressing churn at an aggregate level. This approach is easier and faster to implement, but likely will have more limited ROI, as it does not provide individual churn predictions for each customer.

Second is operationalizing the model to make new predictions in production. This gives each customer their own churn risk and allows end users to prioritize interventions for those which are more likely to churn. For this to be actionable, concrete and cost-effective churn prevention actions are needed. Can we add customer support to the account? Can we offer a promotion? This is why talking to stakeholders at the beginning is important. It teaches us what interventions are possible to prevent churn.

Data Management

With a firm understanding of the problem, we can begin building our training dataset. The first step is to set our prediction point and sampling strategy.

Prediction point and sampling

The most common mistake at this stage is to accidentally train the model on data from after the prediction point. This leads to look-ahead bias. A model trained on data from after the prediction point will have lower accuracy in production than it did in validation, because it no longer has access to data from the future (relative to the prediction point). This is why the first step is to create the relevant prediction points for each customer. For example, the B2B example uses a prediction point of every 4 weeks leading up to the renewal date, up to 36 weeks (9 months) prior to the renewal date. The SQL code below shows an example of how you can create each row in the dataset using this framing.

In [ ]:

#| code-fold: true
#| code-summary: "Show code"
#| output: false

with weeks as (
    select 
        row_number() over (order by seq4()) * 4 as n
    from table(generator(rowcount => 9))
)
select
    r.opportunity_id,
    r.renewal_week,
    dateadd('week', -weeks.n, r.renewal_week) as pred_point,
    weeks.n as weeks_to_renewal
from renewals as r
cross join weeks

If the dataset is large enough, the training dataset can be reduced to one row per customer. This is the recommended approach, as it will make partitioning easier and ensure each customer is equally weighted in the dataset. In this case, we randomly choose a valid prediction point for each customer. Using the B2B example, we would randomly choose one of the 9 months for each customer.

In the B2C example, the prediction point is the start of every month. We chose to keep multiple rows per customer, as the dataset was not large enough to develop confident models without them. When using multiple rows per customer, it is important to either use grouped partitioning (grouping on customers such that all rows from one customer are in the same partition) or Out-of-Time Validation. This prevents leakage across the partitions, where a model can learn a specific customer’s behavior.

Target creation

Now we can pull in our definition of churn to create the target. Remember to use the definition relative to the prediction point. In the B2C example, the target is whether the customer made any purchases in the next quarter.

In [ ]:

#| code-fold: true
#| code-summary: "Show code"
#| output: false

with customers as (
    select 
        customer_id,
        min(event_date) as first_purchase
    from events
    group by 1
),
customer_months as (
    select
        c.customer_id,
        dc.date_actual as month_start
    from customers as c
    cross join daily_calendar as dc
    where dc.date_actual = dc.first_day_of_month
        and dc.date_actual > c.first_purchase
),
customer_monthly_purchases as (
    select
        c.customer_id,
        c.month_start,
        count(e.id) as monthly_number_of_purchases
    from customer_months as c
    left join events as e on c.customer_id = e.customer_id
        and c.month_start = date_trunc('month', e.event_date)::date
    where c.month_start < current_date - interval '3 months'
    group by 1, 2 
),
base_table as (
    select 
        customer_id, 
        month_start as pred_point,
        sum(monthly_number_of_purchases) over (
            partition by customer_id 
            order by month_start 
            rows between 3 preceding and 1 preceding
        ) as number_of_purchases_last_3_months,
        sum(monthly_number_of_purchases) over (
            partition by customer_id 
            order by month_start 
            rows between current row and 2 following
        ) as number_of_purchases_next_3_months,
    (number_of_purchases_next_3_months = 0)::int as churn
    from customer_monthly_purchases
)
select customer_id, pred_point, churn
from base_table
where number_of_purchases_last_3_months > 0
limit 5;
max sql output

This code generates the primary dataset with CHURN=1 if the customer did not place an order in the upcoming three months, and 0 if they did. Including PREDICTION_POINT in this primary table is important because often the training dataset will be comprised of multiple prediction points. This is useful both to increase the size of the training dataset, as well as to help the model account for seasonality. DataRobot feature engineering will also rely on the PREDICTION_POINT field to avoid look-ahead bias.

The B2B model was set up to predict whether a customer would sign a renewal on their renewal date. Again, creating prediction points was necessary to avoid look-ahead bias just like in the B2C example. In this case, though, predictions would be made more frequently and always in reference to that renewal date, e.g. 4 weeks from renewal or 32 weeks out. This way the model could be trained on how different features impact churn probability at different times in the customer lifecycle.

Data sources

It is not always obvious what data will be predictive of churn, so exploring multiple datasets is worthwhile. Data on product/services consumption are important. Some other datasets to consider are purchase history, customer demographic data, customer surveys, and interactions with customer support. In the B2C example, we used data on customer reviews as well as refunds issued.

One way to uncover valuable insights is to include data on actions controllable by the business. If a promotion or a specific marketing campaign turns out to be predictive of churn or retention, that is a quick action item to share with stakeholders. Just ensure these actions were taken before the prediction point, rather than in response to a perceived churn risk.

Listen to your stakeholders about their beliefs on what drives churn or retention and include that data when it is available. This can go a long way towards building their trust in the model. If your model validates their beliefs, it shows evidence that it is learning relevant behavior. On the contrary, if the model refutes one of their beliefs, this can spur a conversation about it. There might be bad data in your dataset, or maybe the feature you created does not accurately represent what they think is a driver. It could also be proof that their belief is wrong, which can foster a deeper understanding of churn risk at the company. These discussions and further data investigation are the key to finding out why.

At the end of the day, start with whatever data is easily accessible and build models with that. Showing value to the business quickly is more important than exploring every dataset possible.

Feature engineering

Merging all of your disparate data into one table may sound daunting. DataRobot’s automated feature engineering can help in a number of ways. DataRobot feature engineering accelerates data preparation for churn modelling by joining data from disparate datasets, automatically generating a wide variety of features across these datasets, and removing features that have little/no relation to churn. Crucially, it also helps avoid the aforementioned look-ahead bias. DataRobot makes use of time-aware feature engineering to ensure we avoid this.

If you prefer to build the dataset outside of DataRobot, make sure your joins are aware of the prediction point, not just the customer ID. In the B2B example, we made heavy use of window functions to create features over a specific period of time. For example, we can join a usage table once but create multiple feature derivation windows, such as number of projects created in the last 4 weeks, last 12 weeks, etc. The SQL below demonstrates how to do this.

In [ ]:

#| code-fold: true
#| code-summary: "Show code"
#| output: false

with weekly_usage_data as (
       select
              a.account_id,
              date_trunc('week', c.date_actual)::date as week_start,
              sum(u.projects_created) as projects_created
       from accounts as a
       inner join daily_calendar as c on a.customer_since_date <= c.date_actual
              and current_date >= c.date_actual
       left join usage_data as u on a.account_id = u.account_id
              and c.date_actual = u.activity_date
       group by 1, 2
)
select
       account_id,
       week_start,
       sum(projects_created) over (partition by account_id
                                   order by week_start
                                   rows between 12 preceding and 1 preceding) as projects_created_last_12_weeks,
       sum(projects_created) over (partition by account_id
                                   order by week_start
                                   rows between 4 preceding and 1 preceding) as projects_created_last_4_weeks
from weekly_usage_data
safer

Ultimately you can be as creative as you want. Just make sure your features are interpretable to the business. They will have ownership of making decisions from what the model recommends, so it is important that they understand how the model makes its predictions.

With our problem well-framed and our dataset created, we are in good shape to begin modelling. Look for Part 2 in this 3 part series for a discussion of model training and evaluation.

Get Started with Free Trial

Experience new features and capabilities previously only available in our full AI Platform product.

The post Tackling Churn with AI – Before Modelling appeared first on DataRobot AI Platform.

]]>
GenAI: Automating Product Feedback Reports Using Generative AI and DataRobot https://www.datarobot.com/ai-accelerators/genai-automating-product-feedback-reports-using-generative-ai-and-datarobot/ Fri, 16 Feb 2024 16:10:22 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53460 This accelerator shows how to use Predictive AI models in tandem with Generative AI models and overcome the limitation of guardrails around automating summarization/segmentation of sentiment text. In a nutshell, it consumes product reviews and ratings and outputs a Design Improvement Report.

The post GenAI: Automating Product Feedback Reports Using Generative AI and DataRobot appeared first on DataRobot AI Platform.

]]>
Going through customer review comments to generate insights for product development teams is a time intensive and costly affair. This notebook illustrates how to use DataRobot and generative AI to derive critical insights from customer reviews and automatically create improvement reports to help product teams in their development cycles.

DataRobot provides robust Natural Language Processing capabilities. Using DataRobot models instead of plain summarization on customer reviews allows you to extract keywords that are strongly correlated with feedback. Using this impactful keyword list, Generative AI can generate user-level context around it in the user’s own lingua franca for the benefit of end users. DataRobot AI Platform acts as a guardrail mechanism which traditional text summarization lacks.

Setup

Install required libraries and dependencies

In [ ]:
!pip install "langchain==0.0.244" \
             "openai==0.27.8" \
             "datasets==2.11.0" \
             "fpdf==1.7.2"

Import libraries

In [ ]:

import json
import os
import warnings

import datarobot as dr
from fpdf import FPDF
from langchain.chains import LLMChain
from langchain.chat_models import AzureChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import BaseOutputParser
import numpy as np
import pandas as pd

warnings.filterwarnings("ignore")

Configuration

Set up the configurations reuired for secure connection to the generative AI model. This notebook assumes you have an OpenAI API key, but you can modify it to work with any other hosted LLM as the process remains the same.

In [ ]:

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
OPENAI_ORGANIZATION = os.environ["OPENAI_ORGANIZATION"]
OPENAI_API_BASE = os.environ["OPENAI_BASE"]
OPENAI_DEPLOYMENT_NAME = os.environ["OPENAI_DEPLOYMENT_NAME"]
OPENAI_API_VERSION = os.environ["OPENAI_API_VERSION"]
OPENAI_API_TYPE = os.environ["OPENAI_API_TYPE"]
In [ ]:

"""with open("/home/notebooks/storage/settings.yaml", 'r') as stream:
    config = yaml.safe_load(stream)
OPENAI_API_KEY = config['OPENAI_API_KEY']
OPENAI_ORGANIZATION = config['OPENAI_ORGANIZATION']
OPENAI_API_BASE = config['OPENAI_BASE']
OPENAI_DEPLOYMENT_NAME = config['OPENAI_DEPLOYMENT_NAME']
OPENAI_API_VERSION = config['OPENAI_API_VERSION']
OPENAI_API_TYPE = config['OPENAI_API_TYPE']"""
Out [ ]:

'with open("/home/notebooks/storage/settings.yaml", \'r\') as stream:\n    config = yaml.safe_load(stream)\nOPENAI_API_KEY = config[\'OPENAI_API_KEY\']\nOPENAI_ORGANIZATION = config[\'OPENAI_ORGANIZATION\']\nOPENAI_API_BASE = config[\'OPENAI_BASE\']\nOPENAI_DEPLOYMENT_NAME = config[\'OPENAI_DEPLOYMENT_NAME\']\nOPENAI_API_VERSION = config[\'OPENAI_API_VERSION\']\nOPENAI_API_TYPE = config[\'OPENAI_API_TYPE\']'

Functions and utilities

The cell below outlines the functions to accomplish the following:

  • Extract high impact review keywords from product reviews using DataRobot.
  • During keyword extraction, implement guardrails for selecting models with higher AUC to make sure keywords are robust and correlated to the review sentiment.
  • Generate product development recommendations for the final report.

LLM Parameters: Read the reference documentation for all Azure OpenAI parameters and how they affect output.

In [ ]:

class JsonOutputParser(BaseOutputParser):
    """Parse the output of an LLM call to a Json list."""

    def parse(self, text: str):
        """Parse the output of an LLM call."""
        return json.loads(text)


def get_review_keywords(product_id):
    """Parse the Word Cloud from DataRobot AutoML model and generate the text input for the LLM."""

    keywords = ""
    product = product_subset[product_subset.product_id == product_id]
    product["review_text_full"] = (
        product["review_headline"] + " " + product["review_body"]
    )
    product["review_class"] = np.where(product.star_rating < 3, "bad", "good")
    project = dr.Project.create(
        product[["review_class", "review_text_full"]],
        project_name=product["product_title"].iloc[0],
    )

   """Creates a DataRobot AutoML NLP project with review text"""
    project.analyze_and_model(
        target="review_class",
        mode=dr.enums.AUTOPILOT_MODE.QUICK,
        worker_count=20,
        positive_class="good",
    )
    project.wait_for_autopilot()
    model = project.recommended_model()
    """logic to accept word ngram models and not char ngram models."""
    if max([1 if proc.find("word") != -1 else 0 for proc in model.processes]) == 0:
        models = project.get_models(order_by="-metric")
        for m in models:
            if max([1 if proc.find("word") != -1 else 0 for proc in m.processes]) == 1:
                model = m
                break
    word_cloud = model.get_word_cloud()
    word_cloud = pd.DataFrame(word_cloud.ngrams_per_class()[None])
    word_cloud.sort_values(
        ["coefficient", "frequency"], ascending=[True, False], inplace=True
    )
    # keywords = '; '.join(word_cloud.head(50)['ngram'].tolist())

    """Guardrail to accept higher accuracy models, as it means the wordclouds contain \
    impactful and significant terms only """
    if model.metrics["AUC"]["crossValidation"] > 0.75:
        keywords = "; ".join(word_cloud[word_cloud.coefficient < 0]["ngram"].tolist())
    return keywords

template = f"""
    You are a product designer. A user will pass top keywords from negative customer reviews. \
    Using the keywords list, \
    provide multiple design recommendations based on the keywords to improve the sales of the product.
    Use only top 10 keywords per design recommendation.\
    
    Output Format should be json with fields recommendation_title, recommendation_description, keyword_tags"""

system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages(
    [system_message_prompt, human_message_prompt]
)
chain = LLMChain(
    llm=AzureChatOpenAI(
        deployment_name=OPENAI_DEPLOYMENT_NAME,
        openai_api_type=OPENAI_API_TYPE,
        openai_api_base=OPENAI_API_BASE,
        openai_api_version=OPENAI_API_VERSION,
        openai_api_key=OPENAI_API_KEY,
        openai_organization=OPENAI_ORGANIZATION,
        model_name=OPENAI_DEPLOYMENT_NAME,
        temperature=0,
        verbose=True,
    ),
    prompt=chat_prompt,
    output_parser=JsonOutputParser(),
)

Import data

This accelerator uses the publicly available Amazon Reviews dataset in this workflow. This example uses a subset of products from the Home Electronics line. The full public dataset can be found here.

You can also the individual parquet files:

In [ ]:

product_subset1 = pd.read_parquet(
    "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/amazon_us_reviews-train-00000-of-00002.parquet"
)
product_subset2 = pd.read_parquet(
    "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/amazon_us_reviews-train-00001-of-00002.parquet"
)
product_subset = pd.concat([product_subset1, product_subset2], axis=0)
product_subset.info()
Out [ ]:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 705889 entries, 0 to 31888
Data columns (total 15 columns):
 #   Column             Non-Null Count   Dtype 
---  ------             --------------   ----- 
 0   marketplace        705889 non-null  object
 1   customer_id        705889 non-null  object
 2   review_id          705889 non-null  object
 3   product_id         705889 non-null  object
 4   product_parent     705889 non-null  object
 5   product_title      705889 non-null  object
 6   product_category   705889 non-null  object
 7   star_rating        705889 non-null  int32 
 8   helpful_votes      705889 non-null  int32 
 9   total_votes        705889 non-null  int32 
 10  vine               705889 non-null  int64 
 11  verified_purchase  705889 non-null  int64 
 12  review_headline    705889 non-null  object
 13  review_body        705889 non-null  object
 14  review_date        705889 non-null  object
dtypes: int32(3), int64(2), object(10)
memory usage: 78.1+ MB
In [ ]:

product_subset.head()

Out [ ]:

product_subset.head()

Out[ ]:

01234
marketplaceUSUSUSUSUS
customer_id1798863729376983321214705496223413911
review_idRY01SAV7HZ8QOR1XX8SDGJ4MZ4LR149Q3B5L33NN5R2ZVD69Z6KPJ4OR1DIKG2G33ZLNP
product_idB00NTI0CQ2B00BUCLVZUB00RBX9D5WB00UJ3IULOB0037UCTXG
product_parent667358431621695622143071132567816707909557698
product_titleAketek 1080P LED Protable Projector HD PC AV V…TiVo Mini with IR Remote (Old Version)Apple TV MD199LL/A Bundle including remote and…New Roku 3 6.5 Foot HDMI – Bundle – v1Generic DVI-I Dual-Link (M) to 15-Pin VGA (F) …
product_categoryHome EntertainmentHome EntertainmentHome EntertainmentHome EntertainmentHome Entertainment
star_rating45514
helpful_votes00000
total_votes00020
vine00000
verified_purchase10111
review_headlinegood enough for my purposeTell the Cable Company to Keep their Boxes!Works perfectly!It doesn’t work. Each time I try to use …As pictured
review_bodynot the best picture quality but surely suitab…Not only do my TiVo Minis replace the boxes th…Works perfectly! Very user friendly!It doesn’t work. Each time I try to use it, th…I received the item pictured. I am unsure why…
review_date2015-08-312015-08-312015-08-312015-08-312015-08-31

Report generation loop

This programmatic loop runs through the product list and generates the final report.

In [ ]:
from datetime import datetime
In [ ]:
product_list = ["B000204SWE", "B00EUY59Z8", "B006U1YUZE", "B00752R4PK", "B004OF9XGO"]

pdf = FPDF()
for product_id in product_list:
    print(
        "product id:",
        product_id,
        "started:",
        datetime.now().strftime("%m-%d-%y %H:%M:%S"),
    )
    keywords = get_review_keywords(product_id)
    """ Guardrail to generate report only if there are enough \
        Keywords to provide results"""
    if len(keywords) > 10:
        # report = chain.run(keywords)['recommendations']
        report = chain.run(keywords)
        if type(report) != list:
            report = chain.run(keywords)["recommendations"]
        product_name = product_subset[product_subset.product_id == product_id][
            "product_title"
        ].iloc[0]
        print("Adding to report")
        pdf.add_page()
        pdf.set_font("Arial", "B", 20)
        pdf.multi_cell(w=0, h=10, txt=product_name)
        for reco in report:
            pdf.cell(w=0, h=7, txt="\n", ln=1)
            pdf.set_font("Arial", "B", 14)
            pdf.multi_cell(w=0, h=7, txt=reco["recommendation_title"])
            pdf.set_font("Arial", "", 14)
            pdf.multi_cell(w=0, h=7, txt=reco["recommendation_description"])
            pdf.set_font("Arial", "I", 11)
            pdf.multi_cell(
                w=0, h=5, txt="Review Keywords: " + ", ".join(reco["keyword_tags"])
            )
 print(
        "product id:",
        product_id,
        "completed:",
        datetime.now().strftime("%m-%d-%y %H:%M:%S"),
    )
pdf.output(f"/home/notebooks/storage/product_development_insights.pdf", "F")

Download the report

Download the pdf named “product_development_insights.pdf” at “/home/notebooks/storage/” or from the notebook files tab in the UI Panel.

In [ ]:

from IPython import display

display.Image(
    "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/images/image_report.jpg",
    width=800,
    height=400,
)

Out [ ]:

Output

Conclusion

This accelerator demonstrates how you can use DataRobot and generative AI to identify key patterns in customer reviews and create reports or work items that can be used by product development teams to improve their products and offerings. Using various prompts you can steer the LLM into much more complex outputs like Agile stories, development plans, and more.

Get Started with Free Trial

Experience new features and capabilities previously only available in our full AI Platform product.

The post GenAI: Automating Product Feedback Reports Using Generative AI and DataRobot appeared first on DataRobot AI Platform.

]]>
6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them https://www.datarobot.com/blog/6-reasons-why-generative-ai-initiatives-fail-and-how-to-overcome-them/ Thu, 08 Feb 2024 14:17:53 +0000 https://www.datarobot.com/?post_type=blog&p=53330 There are six common roadblocks to proving business value with generative AI — and we’ll show you how to steer clear of each one.

The post 6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them appeared first on DataRobot AI Platform.

]]>
If you’re an AI leader, you might feel like you’re stuck between a rock and a hard place lately. 

You have to deliver value from generative AI (GenAI) to keep the board happy and stay ahead of the competition. But you also have to stay on top of the growing chaos, as new tools and ecosystems arrive on the market. 

You also have to juggle new GenAI projects, use cases, and enthusiastic users across the organization. Oh, and data security. Your leadership doesn’t want to be the next cautionary tale of good AI gone bad. 

If you’re being asked to prove ROI for GenAI but it feels more like you’re playing Whack-a-Mole, you’re not alone. 

According to Deloitte, proving AI’s business value is the top challenge for AI leaders. Companies across the globe are struggling to move past prototyping to production. So, here’s how to get it done — and what you need to watch out for.  

6 Roadblocks (and Solutions) to Realizing Business Value from GenAI

Roadblock #1. You Set Yourself Up For Vendor Lock-In 

GenAI is moving crazy fast. New innovations — LLMs, vector databases, embedding models — are being created daily. So getting locked into a specific vendor right now doesn’t just risk your ROI a year from now. It could literally hold you back next week.  

Let’s say you’re all in on one LLM provider right now. What if costs rise and you want to switch to a new provider or use different LLMs depending on your specific use cases? If you’re locked in, getting out could eat any cost savings that you’ve generated with your AI initiatives — and then some. 

Solution: Choose a Versatile, Flexible Platform 

Prevention is the best cure. To maximize your freedom and adaptability, choose solutions that make it easy for you to move your entire AI lifecycle, pipeline, data, vector databases, embedding models, and more – from one provider to another. 

For instance, DataRobot gives you full control over your AI strategy — now, and in the future. Our open AI platform lets you maintain total flexibility, so you can use any LLM, vector database, or embedding model – and swap out underlying components as your needs change or the market evolves, without breaking production. We even give our customers the access to experiment with common LLMs, too.

Roadblock #2. Off-the-Grid Generative AI Creates Chaos 

If you thought predictive AI was challenging to control, try GenAI on for size. Your data science team likely acts as a gatekeeper for predictive AI, but anyone can dabble with GenAI — and they will. Where your company might have 15 to 50 predictive models, at scale, you could well have 200+ generative AI models all over the organization at any given time. 

Worse, you might not even know about some of them. “Off-the-grid” GenAI projects tend to escape leadership purview and expose your organization to significant risk. 

While this enthusiastic use of AI can be a recipe for greater business value, in fact, the opposite is often true. Without a unifying strategy, GenAI can create soaring costs without delivering meaningful results. 

Solution: Manage All of Your AI Assets in a Unified Platform

Fight back against this AI sprawl by getting all your AI artifacts housed in a single, easy-to-manage platform, regardless of who made them or where they were built. Create a single source of truth and system of record for your AI assets — the way you do, for instance, for your customer data. 

Once you have your AI assets in the same place, then you’ll need to apply an LLMOps mentality: 

  • Create standardized governance and security policies that will apply to every GenAI model. 
  • Establish a process for monitoring key metrics about models and intervening when necessary.
  • Build feedback loops to harness user feedback and continuously improve your GenAI applications. 

DataRobot does this all for you. With our AI Registry, you can organize, deploy, and manage all of your AI assets in the same location – generative and predictive, regardless of where they were built. Think of it as a single source of record for your entire AI landscape – what Salesforce did for your customer interactions, but for AI. 

Roadblock #3. GenAI and Predictive AI Initiatives Aren’t Under the Same Roof

If you’re not integrating your generative and predictive AI models, you’re missing out. The power of these two technologies put together is a massive value driver, and businesses that successfully unite them will be able to realize and prove ROI more efficiently.

Here are just a few examples of what you could be doing if you combined your AI artifacts in a single unified system:  

  • Create a GenAI-based chatbot in Slack so that anyone in the organization can query predictive analytics models with natural language (Think, “Can you tell me how likely this customer is to churn?”). By combining the two types of AI technology, you surface your predictive analytics, bring them into the daily workflow, and make them far more valuable and accessible to the business.
  • Use predictive models to control the way users interact with generative AI applications and reduce risk exposure. For instance, a predictive model could stop your GenAI tool from responding if a user gives it a prompt that has a high probability of returning an error or it could catch if someone’s using the application in a way it wasn’t intended.  
  • Set up a predictive AI model to inform your GenAI responses, and create powerful predictive apps that anyone can use. For example, your non-tech employees could ask natural language queries about sales forecasts for next year’s housing prices, and have a predictive analytics model feeding in accurate data.   
  • Trigger GenAI actions from predictive model results. For instance, if your predictive model predicts a customer is likely to churn, you could set it up to trigger your GenAI tool to draft an email that will go to that customer, or a call script for your sales rep to follow during their next outreach to save the account. 

However, for many companies, this level of business value from AI is impossible because they have predictive and generative AI models siloed in different platforms. 

Solution: Combine your GenAI and Predictive Models 

With a system like DataRobot, you can bring all your GenAI and predictive AI models into one central location, so you can create unique AI applications that combine both technologies. 

Not only that, but from inside the platform, you can set and track your business-critical metrics and monitor the ROI of each deployment to ensure their value, even for models running outside of the DataRobot AI Platform.

Roadblock #4. You Unknowingly Compromise on Governance

For many businesses, the primary purpose of GenAI is to save time — whether that’s reducing the hours spent on customer queries with a chatbot or creating automated summaries of team meetings. 

However, this emphasis on speed often leads to corner-cutting on governance and monitoring. That doesn’t just set you up for reputational risk or future costs (when your brand takes a major hit as the result of a data leak, for instance.) It also means that you can’t measure the cost of or optimize the value you’re getting from your AI models right now. 

Solution: Adopt a Solution to Protect Your Data and Uphold a Robust Governance Framework

To solve this issue, you’ll need to implement a proven AI governance tool ASAP to monitor and control your generative and predictive AI assets. 

A solid AI governance solution and framework should include:

  • Clear roles, so every team member involved in AI production knows who is responsible for what
  • Access control, to limit data access and permissions for changes to models in production at the individual or role level and protect your company’s data
  • Change and audit logs, to ensure legal and regulatory compliance and avoid fines 
  • Model documentation, so you can show that your models work and are fit for purpose
  • A model inventory to govern, manage, and monitor your AI assets, irrespective of deployment or origin

Current best practice: Find an AI governance solution that can prevent data and information leaks by extending LLMs with company data.

The DataRobot platform includes these safeguards built-in, and the vector database builder lets you create specific vector databases for different use cases to better control employee access and make sure the responses are super relevant for each use case, all without leaking confidential information.

Roadblock #5. It’s Tough To Maintain AI Models Over Time

Lack of maintenance is one of the biggest impediments to seeing business results from GenAI, according to the same Deloitte report mentioned earlier. Without excellent upkeep, there’s no way to be confident that your models are performing as intended or delivering accurate responses that’ll help users make sound data-backed business decisions.

In short, building cool generative applications is a great starting point — but if you don’t have a centralized workflow for tracking metrics or continuously improving based on usage data or vector database quality, you’ll do one of two things:

  1. Spend a ton of time managing that infrastructure.
  2. Let your GenAI models decay over time. 

Neither of those options is sustainable (or secure) long-term. Failing to guard against malicious activity or misuse of GenAI solutions will limit the future value of your AI investments almost instantaneously.

Solution: Make It Easy To Monitor Your AI Models

To be valuable, GenAI needs guardrails and steady monitoring. You need the AI tools available so that you can track: 

  • Employee and customer-generated prompts and queries over time to ensure your vector database is complete and up to date
  • Whether your current LLM is (still) the best solution for your AI applications 
  • Your GenAI costs to make sure you’re still seeing a positive ROI
  • When your models need retraining to stay relevant

DataRobot can give you that level of control. It brings all your generative and predictive AI applications and models into the same secure registry, and lets you:  

  • Set up custom performance metrics relevant to specific use cases
  • Understand standard metrics like service health, data drift, and accuracy statistics
  • Schedule monitoring jobs
  • Set custom rules, notifications, and retraining settings. If you make it easy for your team to maintain your AI, you won’t start neglecting maintenance over time. 

Roadblock #6. The Costs are Too High – or Too Hard to Track 

Generative AI can come with some serious sticker shock. Naturally, business leaders feel reluctant to roll it out at a sufficient scale to see meaningful results or to spend heavily without recouping much in terms of business value. 

Keeping GenAI costs under control is a huge challenge, especially if you don’t have real oversight over who is using your AI applications and why they’re using them. 

Solution: Track Your GenAI Costs and Optimize for ROI

You need technology that lets you monitor costs and usage for each AI deployment. With DataRobot, you can track everything from the cost of an error to toxicity scores for your LLMs to your overall LLM costs. You can choose between LLMs depending on your application and optimize for cost-effectiveness. 

That way, you’re never left wondering if you’re wasting money with GenAI — you can prove exactly what you’re using AI for and the business value you’re getting from each application. 

Deliver Measurable AI Value with DataRobot 

Proving business value from GenAI is not an impossible task with the right technology in place. A recent economic analysis by the Enterprise Strategy Group found that DataRobot can provide cost savings of 75% to 80% compared to using existing resources, giving you a 3.5x to 4.6x expected return on investment and accelerating time to initial value from AI by up to 83%. 

DataRobot can help you maximize the ROI from your GenAI assets and: 

  • Mitigate the risk of GenAI data leaks and security breaches 
  • Keep costs under control
  • Bring every single AI project across the organization into the same place
  • Empower you to stay flexible and avoid vendor lock-in 
  • Make it easy to manage and maintain your AI models, regardless of origin or deployment 

If you’re ready for GenAI that’s all value, not all talk, start your free trial today. 

Webinar
Reasons Why Generative AI Initiatives Fail to Deliver Business Value

(and How to Avoid Them)

Watch on-demand

The post 6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them appeared first on DataRobot AI Platform.

]]>
GenAI: Observability Starter for HuggingFace Models https://www.datarobot.com/ai-accelerators/observability-starter-for-huggingface-models/ Fri, 26 Jan 2024 12:57:14 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53017 This accelerator shows how users can quickly and seamlessly enable LLMOps or Observability in their HuggingFace-based generative AI solutions without the need of code refactoring.

The post GenAI: Observability Starter for HuggingFace Models appeared first on DataRobot AI Platform.

]]>
This accelerator shows how you can easily enable observability in your HuggingFace-based AI Solutions with the DataRobot LLMOps feature tools. It outlines an example of a byte-sized solution in its current state and then uses DataRobot tools to enable observability almost instantly for the solution.

DataRobot provides tools to enable the observability of external generative models. All the hallmarks of DataRobot MLOps are now available for LLMOps.

Setup

Install the prerequisite libraries

This notebook will be using one of the many publicly available LLMs on Huggingface’s hub. HuggingFace’s models are available via the popular Transformers library. The library provides high level APIs for most common language modeling tasks and solutions.

In[1]:

!pip install transformers torch py-readability-metrics nltk
In[2]:

!pip install datarobotx[llm] datarobot-mlops datarobot-mlops-connected-client

Current state

The following cells outline the current state of a simple Google T5 text generation model implementation. Google’s T5 is a text sequence to text sequence model and this accelerator uses a distilled version of T5 available on the Huggingface Hub.

This accelerator uses the pipeline object from the Transformers library to build a text generation example. The pipeline object simplifies model inference by abstracting out most of the low level code. To enable observability on this implementation on your own, you would have to write code to take measurements, enable infrastructure to record all the measurements, and codify rules for interventions. This also introduces a lot of technical debt in the organization.

In[ ]:

from transformers import pipeline

checkpoint = "MBZUAI/LaMini-T5-223M"
model = pipeline("text2text-generation", model=checkpoint)

parameters = {"max_length": 512, "do_sample": True, "temperature": 0.1}


def get_completion(user_input, parameters):
    answer = model(user_input, **parameters)
    return answer


response = get_completion("What is Agile in software development?", parameters)
print(response[0]["generated_text"])
Out[ ]:

Downloading (…)lve/main/config.json:   0%|       | 0.00/1.48k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|       | 0.00/892M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|       | 0.00/142 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|       | 0.00/2.32k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|       | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|       | 0.00/2.20k [00:00<?, ?B/s]

Agile is a software development methodology that emphasizes the use of software development as a continuous process, with the goal of delivering high-quality, efficient, and cost-effective software products to customers. It involves breaking down tasks into smaller, more manageable tasks, and delivering them in short, incremental increments, with the goal of delivering high-quality, reliable, and user-friendly software.

Observability with DataRobot

To enable observability on the above T5 model from Huggingface, you first need to create a deployment in DataRobot. This can be done from the GUI or the API based on your preference.

Connect to DataRobot

In[ ]:

# Initialize the DataRobot Client if you are not running this code outside DataRobot platform.
# import datarobot as dr
# dr.Client(endpoint=ENDPOINT,token=TOKEN)

from utilities import create_external_llm_deployment

deployment_id, model_id = create_external_llm_deployment(checkpoint + " External")
deployment_id
Out[ ]:

nltk_data] Downloading package punkt to /home/notebooks/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.

Downloading (…)lve/main/config.json:   0%|       | 0.00/811 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|       | 0.00/438M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|       | 0.00/174 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|       | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|       | 0.00/112 [00:00<?, ?B/s]

'651298b51e720eee4bfdda27'

Initiate monitoring configuration

The cells below declare and initialize monitoring configuration. The monitoring configuration lets DataRobot understand how to interpret the inputs and outputs of the external model. The pipeline object expects text input and named parameters which are configured in the MonitoringConfig object as seen below.

The input_parser object helps capture and store the entire input or just the essential parts that you prefer.

In[ ]:

from datarobotx.llm.chains.guard import aguard, MonitoringConfig

monitor_config = MonitoringConfig(
    deployment_id=deployment_id,
    model_id=model_id,
    inputs_parser=lambda prompt, parameters: {**{"prompt": prompt}, **parameters},
    output_parser=lambda x: {"answer": x[0]["generated_text"]},
    target="answer",
)
In[ ]:

@aguard(monitor_config)
async def get_completion(user_input, parameters):
    answer = model(user_input, **parameters)
    return answer


response = await get_completion("What is Agile in software development?", parameters)
print(response[0]["generated_text"])
Out[ ]:

Agile is a software development methodology that emphasizes the use of software development as a continuous process, with the goal of delivering high-quality, efficient, and cost-effective software products to customers. It involves breaking down tasks into smaller, more manageable tasks, and delivering them in short, incremental increments, rather than a fixed-size, standardized schedule.
In[ ]:

response = await get_completion("What is a kulbit maneuver?", parameters)
print(response[0]["generated_text"])
Out[ ]:

A kulbit maneuver is a type of maneuver where a person lands on a surface with a curved or curved surface, and then lands on a surface with a curved or curved surface.

Custom metrics

Observability with DataRobot also supports custom user metrics. The following cells show how you can start capturing toxicity in user prompts and readability in generative model responses. Add the custom metrics in the cell below that you want to record to your deployment. Again, this step can be done using the GUI or the API based on user preference.

  • Toxicity in the user prompt
  • Readability (Flesch Score) of the model response
In[ ]:

from utilities import create_custom_metric

TOX_CUSTOM_METRIC_ID = create_custom_metric(
    deployment_id=deployment_id,
    name="Prompt Toxicity",
    baseline="0.1",
    type="average",
    directionality="lowerIsBetter",
)

READ_CUSTOM_METRIC_ID = create_custom_metric(
    deployment_id=deployment_id,
    name="Response Readability",
    baseline="30",
    type="average",
    directionality="higherIsBetter",
)

Update the Huggingface completion endpoint

Modify the prediction function to add code that calculates the metrics and submits them to the deployment. Now, whenever the prediction is requested from the distilled T5 model, the metrics are calculated and submitted on the deployment enabling you to monitor and intervene as necessary.

In[ ]:

from utilities import get_flesch_score, get_text_texicity, submit_custom_metric


@aguard(monitor_config)
async def get_completion(user_input, parameters):
    answer = model(user_input, **parameters)
    try:
        submit_custom_metric(
            deployment_id,
            READ_CUSTOM_METRIC_ID,
            get_flesch_score(answer[0]["generated_text"]),
        )
        submit_custom_metric(
            deployment_id, TOX_CUSTOM_METRIC_ID, get_text_texicity(user_input)
        )
    except Exception as e:
        print(e)
        pass
    return answer


response = await get_completion(
    "What is Agile methodology in software development in detail?", parameters
)
print(response[0]["generated_text"])
Out[ ]:

Agile methodology is a software development approach that emphasizes the use of software development as a continuous process, with the goal of delivering software development in a timely and efficient manner. It involves breaking down the project into smaller, manageable tasks, focusing on the most critical aspects of the project, and allowing for flexibility and adaptability to change. Agile is often used in software development to ensure that the software is delivered on time, within budget, and with the right tools and techniques. It also emphasizes the importance of continuous learning and feedback, and the need for continuous improvement and continuous improvement.

Conclusion

Using the DataRobot tools for LLMOps, you can implement observability for Huggingface based applications easily with less friction while avoiding increased technical debt.

Get Started with Free Trial

Experience new features and capabilities previously only available in our full AI Platform product.

The post GenAI: Observability Starter for HuggingFace Models appeared first on DataRobot AI Platform.

]]>
Multi-Cloud Generative AI https://www.datarobot.com/platform/generative-ai/ Tue, 23 Jan 2024 14:38:41 +0000 https://www.datarobot.com/?page_id=49035 The post Multi-Cloud Generative AI appeared first on DataRobot AI Platform.

]]>

The End-to-End Generative AI Platform

Build, govern, and operate enterprise-grade generative AI solutions with confidence.

Start for Free
Adapt as Your Needs Change

Enjoy the freedom to rapidly innovate and adapt with the best-of-breed components of your choice (LLMs, vector databases, embedding models), across cloud environments.

flexible way dark
Scale While Maintaining Security and Managing Cost

Safeguard proprietary data by extending your LLMs and monitor cost of your generative AI projects real-time to keep them under control.

data scientist info data search graph dark
Have Confidence in Your Generated Responses

Deploy and maintain safe, high-quality, generative AI applications and solutions in production.

model dark
Gain Visibility Across Your Entire AI Landscape

Unify your generative and predictive AI workflows to break down silos with one end-to-end experience.

Use the Best of Breed Components Across Any Cloud

Our API-first integrations let you stay in the driver’s seat for your generative AI initiatives and prevent vendor lock-in. Choose the generative AI components (LLM, vector database, embedding model) that are right for your organization, across cloud environments. Access some of the most common LLMs directly from the DataRobot AI Platform for experimentation – or bring your own model and vector databases.

ai models clouds
Generative AI Playground

Build Sophisticated Generative AI Applications in Hours

Rapidly innovate to deploy new generative AI use cases in hours, with an intuitive interface for easy experimentation and comparison in our Playground. A suite of tools like Azure OpenAI-powered Code Assist and generative AI accelerators help you jumpstart your AI projects. With organized spaces for project management, hosted notebooks, and a sandbox for building and prototyping generative AI apps, DataRobot centralizes your workflow so you can focus on creating valuable AI solutions and not managing infrastructure.

Make Every App an AI-Powered App

Easily integrate generative AI into your organization’s existing operations and systems such as Slack, Salesforce, BI tools and more, with just a few lines of code. Quickly build, prototype and customize bespoke generative AI applications in a few clicks with a hosted Streamlit application sandbox that lets you volley between building and seeing to ensure you’re creating the best user experience.

Bring your GenAI Projects to Life
LLM in registry

Manage All of Your Generative AI Assets in One Unified Experience

Prevent AI chaos. Unify your entire AI landscape into a single source of truth and system of record with our unified Registry and Console – generative and predictive models, whether built on or off the DataRobot platform. Manage your vector databases, LLMs, and prompt engineering strategies neatly together – no matter who built them or how they were built. Upgrade LLMs and keep track of changes with automated versioning that records all model changes, and lets you easily revert to earlier versions when needed.

Confidently Scale and Continuously Optimize Generative AI 

Securely operate and govern your generative AI assets with enterprise-grade LLMOps capabilities. Protect your business reputation and trust your GenAI assets are going to perform as expected in production. 

Safely extend your LLMs with our Vector Database Builder to ensure your proprietary data stay private. Standardize and scale with universal governance and security policies and controls for all of your generative AI assets regardless of deployment or origin. 

Use custom performance metrics to monitor the metrics that are most important for your organization, like toxicity to ensure your LLM is staying “on-topic.” Implement ‘guard models’ to assess important attributes about your generated responses – from correctness to cost to personal data leaks, and create user feedback loops to continuously improve your generative AI applications.

image9
Screenshot 2023 11 06 at 17.48.16

Keep Your Costs Under Control

Keep the cost of generative AI under control as you scale, so you don’t get surprised with an oversized bill. Cost insight metrics let you easily observe the cost of your generative AI applications in real-time, to empower you with the information you need to make cost-performance trade-offs. Set alerts to notify you if costs exceed a designated threshold,  so you can quickly intervene.

Jumpstart Your Journey with the DataRobot Generative AI Catalyst Program

Break through the generative AI inertia and get your organization going quickly with a targeted program to jumpstart your generative AI journey and build lasting capabilities across your organization. Gain access to the world-class DataRobot generative AI platform and the deep expertise of our applied generative AI experts to accelerate delivery of your generative AI use cases to quickly start driving value.

Full scale program timeline V2

Global Enterprises Trust DataRobot to Deliver Speed, Impact, and Scale

  • “DataRobot is an indispensable partner helping us maintain our reputation both internally and externally by deploying, monitoring, and governing generative AI responsibly and effectively.”
    Tom Thomas
    Tom Thomas

    Vice President of Data & Analytics, FordDirect

  • “The generative AI space is changing quickly, and the flexibility, safety and security of DataRobot helps us stay on the cutting edge with a HIPAA-compliant environment we trust to uphold critical health data protection standards. We’re harnessing innovation for real-world applications, giving us the ability to transform patient care and improve operations and efficiency with confidence”
    Rosalia Tungaraza
    Rosalia Tungaraza

    Ph.D, AVP, Artificial Intelligence, Baptist Health South Florida

  • “The value of having DataRobot as a single platform that pulls all the components together can’t be underestimated.”
    Craig Civil
    Craig Civil

    Director of Data Science & AI

    • warner logo color
    • tampa logo color
    • tokiomarine logo color
    • Panasonic logo color
    • mars corp logo color
    • boston children hospital logo color
    • warner logo color
    • tampa logo color
    • tokiomarine logo color
    • Panasonic logo color
    • mars corp logo color
    • boston children hospital logo color
    • warner logo color
    • tampa logo color
    • tokiomarine logo color
    • Panasonic logo color
    • mars corp logo color
    • boston children hospital logo color
    cta module 1920px

    Take AI From Vision to Value

    See how a value-driven approach to AI can accelerate time to impact.

    The post Multi-Cloud Generative AI appeared first on DataRobot AI Platform.

    ]]>
    How to Focus on GenAI Outcomes, Not Infrastructure https://www.datarobot.com/blog/how-to-focus-on-genai-outcomes-not-infrastructure/ Tue, 12 Dec 2023 19:30:15 +0000 https://www.datarobot.com/?post_type=blog&p=52562 Incorporating generative AI into your existing systems isn’t just an infrastructure problem. It’s a business strategy problem. Find out how to solve it.

    The post How to Focus on GenAI Outcomes, Not Infrastructure appeared first on DataRobot AI Platform.

    ]]>
    Are you seeing tangible results from your investment in generative AI — or is it starting to feel like an expensive experiment? 

    For many AI leaders and engineers, it’s hard to prove business value, despite all their hard work. In a recent Omdia survey of over 5,000+ global enterprise IT practitioners, only 13% of have fully adopted GenAI technologies.

    To quote Deloitte’s recent study, “The perennial question is: Why is this so hard?” 

    The answer is complex — but vendor lock-in, messy data infrastructure, and abandoned past investments are the top culprits. Deloitte found that at least one in three AI programs fail due to data challenges.

    If your GenAI models are sitting unused (or underused), chances are it hasn’t been successfully integrated into your tech stack. This makes GenAI, for most brands, feel more like an exacerbation of the same challenges they saw with predictive AI than a solution. 

    Any given GenAI project contains a hefty mix of different versions, languages, models, and vector databases. And we all know that cobbling together 17 different AI tools and hoping for the best creates a hot mess infrastructure. It’s complex, slow, hard to use, and risky to govern.

    Without a unified intelligence layer sitting on top of your core infrastructure, you’ll create bigger problems than the ones you’re trying to solve, even if you’re using a hyperscaler.

    That’s why I wrote this article, and that’s why myself and Brent Hinks discussed this in-depth during a recent webinar.

    Here, I break down six tactics that will help you shift the focus from half-hearted prototyping to real-world value from GenAI.

    6 Tactics That Replace Infrastructure Woes With GenAI Value  

    Incorporating generative AI into your existing systems isn’t just an infrastructure problem; it’s a business strategy problem—one that separates unrealized or broken prototypes from sustainable GenAI outcomes.

    But if you’ve taken the time to invest in a unified intelligence layer, you can avoid unnecessary challenges and work with confidence. Most companies will bump into at least a handful of the obstacles detailed below. Here are my recommendations on how to turn these common pitfalls into growth accelerators: 

    1. Stay Flexible by Avoiding Vendor Lock-In 

    Many companies that want to improve GenAI integration across their tech ecosystem end up in one of two buckets:

    1. They get locked into a relationship with a hyperscaler or single vendor
    2. They haphazardly cobble together various component pieces like vector databases, embedding models, orchestration tools, and more.

    Given how fast generative AI is changing, you don’t want to end up locked into either of these situations. You need to retain your optionality so you can quickly adapt as the tech needs of your business evolve or as the tech market changes. My recommendation? Use a flexible API system. 

    DataRobot can help you integrate with all of the major players, yes, but what’s even better is how we’ve built our platform to be agnostic about your existing tech and fit in where you need us to. Our flexible API provides the functionality and flexibility you need to actually unify your GenAI efforts across the existing tech ecosystem you’ve built.

    2. Build Integration-Agnostic Models 

    In the same vein as avoiding vendor lock-in, don’t build AI models that only integrate with a single application. For instance, let’s say you build an application for Slack, but now you want it to work with Gmail. You might have to rebuild the entire thing. 

    Instead, aim to build models that can integrate with multiple different platforms, so you can be flexible for future use cases. This won’t just save you upfront development time. Platform-agnostic models will also lower your required maintenance time, thanks to fewer custom integrations that need to be managed. 

    With the right intelligence layer in place, you can bring the power of GenAI models to a diverse blend of apps and their users. This lets you maximize the investments you’ve made across your entire ecosystem.  In addition, you’ll also be able to deploy and manage hundreds of GenAI models from one location.

    For example, DataRobot could integrate GenAI models that work smoothly across enterprise apps like Slack, Tableau, Salesforce, and Microsoft Teams. 

    3. Bring Generative And Predictive AI into One Unified Experience

    Many companies struggle with generative AI chaos because their generative and predictive models are scattered and siloed. For seamless integration, you need your AI models in a single repository, no matter who built them or where they’re hosted. 

    DataRobot is perfect for this; so much of our product’s value lies in our ability to unify AI intelligence across an organization, especially in partnership with hyperscalers. If you’ve built most of your AI frameworks with a hyperscaler, we’re just the layer you need on top to add rigor and specificity to your initiatives’ governance, monitoring, and observability.

    And this isn’t just for generative or predictive models, but models built by anyone on any platform can be brought in for governance and operation right in DataRobot.

    image 2

    4. Build for Ease of Monitoring and Retraining 

    Given the pace of innovation with generative AI over the past year, many of the models I built six months ago are already out of date. But to keep my models relevant, I prioritize retraining, and not just for predictive AI models. GenAI can go stale, too, if the source documents or grounding data are out of date. 

    Imagine you have dozens of GenAI models in production. They could be deployed to all kinds of places such as Slack, customer-facing applications, or internal platforms. Sooner or later your model will need a refresh. If you only have 1-2 models, it may not be a huge concern now, but if you already have an inventory, it’ll take you a lot of manual time to scale the deployment updates.

    Updates that don’t happen through scalable orchestration are stalling outcomes because of infrastructure complexity. This is especially critical when you start thinking a year or more down the road since GenAI updates usually require more maintenance than predictive AI. 

    DataRobot offers model version control with built-in testing to make sure a deployment will work with new platform versions that launch in the future. If an integration fails, you get an alert to notify you about the failure immediately. It also flags if a new dataset has additional features that aren’t the same as the ones in your currently deployed model. This empowers engineers and builders to be far more proactive about fixing things, rather than finding out a month (or further) down the line that an integration is broken. 

    In addition to model control, I use DataRobot to monitor metrics like data drift and groundedness to keep infrastructure costs in check. The simple truth is that if budgets are exceeded, projects get shut down. This can quickly snowball into a situation where whole teamsare affected because they can’t control costs. DataRobot allows me to track metrics that are relevant to each use case, so I can stay informed on the business KPIs that matter.

    5. Stay Aligned With Business Leadership And Your End Users 

    The biggest mistake that I see AI practitioners make is not talking to people around the business enough. You need to bring in stakeholders early and talk to them often. This is not about having one conversation to ask business leadership if they’d be interested in a specific GenAI use case. You need to continuously affirm they still need the use case — and that whatever you’re working on still meets their evolving needs. 

    There are three components here: 

    1. Engage Your AI Users 

    It’s crucial to secure buy-in from your end-users, not just leadership. Before you start to build a new model, talk to your prospective end-users and gauge their interest level. They’re the consumer, and they need to buy into what you’re creating, or it won’t get used. Hint: Make sure whatever GenAI models you build need to easily connect to the processes, solutions, and data infrastructures users are already in.

    Since your end-users are the ones who’ll ultimately decide whether to act on the output from your model, you need to ensure they trust what you’ve built. Before or as part of the rollout, talk to them about what you’ve built, how it works, and most importantly, how it will help them accomplish their goals.

    1. Involve Your Business Stakeholders In The Development Process 

    Even after you’ve confirmed initial interest from leadership and end-users, it’s never a good idea to just head off and then come back months later with a finished product. Your stakeholders will almost certainly have a lot of questions and suggested changes. Be collaborative and build time for feedback into your projects. This helps you build an application that solves their need and helps them trust that it works how they want.

    1. Articulate Precisely What You’re Trying To Achieve 

    It’s not enough to have a goal like, “We want to integrate X platform with Y platform.” I’ve seen too many customers get hung up on short-term goals like these instead of taking a step back to think about overall goals. DataRobot provides enough flexibility that we may be able to develop a simplified overall architecture rather than fixating on a single point of integration. You need to be specific: “We want this Gen AI model that was built in DataRobot to pair with predictive AI and data from Salesforce. And the results need to be pushed into this object in this way.” 

    That way, you can all agree on the end goal, and easily define and measure the success of the project. 

    image 3

    6. Move Beyond Experimentation To Generate Value Early 

    Teams can spend weeks building and deploying GenAI models, but if the process is not organized, all of the usual governance and infrastructure challenges will hamper time-to-value.

    There’s no value in the experiment itself—the model needs to generate results (internally or externally). Otherwise, it’s just been a “fun project” that’s not producing ROI for the business. That is until it’s deployed.

    DataRobot can help you operationalize models 83% faster, while saving 80% of the normal costs required. Our Playgrounds feature gives your team the creative space to compare LLM blueprints and determine the best fit. 

    Instead of making end-users wait for a final solution, or letting the competition get a head start, start with a minimum viable product (MVP). 

    Get a basic model into the hands of your end users and explain that this is a work in progress. Invite them to test, tinker, and experiment, then ask them for feedback.

    An MVP offers two vital benefits: 

    1. You can confirm that you’re moving in the right direction with what you’re building.
    1. Your end users get value from your generative AI efforts quickly. 

    While you may not provide a perfect user experience with your work-in-progress integration, you’ll find that your end-users will accept a bit of friction in the short term to experience the long-term value.

    Unlock Seamless Generative AI Integration with DataRobot 

    If you’re struggling to integrate GenAI into your existing tech ecosystem, DataRobot is the solution you need. Instead of a jumble of siloed tools and AI assets, our AI platform could give you a unified AI landscape and save you some serious technical debt and hassle in the future. With DataRobot, you can integrate your AI tools with your existing tech investments, and choose from best-of-breed components. We’re here to help you: 

    • Avoid vendor lock-in and prevent AI asset sprawl 
    • Build integration-agnostic GenAI models that will stand the test of time
    • Keep your AI models and integrations up to date with alerts and version control
    • Combine your generative and predictive AI models built by anyone, on any platform, to see real business value

    Ready to get more out of your AI with less friction? Get started today with a free 30-day trial or set up a demo with one of our AI experts.

    Demo
    See the DataRobot AI Platform in Action
    Book a demo

    The post How to Focus on GenAI Outcomes, Not Infrastructure appeared first on DataRobot AI Platform.

    ]]>
    Potential Risks of Generative AI According to NAIAC – And How to Mitigate Them https://www.datarobot.com/blog/potential-risks-of-generative-ai-according-to-naiac-and-how-to-mitigate-them/ Tue, 28 Nov 2023 18:02:11 +0000 https://www.datarobot.com/?post_type=blog&p=52318 This blog post covers the risks of AI, highlighting what has been mentioned in the finding and connecting it to the need for organizations to incorporate mitigation processes to address the potential risks and continual monitoring of their GenAI tools.

    The post Potential Risks of Generative AI According to NAIAC – And How to Mitigate Them appeared first on DataRobot AI Platform.

    ]]>
    The unprecedented rise of Artificial Intelligence (AI) has brought transformative possibilities across various sectors, from industries and economies to societies at large. However, this technological leap also introduces a set of potential challenges. In its recent public meeting, the National AI Advisory Committee (NAIAC)1, which provides recommendations on topics including the current state of the U.S. AI competitiveness, the state of science around AI, and AI workforce issues to the President and the National AI Initiative Office, has voted on a finding based on expert briefing on the potential risks of AI and more specifically generative AI2. This blog post aims to shed light on these concerns and delineate how DataRobot customers can proactively leverage the platform to mitigate these threats.

    Understanding AI’s Potential Risks 

    With the swift rise of AI in the realm of technology, it stands poised to transform sectors, streamline operations, and amplify human potential. Yet, these unmatched progressions also usher in a myriad of challenges that demand attention. The “Findings on The Potential Future Risks of AI” discusses segments the risk of AI in short-term and long-term risks of AI. The near-term risks of AI, as described in the finding, refers to risks associated with AI that are well known and current concerns for AI, whether predictive or generative AI. On the other hand, long-term risks of AI underscores the potential risks of AI that may not materialize given the current state of AI technology or well understood but we should prepare for their potential impacts. This finding highlights a few categories of AI risks – malicious objective or unintended consequences, economic and societal, and catastrophic. 

    Societal

    While Large Language Models (LLMs) are primarily optimized for text prediction tasks, their broader applications don’t adhere to a singular goal. This flexibility allows them to be employed in content creation for marketing, translation, or even in disseminating misinformation on a large scale. In some instances, even when the AI’s objective is well-defined and tailored for a specific purpose, unforeseen negative outcomes can still emerge. In addition, as AI systems evolve in complexity, there’s a growing concern that they might find ways to circumvent the safeguards established to monitor or restrict their behavior. This is especially troubling since, although humans create these safety mechanisms with particular goals in mind, an AI may perceive them differently or pinpoint vulnerabilities.

    Economic

    As AI and automation sweep across various sectors, they promise both opportunities and challenges for employment. While there’s potential for job enhancement and broader accessibility by leveraging generative AI, there’s also a risk of deepening economic disparities. Industries centered around routine activities might face job disruptions, yet AI-driven businesses could unintentionally widen the economic divide. It’s important to highlight that being exposed to AI doesn’t directly equate to job loss, as new job opportunities may emerge and some workers might see improved performance through AI support. However, without strategic measures in place—like monitoring labor trends, offering educational reskilling, and establishing policies like wage insurance—the specter of growing inequality looms, even if productivity soars. But the implications of this shift aren’t merely financial. Ethical and societal issues are taking center stage. Concerns about personal privacy, copyright breaches, and our increasing reliance on these tools are more pronounced than ever. 

    Catastrophic

    The evolving landscape of AI technologies has the potential to reach more advanced levels. Especially, with the adoption of generative AI at scale, there’s growing apprehension about their disruptive potential. These disruptions can endanger democracy, pose national security risks like cyberattacks or bioweapons, and instigate societal unrest, particularly through divisive AI-driven mechanisms on platforms like social media. While there’s debate about AI achieving superhuman prowess and the magnitude of these potential risks, it’s clear that many threats stem from AI’s malicious use, unintentional fallout, or escalating economic and societal concerns.

    Recently, discussion on the catastrophic risks of AI has dominated the conversations on AI risk, especially with regards to generative AI. However, as was put forth by NAIAC, “Arguments about existential risk from AI should not detract from the necessity of addressing existing risks of AI. Nor should arguments about existential risk from AI crowd out the consideration of opportunities that benefit society.”3

    The DataRobot Approach 

    The DataRobot AI Platform is an open, end-to-end AI lifecycle platform that streamlines/simplifies how you build, govern, and operate generative and predictive AI. Designed to unify your entire AI landscape, teams and workflows, it empowers you to deliver real-world value from your AI initiatives, while giving you the flexibility to evolve, and the enterprise control to scale with confidence.

    DataRobot serves as a beacon in navigating these challenges. By championing transparent AI models through automated documentation during the experimentation and in production, DataRobot enables users to review and audit the building process of AI tools and its performance in production, which fosters trust and promotes responsible engagement. The platform’s agility ensures that users can swiftly adapt to the rapidly evolving AI landscape. With an emphasis on training and resource provision, DataRobot ensures users are well-equipped to understand and manage the nuances and risks associated with AI. At its core, the platform prioritizes AI safety, ensuring that responsible AI use is not just encouraged but integral from development to deployment.

    With regards to generative AI, DataRobot has incorporated a trustworthy AI framework in our platform. The chart below highlights the high level view of this framework.

    Trusted AI

    Pillars of this framework, Ethics, Performance, and Operations, have guided us to develop and embed features in the platform that assist users in addressing some of the risks associated with generative AI. Below we delve deeper into each of these components. 

    Ethics

    AI Ethics pertains to how an AI system aligns with the values held by both its users and creators, as well as the real-world consequences of its operation. Within this context, DataRobot stands out as an industry leader by incorporating various features into its platform to address ethical concerns across three key domains: Explainability, Discrimination and harm mitigation, and Privacy preservation.

    DataRobot directly tackles these concerns by offering cutting-edge features that monitor model bias and fairness, apply innovative prediction explanation algorithms, and implement a platform architecture designed to maximize data protection. Additionally, when orchestrating generative AI workflows, DataRobot goes a step further by supporting an ensemble of “guard” models. These guard models play a crucial role in safeguarding generative use cases. They can perform tasks such as topic analysis to ensure that generative models stay on topic, identify and mitigate bias, toxicity, and harm, and detect sensitive data patterns and identifiers that should not be utilized in workflows.

    What’s particularly noteworthy is that these guard models can be seamlessly integrated into DataRobot’s modeling pipelines, providing an extra layer of protection around Language Model (LLM) workflows. This level of protection instills confidence in users and stakeholders regarding the deployment of AI systems. Furthermore, DataRobot’s robust governance capabilities enable continuous monitoring, governance, and updates for these guard models over time through an automated workflow. This ensures that ethical considerations remain at the forefront of AI system operations, aligning with the values of all stakeholders involved.

    Performance

    AI Performance pertains to evaluating how effectively a model accomplishes its intended goal. In the context of an LLM, this could involve tasks such as responding to  user queries, summarizing or retrieving key information, translating text, or avarious other use-cases. It is worth noting that many existing LLM deployments often lack real-time assessment of validity, quality, reliability, and cost. DataRobot, however, has the capability to monitor and measure performance across all of these domains.

    DataRobot’s distinctive blend of generative and predictive AI empowers users to create supervised models capable of assessing the correctness of LLMs based on user feedback. This results in the establishment of an LLM correctness score, enabling the evaluation of response effectiveness. Every LLM output is assigned a correctness score, offering users insights into the confidence level of the LLM and allowing for ongoing tracking through the DataRobot LLM Operations (LLMOps) dashboard. By leveraging domain-specific models for performance assessment, organizations can make informed decisions based on precise information. 

    DataRobot’s LLMOps offers comprehensive monitoring options within its dashboard, including speed and cost tracking. Performance metrics such as response and execution times are continuously monitored to ensure timely handling of user queries. Furthermore, the platform supports the use of custom metrics, enabling users to tailor their performance evaluations. For instance, users can define their own metrics or employ established measures like Flesch reading-ease to gauge the quality of LLM responses to inquiries. This functionality facilitates the ongoing assessment and improvement of LLM quality over time.

    Operations

    AI Operations focuses on ensuring ith the reliability of the system or the environment housing the AI technology. This encompasses not only the reliability of the core system but also the governance, oversight, maintenance, and utilization of that system, all with the overarching goal of ensuring efficient, effective, and safe and secure operations. 

    With over 1 million AI projects operationalized and delivering over 1 trillion predictions, the DataRobot platform has established itself as a robust enterprise foundation capable of supporting and monitoring a diverse array of AI use cases. The platform boasts built-in governance features that streamline development and maintenance processes. Users benefit from custom environments that facilitate the deployment of knowledge bases with pre-installed dependencies, expediting development lifecycles. Critical knowledge base deployment activities are logged meticulously to ensure that key events are captured and stored for reference. DataRobot seamlessly integrates with version control, promoting best practices through continuous integration/continuous deployment (CI/CD) and code maintenance. Approval workflows can be orchestrated to ensure that LLM systems undergo proper approval processes before reaching production. Additionally, notification policies keep users informed about key deployment-related activities.

    Security and safety are paramount considerations. DataRobot employs two-factor authentication and access control mechanisms to ensure that only authorized developers and users can utilize LLMs.

    DataRobot’s LLMOps monitoring extends across various dimensions. Service health metrics track the system’s ability to respond quickly and reliably to prediction requests. Crucial metrics like response time provide essential insights into the LLM’s capacity to address user queries promptly. Furthermore, DataRobot’s customizable metrics capability empowers users to define and monitor their own metrics, ensuring effective operations. These metrics could encompass overall cost, readability, user approval of responses, or any user-defined criteria. DataRobot’s text drift feature enables users to monitor changes in input queries over time, allowing organizations to analyze query changes for insights and intervene if they deviate from the intended use case. As organizational needs evolve, this text drift capability serves as a trigger for new development activities.

    DataRobot’s LLM-agnostic approach offers users the flexibility to select the most suitable LLM based on their privacy requirements and data capture policies. This accommodates partners, which enforce enterprise privacy, as well as privately hosted LLMs where data capture is not a concern and is managed by the LLM owners. Additionally, it facilitates solutions where network egress can be controlled. Given the diverse range of applications for generative AI, operational requirements may necessitate various LLMs for different environments and tasks. Thus, an LLM-agnostic framework and operations are essential.

    It’s worth highlighting that DataRobot is committed to continually enhancing its platform by incorporating more responsible AI features into the AI lifecycle for the benefit of end users.

    Conclusion 

    While AI is a beacon of potential and transformative benefits, it is essential to remain cognizant of the accompanying risks. Platforms like DataRobot are pivotal in ensuring that the power of AI is harnessed responsibly, driving real-world value, while proactively addressing challenges.

    Demo
    Start Driving Real-World Value From AI Today
    Book a demo

    1 The White House. n.d. “National AI Advisory Committee.” AI.Gov. https://ai.gov/naiac/.

    2 “FINDINGS: The Potential Future Risks of AI.” October 2023. National Artificial Intelligence Advisory Committee (NAIAC). https://ai.gov/wp-content/uploads/2023/11/Findings_The-Potential-Future-Risks-of-AI.pdf.

    3 “STATEMENT: On AI and Existential Risk.” October 2023. National Artificial Intelligence Advisory Committee (NAIAC). https://ai.gov/wp-content/uploads/2023/11/Statement_On-AI-and-Existential-Risk.pdf.

    The post Potential Risks of Generative AI According to NAIAC – And How to Mitigate Them appeared first on DataRobot AI Platform.

    ]]>
    FordDirect Uncovers AI-Powered Insights 75% Faster, Driving Sales and Service https://www.datarobot.com/customers/forddirect/ Mon, 27 Nov 2023 14:00:00 +0000 https://www.datarobot.com/?post_type=casestudy&p=52172 The digital marketing solution provider for Ford Dealers and Lincoln Retailers turned to the DataRobot AI Platform to shorten the time to understand customers and prospects, enabling highly personalized touchpoints. Their highest-scored leads are 18X more likely to buy a vehicle.

    The post FordDirect Uncovers AI-Powered Insights 75% Faster, Driving Sales and Service appeared first on DataRobot AI Platform.

    ]]>
    DataRobot is our AI platform of choice, which gives us the unique ability to identify, communicate, and engage with our consumers through highly personalized touchpoints. We’re doing things like AI-powered recommendations, optimizations, and direct signals that our dealers, retailers, product partners, and Ford’s digital marketing teams rely on.
    NY Tom Thomas FordDirect
    Tom Thomas

    Vice President of Data & Analytics, FordDirect

    FordDirect: An Indispensable Marketing Partner

    In the competitive vehicle market, dealers who can anticipate when consumers are ready to buy have an edge.

    For Ford Dealers and Lincoln Retailers in North America, a unique partnership makes that possible. FordDirect, a joint venture between Ford Motor Company and Ford Dealers and Lincoln Retailers, serves as their trusted advisor and digital marketing solution provider.

    “There really is nothing like FordDirect in the automotive industry,” said Tom Thomas, Vice President of Data & Analytics at FordDirect. “Our mission is to be an indispensable partner to the dealers and retailers and drive more sales and service for them and Ford.”

    AI for More Personalized Customer Experiences

    FordDirect launched its Customer Journey Platform with the goal of creating a 360-degree view of the customer journey. The platform captures thousands of customer signals – from web visits, calls, chat interactions, and other touches – across Ford, dealers, retailers, and third parties in near real-time.

    To help make sense of the data, FordDirect relies on the DataRobot AI Platform, which integrates with Microsoft Azure and Databricks.

    “DataRobot is our AI platform of choice. Together with our Customer Journey Platform, we have a unique ability to identify, communicate, and engage with our consumers through highly personalized touchpoints,” Thomas said. “We’re doing things like AI-powered recommendations, optimizations, and direct signals that our dealers, retailers, product partners, and Ford’s digital marketing teams rely on.”

    Marketing and advertising partners tap the insights to create more personalized experiences for current and prospective customers.

    From Start to Implementation 75% Faster

    As FordDirect and its data science partner RXA @ OneMagnify tackle each step of the data science process, DataRobot automates the machine learning process. The platform helps prepare data, determine features, move models into production, validate them, set governance rules, and monitor and measure models to improve continuously.

    On the front end, DataRobot connects to FordDirect’s data platforms to help prepare the data. Automatic data readiness checks assess data input, saving data scientists time. Then, the platform automatically derives hundreds of features to use.

    “One of the things I love about DataRobot is the ability to actually model against different feature sets,” said Jonathan Prantner, Chief Analytics Officer at RXA @ OneMagnify. “You can model against all the features plus the ones that DataRobot has derived, or use the platform’s predictive powers regarding which features are most important.”

    Then, DataRobot automatically trains potentially hundreds of models, enabling data scientists to zero in on the winning ones.

    Once in production, the platform provides detailed insight into model performance. Data scientists spend less time on maintenance and more time on new projects.

    DataRobot automates every phase. Data scientists can actually focus on data science, turning one data scientist into four. And that’s not a number that we are just throwing out there. Compared to a custom hand-developed model, from data access to implementation, it takes one-fourth the time.
    1516326537464
    Jonathan Prantner

    Chief Analytics Officer, RXA @OneMagnify

    Finding Prospects 18X More Likely to Buy

    FordDirect runs several large-scale models for use cases such as forecasting, multi-touch attribution weighting, media mix modeling, customer and dealer/retailer segmentation, natural language processing, and propensity scoring.

    One of its top-performing models identifies customers with a likelihood to purchase within the next 90 days. Using this propensity model, FordDirect found that the highest-scored leads are 18 times more likely to buy a vehicle. In fact, 90% of all buys happen in the top 20% of scored customers – a segment valued at an impressive $6.5 million.*

    Moreover, they find that model performance is five times faster than the original model, allowing the company to score more customers and service vehicles over time at the dealerships.

    Ultimately, this model and others help FordDirect score records to create journey profiles regarding propensity to buy and service, and other preferences and predictions to determine next-best actions.

    “We continuously feed these individual customer signals into DataRobot’s automated machine learning platform to calculate and refine each customer’s likelihood to purchase or conduct service over the next 90 days,” Thomas said.

    Testing and Monitoring New LLMs with Generative AI

    By adopting DataRobot in combination with the Customer Journey Platform, FordDirect was able to replace legacy technology systems. This move decreased their technology debt by approximately $3 million* – all while improving agility, efficiency, and effectiveness.

    Next, FordDirect looks forward to scaling with generative AI use cases to add additional value, and unifying workflows for both predictive and generative AI. The platform offers the ability to test and try new large language models quickly, and rapidly build, securely operate, and confidently govern the performance of those LLMs in one place.

    It also helps the company safely extend its proprietary data with LLMs and maintain ownership over its intellectual property as it continues to deliver value to Ford, Lincoln, and their dealership networks.

    “The combination of DataRobot and FordDirect has really given our dealers and retailers an advantage in the marketplace in terms of increased sales, service, higher ROI, and stronger customer loyalty,” Thomas said. “What we’re able to do is unprecedented.”

    Demo
    See the DataRobot AI Platform in Action
    Request a demo

    *Figures provided by FordDirect based on their own experience.

    The post FordDirect Uncovers AI-Powered Insights 75% Faster, Driving Sales and Service appeared first on DataRobot AI Platform.

    ]]>