Generative AI Archives | DataRobot AI Platform https://www.datarobot.com/blog/category/generative-ai/ Deliver Value from AI Thu, 07 Mar 2024 16:19:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 Choosing the Right Vector Embedding Model for Your Generative AI Use Case https://www.datarobot.com/blog/choosing-the-right-vector-embedding-model-for-your-generative-ai-use-case/ Thu, 07 Mar 2024 15:33:37 +0000 https://www.datarobot.com/?post_type=blog&p=53883 When building a RAG application we often need to choose a vector embedding model, a critical component of many generative AI applications. Learn mor

The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

]]>
In our previous post, we discussed considerations around choosing a vector database for our hypothetical retrieval augmented generation (RAG) use case. But when building a RAG application we often need to make another important decision: choose a vector embedding model, a critical component of many generative AI applications. 

A vector embedding model is responsible for the transformation of unstructured data (text, images, audio, video) into a vector of numbers that capture semantic similarity between data objects. Embedding models are widely used beyond RAG applications, including recommendation systems, search engines, databases, and other data processing systems. 

Understanding their purpose, internals, advantages, and disadvantages is crucial and that’s what we’ll cover today. While we’ll be discussing text embedding models only, models for other types of unstructured data work similarly.

What Is an Embedding Model?

Machine learning models don’t work with text directly, they require numbers as input. Since text is ubiquitous, over time, the ML community developed many solutions that handle the conversion from text to numbers. There are many approaches of varying complexity, but we’ll review just some of them.

A simple example is one-hot encoding: treat words of a text as categorical variables and map each word to a vector of 0s and single 1.

image1

Unfortunately, this embedding approach is not very practical, since it leads to a large number of unique categories and results in unmanageable dimensionality of output vectors in most practical cases. Also, one-hot encoding does not put similar vectors closer to one another in a vector space.

Embedding models were invented to tackle these issues. Just like one-hot encoding, they take text as input and return vectors of numbers as output, but they are more complex as they are taught with supervised tasks, often using a neural network. A supervised task can be, for example, predicting product review sentiment score. In this case, the resulting embedding model would place reviews of similar sentiment closer to each other in a vector space. The choice of a supervised task is critical to producing relevant embeddings when building an embedding model.

image2

image3
Word embeddings projected onto 2D axes

On the diagram above we can see word embeddings only, but we often need more than that since human language is more complex than just many words put together. Semantics, word order, and other linguistic parameters should all be taken into account, which means we need to take it to the next level – sentence embedding models

Sentence embeddings associate an input sentence with a vector of numbers, and, as expected, are way more complex internally since they have to capture more complex relationships.

image4

Thanks to progress in deep learning, all state-of-the-art embedding models are created with deep neural nets, since they better capture complex relationships inherent to a human language.

A good embedding model should: 

  • Be fast since often it is just a preprocessing step in a larger application
  • Return vectors of manageable dimensions
  • Return vectors that capture enough information about similarity to be practical

Let’s now quickly look into how most embedding models are organized internally.

Modern Neural Networks Architecture

As we just mentioned, all well-performing state-of-the-art embedding models are deep neural networks. 

This is an actively developing field and most top performing models are associated with some novel architecture improvement. Let’s briefly cover two very important architectures: BERT and GPT.

BERT (Bidirectional Encoder Representations from Transformers) was published in 2018 by researchers at Google and described the application of the bidirectional training of “transformer”, a popular attention model, to language modeling. Standard transformers include two separate mechanisms: an encoder for reading text input and a decoder that makes a prediction. 

BERT uses an encoder that reads the entire sentence of words at once which allows the model to learn the context of a word based on all of its surroundings, left and right unlike legacy approaches that looked at a text sequence from left to right or right to left. Before feeding word sequences into BERT, some words are replaced with [MASK] tokens and then the model attempts to predict the original value of the masked words, based on the context provided by the other, non-masked words in the sequence.  

Standard BERT does not perform very well in most benchmarks and BERT models require task-specific fine-tuning. But it is open-source, has been around since 2018, and has relatively modest system requirements (can be trained on a single medium-range GPU). As a result, it became very popular for many text-related tasks. It is fast, customizable, and small. For example, a very popular all-Mini-LM model is a modified version of BERT.

GPT (Generative Pre-Trained Transformer) by OpenAI is different. Unlike BERT, It is unidirectional, i.e. text is processed in one direction and uses a decoder from a transformer architecture that is suitable for predicting the next word in a sequence. These models are slower and produce very high dimensional embeddings, but they usually have many more parameters, do not require fine-tuning, and are more applicable to many tasks out of the box. GPT is not open source and is available as a paid API.

Context Length and Training Data

Another important parameter of an embedding model is context length. Context length is the number of tokens a model can remember when working with a text. A longer context window allows the model to understand more complex relationships within a wider body of text. As a result, models can provide outputs of higher quality, e.g. capture semantic similarity better.

To leverage a longer context, training data should include longer pieces of coherent text: books, articles, and so on. However, increasing context window length increases the complexity of a model and increases compute and memory requirements for training. 

There are methods that help manage resource requirements e.g. approximate attention, but they do this at a cost to quality. That’s another trade-off that affects quality and costs: larger context lengths capture more complex relationships of a human language, but require more resources.

Also, as always, the quality of training data is very important for all models. Embedding models are no exception. 

Semantic Search and Information Retrieval

Using embedding models for semantic search is a relatively new approach. For decades, people used other technologies: boolean models, latent semantic indexing (LSI), and various probabilistic models.

Some of these approaches work reasonably well for many existing use cases and are still widely used in the industry. 

One of the most popular traditional probabilistic models is BM25 (BM is “best matching”), a search relevance ranking function. It is used to estimate the relevance of a document to a search query and ranks documents based on the query terms from each indexed document. Only recently have embedding models started consistently outperforming it, but BM25 is still used a lot since it is simpler than using embedding models, it has lower computer requirements, and the results are explainable.

Benchmarks

Not every model type has a comprehensive evaluation approach that helps to choose an existing model. 

Fortunately, text embedding models have common benchmark suites such as:

The article “BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models” proposed a reference set of benchmarks and datasets for information retrieval tasks. The original BEIR benchmark consists of a set of 19 datasets and methods for search quality evaluation. Methods include: question-answering, fact-checking, and entity retrieval. Now anyone who releases a text embedding model for information retrieval tasks can run the benchmark and see how their model ranks against the competition.

Massive Text Embedding Benchmarks include BEIR and other components that cover 58 datasets and 112 languages. The public leaderboard for MTEB results can be found here.

These benchmarks have been run on a lot of existing models and their leaderboards are very useful to make an informed choice about model selection.

Using Embedding Models in a Production Environment

Benchmark scores on standard tasks are very important, but they represent only one dimension.

When we use an embedding model for search, we run it twice:

  • When doing offline indexing of available data
  • When embedding a user query for a search request 

There are two important consequences of this. 

The first is that we have to reindex all existing data when we change or upgrade an embedding model. All systems built using embedding models should be designed with upgradability in mind because newer and better models are released all the time and, most of the time, upgrading a model is the easiest way to improve overall system performance. An embedding model is a less stable component of the system infrastructure in this case.

The second consequence of using an embedding model for user queries is that the inference latency becomes very important when the number of users goes up. Model inference takes more time for better-performing models, especially if they require GPU to run: having latency higher than 100ms for a small query is not unheard of for models that have more than 1B parameters. It turns out that smaller, leaner models are still very important in a higher-load production scenario. 

The tradeoff between quality and latency is real and we should always remember about it when choosing an embedding model.

As we have mentioned above, embedding models help manage output vector dimensionality which affects the performance of many algorithms downstream. Generally the smaller the model, the shorter the output vector length, but, often, it is still too great for smaller models. That’s when we need to use dimensionality reduction algorithms such as PCA (principal component analysis), SNE / tSNE (stochastic neighbor embedding), and UMAP (uniform manifold approximation). 

Another place we can use dimensionality reduction is before storing embeddings in a database. Resulting vector embeddings will occupy less space and retrieval speed will be faster, but will come at a price for the quality downstream. Vector databases are often not the primary storage, so embeddings can be regenerated with better precision from the original source data. Their use helps to reduce the output vector length and, as a result, makes the system faster and leaner.

Making the Right Choice

There’s an abundance of factors and trade-offs that should be considered when choosing an embedding model for a use case. The score of a potential model in common benchmarks is important, but we should not forget that it’s the larger models that have a better score. Larger models have higher inference time which can severely limit their use in low latency scenarios as often an embedding model is a pre-processing step in a larger pipeline. Also, larger models require GPUs to run. 

If you intend to use a model in a low-latency scenario, it’s better to focus on latency first and then see which models with acceptable latency have the best-in-class performance. Also, when building a system with an embedding model you should plan for changes since better models are released all the time and often it’s the simplest way to improve the performance of your system.

Closing the Generative AI Confidence Gap

Discover how DataRobot helps you deliver real-world value with generative AI

Learn more

The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

]]>
6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them https://www.datarobot.com/blog/6-reasons-why-generative-ai-initiatives-fail-and-how-to-overcome-them/ Thu, 08 Feb 2024 14:17:53 +0000 https://www.datarobot.com/?post_type=blog&p=53330 There are six common roadblocks to proving business value with generative AI — and we’ll show you how to steer clear of each one.

The post 6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them appeared first on DataRobot AI Platform.

]]>
If you’re an AI leader, you might feel like you’re stuck between a rock and a hard place lately. 

You have to deliver value from generative AI (GenAI) to keep the board happy and stay ahead of the competition. But you also have to stay on top of the growing chaos, as new tools and ecosystems arrive on the market. 

You also have to juggle new GenAI projects, use cases, and enthusiastic users across the organization. Oh, and data security. Your leadership doesn’t want to be the next cautionary tale of good AI gone bad. 

If you’re being asked to prove ROI for GenAI but it feels more like you’re playing Whack-a-Mole, you’re not alone. 

According to Deloitte, proving AI’s business value is the top challenge for AI leaders. Companies across the globe are struggling to move past prototyping to production. So, here’s how to get it done — and what you need to watch out for.  

6 Roadblocks (and Solutions) to Realizing Business Value from GenAI

Roadblock #1. You Set Yourself Up For Vendor Lock-In 

GenAI is moving crazy fast. New innovations — LLMs, vector databases, embedding models — are being created daily. So getting locked into a specific vendor right now doesn’t just risk your ROI a year from now. It could literally hold you back next week.  

Let’s say you’re all in on one LLM provider right now. What if costs rise and you want to switch to a new provider or use different LLMs depending on your specific use cases? If you’re locked in, getting out could eat any cost savings that you’ve generated with your AI initiatives — and then some. 

Solution: Choose a Versatile, Flexible Platform 

Prevention is the best cure. To maximize your freedom and adaptability, choose solutions that make it easy for you to move your entire AI lifecycle, pipeline, data, vector databases, embedding models, and more – from one provider to another. 

For instance, DataRobot gives you full control over your AI strategy — now, and in the future. Our open AI platform lets you maintain total flexibility, so you can use any LLM, vector database, or embedding model – and swap out underlying components as your needs change or the market evolves, without breaking production. We even give our customers the access to experiment with common LLMs, too.

Roadblock #2. Off-the-Grid Generative AI Creates Chaos 

If you thought predictive AI was challenging to control, try GenAI on for size. Your data science team likely acts as a gatekeeper for predictive AI, but anyone can dabble with GenAI — and they will. Where your company might have 15 to 50 predictive models, at scale, you could well have 200+ generative AI models all over the organization at any given time. 

Worse, you might not even know about some of them. “Off-the-grid” GenAI projects tend to escape leadership purview and expose your organization to significant risk. 

While this enthusiastic use of AI can be a recipe for greater business value, in fact, the opposite is often true. Without a unifying strategy, GenAI can create soaring costs without delivering meaningful results. 

Solution: Manage All of Your AI Assets in a Unified Platform

Fight back against this AI sprawl by getting all your AI artifacts housed in a single, easy-to-manage platform, regardless of who made them or where they were built. Create a single source of truth and system of record for your AI assets — the way you do, for instance, for your customer data. 

Once you have your AI assets in the same place, then you’ll need to apply an LLMOps mentality: 

  • Create standardized governance and security policies that will apply to every GenAI model. 
  • Establish a process for monitoring key metrics about models and intervening when necessary.
  • Build feedback loops to harness user feedback and continuously improve your GenAI applications. 

DataRobot does this all for you. With our AI Registry, you can organize, deploy, and manage all of your AI assets in the same location – generative and predictive, regardless of where they were built. Think of it as a single source of record for your entire AI landscape – what Salesforce did for your customer interactions, but for AI. 

Roadblock #3. GenAI and Predictive AI Initiatives Aren’t Under the Same Roof

If you’re not integrating your generative and predictive AI models, you’re missing out. The power of these two technologies put together is a massive value driver, and businesses that successfully unite them will be able to realize and prove ROI more efficiently.

Here are just a few examples of what you could be doing if you combined your AI artifacts in a single unified system:  

  • Create a GenAI-based chatbot in Slack so that anyone in the organization can query predictive analytics models with natural language (Think, “Can you tell me how likely this customer is to churn?”). By combining the two types of AI technology, you surface your predictive analytics, bring them into the daily workflow, and make them far more valuable and accessible to the business.
  • Use predictive models to control the way users interact with generative AI applications and reduce risk exposure. For instance, a predictive model could stop your GenAI tool from responding if a user gives it a prompt that has a high probability of returning an error or it could catch if someone’s using the application in a way it wasn’t intended.  
  • Set up a predictive AI model to inform your GenAI responses, and create powerful predictive apps that anyone can use. For example, your non-tech employees could ask natural language queries about sales forecasts for next year’s housing prices, and have a predictive analytics model feeding in accurate data.   
  • Trigger GenAI actions from predictive model results. For instance, if your predictive model predicts a customer is likely to churn, you could set it up to trigger your GenAI tool to draft an email that will go to that customer, or a call script for your sales rep to follow during their next outreach to save the account. 

However, for many companies, this level of business value from AI is impossible because they have predictive and generative AI models siloed in different platforms. 

Solution: Combine your GenAI and Predictive Models 

With a system like DataRobot, you can bring all your GenAI and predictive AI models into one central location, so you can create unique AI applications that combine both technologies. 

Not only that, but from inside the platform, you can set and track your business-critical metrics and monitor the ROI of each deployment to ensure their value, even for models running outside of the DataRobot AI Platform.

Roadblock #4. You Unknowingly Compromise on Governance

For many businesses, the primary purpose of GenAI is to save time — whether that’s reducing the hours spent on customer queries with a chatbot or creating automated summaries of team meetings. 

However, this emphasis on speed often leads to corner-cutting on governance and monitoring. That doesn’t just set you up for reputational risk or future costs (when your brand takes a major hit as the result of a data leak, for instance.) It also means that you can’t measure the cost of or optimize the value you’re getting from your AI models right now. 

Solution: Adopt a Solution to Protect Your Data and Uphold a Robust Governance Framework

To solve this issue, you’ll need to implement a proven AI governance tool ASAP to monitor and control your generative and predictive AI assets. 

A solid AI governance solution and framework should include:

  • Clear roles, so every team member involved in AI production knows who is responsible for what
  • Access control, to limit data access and permissions for changes to models in production at the individual or role level and protect your company’s data
  • Change and audit logs, to ensure legal and regulatory compliance and avoid fines 
  • Model documentation, so you can show that your models work and are fit for purpose
  • A model inventory to govern, manage, and monitor your AI assets, irrespective of deployment or origin

Current best practice: Find an AI governance solution that can prevent data and information leaks by extending LLMs with company data.

The DataRobot platform includes these safeguards built-in, and the vector database builder lets you create specific vector databases for different use cases to better control employee access and make sure the responses are super relevant for each use case, all without leaking confidential information.

Roadblock #5. It’s Tough To Maintain AI Models Over Time

Lack of maintenance is one of the biggest impediments to seeing business results from GenAI, according to the same Deloitte report mentioned earlier. Without excellent upkeep, there’s no way to be confident that your models are performing as intended or delivering accurate responses that’ll help users make sound data-backed business decisions.

In short, building cool generative applications is a great starting point — but if you don’t have a centralized workflow for tracking metrics or continuously improving based on usage data or vector database quality, you’ll do one of two things:

  1. Spend a ton of time managing that infrastructure.
  2. Let your GenAI models decay over time. 

Neither of those options is sustainable (or secure) long-term. Failing to guard against malicious activity or misuse of GenAI solutions will limit the future value of your AI investments almost instantaneously.

Solution: Make It Easy To Monitor Your AI Models

To be valuable, GenAI needs guardrails and steady monitoring. You need the AI tools available so that you can track: 

  • Employee and customer-generated prompts and queries over time to ensure your vector database is complete and up to date
  • Whether your current LLM is (still) the best solution for your AI applications 
  • Your GenAI costs to make sure you’re still seeing a positive ROI
  • When your models need retraining to stay relevant

DataRobot can give you that level of control. It brings all your generative and predictive AI applications and models into the same secure registry, and lets you:  

  • Set up custom performance metrics relevant to specific use cases
  • Understand standard metrics like service health, data drift, and accuracy statistics
  • Schedule monitoring jobs
  • Set custom rules, notifications, and retraining settings. If you make it easy for your team to maintain your AI, you won’t start neglecting maintenance over time. 

Roadblock #6. The Costs are Too High – or Too Hard to Track 

Generative AI can come with some serious sticker shock. Naturally, business leaders feel reluctant to roll it out at a sufficient scale to see meaningful results or to spend heavily without recouping much in terms of business value. 

Keeping GenAI costs under control is a huge challenge, especially if you don’t have real oversight over who is using your AI applications and why they’re using them. 

Solution: Track Your GenAI Costs and Optimize for ROI

You need technology that lets you monitor costs and usage for each AI deployment. With DataRobot, you can track everything from the cost of an error to toxicity scores for your LLMs to your overall LLM costs. You can choose between LLMs depending on your application and optimize for cost-effectiveness. 

That way, you’re never left wondering if you’re wasting money with GenAI — you can prove exactly what you’re using AI for and the business value you’re getting from each application. 

Deliver Measurable AI Value with DataRobot 

Proving business value from GenAI is not an impossible task with the right technology in place. A recent economic analysis by the Enterprise Strategy Group found that DataRobot can provide cost savings of 75% to 80% compared to using existing resources, giving you a 3.5x to 4.6x expected return on investment and accelerating time to initial value from AI by up to 83%. 

DataRobot can help you maximize the ROI from your GenAI assets and: 

  • Mitigate the risk of GenAI data leaks and security breaches 
  • Keep costs under control
  • Bring every single AI project across the organization into the same place
  • Empower you to stay flexible and avoid vendor lock-in 
  • Make it easy to manage and maintain your AI models, regardless of origin or deployment 

If you’re ready for GenAI that’s all value, not all talk, start your free trial today. 

Webinar
Reasons Why Generative AI Initiatives Fail to Deliver Business Value

(and How to Avoid Them)

Watch on-demand

The post 6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them appeared first on DataRobot AI Platform.

]]>
Choosing the Right Database for Your Generative AI Use Case https://www.datarobot.com/blog/choosing-the-right-database-for-your-generative-ai-use-case/ Thu, 11 Jan 2024 16:54:47 +0000 https://www.datarobot.com/?post_type=blog&p=52804 Vector databases each have their pros and cons - no one will be right for all of your organization's generative AI use cases. Learn more.

The post Choosing the Right Database for Your Generative AI Use Case appeared first on DataRobot AI Platform.

]]>
Ways of Providing Data to a Model

Many organizations are now exploring the power of generative AI to improve their efficiency and gain new capabilities. In most cases, to fully unlock these powers, AI must have access to the relevant enterprise data. Large Language Models (LLMs) are trained on publicly available data (e.g. Wikipedia articles, books, web index, etc.), which is enough for many general-purpose applications, but there are plenty of others that are highly dependent on private data, especially in enterprise environments.

There are three main ways to provide new data to a model:

  1. Pre-training a model from scratch. This rarely makes sense for most companies because it is very expensive and requires a lot of resources and technical expertise.
  2. Fine-tuning an existing general-purpose LLM. This can reduce the resource requirements compared to pre-training, but still requires significant resources and expertise. Fine-tuning produces specialized models that have better performance in a domain for which it is finetuned for but may have worse performance in others. 
  3. Retrieval augmented generation (RAG). The idea is to fetch data relevant to a query and include it in the LLM context so that it could “ground” its own outputs in that information. Such relevant data in this context is referred to as “grounding data”. RAG complements generic LLM models, but the amount of information that can be provided is limited by the LLM context window size (amount of text the LLM can process at once, when the information is generated).

Currently, RAG is the most accessible way to provide new information to an LLM, so let’s focus on this method and dive a little deeper.

Retrieval Augmented Generation 

In general, RAG means using a search or retrieval engine to fetch a relevant set of documents for a specified query. 

For this purpose, we can use many existing systems: a full-text search engine (like Elasticsearch + traditional information retrieval techniques), a general-purpose database with a vector search extension (Postgres with pgvector, Elasticsearch with vector search plugin), or a specialized database that was created specifically for vector search.

Retrieval Augmented Generation DataRobot AI Platform

In two latter cases, RAG is similar to semantic search. For a long time, semantic search was a highly specialized and complex domain with exotic query languages and niche databases. Indexing data required extensive preparation and building knowledge graphs, but recent progress in deep learning has dramatically changed the landscape. Modern semantic search applications now depend on embedding models that successfully learn semantic patterns in presented data. These models take unstructured data (text, audio, or even video) as input and transform them into vectors of numbers of a fixed length, thus turning unstructured data into a numeric form that could be used for calculations Then it becomes  possible to calculate the distance between vectors using a chosen distance metric, and the resulting distance will reflect the semantic similarity between vectors and, in turn, between pieces of original data.

These vectors are indexed by a vector database and, when querying, our query is also transformed into a vector. The database searches for the N closest vectors (according to a chosen distance metric like cosine similarity) to a query vector and returns them.

A vector database is responsible for these 3 things:

  1. Indexing. The database builds an index of vectors using some built-in algorithm (e.g. locality-sensitive hashing (LSH) or hierarchical navigable small world (HNSW)) to precompute data to speed up querying.
  2. Querying. The database uses a query vector and an index to find the most relevant vectors in a database.
  3. Post-processing. After the result set is formed, sometimes we might want to run an additional step like metadata filtering or re-ranking within the result set to improve the outcome.

The purpose of a vector database is to provide a fast, reliable, and efficient way to store and query data. Retrieval speed and search quality can be influenced by the selection of index type. In addition to the already mentioned LSH and HNSW there are others, each with its own set of strengths and weaknesses. Most databases make the choice for us, but in some, you can choose an index type manually to control the tradeoff between speed and accuracy.

Vector Database DataRobot AI Platform

At DataRobot, we believe the technique is here to stay. Fine-tuning can require very sophisticated data preparation to turn raw text into training-ready data, and it’s more of an art than a science to coax LLMs into “learning” new facts through fine-tuning while maintaining their general knowledge and instruction-following behavior. 

LLMs are typically very good at applying knowledge supplied in-context, especially when only the most relevant material is provided, so a good retrieval system is crucial.

Note that the choice of the embedding model used for RAG is essential. It is not a part of the database and choosing the correct embedding model for your application is critical for achieving good performance. Additionally, while new and improved models are constantly being released, changing to a new model requires reindexing your entire database.

Evaluating Your Options 

Choosing a database in an enterprise environment is not an easy task. A database is often the heart of your software infrastructure that manages a very important business asset: data.

Generally, when we choose a database we want:

  • Reliable storage
  • Efficient querying 
  • Ability to insert, update, and delete data granularly (CRUD)
  • Set up multiple users with various levels of access for them (RBAC)
  • Data consistency (predictable behavior when modifying data)
  • Ability to recover from failures
  • Scalability to the size of our data

This list is not exhaustive and might be a bit obvious, but not all new vector databases have these features. Often, it is the availability of enterprise features that determine the final choice between a well-known mature database that provides vector search via extensions and a newer vector-only database. 

Vector-only databases have native support for vector search and can execute queries very fast, but often lack enterprise features and are relatively immature. Keep in mind that it takes years to build complex features and battle-test them, so it’s no surprise that early adopters face outages and data losses. On the other hand, in existing databases that provide vector search through extensions, a vector is not a first-class citizen and query performance can be much worse. 

We will categorize all current databases that provide vector search into the following groups and then discuss them in more detail:

  • Vector search libraries
  • Vector-only databases
  • NoSQL databases with vector search 
  • SQL databases with vector search 
  • Vector search solutions from cloud vendors

Vector search libraries

Vector search libraries like FAISS and ANNOY are not databases – rather, they provide in-memory vector indices, and only limited data persistence options. While these features are not ideal for users requiring a full enterprise database, they have very fast nearest neighbor search and are open source. They offer good support for high-dimensional data and are highly configurable (you can choose the index type and other parameters). 

Overall, they are good for prototyping and integration in simple applications, but they are inappropriate for long-term, multi-user data storage. 

Vector-only databases 

This group includes diverse products like Milvus, Chroma, Pinecone, Weaviate, and others. There are notable differences among them, but all of them are specifically designed to store and retrieve vectors. They are optimized for efficient similarity search with indexing and support high-dimensional data and vector operations natively. 

Most of them are newer and might not have the enterprise features we mentioned above, e.g. some of them don’t have CRUD, no proven failure recovery, RBAC, and so on. For the most part, they can store the raw data, the embedding vector, and a small amount of metadata, but they can’t store other index types or relational data, which means you will have to use another, secondary database and maintain consistency between them. 

Their performance is often unmatched and they are a good option when having multimodal data (images, audio or video).

NoSQL databases with vector search 

Many so-called NoSQL databases recently added vector search to their products, including MongoDB, Redis, neo4j, and ElasticSearch. They offer good enterprise features, are mature, and have a strong community, but they provide vector search functionality via extensions which might lead to less than ideal performance and lack of first-class support for vector search. Elasticsearch stands out here as it is designed for full-text search and already has many traditional information retrieval features that can be used in conjunction with vector search.

NoSQL databases with vector search are a good choice when you are already invested in them and need vector search as an additional, but not very demanding feature.

SQL databases with vector search 

This group is somewhat similar to the previous group, but here we have established players like PostgreSQL and ClickHouse. They offer a wide array of enterprise features, are well-documented, and have strong communities. As for their disadvantages, they are designed for structured data, and scaling them requires specific expertise. 

Their use case is also similar: good choice when you already have them and the expertise to run them in place.

Vector search solutions from cloud vendors

Hyperscalers also offer vector search services. They usually have basic features for vector search (you can choose an embedding model, index type, and other parameters), good interoperability within the rest of the cloud platform, and more flexibility when it comes to cost, especially if you use other services on their platform. However, they have different maturity and different feature sets: Google Cloud vector search uses a fast proprietary index search algorithm called ScaNN and metadata filtering, but is not very user-friendly; Azure Vector search offers structured search capabilities, but is in preview phase and so on. 

Vector search entities can be managed using enterprise features of their platform like IAM (Identity and Access Management), but they are not that simple to use and suited for general cloud usage. 

Making the Right Choice 

The main use case of vector databases in this context is to provide relevant information to a model. For your next LLM project, you can choose a database from an existing array of databases that offer vector search capabilities via extensions or from new vector-only databases that offer native vector support and fast querying. 

The choice depends on whether you need enterprise features, or high-scale performance, as well as your deployment architecture and desired maturity (research, prototyping, or production). One should also consider which databases are already present in your infrastructure and whether you have multimodal data. In any case, whatever choice you will make it is good to hedge it: treat a new database as an auxiliary storage cache, rather than a central point of operations, and abstract your database operations in code to make it easy to adjust to the next iteration of the vector RAG landscape.

How DataRobot Can Help

There are already so many vector database options to choose from. They each have their pros and cons – no one vector database will be right for all of your organization’s generative AI use cases. That is why it’s important to retain optionality and leverage a solution that allows you to customize your generative AI solutions to specific use cases, and adapt as your needs change or the market evolves. 

The DataRobot AI Platform lets you bring your own vector database – whichever is right for the solution you’re building. If you require changes in the future, you can swap out your vector database without breaking your production environment and workflows. 

Closing the Generative AI Confidence Gap

Discover how DataRobot helps you deliver real-world value with generative AI

Learn more

The post Choosing the Right Database for Your Generative AI Use Case appeared first on DataRobot AI Platform.

]]>
Open Source AI Models – What the U.S. National AI Advisory Committee Wants You to Know https://www.datarobot.com/blog/open-source-ai-models-what-the-u-s-national-ai-advisory-committee-wants-you-to-know/ Thu, 04 Jan 2024 15:07:59 +0000 https://www.datarobot.com/?post_type=blog&p=52736 This blog post aims to shed light on the recent NAIAC recommendation and delineate how DataRobot customers can proactively leverage the platform to align their AI adaption with this recommendation. 

The post Open Source AI Models – What the U.S. National AI Advisory Committee Wants You to Know appeared first on DataRobot AI Platform.

]]>
The unprecedented rise of artificial intelligence (AI) has brought transformative possibilities across the board, from industries and economies to societies at large. However, this technological leap also introduces a set of potential challenges. In its recent public meeting, the National AI Advisory Committee (NAIAC)1, which provides recommendations around the U.S. AI competitiveness, the science around AI, and the AI workforce to the President and the National AI Initiative Office, has voted on a recommendation on ‘Generative AI Away from the Frontier.’2 

This recommendation aims to outline the risks and proposed recommendations for how to assess and manage off-frontier AI models – typically referring to open source models.  In summary, the recommendation from the NAIAC provides a roadmap for responsibly navigating the complexities of generative AI. This blog post aims to shed light on this recommendation and delineate how DataRobot customers can proactively leverage the platform to align their AI adaption with this recommendation.

Frontier vs Off-Frontier Models

In the recommendation, the distinction between frontier and off-frontier models of generative AI is based on their accessibility and level of advancement. Frontier models represent the latest and most advanced developments in AI technology. These are complex, high-capability systems typically developed and accessed by leading tech companies, research institutions, or specialized AI labs (such as current state-of-the-art models like GPT-4 and Google Gemini). Due to their complexity and cutting-edge nature, frontier models typically have constrained access – they are not widely available or accessible to the general public.

On the other hand, off-frontier models typically have unconstrained access – they are more widely available and accessible AI systems, often available as open source. They might not achieve the most advanced AI capabilities but are significant due to their broader usage. These models include both proprietary systems and open source AI systems and are used by a wider range of stakeholders, including smaller companies, individual developers, and educational institutions.

This distinction is important for understanding the different levels of risks, governance needs, and regulatory approaches required for various AI systems. While frontier models may need specialized oversight due to their advanced nature, off-frontier models pose a different set of challenges and risks because of their widespread use and accessibility.

What the NAIAC Recommendation Covers

The recommendation on ‘Generative AI Away from the Frontier,’ issued by NAIAC in October 2023, focuses on the governance and risk assessment of generative AI systems. The document provides two key recommendations for the assessment of risks associated with generative AI systems:

For Proprietary Off-Frontier Models: It advises the Biden-Harris administration to encourage companies to extend voluntary commitments3 to include risk-based assessments of off-frontier generative AI systems. This includes independent testing, risk identification, and information sharing about potential risks. This recommendation is particularly aimed at emphasizing the importance of understanding and sharing the information on risks associated with off-frontier models.

For Open Source Off-Frontier Models: For generative AI systems with unconstrained access, such as open-source systems, the National Institute of Standards and Technology (NIST) is charged to collaborate with a diverse range of stakeholders to define appropriate frameworks to mitigate AI risks. This group includes academia, civil society, advocacy organizations, and the industry (where legal and technical feasibility allows). The goal is to develop testing and analysis environments, measurement systems, and tools for testing these AI systems. This collaboration aims to establish appropriate methodologies for identifying critical potential risks associated with these more openly accessible systems.

NAIAC underlines the need to understand the risks posed by widely available, off-frontier generative AI systems, which include both proprietary and open-source systems. These risks range from the acquisition of harmful information to privacy breaches and the generation of harmful content. The recommendation acknowledges the unique challenges in assessing risks in open-source AI systems due to the lack of a fixed target for assessment and limitations on who can test and evaluate the system.

Moreover, it highlights that investigations into these risks require a multi-disciplinary approach, incorporating insights from social sciences, behavioral sciences, and ethics, to support decisions about regulation or governance. While recognizing the challenges, the document also notes the benefits of open-source systems in democratizing access, spurring innovation, and enhancing creative expression.

For proprietary AI systems, the recommendation points out that while companies may understand the risks, this information is often not shared with external stakeholders, including policymakers. This calls for more transparency in the field.

Regulation of Generative AI Models

Recently, discussion on the catastrophic risks of AI has dominated the conversations on AI risk, especially with regards to generative AI. This has led to calls to regulate AI in an attempt to promote responsible development and deployment of AI tools. It is worth exploring the regulatory option with regards to generative AI. There are two main areas where policy makers can regulate AI: regulation at model level and regulation at use case level.

In predictive AI, generally, the two levels significantly overlap as narrow AI is built for a specific use case and cannot be generalized to many other use cases. For example, a model that was developed to identify patients with high likelihood of readmission, can only be used for this particular use case and will require input information similar to what it was trained on. However, a single large language model (LLM), a form of generative AI models, can be used in multiple ways to summarize patient charts, generate potential treatment plans, and improve the communication between the physicians and patients. 

As highlighted in the examples above, unlike predictive AI, the same LLM can be used in a variety of use cases. This distinction is particularly important when considering AI regulation. 

Penalizing AI models at the development level, especially for generative AI models, could hinder innovation and limit the beneficial capabilities of the technology. Nonetheless, it is paramount that the builders of generative AI models, both frontier and off-frontier, adhere to responsible AI development guidelines. 

Instead, the focus should be on the harms of such technology at the use case level, especially at governing the use more effectively. DataRobot can simplify governance by providing capabilities that enable users to evaluate their AI use cases for risks associated with bias and discrimination, toxicity and harm, performance, and cost. These features and tools can help organizations ensure that AI systems are used responsibly and aligned with their existing risk management processes without stifling innovation.

Governance and Risks of Open vs Closed Source Models

Another area that was mentioned in the recommendation and later included in the recently signed executive order signed by President Biden4, is lack of transparency in the model development process. In the closed-source systems, the developing organization may investigate and evaluate the risks associated with the developed generative AI models. However, information on potential risks, findings around outcome of red teaming, and evaluations done internally has not generally been shared publicly. 

On the other hand, open-source models are inherently more transparent due to their openly available design, facilitating the easier identification and correction of potential concerns pre-deployment. But extensive research on potential risks and evaluation of these models has not been conducted.

The distinct and differing characteristics of these systems imply that the governance approaches for open-source models should differ from those applied to closed-source models. 

Avoid Reinventing Trust Across Organizations

Given the challenges of adapting AI, there’s a clear need for standardizing the governance process in AI to prevent every organization from having to reinvent these measures. Various organizations including DataRobot have come up with their framework for Trustworthy AI5. The government can help lead the collaborative effort between the private sector, academia, and civil society to develop standardized approaches to address the concerns and provide robust evaluation processes to ensure development and deployment of trustworthy AI systems. The recent executive order on the safe, secure, and trustworthy development and use of AI directs NIST to lead this joint collaborative effort to develop guidelines and evaluation measures to understand and test generative AI models. The White House AI Bill of Rights and the NIST AI Risk Management Framework (RMF) can serve as foundational principles and frameworks for responsible development and deployment of AI. Capabilities of the DataRobot AI Platform, aligned with the NIST AI RMF, can assist organizations in adopting standardized trust and governance practices. Organizations can leverage these DataRobot tools for more efficient and standardized compliance and risk management for generative and predictive AI.

Demo
See the DataRobot AI Platform in Action
Book a demo

1 National AI Advisory Committee – AI.gov 

2 RECOMMENDATIONS: Generative AI Away from the Frontier

3 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence | The White House

4 https://www.datarobot.com/trusted-ai-101/

The post Open Source AI Models – What the U.S. National AI Advisory Committee Wants You to Know appeared first on DataRobot AI Platform.

]]>
How to Focus on GenAI Outcomes, Not Infrastructure https://www.datarobot.com/blog/how-to-focus-on-genai-outcomes-not-infrastructure/ Tue, 12 Dec 2023 19:30:15 +0000 https://www.datarobot.com/?post_type=blog&p=52562 Incorporating generative AI into your existing systems isn’t just an infrastructure problem. It’s a business strategy problem. Find out how to solve it.

The post How to Focus on GenAI Outcomes, Not Infrastructure appeared first on DataRobot AI Platform.

]]>
Are you seeing tangible results from your investment in generative AI — or is it starting to feel like an expensive experiment? 

For many AI leaders and engineers, it’s hard to prove business value, despite all their hard work. In a recent Omdia survey of over 5,000+ global enterprise IT practitioners, only 13% of have fully adopted GenAI technologies.

To quote Deloitte’s recent study, “The perennial question is: Why is this so hard?” 

The answer is complex — but vendor lock-in, messy data infrastructure, and abandoned past investments are the top culprits. Deloitte found that at least one in three AI programs fail due to data challenges.

If your GenAI models are sitting unused (or underused), chances are it hasn’t been successfully integrated into your tech stack. This makes GenAI, for most brands, feel more like an exacerbation of the same challenges they saw with predictive AI than a solution. 

Any given GenAI project contains a hefty mix of different versions, languages, models, and vector databases. And we all know that cobbling together 17 different AI tools and hoping for the best creates a hot mess infrastructure. It’s complex, slow, hard to use, and risky to govern.

Without a unified intelligence layer sitting on top of your core infrastructure, you’ll create bigger problems than the ones you’re trying to solve, even if you’re using a hyperscaler.

That’s why I wrote this article, and that’s why myself and Brent Hinks discussed this in-depth during a recent webinar.

Here, I break down six tactics that will help you shift the focus from half-hearted prototyping to real-world value from GenAI.

6 Tactics That Replace Infrastructure Woes With GenAI Value  

Incorporating generative AI into your existing systems isn’t just an infrastructure problem; it’s a business strategy problem—one that separates unrealized or broken prototypes from sustainable GenAI outcomes.

But if you’ve taken the time to invest in a unified intelligence layer, you can avoid unnecessary challenges and work with confidence. Most companies will bump into at least a handful of the obstacles detailed below. Here are my recommendations on how to turn these common pitfalls into growth accelerators: 

1. Stay Flexible by Avoiding Vendor Lock-In 

Many companies that want to improve GenAI integration across their tech ecosystem end up in one of two buckets:

  1. They get locked into a relationship with a hyperscaler or single vendor
  2. They haphazardly cobble together various component pieces like vector databases, embedding models, orchestration tools, and more.

Given how fast generative AI is changing, you don’t want to end up locked into either of these situations. You need to retain your optionality so you can quickly adapt as the tech needs of your business evolve or as the tech market changes. My recommendation? Use a flexible API system. 

DataRobot can help you integrate with all of the major players, yes, but what’s even better is how we’ve built our platform to be agnostic about your existing tech and fit in where you need us to. Our flexible API provides the functionality and flexibility you need to actually unify your GenAI efforts across the existing tech ecosystem you’ve built.

2. Build Integration-Agnostic Models 

In the same vein as avoiding vendor lock-in, don’t build AI models that only integrate with a single application. For instance, let’s say you build an application for Slack, but now you want it to work with Gmail. You might have to rebuild the entire thing. 

Instead, aim to build models that can integrate with multiple different platforms, so you can be flexible for future use cases. This won’t just save you upfront development time. Platform-agnostic models will also lower your required maintenance time, thanks to fewer custom integrations that need to be managed. 

With the right intelligence layer in place, you can bring the power of GenAI models to a diverse blend of apps and their users. This lets you maximize the investments you’ve made across your entire ecosystem.  In addition, you’ll also be able to deploy and manage hundreds of GenAI models from one location.

For example, DataRobot could integrate GenAI models that work smoothly across enterprise apps like Slack, Tableau, Salesforce, and Microsoft Teams. 

3. Bring Generative And Predictive AI into One Unified Experience

Many companies struggle with generative AI chaos because their generative and predictive models are scattered and siloed. For seamless integration, you need your AI models in a single repository, no matter who built them or where they’re hosted. 

DataRobot is perfect for this; so much of our product’s value lies in our ability to unify AI intelligence across an organization, especially in partnership with hyperscalers. If you’ve built most of your AI frameworks with a hyperscaler, we’re just the layer you need on top to add rigor and specificity to your initiatives’ governance, monitoring, and observability.

And this isn’t just for generative or predictive models, but models built by anyone on any platform can be brought in for governance and operation right in DataRobot.

image 2

4. Build for Ease of Monitoring and Retraining 

Given the pace of innovation with generative AI over the past year, many of the models I built six months ago are already out of date. But to keep my models relevant, I prioritize retraining, and not just for predictive AI models. GenAI can go stale, too, if the source documents or grounding data are out of date. 

Imagine you have dozens of GenAI models in production. They could be deployed to all kinds of places such as Slack, customer-facing applications, or internal platforms. Sooner or later your model will need a refresh. If you only have 1-2 models, it may not be a huge concern now, but if you already have an inventory, it’ll take you a lot of manual time to scale the deployment updates.

Updates that don’t happen through scalable orchestration are stalling outcomes because of infrastructure complexity. This is especially critical when you start thinking a year or more down the road since GenAI updates usually require more maintenance than predictive AI. 

DataRobot offers model version control with built-in testing to make sure a deployment will work with new platform versions that launch in the future. If an integration fails, you get an alert to notify you about the failure immediately. It also flags if a new dataset has additional features that aren’t the same as the ones in your currently deployed model. This empowers engineers and builders to be far more proactive about fixing things, rather than finding out a month (or further) down the line that an integration is broken. 

In addition to model control, I use DataRobot to monitor metrics like data drift and groundedness to keep infrastructure costs in check. The simple truth is that if budgets are exceeded, projects get shut down. This can quickly snowball into a situation where whole teamsare affected because they can’t control costs. DataRobot allows me to track metrics that are relevant to each use case, so I can stay informed on the business KPIs that matter.

5. Stay Aligned With Business Leadership And Your End Users 

The biggest mistake that I see AI practitioners make is not talking to people around the business enough. You need to bring in stakeholders early and talk to them often. This is not about having one conversation to ask business leadership if they’d be interested in a specific GenAI use case. You need to continuously affirm they still need the use case — and that whatever you’re working on still meets their evolving needs. 

There are three components here: 

  1. Engage Your AI Users 

It’s crucial to secure buy-in from your end-users, not just leadership. Before you start to build a new model, talk to your prospective end-users and gauge their interest level. They’re the consumer, and they need to buy into what you’re creating, or it won’t get used. Hint: Make sure whatever GenAI models you build need to easily connect to the processes, solutions, and data infrastructures users are already in.

Since your end-users are the ones who’ll ultimately decide whether to act on the output from your model, you need to ensure they trust what you’ve built. Before or as part of the rollout, talk to them about what you’ve built, how it works, and most importantly, how it will help them accomplish their goals.

  1. Involve Your Business Stakeholders In The Development Process 

Even after you’ve confirmed initial interest from leadership and end-users, it’s never a good idea to just head off and then come back months later with a finished product. Your stakeholders will almost certainly have a lot of questions and suggested changes. Be collaborative and build time for feedback into your projects. This helps you build an application that solves their need and helps them trust that it works how they want.

  1. Articulate Precisely What You’re Trying To Achieve 

It’s not enough to have a goal like, “We want to integrate X platform with Y platform.” I’ve seen too many customers get hung up on short-term goals like these instead of taking a step back to think about overall goals. DataRobot provides enough flexibility that we may be able to develop a simplified overall architecture rather than fixating on a single point of integration. You need to be specific: “We want this Gen AI model that was built in DataRobot to pair with predictive AI and data from Salesforce. And the results need to be pushed into this object in this way.” 

That way, you can all agree on the end goal, and easily define and measure the success of the project. 

image 3

6. Move Beyond Experimentation To Generate Value Early 

Teams can spend weeks building and deploying GenAI models, but if the process is not organized, all of the usual governance and infrastructure challenges will hamper time-to-value.

There’s no value in the experiment itself—the model needs to generate results (internally or externally). Otherwise, it’s just been a “fun project” that’s not producing ROI for the business. That is until it’s deployed.

DataRobot can help you operationalize models 83% faster, while saving 80% of the normal costs required. Our Playgrounds feature gives your team the creative space to compare LLM blueprints and determine the best fit. 

Instead of making end-users wait for a final solution, or letting the competition get a head start, start with a minimum viable product (MVP). 

Get a basic model into the hands of your end users and explain that this is a work in progress. Invite them to test, tinker, and experiment, then ask them for feedback.

An MVP offers two vital benefits: 

  1. You can confirm that you’re moving in the right direction with what you’re building.
  1. Your end users get value from your generative AI efforts quickly. 

While you may not provide a perfect user experience with your work-in-progress integration, you’ll find that your end-users will accept a bit of friction in the short term to experience the long-term value.

Unlock Seamless Generative AI Integration with DataRobot 

If you’re struggling to integrate GenAI into your existing tech ecosystem, DataRobot is the solution you need. Instead of a jumble of siloed tools and AI assets, our AI platform could give you a unified AI landscape and save you some serious technical debt and hassle in the future. With DataRobot, you can integrate your AI tools with your existing tech investments, and choose from best-of-breed components. We’re here to help you: 

  • Avoid vendor lock-in and prevent AI asset sprawl 
  • Build integration-agnostic GenAI models that will stand the test of time
  • Keep your AI models and integrations up to date with alerts and version control
  • Combine your generative and predictive AI models built by anyone, on any platform, to see real business value

Ready to get more out of your AI with less friction? Get started today with a free 30-day trial or set up a demo with one of our AI experts.

Demo
See the DataRobot AI Platform in Action
Book a demo

The post How to Focus on GenAI Outcomes, Not Infrastructure appeared first on DataRobot AI Platform.

]]>
Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool https://www.datarobot.com/blog/deep-dive-into-jitr-the-pdf-ingesting-and-querying-generative-ai-tool/ Thu, 07 Dec 2023 14:00:00 +0000 https://www.datarobot.com/?post_type=blog&p=52473 Learn how to utilize LLMs to answer user questions based on ingested PDFs at runtime. Accelerate generative AI innovation and real-world value using DataRobot’s GenAI Accelerators.

The post Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool appeared first on DataRobot AI Platform.

]]>
Motivation

Accessing, understanding, and retrieving information from documents are central to countless processes across various industries. Whether working in finance, healthcare, at a mom and pop carpet store, or as a student in a University, there are situations where you see a big document that you need to read through to answer questions. Enter JITR, a game-changing tool that ingests PDF files and leverages LLMs (Language Language Models) to answer user queries about the content. Let’s explore the magic behind JITR.

What Is JITR?

JITR, which stands for Just In Time Retrieval, is one of the newest tools in DataRobot’s GenAI Accelerator suite designed to process PDF documents, extract their content, and deliver accurate answers to user questions and queries. Imagine having a personal assistant that can read and understand any PDF document and then provide answers to your questions about it instantly. That’s JITR for you.

How Does JITR Work?

Ingesting PDFs: The initial stage involves ingesting a PDF into the JITR system. Here, the tool converts the static content of the PDF into a digital format ingestible by the embedding model. The embedding model converts each sentence in the PDF file into a vector. This process creates a vector database of the input PDF file.

Applying your LLM: Once the content is ingested, the tool calls the LLM. LLMs are state-of-the-art AI models trained on vast amounts of text data. They excel at understanding context, discerning meaning, and generating human-like text. JITR employs these models to understand and index the content of the PDF.

Interactive Querying: Users can then pose questions about the PDF’s content. The LLM fetches the relevant information and presents the answers in a concise and coherent manner.

Benefits of Using JITR

Every organization produces a variety of documents that are generated in one department and consumed by another. Often, retrieval of information for employees and teams can be time consuming. Utilization of JITR improves employee efficiency by reducing the review time of lengthy PDFs and providing instant and accurate answers to their questions. In addition, JITR can handle any type of PDF content which enables organizations to embed and utilize it in different workflows without concern for the input document. 

Many organizations may not have resources and expertise in software development to develop tools that utilize LLMs in their workflow. JITR enables teams and departments that are not fluent in Python to convert a PDF file into a vector database as context for an LLM. By simply having an endpoint to send PDF files to, JITR can be integrated into any web application such as Slack (or other messaging tools), or external portals for customers. No knowledge of LLMs, Natural Language Processing (NLP), or vector databases is required.

Real-World Applications

Given its versatility, JITR can be integrated into almost any workflow. Below are some of the applications.

Business Report: Professionals can swiftly get insights from lengthy reports, contracts, and whitepapers. Similarly, this tool can be integrated into internal processes, enabling employees and teams to interact with internal documents.  

Customer Service: From understanding technical manuals to diving deep into tutorials, JITR can enable customers to interact with manuals and documents related to the products and tools. This can increase customer satisfaction and reduce the number of support tickets and escalations. 

Research and Development: R&D teams can quickly extract relevant and digestible information from complex research papers to implement the State-of-the-art technology in the product or internal processes.

Alignment with Guidelines: Many organizations have guidelines that should be followed by employees and teams. JITR enables employees to retrieve relevant information from the guidelines efficiently. 

Legal: JITR can ingest legal documents and contracts and answer questions based on the information provided in the input documents.

How to Build the JITR Bot with DataRobot

The workflow for building a JITR Bot is similar to the workflow for deploying any LLM pipeline using DataRobot. The two main differences are:

  1. Your vector database is defined at runtime
  2. You need logic to handle an encoded PDF

For the latter we can define a simple function that takes an encoding and writes it back to a temporary PDF file within our deployment.

```python

def base_64_to_file(b64_string, filename: str='temp.PDF', directory_path: str = "./storage/data") -> str:     

    """Decode a base64 string into a PDF file"""

    import os

    if not os.path.exists(directory_path):

        os.makedirs(directory_path)

    file_path = os.path.join(directory_path, filename)

    with open(file_path, "wb") as f:

        f.write(codecs.decode(b64_string, "base64"))   

    return file_path

```

With this helper function defined we can go through and make our hooks. Hooks are just a fancy phrase for functions with a specific name. In our case, we just need to define a hook called `load_model` and another hook called `score_unstructured`.  In `load_model`, we’ll set the embedding model we want to use to find the most relevant chunks of text as well as the LLM we’ll ping with our context aware prompt.

```python

def load_model(input_dir):

    """Custom model hook for loading our knowledge base."""

    import os

    import datarobot_drum as drum

    from langchain.chat_models import AzureChatOpenAI

    from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

    try:

        # Pull credentials from deployment

        key = drum.RuntimeParameters.get("OPENAI_API_KEY")["apiToken"]

    except ValueError:

        # Pull credentials from environment (when running locally)

        key = os.environ.get('OPENAI_API_KEY', '')

    embedding_function = SentenceTransformerEmbeddings(

        model_name="all-MiniLM-L6-v2",

        cache_folder=os.path.join(input_dir, 'storage/deploy/sentencetransformers')

    )

    llm = AzureChatOpenAI(

        deployment_name=OPENAI_DEPLOYMENT_NAME,

        openai_api_type=OPENAI_API_TYPE,

        openai_api_base=OPENAI_API_BASE,

        openai_api_version=OPENAI_API_VERSION,

        openai_api_key=OPENAI_API_KEY,

        openai_organization=OPENAI_ORGANIZATION,

        model_name=OPENAI_DEPLOYMENT_NAME,

        temperature=0,

        verbose=True

    )

    return llm, embedding_function

```

Ok, so we have our embedding function and our LLM. We also have a way to take an encoding and get back to a PDF. So now we get to the meat of the JITR Bot, where we’ll build our vector store at run time and use it to query the LLM.

```python

def score_unstructured(model, data, query, **kwargs) -> str:

    """Custom model hook for making completions with our knowledge base.

    When requesting predictions from the deployment, pass a dictionary

    with the following keys:

    - 'question' the question to be passed to the retrieval chain

    - 'document' a base64 encoded document to be loaded into the vector database

    datarobot-user-models (DRUM) handles loading the model and calling

    this function with the appropriate parameters.

    Returns:

    --------

    rv : str

        Json dictionary with keys:

            - 'question' user's original question

            - 'answer' the generated answer to the question

    """

    import json

    from langchain.chains import ConversationalRetrievalChain

    from langchain.document_loaders import PyPDFLoader

    from langchain.vectorstores.base import VectorStoreRetriever

    from langchain.vectorstores.faiss import FAISS

    llm, embedding_function = model

    DIRECTORY = "./storage/data"

    temp_file_name = "temp.PDF"

    data_dict = json.loads(data)

    # Write encoding to file

    base_64_to_file(data_dict['document'].encode(), filename=temp_file_name, directory_path=DIRECTORY)

    # Load up the file

    loader = PyPDFLoader(os.path.join(DIRECTORY, temp_file_name))

    docs = loader.load_and_split()

    # Remove file when done

    os.remove(os.path.join(DIRECTORY, temp_file_name))

    # Create our vector database 

    texts = [doc.page_content for doc in docs]

    metadatas = [doc.metadata for doc in docs] 

    db = FAISS.from_texts(texts, embedding_function, metadatas=metadatas)  

    # Define our chain

    retriever = VectorStoreRetriever(vectorstore=db)

    chain = ConversationalRetrievalChain.from_llm(

        llm, 

        retriever=retriever

    )

    # Run it

    response = chain(inputs={'question': data_dict['question'], 'chat_history': []})

    return json.dumps({"result": response})

```

With our hooks defined, all that’s left to do is deploy our pipeline so that we have an endpoint people can interact with. To some, the process of creating a secure, monitored and queryable endpoint out of arbitrary Python code may sound intimidating or at least time consuming to set up. Using the drx package, we can deploy our JITR Bot in one function call.

```python

import datarobotx as drx

deployment = drx.deploy(

    "./storage/deploy/", # Path with embedding model

    name=f"JITR Bot {now}", 

    hooks={

        "score_unstructured": score_unstructured,

        "load_model": load_model

    },

    extra_requirements=["pyPDF"], # Add a package for parsing PDF files

    environment_id="64c964448dd3f0c07f47d040", # GenAI Dropin Python environment

)

```

How to Use JITR

Ok, the hard work is over. Now we get to enjoy interacting with our newfound deployment. Through Python, we can again take advantage of the drx package to answer our most pressing questions.

```python

# Find a PDF

url = "https://s3.amazonaws.com/datarobot_public_datasets/drx/Instantnoodles.PDF"

resp = requests.get(url).content

encoding = base64.b64encode(io.BytesIO(resp).read()) # encode it

# Interact

response = deployment.predict_unstructured(

    {

        "question": "What does this say about noodle rehydration?",

        "document": encoding.decode(),

    }

)['result']

— – – – 

{'question': 'What does this say about noodle rehydration?',

 'chat_history': [],

 'answer': 'The article mentions that during the frying process, many tiny holes are created due to mass transfer, and they serve as channels for water penetration upon rehydration in hot water. The porous structure created during frying facilitates rehydration.'}

```

But more importantly, we can hit our deployment in any language we want since it’s just an endpoint. Below, I show a screenshot of me interacting with the deployment right through Postman. This means we can integrate our JITR Bot into essentially any application we want by just having the application make an API call.

Integrating JITR Bot into an application - DataRobot

Once embedded in an application, using JITR is very easy. For example, in the Slackbot application used at DataRobot internally, users simply upload a PDF with a question to start a conversation related to the document. 

JITR makes it easy for anyone in an organization to start driving real-world value from generative AI, across countless touchpoints in employees’ day-to-day workflows. Check out this video to learn more about JITR. 

Things You Can Do to Make the JITR Bot More Powerful

In the code I showed, we ran through a straightforward implementation of the JITRBot which takes an encoded PDF and makes a vector store at runtime in order to answer questions.  Since they weren’t relevant to the core concept, I opted to leave out a number of bells and whistles we implemented internally with the JITRBot such as:

  • Returning context aware prompt and completion tokens
  • Answering questions based on multiple documents
  • Answering multiple questions at once
  • Letting users provide conversation history
  • Using other chains for different types of questions
  • Reporting custom metrics back to the deployment

There’s also no reason why the JITRBot has to only work with PDF files! So long as a document can be encoded and converted back into a string of text, we could build more logic into our `score_unstructured` hook to handle any file type a user provides.

Start Leveraging JITR in Your Workflow

JITR makes it easy to interact with arbitrary PDFs. If you’d like to give it a try, you can follow along with the notebook here.

Demo
See DataRobot AI Platform in Action
Book a demo

The post Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool appeared first on DataRobot AI Platform.

]]>
Potential Risks of Generative AI According to NAIAC – And How to Mitigate Them https://www.datarobot.com/blog/potential-risks-of-generative-ai-according-to-naiac-and-how-to-mitigate-them/ Tue, 28 Nov 2023 18:02:11 +0000 https://www.datarobot.com/?post_type=blog&p=52318 This blog post covers the risks of AI, highlighting what has been mentioned in the finding and connecting it to the need for organizations to incorporate mitigation processes to address the potential risks and continual monitoring of their GenAI tools.

The post Potential Risks of Generative AI According to NAIAC – And How to Mitigate Them appeared first on DataRobot AI Platform.

]]>
The unprecedented rise of Artificial Intelligence (AI) has brought transformative possibilities across various sectors, from industries and economies to societies at large. However, this technological leap also introduces a set of potential challenges. In its recent public meeting, the National AI Advisory Committee (NAIAC)1, which provides recommendations on topics including the current state of the U.S. AI competitiveness, the state of science around AI, and AI workforce issues to the President and the National AI Initiative Office, has voted on a finding based on expert briefing on the potential risks of AI and more specifically generative AI2. This blog post aims to shed light on these concerns and delineate how DataRobot customers can proactively leverage the platform to mitigate these threats.

Understanding AI’s Potential Risks 

With the swift rise of AI in the realm of technology, it stands poised to transform sectors, streamline operations, and amplify human potential. Yet, these unmatched progressions also usher in a myriad of challenges that demand attention. The “Findings on The Potential Future Risks of AI” discusses segments the risk of AI in short-term and long-term risks of AI. The near-term risks of AI, as described in the finding, refers to risks associated with AI that are well known and current concerns for AI, whether predictive or generative AI. On the other hand, long-term risks of AI underscores the potential risks of AI that may not materialize given the current state of AI technology or well understood but we should prepare for their potential impacts. This finding highlights a few categories of AI risks – malicious objective or unintended consequences, economic and societal, and catastrophic. 

Societal

While Large Language Models (LLMs) are primarily optimized for text prediction tasks, their broader applications don’t adhere to a singular goal. This flexibility allows them to be employed in content creation for marketing, translation, or even in disseminating misinformation on a large scale. In some instances, even when the AI’s objective is well-defined and tailored for a specific purpose, unforeseen negative outcomes can still emerge. In addition, as AI systems evolve in complexity, there’s a growing concern that they might find ways to circumvent the safeguards established to monitor or restrict their behavior. This is especially troubling since, although humans create these safety mechanisms with particular goals in mind, an AI may perceive them differently or pinpoint vulnerabilities.

Economic

As AI and automation sweep across various sectors, they promise both opportunities and challenges for employment. While there’s potential for job enhancement and broader accessibility by leveraging generative AI, there’s also a risk of deepening economic disparities. Industries centered around routine activities might face job disruptions, yet AI-driven businesses could unintentionally widen the economic divide. It’s important to highlight that being exposed to AI doesn’t directly equate to job loss, as new job opportunities may emerge and some workers might see improved performance through AI support. However, without strategic measures in place—like monitoring labor trends, offering educational reskilling, and establishing policies like wage insurance—the specter of growing inequality looms, even if productivity soars. But the implications of this shift aren’t merely financial. Ethical and societal issues are taking center stage. Concerns about personal privacy, copyright breaches, and our increasing reliance on these tools are more pronounced than ever. 

Catastrophic

The evolving landscape of AI technologies has the potential to reach more advanced levels. Especially, with the adoption of generative AI at scale, there’s growing apprehension about their disruptive potential. These disruptions can endanger democracy, pose national security risks like cyberattacks or bioweapons, and instigate societal unrest, particularly through divisive AI-driven mechanisms on platforms like social media. While there’s debate about AI achieving superhuman prowess and the magnitude of these potential risks, it’s clear that many threats stem from AI’s malicious use, unintentional fallout, or escalating economic and societal concerns.

Recently, discussion on the catastrophic risks of AI has dominated the conversations on AI risk, especially with regards to generative AI. However, as was put forth by NAIAC, “Arguments about existential risk from AI should not detract from the necessity of addressing existing risks of AI. Nor should arguments about existential risk from AI crowd out the consideration of opportunities that benefit society.”3

The DataRobot Approach 

The DataRobot AI Platform is an open, end-to-end AI lifecycle platform that streamlines/simplifies how you build, govern, and operate generative and predictive AI. Designed to unify your entire AI landscape, teams and workflows, it empowers you to deliver real-world value from your AI initiatives, while giving you the flexibility to evolve, and the enterprise control to scale with confidence.

DataRobot serves as a beacon in navigating these challenges. By championing transparent AI models through automated documentation during the experimentation and in production, DataRobot enables users to review and audit the building process of AI tools and its performance in production, which fosters trust and promotes responsible engagement. The platform’s agility ensures that users can swiftly adapt to the rapidly evolving AI landscape. With an emphasis on training and resource provision, DataRobot ensures users are well-equipped to understand and manage the nuances and risks associated with AI. At its core, the platform prioritizes AI safety, ensuring that responsible AI use is not just encouraged but integral from development to deployment.

With regards to generative AI, DataRobot has incorporated a trustworthy AI framework in our platform. The chart below highlights the high level view of this framework.

Trusted AI

Pillars of this framework, Ethics, Performance, and Operations, have guided us to develop and embed features in the platform that assist users in addressing some of the risks associated with generative AI. Below we delve deeper into each of these components. 

Ethics

AI Ethics pertains to how an AI system aligns with the values held by both its users and creators, as well as the real-world consequences of its operation. Within this context, DataRobot stands out as an industry leader by incorporating various features into its platform to address ethical concerns across three key domains: Explainability, Discrimination and harm mitigation, and Privacy preservation.

DataRobot directly tackles these concerns by offering cutting-edge features that monitor model bias and fairness, apply innovative prediction explanation algorithms, and implement a platform architecture designed to maximize data protection. Additionally, when orchestrating generative AI workflows, DataRobot goes a step further by supporting an ensemble of “guard” models. These guard models play a crucial role in safeguarding generative use cases. They can perform tasks such as topic analysis to ensure that generative models stay on topic, identify and mitigate bias, toxicity, and harm, and detect sensitive data patterns and identifiers that should not be utilized in workflows.

What’s particularly noteworthy is that these guard models can be seamlessly integrated into DataRobot’s modeling pipelines, providing an extra layer of protection around Language Model (LLM) workflows. This level of protection instills confidence in users and stakeholders regarding the deployment of AI systems. Furthermore, DataRobot’s robust governance capabilities enable continuous monitoring, governance, and updates for these guard models over time through an automated workflow. This ensures that ethical considerations remain at the forefront of AI system operations, aligning with the values of all stakeholders involved.

Performance

AI Performance pertains to evaluating how effectively a model accomplishes its intended goal. In the context of an LLM, this could involve tasks such as responding to  user queries, summarizing or retrieving key information, translating text, or avarious other use-cases. It is worth noting that many existing LLM deployments often lack real-time assessment of validity, quality, reliability, and cost. DataRobot, however, has the capability to monitor and measure performance across all of these domains.

DataRobot’s distinctive blend of generative and predictive AI empowers users to create supervised models capable of assessing the correctness of LLMs based on user feedback. This results in the establishment of an LLM correctness score, enabling the evaluation of response effectiveness. Every LLM output is assigned a correctness score, offering users insights into the confidence level of the LLM and allowing for ongoing tracking through the DataRobot LLM Operations (LLMOps) dashboard. By leveraging domain-specific models for performance assessment, organizations can make informed decisions based on precise information. 

DataRobot’s LLMOps offers comprehensive monitoring options within its dashboard, including speed and cost tracking. Performance metrics such as response and execution times are continuously monitored to ensure timely handling of user queries. Furthermore, the platform supports the use of custom metrics, enabling users to tailor their performance evaluations. For instance, users can define their own metrics or employ established measures like Flesch reading-ease to gauge the quality of LLM responses to inquiries. This functionality facilitates the ongoing assessment and improvement of LLM quality over time.

Operations

AI Operations focuses on ensuring ith the reliability of the system or the environment housing the AI technology. This encompasses not only the reliability of the core system but also the governance, oversight, maintenance, and utilization of that system, all with the overarching goal of ensuring efficient, effective, and safe and secure operations. 

With over 1 million AI projects operationalized and delivering over 1 trillion predictions, the DataRobot platform has established itself as a robust enterprise foundation capable of supporting and monitoring a diverse array of AI use cases. The platform boasts built-in governance features that streamline development and maintenance processes. Users benefit from custom environments that facilitate the deployment of knowledge bases with pre-installed dependencies, expediting development lifecycles. Critical knowledge base deployment activities are logged meticulously to ensure that key events are captured and stored for reference. DataRobot seamlessly integrates with version control, promoting best practices through continuous integration/continuous deployment (CI/CD) and code maintenance. Approval workflows can be orchestrated to ensure that LLM systems undergo proper approval processes before reaching production. Additionally, notification policies keep users informed about key deployment-related activities.

Security and safety are paramount considerations. DataRobot employs two-factor authentication and access control mechanisms to ensure that only authorized developers and users can utilize LLMs.

DataRobot’s LLMOps monitoring extends across various dimensions. Service health metrics track the system’s ability to respond quickly and reliably to prediction requests. Crucial metrics like response time provide essential insights into the LLM’s capacity to address user queries promptly. Furthermore, DataRobot’s customizable metrics capability empowers users to define and monitor their own metrics, ensuring effective operations. These metrics could encompass overall cost, readability, user approval of responses, or any user-defined criteria. DataRobot’s text drift feature enables users to monitor changes in input queries over time, allowing organizations to analyze query changes for insights and intervene if they deviate from the intended use case. As organizational needs evolve, this text drift capability serves as a trigger for new development activities.

DataRobot’s LLM-agnostic approach offers users the flexibility to select the most suitable LLM based on their privacy requirements and data capture policies. This accommodates partners, which enforce enterprise privacy, as well as privately hosted LLMs where data capture is not a concern and is managed by the LLM owners. Additionally, it facilitates solutions where network egress can be controlled. Given the diverse range of applications for generative AI, operational requirements may necessitate various LLMs for different environments and tasks. Thus, an LLM-agnostic framework and operations are essential.

It’s worth highlighting that DataRobot is committed to continually enhancing its platform by incorporating more responsible AI features into the AI lifecycle for the benefit of end users.

Conclusion 

While AI is a beacon of potential and transformative benefits, it is essential to remain cognizant of the accompanying risks. Platforms like DataRobot are pivotal in ensuring that the power of AI is harnessed responsibly, driving real-world value, while proactively addressing challenges.

Demo
Start Driving Real-World Value From AI Today
Book a demo

1 The White House. n.d. “National AI Advisory Committee.” AI.Gov. https://ai.gov/naiac/.

2 “FINDINGS: The Potential Future Risks of AI.” October 2023. National Artificial Intelligence Advisory Committee (NAIAC). https://ai.gov/wp-content/uploads/2023/11/Findings_The-Potential-Future-Risks-of-AI.pdf.

3 “STATEMENT: On AI and Existential Risk.” October 2023. National Artificial Intelligence Advisory Committee (NAIAC). https://ai.gov/wp-content/uploads/2023/11/Statement_On-AI-and-Existential-Risk.pdf.

The post Potential Risks of Generative AI According to NAIAC – And How to Mitigate Them appeared first on DataRobot AI Platform.

]]>
The Power of a Flexible and Diverse Generative AI Strategy https://www.datarobot.com/blog/the-power-of-a-flexible-and-diverse-generative-ai-strategy/ Wed, 22 Nov 2023 16:45:00 +0000 https://www.datarobot.com/?post_type=blog&p=52264 Build resilience and enable speed to evolve your organization’s generative AI strategy so that you can adapt as the market evolves. Read now.

The post The Power of a Flexible and Diverse Generative AI Strategy appeared first on DataRobot AI Platform.

]]>
Since launching our generative AI platform offering just a few short months ago, we’ve seen, heard, and experienced intense and accelerated AI innovation, with remarkable breakthroughs. As a long-time machine learning advocate and industry leader, I’ve witnessed many such breakthroughs, perfectly represented by the steady excitement around ChatGPT, released almost a year ago. 

And just as ecosystems thrive with biological diversity, the AI ecosystem benefits from multiple providers. Interoperability and system flexibility have always been key to mitigating risk – so that organizations can adapt and continue to deliver value. But the unprecedented speed of evolution with generative AI has made optionality a critical capability. 

The market is changing so rapidly that there are no sure bets – today or in the near future. This is a statement that we’ve heard echoed by our customers and one of the core philosophies that underpinned many of the innovative new generative AI capabilities announced in our recent Fall Launch

Relying too heavily upon any one AI provider could pose a risk as rates of innovation are disrupted. Already, there are over 180+ different open source LLM models. The pace of change is evolving much faster than teams can apply it.

image 15

DataRobot’s philosophy has been that organizations need to build flexibility into their generative AI strategy based on performance, robustness, costs, and adequacy for the specific LLM task being deployed. 

As with all technologies, many LLMs come with trade offs or are more tailored to specific tasks. Some LLMs may excel at particular natural language operations like text summarization, provide more diverse text generation, or even be cheaper to operate. As a result, many LLMs can be best-in-class in different but useful ways. A tech stack that provides flexibility to select or blend these offerings ensures organizations maximize AI value in a cost-efficient manner.

DataRobot operates as an open, unified intelligence layer that lets organizations compare and select the generative AI components that are right for them. This interoperability leads to better generative AI outputs, improves operational continuity, and decreases single-provider dependencies. 

With such a strategy, operational processes remain unaffected if, say, a provider is experiencing internal disruption. Plus, costs can be managed more efficiently by enabling organizations to make cost-performance tradeoffs around their LLMs.

During our Fall Launch, we announced our new multi-provider LLM Playground. The first-of-its-kind visual interface provides you with built-in access to Google Cloud Vertex AI, Azure OpenAI, and Amazon Bedrock models to easily compare and experiment with different generative AI ‘recipes.’ You can use any of the built-in LLMs in our playground or bring your own. Access to these LLMs is available out-of-the-box during experimentation, so there are no additional steps needed to start building GenAI solutions in DataRobot. 

DataRobot Multi-Provider LLM Playground
DataRobot Multi-Provider LLM Playground

With our new LLM Playground, we’ve made it easy to try, test, and compare different GenAI “recipes” in terms of style/tone, cost, and relevance. We’ve made it easy to evaluate any combination of foundational model, vector database, chunking strategy, and prompting strategy. You can do this whether you prefer to build with the platform UI or using a notebook. Having the LLM playground makes it easy for you to flip back and forth from code to visualizing your experiments side by side. 

image 13
Easily test different prompting and chunking strategies, and vector databases

With DataRobot, you can also hot-swap underlying components (like LLMs) without breaking production, if your organization’s needs change or the market evolves. This not only lets you calibrate your generative AI solutions to your exact requirements, but also ensures you maintain technical autonomy with all of the best of breed components right at your fingertips. 

You can see below exactly how easy it is to compare different generative AI ‘recipes’ with our LLM Playground.

Once you’ve selected the right ’recipe’ for you, you can quickly and easily move it, your vector database, and prompting strategies into production. Once in production, you get full end-to-end generative AI lineage, monitoring, and reporting. 

With DataRobot’s generative AI offering, organizations can easily choose the right tools for the job, safely extend their internal data to LLMs, while also measuring outputs for toxicity, truthfulness, and cost among other KPIs. We like to say, “we’re not building LLMs, we’re solving the confidence problem for generative AI.” 

The generative AI ecosystem is complex – and changing every day. At DataRobot, we ensure that you have a flexible and resilient approach – think of it as an insurance policy and safeguards against stagnation in an ever-evolving technological landscape, ensuring both data scientists’ agility and CIOs’ peace of mind. Because the reality is that an organization’s strategy shouldn’t be constrained to a single provider’s world view, rate of innovation, or internal turmoil. It’s about building resilience and speed to evolve your organization’s generative AI strategy so that you can adapt as the market evolves – which it can quickly do! 

You can learn more about how else we’re solving the ‘confidence problem’ by watching our Fall Launch event on-demand.

Demo
Start Driving Real-World Value From AI Today
Request a demo

The post The Power of a Flexible and Diverse Generative AI Strategy appeared first on DataRobot AI Platform.

]]>
Closing the Generative AI Confidence Gap with DataRobot https://www.datarobot.com/blog/closing-the-generative-ai-confidence-gap-with-datarobot/ Thu, 16 Nov 2023 16:47:05 +0000 https://www.datarobot.com/?post_type=blog&p=52222 Learn how the DataRobot AI Platform empowers practitioners to rapidly experiment, maintain oversight, and operationalize high-quality generative AI solutions.

The post Closing the Generative AI Confidence Gap with DataRobot appeared first on DataRobot AI Platform.

]]>
Generative AI holds immense promise – but only if and when you can feel confident about putting it into production. And after our summer release we clearly heard and saw that many of you are struggling to build, deploy, manage, and operationalize models responsibly due to a lack of transparency and governance. This is what we have identified as the confidence gap which is a roadblock for most organizations and end users on their path to harnessing the power of generative AI solutions.

But we don’t shy away from the hard problems with AI. Which is why our new Fall Launch addresses the confidence gap head-on, empowering enterprises to deploy generative AI. The new capabilities allow you to operate AI with correctness and control, govern with full transparency and oversight, and build rapidly with flexibility, with the assurance you need to feel confident putting these solutions into practice. Our robust platform empowers practitioners to rapidly experiment, maintain oversight, and operationalize high-quality generative AI solutions.

Since clear insights and model performance alerts ensure high quality responses, operating with correctness and control becomes possible, allowing you to reliably and assuredly get your generative AI solutions into production. With the new capabilities of the DataRobot AI Platform, you can now continuously monitor performance to ensure real-time observability of deployed models through our unified AI Console. Custom alerts and metrics identify issues proactively, increasing the overall trust in your generative AI solution. Features like Generative AI Guard models score every output for completeness, relevance, and confidence. Coupled with human feedback loops, this ensures that the model’s outputs stay on track over time. When the ongoing monitoring surfaces anomalies, our platform enables immediate intervention to address problems before any downstream impact occurs to maintain operational control.

Unified AI Console - DataRobot
Unified AI Console

As generative AI expands, cross-functional coordination becomes imperative but increasingly challenging. DataRobot helps you govern with full transparency and oversight by enabling and facilitating greater collaboration. The unified AI Registry catalogs all models and projects from across your organization in one place, enabling greater coordination, model lineage transparency, and, thus, better overall governance. 

The Workbench centralizes in-flight projects so nothing falls through the cracks. With holistic visibility, DataRobot allows seamless collaboration across data teams, developers, IT, and business users. Granular analytics around generative prediction spend also facilitates financially responsible innovation by providing continuous cost visibility. With robust visibility into model portfolios and spending, DataRobot empowers leaders to govern generative AI in an informed, measured manner.

Unified AI Registry - DataRobot
Unified AI Registry

Experimentation and optionality are crucial to ride the generative AI innovation wave.​​ This can help organizations stay ahead of the curve by being competitive, mitigate vendor risks, and customize solutions for unique use cases.

By allowing organizations to build rapidly with optionality, DataRobot empowers fearless innovation both now and into the future with robust visual experimentation capabilities and support for leading models. 

Our Multi-Provider LLM Playground is the first-of-its-kind visual comparison interface with out-of-the-box access to external LLM services, including Google PaLM, Azure OpenAI, AWS BedRock, as well as the option to bring your own, custom models.

The Playground allows you to easily compare and experiment with different generative AI ‘recipes’ that may include any combination of foundation models, vector databases, and prompting strategies tailored to your needs. And with the freedom to continuously adopt cutting-edge advances as they emerge, you can deliver impactful models at unmatched speeds without being locked into any single technology ecosystem. 

To further boost the velocity of your generative AI experiments, our AI Accelerators with expert-designed templates allow you to kickstart generative AI projects and dramatically shorten time to value. 

Multi-Provider LLM Playground - DataRobot
Multi-Provider LLM Playground

We help organizations continuously accelerate generative AI development and augment their ecosystem with turnkey building blocks and seamless integrations. Our library of expert-designed Generative AI Accelerators helps you kickstart development by packaging proven reusable code snippets. 

These Accelerators can help you extend foundational models with proprietary data for security, build a RAG application, add custom metrics, monitor models, or embed your generative AI solution into a communications app. These readymade templates enable rapid time-to-value. 

We also complement your existing tech stack by allowing you to easily leverage existing enterprise messaging tools like Slack and Microsoft Teams to host your generative AI solutions and facilitate user adoption. Integrations with Databricks and BigQuery reduce data wrangling time. With domain expertise encoded into reusable accelerators and ecosystem interoperability, DataRobot is the fastest path to generative AI impact. Our robust library of prebuilt capabilities and complementary ecosystem integrations empower enterprises to jumpstart delivery and maximize results.

The generative AI opportunity is immense, but realizing it requires the right platform. DataRobot helps you operate AI with correctness and control, govern with full transparency and oversight, and build rapidly with flexibility to quickly put any generative AI solution into production. Our robust support for state-of-the-art foundation models empowers you to deliver high-impact solutions today while retaining the flexibility to innovate boldly into the future.

Getting Started

Experience generative AI success for yourself – start a free trial today to build, operationalize, and govern generative models with confidence with Datarobot.

Our experts are also available for 1:1 tailored demonstrations showing how DataRobot can empower your specific AI initiatives. Book a demo for a deep dive into how our new Generative AI offerings can help you propel ahead.

The generative journey is just the beginning. We look forward to partnering with you to maximize results and uncover new opportunities. Let’s realize the full potential of generative AI together.

The End-to-End Generative AI Platform

Build, govern, and operate enterprise-grade generative AI solutions with confidence.

Explore now

The post Closing the Generative AI Confidence Gap with DataRobot appeared first on DataRobot AI Platform.

]]>
Design and Monitor Custom Metrics for Generative AI Use Cases in DataRobot AI Platform https://www.datarobot.com/blog/design-and-monitor-custom-metrics-for-generative-ai-use-cases-in-datarobot-ai-platform/ Tue, 07 Nov 2023 14:00:00 +0000 https://www.datarobot.com/?post_type=blog&p=51244 Define and monitor custom, use case-specific performance metrics for generative AI use cases with DataRobot AI Platform. Learn more.

The post Design and Monitor Custom Metrics for Generative AI Use Cases in DataRobot AI Platform appeared first on DataRobot AI Platform.

]]>
CIOs and other technology leaders have come to realize that generative AI (GenAI) use cases require careful monitoring – there are inherent risks with these applications, and strong observability capabilities helps to mitigate them. They’ve also realized that the same data science accuracy metrics commonly used for predictive use cases, while useful, are not completely sufficient for LLMOps

When it comes to monitoring LLM outputs, response correctness remains important, but now organizations also need to worry about metrics related to toxicity, readability, personally identifiable information (PII) leaks, incomplete information, and most importantly, LLM costs. While all these metrics are new and important for specific use cases, quantifying the unknown LLM costs is typically the one that comes up first in our customer discussions.

This article shares a generalizable approach to defining and monitoring custom, use case-specific performance metrics for generative AI use cases for deployments that are monitored with DataRobot AI Production

Remember that models do not need to be built with DataRobot to use the extensive governance and monitoring functionality. Also remember that DataRobot offers many deployment metrics out-of-the-box in the categories of Service Health, Data Drift, Accuracy and Fairness. The present discussion is about adding your own user-defined Custom Metrics to a monitored deployment.

Customer Metrics in DataRobot
Customer Metrics in DataRobot

To illustrate this feature, we’re using a logistics-industry example published on DataRobot Community Github that you can replicate on your own with a DataRobot license or with a free trial account. If you choose to get hands-on, also watch the video below and review the documentation on Custom Metrics.

Monitoring Metrics for Generative AI Use Cases

While DataRobot offers you the flexibility to define any custom metric, the structure that follows will help you narrow your metrics down to a manageable set that still provides broad visibility. If you define one or two metrics in each of the categories below you’ll be able to monitor cost, end-user experience, LLM misbehaviors, and value creation. Let’s dive into each in future detail. 

Total Cost of Ownership

Metrics in this category monitor the expense of operating the generative AI solution. In the case of self-hosted LLMs, this would be the direct compute costs incurred. When using externally-hosted LLMs this would be a function of the cost of each API call. 

Defining your custom cost metric for an external LLM will require knowledge of the pricing model. As of this writing the Azure OpenAI pricing page lists the price for using GPT-3.5-Turbo 4K as $0.0015 per 1000 tokens in the prompt, plus $0.002 per 1000 tokens in the response. The following get_gpt_3_5_cost function calculates the price per prediction when using these hard-coded prices and token counts for the prompt and response calculated with the help of Tiktoken.

import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")

def get_gpt_token_count(text):
    return len(encoding.encode(text))

def get_gpt_3_5_cost(
    prompt, response, prompt_token_cost=0.0015 / 1000, response_token_cost=0.002 / 1000
):
    return (
        get_gpt_token_count(prompt) * prompt_token_cost
        + get_gpt_token_count(response) * response_token_cost
    )

User Experience

Metrics in this category monitor the quality of the responses from the perspective of the intended end user. Quality will vary based on the use case and the user. You might want a chatbot for a paralegal researcher to produce long answers written formally with lots of details. However, a chatbot for answering basic questions about the dashboard lights in your car should answer plainly without using unfamiliar automotive terms. 

Two starter metrics for user experience are response length and readability. You already saw above how to capture the generated response length and how it relates to cost. There are many options for readability metrics. All of them are based on some combinations of average word length, average number of syllables in words, and average sentence length. Flesch-Kincaid is one such readability metric with broad adoption. On a scale of 0 to 100, higher scores indicate that the text is easier to read. Here is an easy way to calculate the Readability of the generative response with the help of the textstat package.

import textstat

def get_response_readability(response):
    return textstat.flesch_reading_ease(response)

Safety and Regulatory Metrics

This category contains metrics to monitor generative AI solutions for content that might be offensive (Safety) or violate the law (Regulatory). The right metrics to represent this category will vary greatly by use case and by the regulations that apply to your industry or your location.

It is important to note that metrics in this category apply to the prompts submitted by users and the responses generated by large language models. You might wish to monitor prompts for abusive and toxic language, overt bias, prompt-injection hacks, or PII leaks. You might wish to monitor generative responses for toxicity and bias as well, plus hallucinations and polarity.

Monitoring response polarity is useful for ensuring that the solution isn’t generating text with a consistent negative outlook. In the linked example which deals with proactive emails to inform customers of shipment status, the polarity of the generated email is checked before it is shown to the end user. If the email is extremely negative, it is over-written with a message that instructs the customer to contact customer support for an update on their shipment. Here is one way to define a Polarity metric with the help of the TextBlob package.

import numpy as np
from textblob import TextBlob

def get_response_polarity(response):
    blob = TextBlob(response)
    return np.mean([sentence.sentiment.polarity for sentence in blob.sentences])

Business Value

CIO are under increasing pressure to demonstrate clear business value from generative AI solutions. In an ideal world, the ROI, and how to calculate it, is a consideration in approving the use case to be built. But, in the current rush to experiment with generative AI, that has not always been the case. Adding business value metrics to a GenAI solution that was built as a proof-of-concept can help secure long-term funding for it and for the next use case.

Generative AI 101 for Executives: a Video Crash Course
We can’t build your generative AI strategy for you, but we can steer you in the right direction

The metrics in this category are entirely use-case dependent. To illustrate this, consider how to measure the business value of the sample use case dealing with proactive notifications to customers about the status of their shipments. 

One way to measure the value is to consider the average typing speed of a customer support agent who, in the absence of the generative solution, would type out a custom email from scratch. Ignoring the time required to research the status of the customer’s shipment and just quantifying the typing time at 150 words per minute and $20 per hour could be computed as follows.

def get_productivity(response):
    return get_gpt_token_count(response) * 20 / (150 * 60)

More likely the real business impact will be in reduced calls to the contact center and higher customer satisfaction. Let’s stipulate that this business has experienced a 30% decline in call volume since implementing the generative AI solution. In that case the real savings associated with each email proactively sent can be calculated as follows. 

def get_savings(CONTAINER_NUMBER):
    prob = 0.3
    email_cost = $0.05
    call_cost = $4.00
    return prob * (call_cost - email_cost)

Create and Submit Custom Metrics in DataRobot

Create Custom Metric

Once you have definitions and names for your custom metrics, adding them to a deployment is very straight-forward. You can add metrics to the Custom Metrics tab of a Deployment using the button +Add Custom Metric in the UI or with code. For both routes, you’ll need to supply the information shown in this dialogue box below.

Customer Metrics Menu
Customer Metrics Menu

Submit Custom Metric

There are several options for submitting custom metrics to a deployment which are covered in detail in the support documentation. Depending on how you define the metrics, you might know the values immediately or there may be a delay and you’ll need to associate them with the deployment at a later date.

It is best practice to conjoin the submission of metric details with the LLM prediction to avoid missing any information. In this screenshot below, which is an excerpt from a larger function, you see llm.predict() in the first row. Next you see the Polarity test and the override logic. Finally, you see the submission of the metrics to the deployment. 

Put another way, there is no way for a user to use this generative solution, without having the metrics recorded. Each call to the LLM and its response is fully monitored.

Submitting Customer Metrics
Submitting Customer Metrics

DataRobot for Generative AI

We hope this deep dive into metrics for Generative AI gives you a better understanding of how to use the DataRobot AI Platform for operating and governing your generative AI use cases. While this article focused narrowly on monitoring metrics, the DataRobot AI Platform can help you with simplifying the entire AI lifecycle – to build, operate, and govern enterprise-grade generative AI solutions, safely and reliably.

Enjoy the freedom to work with all the best tools and techniques, across cloud environments, all in one place. Breakdown silos and prevent new ones with one consistent experience. Deploy and maintain safe, high-quality, generative AI applications and solutions in production.

White Paper
Everything You Need to Know About LLMOps

Monitor, manage, and govern all of your large language models

Download Now

The post Design and Monitor Custom Metrics for Generative AI Use Cases in DataRobot AI Platform appeared first on DataRobot AI Platform.

]]>