Blog | DataRobot AI Platform

Choosing the Right Vector Embedding Model for Your Generative AI Use Case

Nick Volynets — Thu, 07 Mar 2024 15:33:37 +0000

In our previous post, we discussed considerations around choosing a vector database for our hypothetical retrieval augmented generation (RAG) use case. But when building a RAG application we often need to make another important decision: choose a vector embedding model, a critical component of many generative AI applications.

A vector embedding model is responsible for the transformation of unstructured data (text, images, audio, video) into a vector of numbers that capture semantic similarity between data objects. Embedding models are widely used beyond RAG applications, including recommendation systems, search engines, databases, and other data processing systems.

Understanding their purpose, internals, advantages, and disadvantages is crucial and that’s what we’ll cover today. While we’ll be discussing text embedding models only, models for other types of unstructured data work similarly.

What Is an Embedding Model?

Machine learning models don’t work with text directly, they require numbers as input. Since text is ubiquitous, over time, the ML community developed many solutions that handle the conversion from text to numbers. There are many approaches of varying complexity, but we’ll review just some of them.

A simple example is one-hot encoding: treat words of a text as categorical variables and map each word to a vector of 0s and single 1.

Unfortunately, this embedding approach is not very practical, since it leads to a large number of unique categories and results in unmanageable dimensionality of output vectors in most practical cases. Also, one-hot encoding does not put similar vectors closer to one another in a vector space.

Embedding models were invented to tackle these issues. Just like one-hot encoding, they take text as input and return vectors of numbers as output, but they are more complex as they are taught with supervised tasks, often using a neural network. A supervised task can be, for example, predicting product review sentiment score. In this case, the resulting embedding model would place reviews of similar sentiment closer to each other in a vector space. The choice of a supervised task is critical to producing relevant embeddings when building an embedding model.

Word embeddings projected onto 2D axes

On the diagram above we can see word embeddings only, but we often need more than that since human language is more complex than just many words put together. Semantics, word order, and other linguistic parameters should all be taken into account, which means we need to take it to the next level – sentence embedding models.

Sentence embeddings associate an input sentence with a vector of numbers, and, as expected, are way more complex internally since they have to capture more complex relationships.

Thanks to progress in deep learning, all state-of-the-art embedding models are created with deep neural nets, since they better capture complex relationships inherent to a human language.

A good embedding model should:

Be fast since often it is just a preprocessing step in a larger application
Return vectors of manageable dimensions
Return vectors that capture enough information about similarity to be practical

Let’s now quickly look into how most embedding models are organized internally.

Modern Neural Networks Architecture

As we just mentioned, all well-performing state-of-the-art embedding models are deep neural networks.

This is an actively developing field and most top performing models are associated with some novel architecture improvement. Let’s briefly cover two very important architectures: BERT and GPT.

BERT (Bidirectional Encoder Representations from Transformers) was published in 2018 by researchers at Google and described the application of the bidirectional training of “transformer”, a popular attention model, to language modeling. Standard transformers include two separate mechanisms: an encoder for reading text input and a decoder that makes a prediction.

BERT uses an encoder that reads the entire sentence of words at once which allows the model to learn the context of a word based on all of its surroundings, left and right unlike legacy approaches that looked at a text sequence from left to right or right to left. Before feeding word sequences into BERT, some words are replaced with [MASK] tokens and then the model attempts to predict the original value of the masked words, based on the context provided by the other, non-masked words in the sequence.

Standard BERT does not perform very well in most benchmarks and BERT models require task-specific fine-tuning. But it is open-source, has been around since 2018, and has relatively modest system requirements (can be trained on a single medium-range GPU). As a result, it became very popular for many text-related tasks. It is fast, customizable, and small. For example, a very popular all-Mini-LM model is a modified version of BERT.

GPT (Generative Pre-Trained Transformer) by OpenAI is different. Unlike BERT, It is unidirectional, i.e. text is processed in one direction and uses a decoder from a transformer architecture that is suitable for predicting the next word in a sequence. These models are slower and produce very high dimensional embeddings, but they usually have many more parameters, do not require fine-tuning, and are more applicable to many tasks out of the box. GPT is not open source and is available as a paid API.

Context Length and Training Data

Another important parameter of an embedding model is context length. Context length is the number of tokens a model can remember when working with a text. A longer context window allows the model to understand more complex relationships within a wider body of text. As a result, models can provide outputs of higher quality, e.g. capture semantic similarity better.

To leverage a longer context, training data should include longer pieces of coherent text: books, articles, and so on. However, increasing context window length increases the complexity of a model and increases compute and memory requirements for training.

There are methods that help manage resource requirements e.g. approximate attention, but they do this at a cost to quality. That’s another trade-off that affects quality and costs: larger context lengths capture more complex relationships of a human language, but require more resources.

Also, as always, the quality of training data is very important for all models. Embedding models are no exception.

Semantic Search and Information Retrieval

Using embedding models for semantic search is a relatively new approach. For decades, people used other technologies: boolean models, latent semantic indexing (LSI), and various probabilistic models.

Some of these approaches work reasonably well for many existing use cases and are still widely used in the industry.

One of the most popular traditional probabilistic models is BM25 (BM is “best matching”), a search relevance ranking function. It is used to estimate the relevance of a document to a search query and ranks documents based on the query terms from each indexed document. Only recently have embedding models started consistently outperforming it, but BM25 is still used a lot since it is simpler than using embedding models, it has lower computer requirements, and the results are explainable.

Benchmarks

Not every model type has a comprehensive evaluation approach that helps to choose an existing model.

Fortunately, text embedding models have common benchmark suites such as:

BEIR

The article “BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models” proposed a reference set of benchmarks and datasets for information retrieval tasks. The original BEIR benchmark consists of a set of 19 datasets and methods for search quality evaluation. Methods include: question-answering, fact-checking, and entity retrieval. Now anyone who releases a text embedding model for information retrieval tasks can run the benchmark and see how their model ranks against the competition.

MTEB

Massive Text Embedding Benchmarks include BEIR and other components that cover 58 datasets and 112 languages. The public leaderboard for MTEB results can be found here.

These benchmarks have been run on a lot of existing models and their leaderboards are very useful to make an informed choice about model selection.

Using Embedding Models in a Production Environment

Benchmark scores on standard tasks are very important, but they represent only one dimension.

When we use an embedding model for search, we run it twice:

When doing offline indexing of available data
When embedding a user query for a search request

There are two important consequences of this.

The first is that we have to reindex all existing data when we change or upgrade an embedding model. All systems built using embedding models should be designed with upgradability in mind because newer and better models are released all the time and, most of the time, upgrading a model is the easiest way to improve overall system performance. An embedding model is a less stable component of the system infrastructure in this case.

The second consequence of using an embedding model for user queries is that the inference latency becomes very important when the number of users goes up. Model inference takes more time for better-performing models, especially if they require GPU to run: having latency higher than 100ms for a small query is not unheard of for models that have more than 1B parameters. It turns out that smaller, leaner models are still very important in a higher-load production scenario.

The tradeoff between quality and latency is real and we should always remember about it when choosing an embedding model.

As we have mentioned above, embedding models help manage output vector dimensionality which affects the performance of many algorithms downstream. Generally the smaller the model, the shorter the output vector length, but, often, it is still too great for smaller models. That’s when we need to use dimensionality reduction algorithms such as PCA (principal component analysis), SNE / tSNE (stochastic neighbor embedding), and UMAP (uniform manifold approximation).

Another place we can use dimensionality reduction is before storing embeddings in a database. Resulting vector embeddings will occupy less space and retrieval speed will be faster, but will come at a price for the quality downstream. Vector databases are often not the primary storage, so embeddings can be regenerated with better precision from the original source data. Their use helps to reduce the output vector length and, as a result, makes the system faster and leaner.

Making the Right Choice

There’s an abundance of factors and trade-offs that should be considered when choosing an embedding model for a use case. The score of a potential model in common benchmarks is important, but we should not forget that it’s the larger models that have a better score. Larger models have higher inference time which can severely limit their use in low latency scenarios as often an embedding model is a pre-processing step in a larger pipeline. Also, larger models require GPUs to run.

If you intend to use a model in a low-latency scenario, it’s better to focus on latency first and then see which models with acceptable latency have the best-in-class performance. Also, when building a system with an embedding model you should plan for changes since better models are released all the time and often it’s the simplest way to improve the performance of your system.

The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

Reflecting on the Richness of Black Art

Lelia Colley — Thu, 29 Feb 2024 17:45:21 +0000

As Black History Month comes to a close, it’s essential to take a moment to reflect on this year’s theme, “Black Art – The Infusion of African, Caribbean, and Black American Lived Experiences.”

Our employee business resource community, BEACON, embarked on a purposeful journey to celebrate and amplify the voices of Black artists. This theme served as the guiding light for BEACON’s initiatives throughout the month, creating a unique and dynamic space for exploration, education, and celebration.

Personally, the exploration of Black Art during this month has been a journey of discovery and connection. The infusion of African, Caribbean, and Black American lived experiences in art has provided a lens through which to appreciate the depth of cultural narratives, the resilience of communities, and the celebration of identity.

It’s a reminder that art is not merely an expression; it is a testament to the collective experiences that shape our perceptions of the world.

With BEACON, the celebration of Black History Month took on a unique and dynamic form. This year, we chose to amplify the voices of Black employees at DataRobot through a dedicated podcast focused on the theme of Black Art. The episode delved into the intricate stories behind various art forms, exploring how they serve as vessels for cultural preservation, activism, and personal expression.

In producing the podcast episode, we witnessed firsthand the power of storytelling in fostering understanding and unity. The narratives shared by our guests illuminated the often-overlooked aspects of Black art, showcasing its ability to challenge stereotypes, inspire change, and foster a sense of pride within the community.

As we bid farewell to Black History Month, the echoes of the conversations, stories, and artistic expressions linger. The theme of Black Art has left an indelible mark on our collective consciousness, catalyzing continued exploration, dialogue, and celebration of the rich tapestry that makes up the Black experience. The journey does not end here; it’s a continuous exploration of the interconnectedness of our stories and the profound impact of Black Art on shaping a more inclusive and understanding world.

Get Started with Free Trial

Experience new features and capabilities previously only available in our full AI Platform product.

Get Started Now

The post Reflecting on the Richness of Black Art appeared first on DataRobot AI Platform.

6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them

Jenna Beglin, Jessica Lin — Thu, 08 Feb 2024 14:17:53 +0000

If you’re an AI leader, you might feel like you’re stuck between a rock and a hard place lately.

You have to deliver value from generative AI (GenAI) to keep the board happy and stay ahead of the competition. But you also have to stay on top of the growing chaos, as new tools and ecosystems arrive on the market.

You also have to juggle new GenAI projects, use cases, and enthusiastic users across the organization. Oh, and data security. Your leadership doesn’t want to be the next cautionary tale of good AI gone bad.

If you’re being asked to prove ROI for GenAI but it feels more like you’re playing Whack-a-Mole, you’re not alone.

According to Deloitte, proving AI’s business value is the top challenge for AI leaders. Companies across the globe are struggling to move past prototyping to production. So, here’s how to get it done — and what you need to watch out for.

6 Roadblocks (and Solutions) to Realizing Business Value from GenAI

Roadblock #1. You Set Yourself Up For Vendor Lock-In

GenAI is moving crazy fast. New innovations — LLMs, vector databases, embedding models — are being created daily. So getting locked into a specific vendor right now doesn’t just risk your ROI a year from now. It could literally hold you back next week.

Let’s say you’re all in on one LLM provider right now. What if costs rise and you want to switch to a new provider or use different LLMs depending on your specific use cases? If you’re locked in, getting out could eat any cost savings that you’ve generated with your AI initiatives — and then some.

Solution: Choose a Versatile, Flexible Platform

Prevention is the best cure. To maximize your freedom and adaptability, choose solutions that make it easy for you to move your entire AI lifecycle, pipeline, data, vector databases, embedding models, and more – from one provider to another.

For instance, DataRobot gives you full control over your AI strategy — now, and in the future. Our open AI platform lets you maintain total flexibility, so you can use any LLM, vector database, or embedding model – and swap out underlying components as your needs change or the market evolves, without breaking production. We even give our customers the access to experiment with common LLMs, too.

WEBINAR: Reasons Why Generative AI Initiatives Fail to Deliver Business Value (And How to Avoid Them)

Roadblock #2. Off-the-Grid Generative AI Creates Chaos

If you thought predictive AI was challenging to control, try GenAI on for size. Your data science team likely acts as a gatekeeper for predictive AI, but anyone can dabble with GenAI — and they will. Where your company might have 15 to 50 predictive models, at scale, you could well have 200+ generative AI models all over the organization at any given time.

Worse, you might not even know about some of them. “Off-the-grid” GenAI projects tend to escape leadership purview and expose your organization to significant risk.

While this enthusiastic use of AI can be a recipe for greater business value, in fact, the opposite is often true. Without a unifying strategy, GenAI can create soaring costs without delivering meaningful results.

Solution: Manage All of Your AI Assets in a Unified Platform

Fight back against this AI sprawl by getting all your AI artifacts housed in a single, easy-to-manage platform, regardless of who made them or where they were built. Create a single source of truth and system of record for your AI assets — the way you do, for instance, for your customer data.

Once you have your AI assets in the same place, then you’ll need to apply an LLMOps mentality:

Create standardized governance and security policies that will apply to every GenAI model.
Establish a process for monitoring key metrics about models and intervening when necessary.
Build feedback loops to harness user feedback and continuously improve your GenAI applications.

DataRobot does this all for you. With our AI Registry, you can organize, deploy, and manage all of your AI assets in the same location – generative and predictive, regardless of where they were built. Think of it as a single source of record for your entire AI landscape – what Salesforce did for your customer interactions, but for AI.

Roadblock #3. GenAI and Predictive AI Initiatives Aren’t Under the Same Roof

If you’re not integrating your generative and predictive AI models, you’re missing out. The power of these two technologies put together is a massive value driver, and businesses that successfully unite them will be able to realize and prove ROI more efficiently.

Here are just a few examples of what you could be doing if you combined your AI artifacts in a single unified system:

Create a GenAI-based chatbot in Slack so that anyone in the organization can query predictive analytics models with natural language (Think, “Can you tell me how likely this customer is to churn?”). By combining the two types of AI technology, you surface your predictive analytics, bring them into the daily workflow, and make them far more valuable and accessible to the business.

Use predictive models to control the way users interact with generative AI applications and reduce risk exposure. For instance, a predictive model could stop your GenAI tool from responding if a user gives it a prompt that has a high probability of returning an error or it could catch if someone’s using the application in a way it wasn’t intended.

Set up a predictive AI model to inform your GenAI responses, and create powerful predictive apps that anyone can use. For example, your non-tech employees could ask natural language queries about sales forecasts for next year’s housing prices, and have a predictive analytics model feeding in accurate data.

Trigger GenAI actions from predictive model results. For instance, if your predictive model predicts a customer is likely to churn, you could set it up to trigger your GenAI tool to draft an email that will go to that customer, or a call script for your sales rep to follow during their next outreach to save the account.

However, for many companies, this level of business value from AI is impossible because they have predictive and generative AI models siloed in different platforms.

Solution: Combine your GenAI and Predictive Models

With a system like DataRobot, you can bring all your GenAI and predictive AI models into one central location, so you can create unique AI applications that combine both technologies.

Not only that, but from inside the platform, you can set and track your business-critical metrics and monitor the ROI of each deployment to ensure their value, even for models running outside of the DataRobot AI Platform.

Roadblock #4. You Unknowingly Compromise on Governance

For many businesses, the primary purpose of GenAI is to save time — whether that’s reducing the hours spent on customer queries with a chatbot or creating automated summaries of team meetings.

However, this emphasis on speed often leads to corner-cutting on governance and monitoring. That doesn’t just set you up for reputational risk or future costs (when your brand takes a major hit as the result of a data leak, for instance.) It also means that you can’t measure the cost of or optimize the value you’re getting from your AI models right now.

Solution: Adopt a Solution to Protect Your Data and Uphold a Robust Governance Framework

To solve this issue, you’ll need to implement a proven AI governance tool ASAP to monitor and control your generative and predictive AI assets.

A solid AI governance solution and framework should include:

Clear roles, so every team member involved in AI production knows who is responsible for what
Access control, to limit data access and permissions for changes to models in production at the individual or role level and protect your company’s data
Change and audit logs, to ensure legal and regulatory compliance and avoid fines
Model documentation, so you can show that your models work and are fit for purpose
A model inventory to govern, manage, and monitor your AI assets, irrespective of deployment or origin

Current best practice: Find an AI governance solution that can prevent data and information leaks by extending LLMs with company data.

The DataRobot platform includes these safeguards built-in, and the vector database builder lets you create specific vector databases for different use cases to better control employee access and make sure the responses are super relevant for each use case, all without leaking confidential information.

Roadblock #5. It’s Tough To Maintain AI Models Over Time

Lack of maintenance is one of the biggest impediments to seeing business results from GenAI, according to the same Deloitte report mentioned earlier. Without excellent upkeep, there’s no way to be confident that your models are performing as intended or delivering accurate responses that’ll help users make sound data-backed business decisions.

In short, building cool generative applications is a great starting point — but if you don’t have a centralized workflow for tracking metrics or continuously improving based on usage data or vector database quality, you’ll do one of two things:

Spend a ton of time managing that infrastructure.
Let your GenAI models decay over time.

Neither of those options is sustainable (or secure) long-term. Failing to guard against malicious activity or misuse of GenAI solutions will limit the future value of your AI investments almost instantaneously.

Solution: Make It Easy To Monitor Your AI Models

To be valuable, GenAI needs guardrails and steady monitoring. You need the AI tools available so that you can track:

Employee and customer-generated prompts and queries over time to ensure your vector database is complete and up to date
Whether your current LLM is (still) the best solution for your AI applications
Your GenAI costs to make sure you’re still seeing a positive ROI
When your models need retraining to stay relevant

DataRobot can give you that level of control. It brings all your generative and predictive AI applications and models into the same secure registry, and lets you:

Set up custom performance metrics relevant to specific use cases
Understand standard metrics like service health, data drift, and accuracy statistics
Schedule monitoring jobs
Set custom rules, notifications, and retraining settings. If you make it easy for your team to maintain your AI, you won’t start neglecting maintenance over time.

Roadblock #6. The Costs are Too High – or Too Hard to Track

Generative AI can come with some serious sticker shock. Naturally, business leaders feel reluctant to roll it out at a sufficient scale to see meaningful results or to spend heavily without recouping much in terms of business value.

Keeping GenAI costs under control is a huge challenge, especially if you don’t have real oversight over who is using your AI applications and why they’re using them.

Solution: Track Your GenAI Costs and Optimize for ROI

You need technology that lets you monitor costs and usage for each AI deployment. With DataRobot, you can track everything from the cost of an error to toxicity scores for your LLMs to your overall LLM costs. You can choose between LLMs depending on your application and optimize for cost-effectiveness.

That way, you’re never left wondering if you’re wasting money with GenAI — you can prove exactly what you’re using AI for and the business value you’re getting from each application.

Deliver Measurable AI Value with DataRobot

Proving business value from GenAI is not an impossible task with the right technology in place. A recent economic analysis by the Enterprise Strategy Group found that DataRobot can provide cost savings of 75% to 80% compared to using existing resources, giving you a 3.5x to 4.6x expected return on investment and accelerating time to initial value from AI by up to 83%.

DataRobot can help you maximize the ROI from your GenAI assets and:

Mitigate the risk of GenAI data leaks and security breaches
Keep costs under control
Bring every single AI project across the organization into the same place
Empower you to stay flexible and avoid vendor lock-in
Make it easy to manage and maintain your AI models, regardless of origin or deployment

If you’re ready for GenAI that’s all value, not all talk, start your free trial today.

The post 6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them appeared first on DataRobot AI Platform.

Beyond Differences, Embracing the Journey: A New Year’s Resolution for a Better Tomorrow

Teresa Gearin — Tue, 23 Jan 2024 18:54:46 +0000

As we bid farewell to the old year and welcome the promise of a new beginning, there’s no better time to reflect on our collective journey toward a more inclusive and equitable future. The dawn of a new year presents us with the opportunity to set intentions that go beyond personal aspirations—this year, let’s make it a collective commitment to foster Diversity, Equity, Inclusion, and Belonging (DEIB) in every aspect of our lives.

In this blog, we’ll explore the power of our DEIB communities as one transformative force, delve into the importance of creating spaces that embrace everyone’s unique narratives, and share practical resolutions to help us all weave the principles of diversity, equity, inclusion, and belonging into the fabric of our daily lives. As we step into the unknown of this new year, let’s embark on a journey together—one that champions unity, celebrates differences, and shapes a brighter, more inclusive tomorrow for all.

In 2023, the Belong Community Leaders came together to create a space for meaningful discussions on relevant topics called Beyond Differences. This series of internal events is hosted by the Belong communities and open to ALL. The goal is to be Better Together by driving conversation, initiating action, accelerating progress, and facilitating impact.

We had two roundtables discussing cultural differences and diverse networking, and one guest speaker presenting on the topic of cultural competency. As we look forward into 2024, we plan to continue strengthening our communities, fostering real connection, and facilitating empathy and understanding by looking beyond our differences.

Like last year, we asked our communities what their goals and resolutions are for the coming year.

ACTnow! advocates for the diverse needs of all Asian, Asian American, and Pacific Islander teammates through educational, cultural and social activities. Leader Alex Shoop (he/him).

This year, we celebrate the year of the dragon which symbolizes fortune, resilience, and strength. On behalf of the ACTnow! community, we wish you a happy 2024! We will continue cultivating a space to celebrate Asian traditions, share cultural learnings, and spotlight our diverse employees.

ADAPT provides education and allyship to advocate for and empower our teammates with disabilities to ensure an inclusive work environment. Leader Chad Harness (he/him).

To enhance the inclusivity and engagement within our ADAPT community, we will initiate weekly discussions centered around “AI and Machine Learning for All.” Our goal is to stimulate active participation, with the aim of increasing the weekly engagement and maintaining active involvement throughout the year. We will collaborate with DataRobot’s AI experts to curate relevant content and insights tailored to inclusivity in AI, fostering a more vibrant and interactive support group.

In order to strengthen advocacy and support within our ADAPT community, we will establish a dedicated space where members can confidentially share concerns or insights related to accessibility and inclusivity. Our goal is to provide timely response or acknowledgment to these insights and concerns.

To further enhance mental health support within our ADAPT community, we will expand the existing resources, offering tech-specific insights and coping strategies.

BEACON aims to advance a diverse, inclusive, and equitable community that fosters a culture of belonging for Black teammates both current and future. Leader Lelia Colley (she/her).

In 2023, BEACON ended the year with a focus on community and connection by kicking off our Pages of Inclusion Book Club. These sessions allowed for intentional thought collaboration and to become familiar with Black authors and refreshing concepts.

When it comes to our focus of 2024, we aim to create a more inclusive, supportive, and empowering environment for black employees and allies at DataRobot by establishing initiatives to support the mental health and well-being of black employees and allies, including mental health resources and open discussions:

Fostering a supportive network among black employees and allies through regular networking events, cross-functional collaborations, and knowledge-sharing forums.
Implementing programs to recognize and celebrate the achievements and contributions of black employees and allies within the organization. Establishing safe spaces for open dialogues where BEACON members can share experiences, concerns, and ideas to foster a more supportive community.
Promoting allyship and inclusivity by providing resources and training to DataRobot employees, encouraging active support of marginalized communities.
Actively participating in recruitment efforts by collaborating with the People team to identify strategies for attracting and retaining black talent.

LATTITUD is dedicated to connecting the Latin/Hispanic community in a supportive and uplifting environment while creating space to share ideas, struggles, resources, and celebrate our diverse cultures and accomplishments. Leader Lisa Aguilar (she/her).

PrideBots provides an open, safe, inclusive community where members can connect on common interests or backgrounds and celebrate all sexes, gender identities, gender expressions, and orientations. Leader Em Radkowski (she/her).

Pridebots resolve to lean further into affirming the identities of LGBTQIA+ and questioning team members. We commit to acknowledging and raising awareness about the diverse lived experiences of LGBTQIA+ individuals, recognizing the impact of factors such as race, religion, ethnicity, age, ability status, social class, and other social characteristics.

As part of this commitment, our goal is to advise the Executive Leadership Team (ELT) on opportunities and challenges related to LGBTQIA+ team members. We will advocate for inclusive policies and practices within the organization. Additionally, our community aims to review our health benefits and advise our organization to ensure that all individuals, regardless of their identity, can benefit from comprehensive healthcare services. This goal aligns with our broader commitment to creating a workplace that values diversity and prioritizes the well-being of all employees.

DataRobot Veterans brings together those who have served in all branches of the military for ongoing resources, support and networking. Leader Robert Newsom (he/him).

2024 will be the year when the DataRobot Veterans community launches, starting with an effort to increase channel membership and a poll to identify which components our veterans belong to. We want to give presentations on topics of interest to the community, such as the PACT Act and any company policies regarding mandatory service and recall to active duty.

Women @ DR seeks to create, promote and expand an inclusive culture that connects, educates and advances the needs, professional goals and aspirations of our community of female-identifying members and allies. Leader Teresa Gearin (she/her).

Women@DR is committed to laying a solid foundation for a future mentorship program in 2024, while simultaneously enhancing the overall experience of women within the organization. Our focus is on connecting the community, building allyship, and shining a light on gaps in equity and inclusion. This groundwork will contribute to a more supportive and inclusive workplace for women at every stage of their careers.

Promoting Diversity, Equity, Inclusion, and Belonging (DEIB)

Promoting Diversity, Equity, Inclusion, and Belonging (DEIB) is a collective effort to foster a culture that values and respects differences, for all individuals, at all levels. Here are a few ways you can join this effort in small everyday actions.

Educate Yourself: Take the initiative to educate yourself on issues related to diversity, equity, and inclusion. Read books, articles, and attend workshops to broaden your understanding.

Listen Actively: Listen to the experiences and perspectives of people from different backgrounds without judgment. Actively seek to understand the challenges faced by others and be empathetic.

Challenge Stereotypes: Speak up when you encounter stereotypes or biased statements. Challenge misconceptions and promote a more accurate understanding of diverse groups.

Use Inclusive Language: Be mindful of the language you use and strive to use inclusive terminology. Avoid making assumptions about people based on stereotypes or preconceived notions.

Amplify Others: Amplify the voices of those who may be marginalized or underrepresented. Give credit where it is due and acknowledge the contributions of others.

Advocate for Inclusive Policies: Support and advocate for workplace policies that promote diversity, equity, and inclusion. Encourage your organization to adopt practices that create a more inclusive environment.

Call Out Microaggressions: Address microaggressions when you witness them, even if they are subtle. Help create an environment where people feel safe and respected.

Engage in Allyship: Be an ally to individuals from marginalized groups by actively supporting and standing up for them. Use your privilege to advocate for equal opportunities.

Embrace Lifelong Learning: Recognize that promoting DEIB is an ongoing process of learning and unlearning.Stay open to new ideas and be willing to adapt your perspectives based on new information.

Belong @ DataRobot is committed to continuing our journey in Diversity, Equity, Inclusion, and Belonging in 2024 by presenting opportunities for the people of DataRobot to actively participate in community, events, education, conversations, and self reflection. We wish everyone a gentle and prosperous 2024.

The post Beyond Differences, Embracing the Journey: A New Year’s Resolution for a Better Tomorrow appeared first on DataRobot AI Platform.

Choosing the Right Database for Your Generative AI Use Case

Nick Volynets — Thu, 11 Jan 2024 16:54:47 +0000

Ways of Providing Data to a Model

Many organizations are now exploring the power of generative AI to improve their efficiency and gain new capabilities. In most cases, to fully unlock these powers, AI must have access to the relevant enterprise data. Large Language Models (LLMs) are trained on publicly available data (e.g. Wikipedia articles, books, web index, etc.), which is enough for many general-purpose applications, but there are plenty of others that are highly dependent on private data, especially in enterprise environments.

There are three main ways to provide new data to a model:

Pre-training a model from scratch. This rarely makes sense for most companies because it is very expensive and requires a lot of resources and technical expertise.
Fine-tuning an existing general-purpose LLM. This can reduce the resource requirements compared to pre-training, but still requires significant resources and expertise. Fine-tuning produces specialized models that have better performance in a domain for which it is finetuned for but may have worse performance in others.
Retrieval augmented generation (RAG). The idea is to fetch data relevant to a query and include it in the LLM context so that it could “ground” its own outputs in that information. Such relevant data in this context is referred to as “grounding data”. RAG complements generic LLM models, but the amount of information that can be provided is limited by the LLM context window size (amount of text the LLM can process at once, when the information is generated).

Currently, RAG is the most accessible way to provide new information to an LLM, so let’s focus on this method and dive a little deeper.

Retrieval Augmented Generation

In general, RAG means using a search or retrieval engine to fetch a relevant set of documents for a specified query.

For this purpose, we can use many existing systems: a full-text search engine (like Elasticsearch + traditional information retrieval techniques), a general-purpose database with a vector search extension (Postgres with pgvector, Elasticsearch with vector search plugin), or a specialized database that was created specifically for vector search.

In two latter cases, RAG is similar to semantic search. For a long time, semantic search was a highly specialized and complex domain with exotic query languages and niche databases. Indexing data required extensive preparation and building knowledge graphs, but recent progress in deep learning has dramatically changed the landscape. Modern semantic search applications now depend on embedding models that successfully learn semantic patterns in presented data. These models take unstructured data (text, audio, or even video) as input and transform them into vectors of numbers of a fixed length, thus turning unstructured data into a numeric form that could be used for calculations Then it becomes possible to calculate the distance between vectors using a chosen distance metric, and the resulting distance will reflect the semantic similarity between vectors and, in turn, between pieces of original data.

These vectors are indexed by a vector database and, when querying, our query is also transformed into a vector. The database searches for the N closest vectors (according to a chosen distance metric like cosine similarity) to a query vector and returns them.

A vector database is responsible for these 3 things:

Indexing. The database builds an index of vectors using some built-in algorithm (e.g. locality-sensitive hashing (LSH) or hierarchical navigable small world (HNSW)) to precompute data to speed up querying.
Querying. The database uses a query vector and an index to find the most relevant vectors in a database.
Post-processing. After the result set is formed, sometimes we might want to run an additional step like metadata filtering or re-ranking within the result set to improve the outcome.

The purpose of a vector database is to provide a fast, reliable, and efficient way to store and query data. Retrieval speed and search quality can be influenced by the selection of index type. In addition to the already mentioned LSH and HNSW there are others, each with its own set of strengths and weaknesses. Most databases make the choice for us, but in some, you can choose an index type manually to control the tradeoff between speed and accuracy.

At DataRobot, we believe the technique is here to stay. Fine-tuning can require very sophisticated data preparation to turn raw text into training-ready data, and it’s more of an art than a science to coax LLMs into “learning” new facts through fine-tuning while maintaining their general knowledge and instruction-following behavior.

LLMs are typically very good at applying knowledge supplied in-context, especially when only the most relevant material is provided, so a good retrieval system is crucial.

Note that the choice of the embedding model used for RAG is essential. It is not a part of the database and choosing the correct embedding model for your application is critical for achieving good performance. Additionally, while new and improved models are constantly being released, changing to a new model requires reindexing your entire database.

Evaluating Your Options

Choosing a database in an enterprise environment is not an easy task. A database is often the heart of your software infrastructure that manages a very important business asset: data.

Generally, when we choose a database we want:

Reliable storage
Efficient querying
Ability to insert, update, and delete data granularly (CRUD)
Set up multiple users with various levels of access for them (RBAC)
Data consistency (predictable behavior when modifying data)
Ability to recover from failures
Scalability to the size of our data

This list is not exhaustive and might be a bit obvious, but not all new vector databases have these features. Often, it is the availability of enterprise features that determine the final choice between a well-known mature database that provides vector search via extensions and a newer vector-only database.

Vector-only databases have native support for vector search and can execute queries very fast, but often lack enterprise features and are relatively immature. Keep in mind that it takes years to build complex features and battle-test them, so it’s no surprise that early adopters face outages and data losses. On the other hand, in existing databases that provide vector search through extensions, a vector is not a first-class citizen and query performance can be much worse.

We will categorize all current databases that provide vector search into the following groups and then discuss them in more detail:

Vector search libraries
Vector-only databases
NoSQL databases with vector search
SQL databases with vector search
Vector search solutions from cloud vendors

Vector search libraries

Vector search libraries like FAISS and ANNOY are not databases – rather, they provide in-memory vector indices, and only limited data persistence options. While these features are not ideal for users requiring a full enterprise database, they have very fast nearest neighbor search and are open source. They offer good support for high-dimensional data and are highly configurable (you can choose the index type and other parameters).

Overall, they are good for prototyping and integration in simple applications, but they are inappropriate for long-term, multi-user data storage.

Vector-only databases

This group includes diverse products like Milvus, Chroma, Pinecone, Weaviate, and others. There are notable differences among them, but all of them are specifically designed to store and retrieve vectors. They are optimized for efficient similarity search with indexing and support high-dimensional data and vector operations natively.

Most of them are newer and might not have the enterprise features we mentioned above, e.g. some of them don’t have CRUD, no proven failure recovery, RBAC, and so on. For the most part, they can store the raw data, the embedding vector, and a small amount of metadata, but they can’t store other index types or relational data, which means you will have to use another, secondary database and maintain consistency between them.

Their performance is often unmatched and they are a good option when having multimodal data (images, audio or video).

NoSQL databases with vector search

Many so-called NoSQL databases recently added vector search to their products, including MongoDB, Redis, neo4j, and ElasticSearch. They offer good enterprise features, are mature, and have a strong community, but they provide vector search functionality via extensions which might lead to less than ideal performance and lack of first-class support for vector search. Elasticsearch stands out here as it is designed for full-text search and already has many traditional information retrieval features that can be used in conjunction with vector search.

NoSQL databases with vector search are a good choice when you are already invested in them and need vector search as an additional, but not very demanding feature.

SQL databases with vector search

This group is somewhat similar to the previous group, but here we have established players like PostgreSQL and ClickHouse. They offer a wide array of enterprise features, are well-documented, and have strong communities. As for their disadvantages, they are designed for structured data, and scaling them requires specific expertise.

Their use case is also similar: good choice when you already have them and the expertise to run them in place.

Vector search solutions from cloud vendors

Hyperscalers also offer vector search services. They usually have basic features for vector search (you can choose an embedding model, index type, and other parameters), good interoperability within the rest of the cloud platform, and more flexibility when it comes to cost, especially if you use other services on their platform. However, they have different maturity and different feature sets: Google Cloud vector search uses a fast proprietary index search algorithm called ScaNN and metadata filtering, but is not very user-friendly; Azure Vector search offers structured search capabilities, but is in preview phase and so on.

Vector search entities can be managed using enterprise features of their platform like IAM (Identity and Access Management), but they are not that simple to use and suited for general cloud usage.

Making the Right Choice

The main use case of vector databases in this context is to provide relevant information to a model. For your next LLM project, you can choose a database from an existing array of databases that offer vector search capabilities via extensions or from new vector-only databases that offer native vector support and fast querying.

The choice depends on whether you need enterprise features, or high-scale performance, as well as your deployment architecture and desired maturity (research, prototyping, or production). One should also consider which databases are already present in your infrastructure and whether you have multimodal data. In any case, whatever choice you will make it is good to hedge it: treat a new database as an auxiliary storage cache, rather than a central point of operations, and abstract your database operations in code to make it easy to adjust to the next iteration of the vector RAG landscape.

How DataRobot Can Help

There are already so many vector database options to choose from. They each have their pros and cons – no one vector database will be right for all of your organization’s generative AI use cases. That is why it’s important to retain optionality and leverage a solution that allows you to customize your generative AI solutions to specific use cases, and adapt as your needs change or the market evolves.

The DataRobot AI Platform lets you bring your own vector database – whichever is right for the solution you’re building. If you require changes in the future, you can swap out your vector database without breaking your production environment and workflows.

The post Choosing the Right Database for Your Generative AI Use Case appeared first on DataRobot AI Platform.

Open Source AI Models – What the U.S. National AI Advisory Committee Wants You to Know

Haniyeh Mahmoudian, Michael Schmidt — Thu, 04 Jan 2024 15:07:59 +0000

The unprecedented rise of artificial intelligence (AI) has brought transformative possibilities across the board, from industries and economies to societies at large. However, this technological leap also introduces a set of potential challenges. In its recent public meeting, the National AI Advisory Committee (NAIAC)¹, which provides recommendations around the U.S. AI competitiveness, the science around AI, and the AI workforce to the President and the National AI Initiative Office, has voted on a recommendation on ‘Generative AI Away from the Frontier.’²

This recommendation aims to outline the risks and proposed recommendations for how to assess and manage off-frontier AI models – typically referring to open source models. In summary, the recommendation from the NAIAC provides a roadmap for responsibly navigating the complexities of generative AI. This blog post aims to shed light on this recommendation and delineate how DataRobot customers can proactively leverage the platform to align their AI adaption with this recommendation.

Frontier vs Off-Frontier Models

In the recommendation, the distinction between frontier and off-frontier models of generative AI is based on their accessibility and level of advancement. Frontier models represent the latest and most advanced developments in AI technology. These are complex, high-capability systems typically developed and accessed by leading tech companies, research institutions, or specialized AI labs (such as current state-of-the-art models like GPT-4 and Google Gemini). Due to their complexity and cutting-edge nature, frontier models typically have constrained access – they are not widely available or accessible to the general public.

On the other hand, off-frontier models typically have unconstrained access – they are more widely available and accessible AI systems, often available as open source. They might not achieve the most advanced AI capabilities but are significant due to their broader usage. These models include both proprietary systems and open source AI systems and are used by a wider range of stakeholders, including smaller companies, individual developers, and educational institutions.

This distinction is important for understanding the different levels of risks, governance needs, and regulatory approaches required for various AI systems. While frontier models may need specialized oversight due to their advanced nature, off-frontier models pose a different set of challenges and risks because of their widespread use and accessibility.

What the NAIAC Recommendation Covers

The recommendation on ‘Generative AI Away from the Frontier,’ issued by NAIAC in October 2023, focuses on the governance and risk assessment of generative AI systems. The document provides two key recommendations for the assessment of risks associated with generative AI systems:

For Proprietary Off-Frontier Models: It advises the Biden-Harris administration to encourage companies to extend voluntary commitments³ to include risk-based assessments of off-frontier generative AI systems. This includes independent testing, risk identification, and information sharing about potential risks. This recommendation is particularly aimed at emphasizing the importance of understanding and sharing the information on risks associated with off-frontier models.

For Open Source Off-Frontier Models: For generative AI systems with unconstrained access, such as open-source systems, the National Institute of Standards and Technology (NIST) is charged to collaborate with a diverse range of stakeholders to define appropriate frameworks to mitigate AI risks. This group includes academia, civil society, advocacy organizations, and the industry (where legal and technical feasibility allows). The goal is to develop testing and analysis environments, measurement systems, and tools for testing these AI systems. This collaboration aims to establish appropriate methodologies for identifying critical potential risks associated with these more openly accessible systems.

NAIAC underlines the need to understand the risks posed by widely available, off-frontier generative AI systems, which include both proprietary and open-source systems. These risks range from the acquisition of harmful information to privacy breaches and the generation of harmful content. The recommendation acknowledges the unique challenges in assessing risks in open-source AI systems due to the lack of a fixed target for assessment and limitations on who can test and evaluate the system.

Moreover, it highlights that investigations into these risks require a multi-disciplinary approach, incorporating insights from social sciences, behavioral sciences, and ethics, to support decisions about regulation or governance. While recognizing the challenges, the document also notes the benefits of open-source systems in democratizing access, spurring innovation, and enhancing creative expression.

For proprietary AI systems, the recommendation points out that while companies may understand the risks, this information is often not shared with external stakeholders, including policymakers. This calls for more transparency in the field.

Regulation of Generative AI Models

Recently, discussion on the catastrophic risks of AI has dominated the conversations on AI risk, especially with regards to generative AI. This has led to calls to regulate AI in an attempt to promote responsible development and deployment of AI tools. It is worth exploring the regulatory option with regards to generative AI. There are two main areas where policy makers can regulate AI: regulation at model level and regulation at use case level.

In predictive AI, generally, the two levels significantly overlap as narrow AI is built for a specific use case and cannot be generalized to many other use cases. For example, a model that was developed to identify patients with high likelihood of readmission, can only be used for this particular use case and will require input information similar to what it was trained on. However, a single large language model (LLM), a form of generative AI models, can be used in multiple ways to summarize patient charts, generate potential treatment plans, and improve the communication between the physicians and patients.

As highlighted in the examples above, unlike predictive AI, the same LLM can be used in a variety of use cases. This distinction is particularly important when considering AI regulation.

Penalizing AI models at the development level, especially for generative AI models, could hinder innovation and limit the beneficial capabilities of the technology. Nonetheless, it is paramount that the builders of generative AI models, both frontier and off-frontier, adhere to responsible AI development guidelines.

Instead, the focus should be on the harms of such technology at the use case level, especially at governing the use more effectively. DataRobot can simplify governance by providing capabilities that enable users to evaluate their AI use cases for risks associated with bias and discrimination, toxicity and harm, performance, and cost. These features and tools can help organizations ensure that AI systems are used responsibly and aligned with their existing risk management processes without stifling innovation.

Governance and Risks of Open vs Closed Source Models

Another area that was mentioned in the recommendation and later included in the recently signed executive order signed by President Biden⁴, is lack of transparency in the model development process. In the closed-source systems, the developing organization may investigate and evaluate the risks associated with the developed generative AI models. However, information on potential risks, findings around outcome of red teaming, and evaluations done internally has not generally been shared publicly.

On the other hand, open-source models are inherently more transparent due to their openly available design, facilitating the easier identification and correction of potential concerns pre-deployment. But extensive research on potential risks and evaluation of these models has not been conducted.

The distinct and differing characteristics of these systems imply that the governance approaches for open-source models should differ from those applied to closed-source models.

Avoid Reinventing Trust Across Organizations

Given the challenges of adapting AI, there’s a clear need for standardizing the governance process in AI to prevent every organization from having to reinvent these measures. Various organizations including DataRobot have come up with their framework for Trustworthy AI⁵. The government can help lead the collaborative effort between the private sector, academia, and civil society to develop standardized approaches to address the concerns and provide robust evaluation processes to ensure development and deployment of trustworthy AI systems. The recent executive order on the safe, secure, and trustworthy development and use of AI directs NIST to lead this joint collaborative effort to develop guidelines and evaluation measures to understand and test generative AI models. The White House AI Bill of Rights and the NIST AI Risk Management Framework (RMF) can serve as foundational principles and frameworks for responsible development and deployment of AI. Capabilities of the DataRobot AI Platform, aligned with the NIST AI RMF, can assist organizations in adopting standardized trust and governance practices. Organizations can leverage these DataRobot tools for more efficient and standardized compliance and risk management for generative and predictive AI.

¹ National AI Advisory Committee – AI.gov

² RECOMMENDATIONS: Generative AI Away from the Frontier

³ Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence | The White House

⁴ https://www.datarobot.com/trusted-ai-101/

The post Open Source AI Models – What the U.S. National AI Advisory Committee Wants You to Know appeared first on DataRobot AI Platform.

2023: A Year of Innovation and Impact

Jen Hewlette — Thu, 21 Dec 2023 16:37:18 +0000

2023 brings to a close a remarkable year.

Generative AI made AI a household name and ushered in one of the most exciting and fast-paced periods of technological advancement in recent history.

And we’ve been busy at DataRobot bringing to market the newest cutting-edge innovations with generative AI to accelerate impact for our customers around the world.

This year, we “capped a comeback” and delivered three value-packed launches to significantly advance the DataRobot AI Platform. We unveiled new enterprise-grade generative AI functionality to close the confidence gap and accelerate adoption, including generative AI application cost and performance monitoring, a unified observability console and registry for governance, multi-provider comparison playground and other enhancements to deliver greater transparency and governance.

We also strengthened our partner ecosystem to provide maximum optionality and flexibility for our customers, announcing new capabilities and solutions with SAP, Microsoft, Google Cloud, and AWS.

Finally, we celebrated industry recognition for our market-leading platform, including most recently being named a Leader in the inaugural IDC MarketScape: Worldwide AI Governance Platforms 2023 Vendor Assessment. Of the 10 AI/ML platform vendors evaluated, DataRobot was one of just four leaders, with the report citing strengths including extensive experience in regulated industries and continuous innovation.

A Leader like DataRobot sets the bar for AI at a time when both expertise in governance and rapid innovation is paramount – particularly for organizations with the most stringent regulatory requirements like banking and healthcare. DataRobot understands the unique challenges and requirements businesses face today, with an AI platform designed to avoid silos and vendor lock-in, and the tools that help customers accelerate AI deployments and ROI.

Ritu Jyoti
Group Vice President, Worldwide Artificial Intelligence and Automation Research Practice, IDC

Why DataRobot?

DataRobot is the fastest path to AI value, empowering organizations to accelerate AI from idea to impact.

With over a decade at the forefront of AI innovation, we know what it takes to make a real difference – to your bottom line, to your business vision, and to the world around us.

Organizations across industries and geographies like CVS Health, Freddie Mac, Aflac, Warner Bros., BMW, and the U.S. Army trust DataRobot to help solve their biggest challenges with AI, leveraging generative and predictive capabilities today while providing the flexibility to adapt to the innovations of tomorrow.

We’re incredibly proud of the work we do to help organizations across industries to deliver real-world value from AI solutions.

We use DataRobot because it just works. We can rapidly build new AI use cases and have the optionality we need, especially with generative AI, that otherwise would not be possible.

Frederique De Letter
Senior Director Business Insights & Analytics, Keller Williams

What’s next for DataRobot in 2024?

Next year, we are laser-focused on continuing to push the boundaries of innovation, listening to our customers and building what is needed most to deliver value for your organizations.

Thank you to our customers and partners for the privilege of working together and delivering more and more impact with AI, every day.

Thank you to Robots around the world for always pushing to be better than yesterday. Here’s to 2024!

The post 2023: A Year of Innovation and Impact appeared first on DataRobot AI Platform.

How to Focus on GenAI Outcomes, Not Infrastructure

Jake Snyder — Tue, 12 Dec 2023 19:30:15 +0000

Are you seeing tangible results from your investment in generative AI — or is it starting to feel like an expensive experiment?

For many AI leaders and engineers, it’s hard to prove business value, despite all their hard work. In a recent Omdia survey of over 5,000+ global enterprise IT practitioners, only 13% of have fully adopted GenAI technologies.

To quote Deloitte’s recent study, “The perennial question is: Why is this so hard?”

The answer is complex — but vendor lock-in, messy data infrastructure, and abandoned past investments are the top culprits. Deloitte found that at least one in three AI programs fail due to data challenges.

If your GenAI models are sitting unused (or underused), chances are it hasn’t been successfully integrated into your tech stack. This makes GenAI, for most brands, feel more like an exacerbation of the same challenges they saw with predictive AI than a solution.

Any given GenAI project contains a hefty mix of different versions, languages, models, and vector databases. And we all know that cobbling together 17 different AI tools and hoping for the best creates a hot mess infrastructure. It’s complex, slow, hard to use, and risky to govern.

Without a unified intelligence layer sitting on top of your core infrastructure, you’ll create bigger problems than the ones you’re trying to solve, even if you’re using a hyperscaler.

That’s why I wrote this article, and that’s why myself and Brent Hinks discussed this in-depth during a recent webinar.

Here, I break down six tactics that will help you shift the focus from half-hearted prototyping to real-world value from GenAI.

6 Tactics That Replace Infrastructure Woes With GenAI Value

Incorporating generative AI into your existing systems isn’t just an infrastructure problem; it’s a business strategy problem—one that separates unrealized or broken prototypes from sustainable GenAI outcomes.

But if you’ve taken the time to invest in a unified intelligence layer, you can avoid unnecessary challenges and work with confidence. Most companies will bump into at least a handful of the obstacles detailed below. Here are my recommendations on how to turn these common pitfalls into growth accelerators:

1. Stay Flexible by Avoiding Vendor Lock-In

Many companies that want to improve GenAI integration across their tech ecosystem end up in one of two buckets:

They get locked into a relationship with a hyperscaler or single vendor
They haphazardly cobble together various component pieces like vector databases, embedding models, orchestration tools, and more.

Given how fast generative AI is changing, you don’t want to end up locked into either of these situations. You need to retain your optionality so you can quickly adapt as the tech needs of your business evolve or as the tech market changes. My recommendation? Use a flexible API system.

DataRobot can help you integrate with all of the major players, yes, but what’s even better is how we’ve built our platform to be agnostic about your existing tech and fit in where you need us to. Our flexible API provides the functionality and flexibility you need to actually unify your GenAI efforts across the existing tech ecosystem you’ve built.

2. Build Integration-Agnostic Models

In the same vein as avoiding vendor lock-in, don’t build AI models that only integrate with a single application. For instance, let’s say you build an application for Slack, but now you want it to work with Gmail. You might have to rebuild the entire thing.

Instead, aim to build models that can integrate with multiple different platforms, so you can be flexible for future use cases. This won’t just save you upfront development time. Platform-agnostic models will also lower your required maintenance time, thanks to fewer custom integrations that need to be managed.

With the right intelligence layer in place, you can bring the power of GenAI models to a diverse blend of apps and their users. This lets you maximize the investments you’ve made across your entire ecosystem. In addition, you’ll also be able to deploy and manage hundreds of GenAI models from one location.

For example, DataRobot could integrate GenAI models that work smoothly across enterprise apps like Slack, Tableau, Salesforce, and Microsoft Teams.

3. Bring Generative And Predictive AI into One Unified Experience

Many companies struggle with generative AI chaos because their generative and predictive models are scattered and siloed. For seamless integration, you need your AI models in a single repository, no matter who built them or where they’re hosted.

DataRobot is perfect for this; so much of our product’s value lies in our ability to unify AI intelligence across an organization, especially in partnership with hyperscalers. If you’ve built most of your AI frameworks with a hyperscaler, we’re just the layer you need on top to add rigor and specificity to your initiatives’ governance, monitoring, and observability.

And this isn’t just for generative or predictive models, but models built by anyone on any platform can be brought in for governance and operation right in DataRobot.

4. Build for Ease of Monitoring and Retraining

Given the pace of innovation with generative AI over the past year, many of the models I built six months ago are already out of date. But to keep my models relevant, I prioritize retraining, and not just for predictive AI models. GenAI can go stale, too, if the source documents or grounding data are out of date.

Imagine you have dozens of GenAI models in production. They could be deployed to all kinds of places such as Slack, customer-facing applications, or internal platforms. Sooner or later your model will need a refresh. If you only have 1-2 models, it may not be a huge concern now, but if you already have an inventory, it’ll take you a lot of manual time to scale the deployment updates.

Updates that don’t happen through scalable orchestration are stalling outcomes because of infrastructure complexity. This is especially critical when you start thinking a year or more down the road since GenAI updates usually require more maintenance than predictive AI.

DataRobot offers model version control with built-in testing to make sure a deployment will work with new platform versions that launch in the future. If an integration fails, you get an alert to notify you about the failure immediately. It also flags if a new dataset has additional features that aren’t the same as the ones in your currently deployed model. This empowers engineers and builders to be far more proactive about fixing things, rather than finding out a month (or further) down the line that an integration is broken.

In addition to model control, I use DataRobot to monitor metrics like data drift and groundedness to keep infrastructure costs in check. The simple truth is that if budgets are exceeded, projects get shut down. This can quickly snowball into a situation where whole teamsare affected because they can’t control costs. DataRobot allows me to track metrics that are relevant to each use case, so I can stay informed on the business KPIs that matter.

5. Stay Aligned With Business Leadership And Your End Users

The biggest mistake that I see AI practitioners make is not talking to people around the business enough. You need to bring in stakeholders early and talk to them often. This is not about having one conversation to ask business leadership if they’d be interested in a specific GenAI use case. You need to continuously affirm they still need the use case — and that whatever you’re working on still meets their evolving needs.

There are three components here:

Engage Your AI Users

It’s crucial to secure buy-in from your end-users, not just leadership. Before you start to build a new model, talk to your prospective end-users and gauge their interest level. They’re the consumer, and they need to buy into what you’re creating, or it won’t get used. Hint: Make sure whatever GenAI models you build need to easily connect to the processes, solutions, and data infrastructures users are already in.

Since your end-users are the ones who’ll ultimately decide whether to act on the output from your model, you need to ensure they trust what you’ve built. Before or as part of the rollout, talk to them about what you’ve built, how it works, and most importantly, how it will help them accomplish their goals.

Involve Your Business Stakeholders In The Development Process

Even after you’ve confirmed initial interest from leadership and end-users, it’s never a good idea to just head off and then come back months later with a finished product. Your stakeholders will almost certainly have a lot of questions and suggested changes. Be collaborative and build time for feedback into your projects. This helps you build an application that solves their need and helps them trust that it works how they want.

Articulate Precisely What You’re Trying To Achieve

It’s not enough to have a goal like, “We want to integrate X platform with Y platform.” I’ve seen too many customers get hung up on short-term goals like these instead of taking a step back to think about overall goals. DataRobot provides enough flexibility that we may be able to develop a simplified overall architecture rather than fixating on a single point of integration. You need to be specific: “We want this Gen AI model that was built in DataRobot to pair with predictive AI and data from Salesforce. And the results need to be pushed into this object in this way.”

That way, you can all agree on the end goal, and easily define and measure the success of the project.

6. Move Beyond Experimentation To Generate Value Early

Teams can spend weeks building and deploying GenAI models, but if the process is not organized, all of the usual governance and infrastructure challenges will hamper time-to-value.

There’s no value in the experiment itself—the model needs to generate results (internally or externally). Otherwise, it’s just been a “fun project” that’s not producing ROI for the business. That is until it’s deployed.

DataRobot can help you operationalize models 83% faster, while saving 80% of the normal costs required. Our Playgrounds feature gives your team the creative space to compare LLM blueprints and determine the best fit.

Instead of making end-users wait for a final solution, or letting the competition get a head start, start with a minimum viable product (MVP).

Get a basic model into the hands of your end users and explain that this is a work in progress. Invite them to test, tinker, and experiment, then ask them for feedback.

An MVP offers two vital benefits:

You can confirm that you’re moving in the right direction with what you’re building.

Your end users get value from your generative AI efforts quickly.

While you may not provide a perfect user experience with your work-in-progress integration, you’ll find that your end-users will accept a bit of friction in the short term to experience the long-term value.

Unlock Seamless Generative AI Integration with DataRobot

If you’re struggling to integrate GenAI into your existing tech ecosystem, DataRobot is the solution you need. Instead of a jumble of siloed tools and AI assets, our AI platform could give you a unified AI landscape and save you some serious technical debt and hassle in the future. With DataRobot, you can integrate your AI tools with your existing tech investments, and choose from best-of-breed components. We’re here to help you:

Avoid vendor lock-in and prevent AI asset sprawl
Build integration-agnostic GenAI models that will stand the test of time
Keep your AI models and integrations up to date with alerts and version control
Combine your generative and predictive AI models built by anyone, on any platform, to see real business value

Ready to get more out of your AI with less friction? Get started today with a free 30-day trial or set up a demo with one of our AI experts.

The post How to Focus on GenAI Outcomes, Not Infrastructure appeared first on DataRobot AI Platform.

Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool

Marshall Krassenstein — Thu, 07 Dec 2023 14:00:00 +0000

Motivation

Accessing, understanding, and retrieving information from documents are central to countless processes across various industries. Whether working in finance, healthcare, at a mom and pop carpet store, or as a student in a University, there are situations where you see a big document that you need to read through to answer questions. Enter JITR, a game-changing tool that ingests PDF files and leverages LLMs (Language Language Models) to answer user queries about the content. Let’s explore the magic behind JITR.

What Is JITR?

JITR, which stands for Just In Time Retrieval, is one of the newest tools in DataRobot’s GenAI Accelerator suite designed to process PDF documents, extract their content, and deliver accurate answers to user questions and queries. Imagine having a personal assistant that can read and understand any PDF document and then provide answers to your questions about it instantly. That’s JITR for you.

AI Accelerator: Use the JITR Bot to Generate Context-Aware Responses

How Does JITR Work?

Ingesting PDFs: The initial stage involves ingesting a PDF into the JITR system. Here, the tool converts the static content of the PDF into a digital format ingestible by the embedding model. The embedding model converts each sentence in the PDF file into a vector. This process creates a vector database of the input PDF file.

Applying your LLM: Once the content is ingested, the tool calls the LLM. LLMs are state-of-the-art AI models trained on vast amounts of text data. They excel at understanding context, discerning meaning, and generating human-like text. JITR employs these models to understand and index the content of the PDF.

Interactive Querying: Users can then pose questions about the PDF’s content. The LLM fetches the relevant information and presents the answers in a concise and coherent manner.

Benefits of Using JITR

Every organization produces a variety of documents that are generated in one department and consumed by another. Often, retrieval of information for employees and teams can be time consuming. Utilization of JITR improves employee efficiency by reducing the review time of lengthy PDFs and providing instant and accurate answers to their questions. In addition, JITR can handle any type of PDF content which enables organizations to embed and utilize it in different workflows without concern for the input document.

Many organizations may not have resources and expertise in software development to develop tools that utilize LLMs in their workflow. JITR enables teams and departments that are not fluent in Python to convert a PDF file into a vector database as context for an LLM. By simply having an endpoint to send PDF files to, JITR can be integrated into any web application such as Slack (or other messaging tools), or external portals for customers. No knowledge of LLMs, Natural Language Processing (NLP), or vector databases is required.

Real-World Applications

Given its versatility, JITR can be integrated into almost any workflow. Below are some of the applications.

Business Report: Professionals can swiftly get insights from lengthy reports, contracts, and whitepapers. Similarly, this tool can be integrated into internal processes, enabling employees and teams to interact with internal documents.

Customer Service: From understanding technical manuals to diving deep into tutorials, JITR can enable customers to interact with manuals and documents related to the products and tools. This can increase customer satisfaction and reduce the number of support tickets and escalations.

Research and Development: R&D teams can quickly extract relevant and digestible information from complex research papers to implement the State-of-the-art technology in the product or internal processes.

Alignment with Guidelines: Many organizations have guidelines that should be followed by employees and teams. JITR enables employees to retrieve relevant information from the guidelines efficiently.

Legal: JITR can ingest legal documents and contracts and answer questions based on the information provided in the input documents.

How to Build the JITR Bot with DataRobot

The workflow for building a JITR Bot is similar to the workflow for deploying any LLM pipeline using DataRobot. The two main differences are:

Your vector database is defined at runtime
You need logic to handle an encoded PDF

For the latter we can define a simple function that takes an encoding and writes it back to a temporary PDF file within our deployment.

```python

def base_64_to_file(b64_string, filename: str='temp.PDF', directory_path: str = "./storage/data") -> str:     

    """Decode a base64 string into a PDF file"""

    import os

    if not os.path.exists(directory_path):

        os.makedirs(directory_path)

    file_path = os.path.join(directory_path, filename)

    with open(file_path, "wb") as f:

        f.write(codecs.decode(b64_string, "base64"))   

    return file_path

```

With this helper function defined we can go through and make our hooks. Hooks are just a fancy phrase for functions with a specific name. In our case, we just need to define a hook called `load_model` and another hook called `score_unstructured`. In `load_model`, we’ll set the embedding model we want to use to find the most relevant chunks of text as well as the LLM we’ll ping with our context aware prompt.

```python

def load_model(input_dir):

    """Custom model hook for loading our knowledge base."""

    import os

    import datarobot_drum as drum

    from langchain.chat_models import AzureChatOpenAI

    from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

    try:

        # Pull credentials from deployment

        key = drum.RuntimeParameters.get("OPENAI_API_KEY")["apiToken"]

    except ValueError:

        # Pull credentials from environment (when running locally)

        key = os.environ.get('OPENAI_API_KEY', '')

    embedding_function = SentenceTransformerEmbeddings(

        model_name="all-MiniLM-L6-v2",

        cache_folder=os.path.join(input_dir, 'storage/deploy/sentencetransformers')

    )

    llm = AzureChatOpenAI(

        deployment_name=OPENAI_DEPLOYMENT_NAME,

        openai_api_type=OPENAI_API_TYPE,

        openai_api_base=OPENAI_API_BASE,

        openai_api_version=OPENAI_API_VERSION,

        openai_api_key=OPENAI_API_KEY,

        openai_organization=OPENAI_ORGANIZATION,

        model_name=OPENAI_DEPLOYMENT_NAME,

        temperature=0,

        verbose=True

    )

    return llm, embedding_function

```

Ok, so we have our embedding function and our LLM. We also have a way to take an encoding and get back to a PDF. So now we get to the meat of the JITR Bot, where we’ll build our vector store at run time and use it to query the LLM.

```python

def score_unstructured(model, data, query, **kwargs) -> str:

    """Custom model hook for making completions with our knowledge base.

    When requesting predictions from the deployment, pass a dictionary

    with the following keys:

    - 'question' the question to be passed to the retrieval chain

    - 'document' a base64 encoded document to be loaded into the vector database

    datarobot-user-models (DRUM) handles loading the model and calling

    this function with the appropriate parameters.

    Returns:

    --------

    rv : str

        Json dictionary with keys:

            - 'question' user's original question

            - 'answer' the generated answer to the question

    """

    import json

    from langchain.chains import ConversationalRetrievalChain

    from langchain.document_loaders import PyPDFLoader

    from langchain.vectorstores.base import VectorStoreRetriever

    from langchain.vectorstores.faiss import FAISS

    llm, embedding_function = model

    DIRECTORY = "./storage/data"

    temp_file_name = "temp.PDF"

    data_dict = json.loads(data)

    # Write encoding to file

    base_64_to_file(data_dict['document'].encode(), filename=temp_file_name, directory_path=DIRECTORY)

    # Load up the file

    loader = PyPDFLoader(os.path.join(DIRECTORY, temp_file_name))

    docs = loader.load_and_split()

    # Remove file when done

    os.remove(os.path.join(DIRECTORY, temp_file_name))

    # Create our vector database 

    texts = [doc.page_content for doc in docs]

    metadatas = [doc.metadata for doc in docs] 

    db = FAISS.from_texts(texts, embedding_function, metadatas=metadatas)  

    # Define our chain

    retriever = VectorStoreRetriever(vectorstore=db)

    chain = ConversationalRetrievalChain.from_llm(

        llm, 

        retriever=retriever

    )

    # Run it

    response = chain(inputs={'question': data_dict['question'], 'chat_history': []})

    return json.dumps({"result": response})

```

With our hooks defined, all that’s left to do is deploy our pipeline so that we have an endpoint people can interact with. To some, the process of creating a secure, monitored and queryable endpoint out of arbitrary Python code may sound intimidating or at least time consuming to set up. Using the drx package, we can deploy our JITR Bot in one function call.

```python

import datarobotx as drx

deployment = drx.deploy(

    "./storage/deploy/", # Path with embedding model

    name=f"JITR Bot {now}", 

    hooks={

        "score_unstructured": score_unstructured,

        "load_model": load_model

    },

    extra_requirements=["pyPDF"], # Add a package for parsing PDF files

    environment_id="64c964448dd3f0c07f47d040", # GenAI Dropin Python environment

)

```

How to Use JITR

Ok, the hard work is over. Now we get to enjoy interacting with our newfound deployment. Through Python, we can again take advantage of the drx package to answer our most pressing questions.

```python

# Find a PDF

url = "https://s3.amazonaws.com/datarobot_public_datasets/drx/Instantnoodles.PDF"

resp = requests.get(url).content

encoding = base64.b64encode(io.BytesIO(resp).read()) # encode it

# Interact

response = deployment.predict_unstructured(

    {

        "question": "What does this say about noodle rehydration?",

        "document": encoding.decode(),

    }

)['result']

— – – – 

{'question': 'What does this say about noodle rehydration?',

 'chat_history': [],

 'answer': 'The article mentions that during the frying process, many tiny holes are created due to mass transfer, and they serve as channels for water penetration upon rehydration in hot water. The porous structure created during frying facilitates rehydration.'}

```

But more importantly, we can hit our deployment in any language we want since it’s just an endpoint. Below, I show a screenshot of me interacting with the deployment right through Postman. This means we can integrate our JITR Bot into essentially any application we want by just having the application make an API call.

Once embedded in an application, using JITR is very easy. For example, in the Slackbot application used at DataRobot internally, users simply upload a PDF with a question to start a conversation related to the document.

JITR makes it easy for anyone in an organization to start driving real-world value from generative AI, across countless touchpoints in employees’ day-to-day workflows. Check out this video to learn more about JITR.

Things You Can Do to Make the JITR Bot More Powerful

In the code I showed, we ran through a straightforward implementation of the JITRBot which takes an encoded PDF and makes a vector store at runtime in order to answer questions. Since they weren’t relevant to the core concept, I opted to leave out a number of bells and whistles we implemented internally with the JITRBot such as:

Returning context aware prompt and completion tokens
Answering questions based on multiple documents
Answering multiple questions at once
Letting users provide conversation history
Using other chains for different types of questions
Reporting custom metrics back to the deployment

There’s also no reason why the JITRBot has to only work with PDF files! So long as a document can be encoded and converted back into a string of text, we could build more logic into our `score_unstructured` hook to handle any file type a user provides.

Start Leveraging JITR in Your Workflow

JITR makes it easy to interact with arbitrary PDFs. If you’d like to give it a try, you can follow along with the notebook here.

The post Deep Dive into JITR: The PDF Ingesting and Querying Generative AI Tool appeared first on DataRobot AI Platform.

BELONG @ DataRobot: Two Years of Being Better Together

Amy Suchanek — Wed, 29 Nov 2023 15:26:54 +0000

It’s been two years since we officially launched Belong @ DataRobot. With the goal to make everyone feel valued and heard, Belong is committed to ensuring that all employees are confident and respected for their unique abilities, secure in the knowledge that they belong at DataRobot. And, as we celebrate our two-year anniversary, it’s a wonderful time to reflect on the strides we’ve made along the way to help people feel like they Belong.

I’d be lying if I said the last year was easy. Like most companies today, we’ve gone through so many changes as an organization and as people. We’ve said goodbye to many colleagues, and welcomed many new ones. We’ve seen the market for AI explode and have been leveraging evolving technologies and evolving our product every day to continue to provide value for our customers. And through it all, we have been there for each other – in our monthly meetings, round tables, one-on-ones. Personally, I worked through finalizing a divorce and found our Women @ DR group to be a real, honest space to share my experience and be heard without judgment or mark on the role that I carry. I know our other employee resource groups have done the same for many others, continuing to hold a safe space to share experiences and bring connection, through all the changes, ups and downs. We established these groups in our first year and they continue to evolve and educate our team members every day:

ACTnow! stands for Asians Coming Together Now and cultivates an inclusive space to advocate for the diverse needs of all Asian, Asian American, and Pacific Islander employees at DataRobot through educational, cultural, and social activities.
ADAPT stands for Abled and Disabled Advocating and Partnering Together and provides education and allyship to advocate for and empower DataRobot employees with disabilities to ensure an inclusive work environment.
BEACON stands for Black Employees and Allies Changing Outlooks Now and aims to advance a diverse, inclusive, and equitable community that fosters a culture of belonging for Black employees both, current and future.
LATTITUD stands for Latinx, Hispanics and Allies in Tech Together Influencing Technology Inclusion and Uniting for Diversity and is dedicated to connecting the Latin/Hispanic community in a supportive and uplifting environment while creating space to share ideas, struggles, resources, and celebrate our diverse cultures and accomplishments.
PrideBots vision is to provide an open, safe, inclusive community where DataRobot employees can connect on common interests or backgrounds and bring our collective voices together to drive innovation, create opportunities, inspire each other and celebrate all sexes, gender identities, gender expressions, and orientations. We welcome all members of the LGBTQIA+ community as well as allies.
Veterans group brings together those who have served in all branches of the military for ongoing resources, support, and networking.
Women @ DR seeks to create, promote, and expand an inclusive culture that connects, educates, and advances the needs, professional goals, and aspirations of our community of female-identifying members and allies.

As the year continued, we realized that we wanted to bring people together across our resource groups, so we started our ‘Beyond Differences’ Round Tables. In December, we are also kicking off a quarterly ‘Beyond Differences’ Speaker Series, starting with the subject of cultural competence.

Further, each month, in our Belong Newsletter and Functional All Hands meetings, we share tips and tricks on what it means to be inclusive – we’ve spotlighted subjects such as ‘how to hold inclusive meetings’. And, we have book clubs! Organized by employees who are interested in learning more about each other’s challenges and discussing how to make a bigger impact on this world.

As we look ahead to the future and at the past to help us shape a better future, I’m always reminded of this quote from an adaptation of an article written by Dafina-Lazarus Stewart. It speaks to the continuous work and the consistent questions that must be asked and answered to bring awareness to Diversity, Equity, Inclusion, and Belonging:

– Diversity Asks: Who is in the room?
– Equity Asks: Who is trying to get in the room but can’t? And what are the barriers?

– Inclusion Asks: Have everyone’s ideas been heard, respected, and understood?

– Belonging Asks: Is everyone feeling valued through positive connections with others and able to bring their authentic selves to work?

While we always have room to grow and be better than yesterday, Belong @ DataRobot celebrates the strides we have made over the past two years. As we continue to receive messages that our Belong community has been a safe space that allows people to be themselves during their low moments, we know that our work is making an impact. Belong is a journey of discovery, authenticity, connection, and, most importantly, people.

The post BELONG @ DataRobot: Two Years of Being Better Together appeared first on DataRobot AI Platform.