AI & ML Expertise Archives | DataRobot AI Platform https://www.datarobot.com/blog/category/ai-ml-expertise/ Deliver Value from AI Wed, 14 Feb 2024 11:44:26 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 Value-Driven AI: Applying Lessons Learned from Predictive AI to Generative AI https://www.datarobot.com/blog/value-driven-ai-applying-lessons-learned-from-predictive-ai-to-generative-ai/ Thu, 05 Oct 2023 15:27:56 +0000 https://www.datarobot.com/?post_type=blog&p=50875 Learn about the shift from predictive to generative AI. Gain insights, overcome challenges, and harness the power of AI transformation.

The post Value-Driven AI: Applying Lessons Learned from Predictive AI to Generative AI appeared first on DataRobot AI Platform.

]]>
If we look back five years, most enterprises were just getting started with machine learning and predictive AI, trying to figure out which projects they should choose. This is a question that is still incredibly important, but the AI landscape has now evolved dramatically, as have the questions enterprises are working to answer. 

Most organizations find that their first use cases are harder than anticipated. And the questions just keep piling up. Should they go after the moonshot projects or focus on steady streams of incremental value, or some mix of both? How do you scale? What do you do next? 

Generative models – ChatGPT being the most impactful – have completely changed the AI scene and forced organizations to ask entirely new questions. The big one is, which hard-earned lessons about getting value from predictive AI do we apply to generative AI

Top Dos and Don’ts of Getting Value with Predictive AI

Companies that generate value from predictive AI tend to be aggressive about delivering those first use cases. 

Some Dos they follow are: 

  • Choosing the right projects and qualifying those projects holistically. It’s easy to fall into the trap of spending too much time on the technical feasibility of projects, but the successful teams are ones that also think about getting appropriate sponsorship and buy-in from multiple levels of their organization.
  • Involving the right mix of stakeholders early. The most successful teams have business users who are invested in the outcome and even asking for more AI projects. 
  • Fanning the flames. Celebrate your successes to inspire, overcome inertia, and create urgency. This is where executive sponsorship comes in very handy. It helps you to lay the groundwork for more ambitious projects. 

Some of the Don’ts we notice with our clients are: 

  • Starting with your hardest and highest value problem introduces a lot of risk, so we advise not doing that. 
  • Deferring modeling until the data is perfect. This mindset can result in perpetually deferring value unnecessarily. 
  • Focusing on perfecting your organizational design, your operating model, and strategy, which can make it very hard to scale your AI projects. 

What New Technical Challenges May Arise with Generative AI?

  • Increased computational requirements. Generative AI models require high performance computation and hardware in order to train and run them. Either companies will need to own this hardware or use the cloud. 
  • Model evaluation. By nature, generative AI models create new content. Predictive models use very clear metrics, like accuracy or AUC. Generative AI requires more subjective and complex evaluation metrics that are harder to implement. 

Systematically evaluating these models, rather than having a human evaluate the output, means determining what are the fair metrics to use on all of these models, and that’s a harder task compared to evaluating predictive models. Getting started with generative AI models could be easy, but getting them to generate meaningfully good outputs will be harder. 

  • Ethical AI. Companies need to make sure generative AI outputs are mature, responsible, and not harmful to society or their organizations. 

What are Some of the Primary Differentiators and Challenges with Generative AI? 

  • Getting started with the right problems. Organizations that go after the wrong problem will struggle to get to value quickly. Focusing on productivity instead of cost benefits, for example, is a much more successful endeavor. Moving too slowly is also an issue. 
  • The last mile of generative AI use cases is different from predictive AI. With predictive AI, we spend a lot of time on the consumption mechanism, such as dashboards and stakeholder feedback loops. Because the outputs of generative AI are in a form of human language, it’s going to be faster getting to these value propositions. The interactivity of human language may make it easier to move along faster. 
  • The data will be different. The nature of data-related challenges will be different. Generative AI models are better at working with messy and multimodal data, so we may spend a little less time preparing and transforming our data. 

What Will Be the Biggest Change for Data Scientists with Generative AI? 

  • Change in skillset. We need to understand how these generative AI models work. How do they generate output? What are their shortcomings? What are the prompting strategies we might use? It’s a new paradigm that we all need to learn more about. 
  • Increased computational requirements. If you want to host these models yourself, you will need to work with more complex hardware, which may be another skill requirement for the team. 
  • Model output evaluation. We’ll want to experiment with different types of models using different strategies and learn which combinations work best. This means trying different prompting or data chunking strategies and model embeddings. We will want to run different kinds of experiments and evaluate them efficiently and systematically. Which combination gets us to the best result? 
  • Monitoring. Because these models can raise ethical and legal concerns, they will need closer monitoring. There must be systems in place to monitor them more rigorously. 
  • New user experience. Maybe we will want to have humans in the loop and think of what new user experiences we want to incorporate into the modeling workflow. Who will be the main personas involved in building generative AI solutions? How does this contrast with predictive AI? 

When it comes to the differences organizations will face, the people won’t change too much with generative AI. We still need people who understand the nuances of models and can research new technologies. Machine learning engineers, data engineers, domain experts, AI ethics experts will all still be necessary to the success of generative AI. To learn more about what you can expect from generative AI, which use cases to start with, and what our other predictions are, watch our webinar, Value-Driven AI: Applying Lessons Learned from Predictive AI to Generative AI

Webinar
Value-Driven AI: Applying Lessons Learned from Predictive AI to Generative
Watch on-demand

The post Value-Driven AI: Applying Lessons Learned from Predictive AI to Generative AI appeared first on DataRobot AI Platform.

]]>
Harnessing Synthetic Data for Model Training https://www.datarobot.com/blog/harnessing-synthetic-data-for-model-training/ Thu, 28 Sep 2023 13:15:30 +0000 https://www.datarobot.com/?post_type=blog&p=50778 Synthetic data can replace or augment existing data and be used for training ML models, mitigating bias, and protecting sensitive or regulated data. Learn more.

The post Harnessing Synthetic Data for Model Training appeared first on DataRobot AI Platform.

]]>
It is no secret to anyone that high-performing ML models have to be supplied with large volumes of quality training data. Without having the data, there’s hardly a way an organization can leverage AI and self-reflect to become more efficient and make better-informed decisions. The process of becoming a data-driven (and especially AI-driven) company is known to be not easy. 

28% of companies that adopt AI cite lack of access to data as a reason behind failed deployments. – KDNuggets

Additionally, there are issues with errors and biases within existing data. They are somewhat easier to mitigate by various processing techniques, but this still affects the availability of trustworthy training data. It’s a serious problem, but the lack of training data is a much harder problem, and solving it might involve many initiatives depending on the maturity level.

Besides data availability and biases there’s another aspect that is very important to mention: data privacy. Both companies and individuals are consistently choosing to prevent data they own to be used for model training by third parties. The lack of transparency and legislation around this topic is well known and had already become a catalyst of lawmaking across the globe.

However, in the broad landscape of data-oriented technologies, there’s one that aims to solve the above-mentioned problems from a little unexpected angle. This technology is synthetic data. Synthetic data is produced by simulations with various models and scenarios or sampling techniques of existing data sources to create new data that is not sourced from the real world.

Synthetic data can replace or augment existing data and be used for training ML models, mitigating bias, and protecting sensitive or regulated data. It is cheap and can be produced on demand in large quantities according to specified statistics.

Synthetic datasets keep the statistical properties of the original data used as a source: techniques that generate the data obtain a joint distribution that also can be customized if necessary. As a result, synthetic datasets are similar to their real sources but don’t contain any sensitive information. This is especially useful in highly regulated industries such as banking and healthcare, where it can take months for an employee to get access to sensitive data because of strict internal procedures. Using synthetic data in this environment for testing, training AI models, detecting fraud and other purposes simplifies the workflow and reduces the time required for development.

All this also applies to training large language models since they are trained mostly on public data (e.g. OpenAI ChatGPT was trained on Wikipedia, parts of web index, and other public datasets), but we think that it is synthetic data is a real differentiator going further since there’s a limit of available public data for training models (both physical and legal) and human created data is expensive, especially if it requires experts. 

Producing Synthetic Data

There are various methods of producing synthetic data. They can be subdivided into roughly 3 major categories, each with its advantages and disadvantages:

  • Stochastic process modeling. Stochastic models are relatively simple to build and don’t require a lot of computing resources, but since modeling is focused on statistical distribution, the row-level data has no sensitive information. The simplest example of stochastic process modeling can be generating a column of numbers based on some statistical parameters such as minimum, maximum, and average values and assuming the output data follows some known distribution (e.g. random or Gaussian).
  • Rule-based data generation. Rule-based systems improve statistical modeling by including data that is generated according to rules defined by humans. Rules can be of various complexity, but high-quality data requires complex rules and tuning by human experts which limits the scalability of the method.
  • Deep learning generative models. By applying deep learning generative models, it is possible to train a model with real data and use that model to generate synthetic data. Deep learning models are able to capture more complex relationships and joint distributions of datasets, but at a higher complexity and compute costs. 

Also, it is worth mentioning that current LLMs can also be used to generate synthetic data. It does not require extensive setup and can be very useful on a smaller scale (or when done just on a user request) as it can provide both structured and unstructured data, but on a larger scale it might be more expensive than specialized methods. Let’s not forget that state-of-the-art models are prone to hallucinations so statistical properties of synthetic data that comes from LLM should be checked before using it in scenarios where distribution matters.

An interesting example that can serve as an illustration of how the use of synthetic data requires a change in approach to ML model training is an approach to model validation.

Illustration of how the use of synthetic data
Model validation with synthetic data

In traditional data modeling, we have a dataset (D) that is a set of observations drawn from some unknown real-world process (P) that we want to model. We divide that dataset into a training subset (T), a validation subset (V) and a holdout (H) and use it to train a model and estimate its accuracy. 

To do synthetic data modeling, we synthesize a distribution P’ from our initial dataset and sample it to get the synthetic dataset (D’). We subdivide the synthetic dataset into a training subset (T’), a validation subset (V’), and a holdout (H’) like we subdivided the real dataset. We want distribution P’ to be as practically close to P as possible since we want the accuracy of a model trained on synthetic data to be as close to the accuracy of a model trained on real data (of course, all synthetic data guarantees should be held). 

When possible, synthetic data modeling should also use the validation (V) and holdout (H) data from the original source data (D) for model evaluation to ensure that the model trained on synthetic data (T’) performs well on real-world data.

So, a good synthetic data solution should allow us to model P(X, Y) as accurately as possible while keeping all privacy guarantees held.

Although the wider use of synthetic data for model training requires changing and improving existing approaches, in our opinion, it is a promising technology to address current problems with data ownership and privacy. Its proper use will lead to more accurate models that will improve and automate the decision making process significantly reducing the risks associated with the use of private data.

Free trial
Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Harnessing Synthetic Data for Model Training appeared first on DataRobot AI Platform.

]]>
Your AI Infrastructure: Getting It Right https://www.datarobot.com/blog/your-ai-infrastructure-getting-it-right/ Thu, 13 Jul 2023 14:10:37 +0000 https://www.datarobot.com/?post_type=blog&p=48232 Unlocking AI Success: Building Effective Infrastructure for Rapid Experimentation, Reliable Production, and Adaptability. Discover the key elements, common mistakes, and metrics for high-functioning AI infrastructure. Read our blog post for expert insights and tips.

The post Your AI Infrastructure: Getting It Right appeared first on DataRobot AI Platform.

]]>
Take a step back and look at your AI infrastructure. Can you say confidently that you are set up for AI success? And when you hear about generative AI, is your organization and your infrastructure ready to weather the winds of change. 

In our on-demand webinar, Building Effective AI Infrastructure, three of our technical experts lead a discussion to answer your most pressing questions about your infrastructure. What makes an AI infrastructure successful? What common mistakes do organizations make when building their infrastructure? What metrics should you use to measure success? 

AI Infrastructure Means Including All the Things  

AI infrastructure is not just about one solution, and you can’t simply set up a network and be done with it. Rather, it should include all the systems and processes that cover the entire end-to-end lifecycle of AI projects. This means having the ability to experiment with new use cases, prepare datasets and features, and train models and deploy them into production, as well as monitoring the performance and accuracy of models. With these moving parts in place, you will lay the foundation for success. 

How Do You Build Effective Infrastructure? 

Building effective infrastructure is a balancing act consisting of three main elements: rapid experimentation, reliable productionization, and adaptability in an evolving ecosystem. 

Experimentation

When it comes to rapid experimentation of models, time is the key element. You want to be able to move quickly, and you want your growth to be organic. You also want to make data access easy for the key people on your team. Once you understand the business impact you’re looking for, you can work out your data access policy. 

To avoid slowing down production and making costly mistakes, it’s very important to separate experimentation from production. This allows you to iterate much faster without interrupting production operations. You should also ask several central questions: Is this a valid use case? Has every step been documented? Is it ready for production? 

Keep in mind that some tools are better than others and can save time and money. Look for repeatability in experimentation to ensure the integrity of your model development process. 

Production

Machine learning in production will make the assumption that the data being used for inference is similar to the data it was trained on. You should expect that this assumption will be violated, either because of a change in the data, external conditions, or because upstream software systems have changed. You can protect your production pipeline with monitoring capabilities such as data drift, model drift, and accuracy. 

Collaboration across your organization is also essential to realizing value at production scale, so you should invest in tools and technologies that help facilitate that cross-functional collaboration. Rather than data scientists just throwing a bunch of code over the fence to ML engineers, make sure everyone understands the business goal you’re trying to achieve. Then when things change—as they inevitably do—you can rely on this collaboration to carry your AI project forward and move your use case into production much more quickly. 

Adaptability

Things change. The world changes, data goes out of date quickly, and models start to drift. When this happens, you’ll need to adapt quickly. One way to do that is not to wait for perfection during the experimentation stage. Too many teams wait until they get a model to perfection before putting it into production, but this process can lock them up for a year or longer. If it’s taking you a year to get your models to production, that’s too long. If you focus on getting “good enough” models in less than three months, you’ll be a much more nimble operation. 

Focus on the use case. Think through the ROI you want to achieve, which will help you determine where to make more targeted investments. Also, by focusing on small use cases and iterating on them quickly, you can build your infrastructure so that your experimentation-to-production process is repeatable. 

Every time you introduce a new technology, you should do a post-mortem and ask, what slowed us down? This will help you assess your infrastructure and unlock greater efficiencies. 

Want to Learn More?

Listen to our on-demand webinar to find out more tips and tricks from our data science experts about building the most effective AI infrastructure. 

On-demand webinar
Building Effective AI Infrastructure
Watch now

The post Your AI Infrastructure: Getting It Right appeared first on DataRobot AI Platform.

]]>
How AI Helps Address Customer and Employee Churn https://www.datarobot.com/blog/how-ai-helps-address-customer-and-employee-churn/ Thu, 08 Jun 2023 13:28:10 +0000 https://www.datarobot.com/?post_type=blog&p=47296 Learn how AI can help businesses reduce customer and employee churn with granular insights and targeted intervention tactics. Explore DataRobot AI Platform.

The post How AI Helps Address Customer and Employee Churn appeared first on DataRobot AI Platform.

]]>
Even though churn is recognized as one of the most persistent business problems, most organizations haven’t yet developed mitigation approaches or tried AI-driven solutions. In today’s data-driven world, traditional approaches to churn mitigation don’t work: consumer and employee behavioral patterns change too rapidly and apply to smaller and smaller cohorts. AI can help businesses deliver granular consumer and employee insights and drive highly targeted churn intervention tactics. 

Successful churn prevention methods can have a significant impact on the bottom line, as well as the cost of doing business. The cost of acquiring new customers can be up to 6-7x higher than retaining existing ones.1 As for employees, with the Great Resignation seemingly continuing, employee retention is still an important operational imperative, as costs to replace seasoned and well-adjusted employees might be too high for some businesses.2   

In this environment, it’s important that businesses address churn in two ways. First, gain both a true understanding of the churn rate and its causal patterns. Second, implement AI to discover both the insights and techniques that will help create a solution to lower customer and employee churn.

How AI Can Deliver Granular Churn Insights

Churn analysis typically involves using a set of statistical approaches to identify customers or employees likely to churn and applying appropriate interventions to mitigate this risk. However, because interventions are traditionally applied at a high-level to entire groups, they are often not specific enough for individuals in those groups to be effective. 

These interventions can also be expensive (or simply inappropriate) when delivered in large quantities. For example, blanket discount offers wouldn’t always work for customers about to cancel their subscription. Some of them might be interested in more specific offers, like bundles or additional features or maybe even specific content.

Voluntary employee turnover alone costs the U.S. economy a trillion dollars a year.3
Gallup

This lack of detail and visibility is why many organizations are turning to AI, as it helps organizations move away from general approaches and create granular intervention tactics, appropriate for smaller groups or even individuals. Machine learning and AI enable organizations to work through incredibly large datasets at high speed, delivering deep analysis of data, in all of its various forms, to find the factors that predict churn and highlight people at risk

A good churn prevention solution isn’t just built on predictive models, though. You also need to have clear prevention plans for when an individual is determined to be at-risk – and it’s incredibly important to get feedback from business stakeholders on the features and patterns your model can act on, and the mitigations it can realistically offer. For example, if commute time is identified as a risk factor for employees, can you offer remote working to any employee or only those in specific locations?

Improve Churn Mitigation with the DataRobot AI Platform

Сhurn prevention is a  popular use case among DataRobot customers across industries. For example, D&G, one of the leading insurance providers in the UK, uses DataRobot for pricing optimization to determine the price point where customers are most likely to be happy with the warranty coverage they receive and renew their policies. There are many other churn-focused use cases, like media subscription renewal forecasting or clinical trial churn predictions.

Whether you choose expert advice around specific churn use cases or develop your own models from scratch, you benefit from the DataRobot platform:

Enterprises address churn with the DataRobot AI Platform and see multiple benefits. 

  • Achieve higher machine learning model accuracy. The only way to judge the performance of a predictive model is to assess the cumulative lift – the improvement in the precision of your interventions. To do this, you need to a) have established a clear baseline, and b) be able to clearly understand the improvement you’re seeing. And while it sounds obvious, not all tools make this easy. With DataRobot, you have access to out-of-the-box evaluation techniques on each model, like  Lift Chart, and ROC curve graphs, which enable you to validate the model’s effectiveness and how it is performed.
  • Improve engagement from business stakeholders. Involving business stakeholders or domain experts is critical to developing a resilient and reliable churn prevention solution. DataRobot AI Platform offers a  highly intuitive, graphic-led way to engage the teams that will make your churn prevention strategy a success. 
  • Understand the impact of your data with feature impact graphs, which rank all the churn features appearing in the model, and make it easy for you and your experts to identify if they are valid, or if they are artificially influencing the predictive capability of the model. Tweaking this enables greater accuracy.
  • Achieve granularity of insights with prediction explanations, which show you the reasons why the model has suggested someone is at-risk, enabling you to compare it to the information you have from outside the model. For instance, if an employee’s job role has a high prediction rating, does HR already know of issues within that team?

Start Developing Churn Predictions with AI

Although churn is an inevitable part of running a business, DataRobot helps organizations create strategies that can quickly and effectively transform churn mitigation.

DataRobot provides you with the tools necessary to create a deeper understanding of churn factors that can lead to a robust plan for combating it. You’ll be able to validate predictive models before you deploy them, and use DataRobot features to keep stakeholders in the loop.

Learn more about how DataRobot is helping organizations.

Ebook
Mitigating Churn with Al

A Guide to Better Customer and Employee Retention

Download Now

 1 American Express, Retaining Customers vs. Acquiring Customers

 2 Computer World, The Great Resignation isn’t over yet

 3 Gallup, This Fixable Problem Costs U.S. Businesses $1 Trillion

The post How AI Helps Address Customer and Employee Churn appeared first on DataRobot AI Platform.

]]>
Getting Value Out of Generative AI https://www.datarobot.com/blog/getting-value-out-of-generative-ai/ Wed, 10 May 2023 13:00:00 +0000 https://www.datarobot.com/?post_type=blog&p=46856 Many companies are experiencing mounting pressure to have a generative AI strategy, but most are not equipped to meaningfully put generative AI to work. For AI leaders, there are deeper questions you need to ask as you consider your path with generative AI.

The post Getting Value Out of Generative AI appeared first on DataRobot AI Platform.

]]>
This article was originally published on LinkedIn.

I’ve been fortunate to witness up close some of the most groundbreaking innovations in the AI space over the past decade and with some of the brightest pioneers in this field, including Google, AWS, IBM Research and — of course — DataRobot. Having served as DataRobot’s CEO for nearly a year, I have never been more excited about the opportunities ahead to work with customers to create market-moving business impact with generative AI

DataRobot was built in 2012 to help businesses apply the most advanced AI solution at the time – machine learning – to derive business value. We’ve been helping businesses turn moments of AI disruption into value ever since. Over the past ten years we have helped thousands of customers, including many in the Fortune 50, to make AI a core expertise for their businesses. We intend to do the same with generative AI

Putting Generative AI to Work

When it comes to generative AI, the possibilities are practically limitless. Conversational customer service experiences will more efficiently connect businesses with their customers. Generative AI-powered recommendations for underlying causes will help medical professionals diagnose diseases faster. Quicker, more efficient analysis of customer data and financial information will enable insurers to support risk assessment as regulatory requirements shift. 

Many companies are experiencing mounting pressure to have a generative AI strategy, but most are not equipped to meaningfully put generative AI to work. For AI leaders, there are deeper questions you need to ask as you consider your path with generative AI:

  • AI Readiness – Where are you on the AI maturity curve? Do you have internal (generative) AI expertise? 
  • Use Cases – What are your internal and customer facing GenAI use cases? Do you have a framework to prioritize them based on ROI, feasibility, time frame etc? 
  • Technology Choices – Are you going to use hosted pre-trained models, open source models, or build your own?  How will you fine tune and customize models for specific use cases?
  • Production Risks – What reputational and compliance risks you may be exposed to? How will you monitor the models for hallucination, topic drift etc? How will you ensure that model output is explainable and auditable? 
  • Change Management – How will you handle changes to your end-user applications and business processes? How will GenAI impact your workforce – size, skills, culture? 

Considering these elements will help you to develop a generative AI strategy that you can quickly put to work to start uncovering value. The key is to get started now, as there will be material costs to doing nothing. While AI alone won’t replace people or companies, AI-savvy people and companies will.

Let’s Talk

We’ve started rolling out DataRobot’s capabilities and I personally am on the road speaking with customers about how they need generative AI to work to be value driven. I’m so excited to work with our customers to realize what’s possible.

Getting Value Out of Generative AI

Now is a critical moment to determine how you can put generative AI to work to create tangible business value

Learn More

The post Getting Value Out of Generative AI appeared first on DataRobot AI Platform.

]]>
How Much Data Is Enough for AI? https://www.datarobot.com/blog/how-much-data-is-enough-for-ai/ Thu, 04 May 2023 13:26:00 +0000 https://www.datarobot.com/?post_type=blog&p=46792 Discover the challenges and benefits of big data in AI, downsampling, and smart sampling techniques to reduce data size without losing accuracy.

The post How Much Data Is Enough for AI? appeared first on DataRobot AI Platform.

]]>
The recent advancements in large language models (LLMs) are fascinating. We’ve already seen LLMs taking exams from law1, medical2 and business schools3, as well as writing scientific articles, blog posts, finding bugs in code, and composing poems. For some, especially for those who do not closely watch the field, it might seem like magic. However, what’s really magical is how these technologies are able to excite and inspire people with wonder and curiosity about the future.

One of the most important contributions to these advances is the availability of large amounts of data together with technologies for storing and processing it. This enables companies to leverage these technologies to perform complex analytics, use the data for model training, and deliver a real competitive advantage over the startups and companies entering their field. 

However, even with the clear benefits, big data technologies bring challenges that aren’t always obvious when you set off on your journey to extract value from the data using AI.

Data storage is comparatively cheap these days. Competition between cloud providers has driven the cost of data storage down while making data more accessible to distributed computing systems. But the technologies to store the ever increasing amounts of data have not reduced the workload required to maintain and improve data quality. According to research, 96% of companies encounter data quality problems, 33% of failed projects stall due to problems in the data, while only 4% of companies haven’t experienced any training data problems, and the situation is unlikely to change much in the near future.4

Scaling Down Datasets to Scale Down the Problem

In real-world applications, full datasets are rarely used in their entirety. In some cases, the amount of processed data for an application is smaller than the total data size, because it’s only the very recent data that matters; in others, data needs to be aggregated before processing and raw data is not needed anymore.

When DataRobot was helping HHS with COVID-19 vaccine trials during the pandemic by providing forecasts of socioeconomic parameters, we collected more than 200 datasets with a total volume of over 10TB, but daily predictions required just a fraction of this. Instead, smaller dataset sizes allowed us to use faster data analysis, where turnaround time was critical for decision-making. This let us avoid distributed systems that would have been costly to use and require more resources to maintain.

The visualization of the COVID project's data warehouse -  DataRobot AI Platform
The visualization of the COVID project’s data warehouse. A vertex is a dataset, an edge is a column. The relative size of a vertex corresponds to a dataset size. Color corresponds to a column data type. The average working dataset size is approx ~10 MB. Simpler tools enabled us to start collecting and maintaining data faster.

The visualization of the COVID project’s data warehouse. A vertex is a dataset, an edge is a column. The relative size of a vertex corresponds to a dataset size. Color corresponds to a column data type. The average working dataset size is approx ~10 MB. Simpler tools enabled us to start collecting and maintaining data faster.

Downsampling is also an effective technique that helps to reduce data size without losing accuracy in many cases, especially for complex analysis that cannot easily be pushed down to the data source. Sometimes (especially when a dataset is not materialized) it is just wasteful to run detection on an entire column of data, and it makes sense to intelligently sample a column and run a detection algorithm on a sample. Part of our goal at DataRobot is to enable best practices that not only get the best results but also do it in the most efficient way. Not all sampling is the same, however. DataRobot allows you to do smart sampling, which helps to automatically retain rare samples and enable the most representative sample possible. With smart sampling, DataRobot intentionally changes the proportion of different classes. This is done to balance classes, like in case of classification problems, or to remove frequently repeated values, like in the case of zero-inflated regression. 

We should not forget about the progress of hardware in recent years. A single machine today is able to process more data faster thanks to improvements in RAM, CPUs, and SSDs that reduce the need for distributed systems for data processing. It leads to lower complexity and lower maintenance costs and a simpler and more accessible software stack that allows us to iterate and get value faster. Our COVID-19 decision intelligence platform was built without using established big data approaches, despite having sufficiently large data sizes, and not using them allowed our data scientists to use familiar tools and get results quicker.

Additionally, collecting and storing customer data may be seen as a liability from a legal perspective. There are regulations in place (GDPR et al), and risks of breaches and data leaks. Some companies even choose not to store raw data, but use techniques such as differential privacy5 and store aggregated data only. In this case, there’s a guarantee that individual contributions to aggregates are protected and further processing downstream does not significantly affect accuracy. We at DataRobot use this approach in cases when we need to aggregate potentially sensitive data before ingestion on the customer side and also for anonymizing search index while building a recommendation system internally. 

Size Doesn’t Always Matter

While having large datasets and a mature infrastructure to process and leverage them can be a major benefit, it’s not always required to unlock value with AI. In fact, large datasets can slow down the AI lifecycle and are not required if proven ML techniques, in combination with the right hardware, are applied in the process. In this context, it’s important that organizations understand the qualitative parameters of the data that they possess, since a modern AI stack can handle the lack of quantity but is never going to be equipped to handle the lack of quality in that data. 

Demo
Get a Personalized Demonstration of the DataRobot AI Platform
Request a demo

1 Illinois Institute of Technology, GPT-4 Passes the Bar Exam

2 MedPage Today, AI Passes U.S. Medical Licensing Exam

3 CNN, ChatGPT Passes Exams from Law and Business Schools 

4 Dimensional Research, Artificial Intelligence and Machine Learning Projects Are Obstructed by Data Issues

5 Wikipedia, Differential Privacy

The post How Much Data Is Enough for AI? appeared first on DataRobot AI Platform.

]]>
Realize the Value of AI at Production Scale with DataRobot 9.0 https://www.datarobot.com/blog/realize-the-value-of-ai-at-production-scale-with-datarobot-9-0/ Thu, 20 Apr 2023 12:46:31 +0000 https://www.datarobot.com/?post_type=blog&p=46574 DataRobot 9.0 helps organizations scale the use of AI to create value enterprise-wide. Discover how it simplifies ML production, automates deployment, and manages model drift to maintain business value.

The post Realize the Value of AI at Production Scale with DataRobot 9.0 appeared first on DataRobot AI Platform.

]]>
Organizations want to scale the use of AI to create value enterprise-wide. This might mean deploying hundreds of ML models to support use cases across the whole company. 

At the same time, the world is constantly changing, which impacts the performance and accuracy of your business critical deployments. Now imagine the impact of these changes on those hundreds of deployed models.

Maintaining multiple models in production and knowing which ones need attention to retain their accuracy, impact, and value in the long term is no easy task. This is why Production is such a vital—and challenging—part of the ML lifecycle.

The DataRobot AI Platform can help. It’s a complete AI lifecycle platform that is collaborative and easy to implement, gets you to value faster, and helps you to easily maintain that value over time.

DataRobot Production capabilities provide a single system record for all your AI artifacts, helping you to manage all production models, no matter who built them, how they were built, or where they are hosted. The Platform unifies your fractured infrastructure, helping you think clearly about your entire model inventory.

With DataRobot 9.0, we are doing even more, by helping you:

  • Clearly calculate and track ML impact and easily communicate ROI.
  • Make deployment automation easier with our new GitHub Actions for CI/CD.
  • Quickly identify and deal with data drift to maintain the value of business critical deployments.

Let’s explore how each of these can help you realize the value of AI at production scale.

Track ML Impact and Value with Customized Metrics for Your Organization

For a long time, the DataRobot AI Platform provided metrics like accuracy tracking. With custom metrics, we have extended this to value-based tracking.

Most organizations struggle to quantify and track the value of their AI projects. Our new custom inference metrics feature, unique to DataRobot, shifts the focus from high-level summary statistics to what matters most for your business.

With DataRobot 9.0, you can embed your own analytics, and from a single place, track traditional metrics like accuracy, as well as all your custom KPIs tied to a model. This gives you a constant, multidimensional view of that model’s impact on your business. 

As soon as any KPI falls below an acceptable threshold, an automated notification will be sent, and you can take appropriate action, such as retraining or replacing a model with a better performing challenger. You’ll continue to drive top- and bottom-line impact and improve the value of DataRobot investments across your organization.

DataRobot deployment metrics

Make Deployment Automation Easier with GitHub Actions for CI/CD

Deploying models and calculating their value is one hurdle, but you also need to maintain that value in production. Our new GitHub Marketplace Action for CI/CD makes sure that you continuously sustain the value you initially created.

Whenever you update your models, Production can automatically trigger, build, test, and deploy a new model iteration into DataRobot, straight from your favorite command line, IDE, or Git tool. This means you can make deployment a completely self-service layer inside your business, and automatically update models fast, without sacrificing control, governance, or observability.

For example, imagine you were tracking business KPIs like cost of error and regulatory fines via custom metrics. If those metrics started to trend in the wrong direction, you could easily replace your model using the GitHub actions CI/CD workflow. This would save you time, ensure lineage and governance of your deployments, and help maintain the business value you expect from your models.

GitHub Actions for CI/CD - DataRobot AI Platform

Produce Better Models with Expanded and Robust Drift Management Capabilities

DataRobot always offered deep drift management capabilities for ML deployments – no matter where or how they were built, or where they are deployed. This helps data scientists visualize, analyze, and share feedback about model drift. Through understanding how models are drifting – and being alerted in a timely manner when a model should be retrained – your organization can better respond to changes in market conditions.

In DataRobot 9.0, we’re taking things further with an expanded suite of drift management capabilities. 

New visualizations help you quickly investigate the context and severity of drift. With a few clicks, you can view drift across multiple features (including text features), compare time periods, and more. The speed and depth at which you can analyze drift means you can take appropriate action before your business is impacted.

A new drift over time feature helps you estimate the severity of drift for each time bucket, while being mindful of any shifts in prediction volumes that should be taken into account. The new Drill Down tab provides a heat map for data drift across your features for each timestep, so you can detect correlated changes across multiple features in one view.

image 2

These new features meet the needs of customers who require deeper yet quick analysis and problem solving, so they can manage AI in a rapidly changing world and volatile economy that causes models to drift. They help you identify the scale and depth of investigations to produce better models for retraining, which means you get model check-ins done quickly and can get back to building models.

Accelerate Your Path to Value with DataRobot 9.0

Building a process for a few models is relatively easy. Running a fleet of models in production is a very different prospect. That’s why DataRobot 9.0 continues to make it simpler, seamless and easier for you and your teams to operate and deliver on the value of AI at production scale.

To find out more and see these new DataRobot 9.0 features in action, watch our Generate and Maintain Value of AI at Scale session.

Video
Generate and Maintain Value of AI at Scale
Watch now

The post Realize the Value of AI at Production Scale with DataRobot 9.0 appeared first on DataRobot AI Platform.

]]>
Accelerate Machine Learning Experimentation Through Seamless Collaboration https://www.datarobot.com/blog/accelerate-machine-learning-experimentation-through-seamless-collaboration/ Thu, 06 Apr 2023 12:00:00 +0000 https://www.datarobot.com/?post_type=blog&p=46369 Find how how your organization can now deliver value-driven AI insights quicker and easier than ever with the latest features from DataRobot.

The post Accelerate Machine Learning Experimentation Through Seamless Collaboration appeared first on DataRobot AI Platform.

]]>
To get the most value from artificial intelligence (AI) and machine learning (ML) in your organization, the work needs to move beyond data science teams – everyone needs to be able to participate fully, including business and engineering stakeholders. 

Code-first and intuitive no-code/low code experiences are important in enabling everyone within an organization to collaborate more effectively, regardless of the complexity of the use case. When teams have a way to work together, it amplifies productivity and makes it easier to iterate and experiment.

We’re creating more collaborative experimentation experiences with the latest DataRobot release.

Create Frictionless Collaboration for Experimentation

With the revitalized DataRobot Workbench experience, you can solve one of the biggest collaboration challenges – organizing ML projects across teams. Now, it’s quick and easy to bring together the resources and ML assets related to a particular business problem in a single location that everyone can access.

Accessed via the new Workbench, Use Case folders organize all the projects and assets related to a business challenge (for example customer retention, demand forecasting, or predicting manufacturing defects) in one place. Team members no longer need to zip and exchange files and assets by email, which means it’s quicker, easier, and less risky to share data. It’s easier to compare different models from different projects, too.

With all the assets relating to a business use case in one place, your data scientists will be able to move faster between data preparation and modeling, and deliver quicker iterations. 

Use Cases also enable more opportunities for collaboration – people invited to join the project can instantly access everything they need to familiarize themselves with the project and start contributing more quickly. Additionally, if people leave the team or the organization entirely, the Use Cases functionality allows to maintain the project’s continuity because all of their previous work is stored in the same location, accessible to other existing or new team members.

unnamed 6
DataRobot Workbench

Integrate Data Prep with the Rest of Your ML Flow

Data preparation is arguably one of the most tedious yet essential steps in an ML project. It’s usually done outside of your core AI tool, and can require complex and time-consuming integrations to get the prepared data into production. With the new DataRobot Data Preparation tool, you can now analyze and transform structured data directly from popular cloud data warehouses and lakes, starting with Snowflake (one of the most popular data sources for DataRobot customers) today in a natural, seamless workflow.

With Data Prep, your data teams can browse and preview any registered dataset before deciding to add it for wrangling. With support for Snowflake push-down, you can maintain data governance, security, compliance, and leverage the scale of your cloud data warehouse. 

Frictionless integration with Data Prep in Use Cases ensures your data is seamlessly delivered into your ML workflows. It means faster iterations and continuous experimentation loops – increasing productivity to help you gain even greater value from your cloud data warehouse investment.

unnamed 7
DataRobot Data Preparation

Reduce the Time and Effort of Managing Notebooks

Notebooks are a core part of any AI solution for code-first data scientists, giving you the freedom to use your preferred libraries to build custom ML models in a code-first workflow. 

DataRobot Notebooks make it easy to organize, share and manage notebooks as part of your workflow. There’s no more need to keep track of notebooks on your desktop, look for convenient ways to share them, or worry about losing projects when a colleague leaves the team. These capabilities significantly reduce the time you spend managing infrastructure so you can spend more on data science experimentation. 

With Notebooks you can create a code-first experience, using the DataRobot APIs or any other data science framework, and maximize productivity with advanced features such as frequently used code snippets that remove the need to write boilerplate code every time you start a project, as well as pre-installed dependencies and versioning. The result? A significant increase in speed, security, and efficiency across your ML projects.

Notebooks are stored within Use Cases and you can get started in seconds with pre-built templates. It’s a fully managed solution so there’s no ongoing infrastructure to manage – the AI platform uses adjustable compute resources to help you quickly scale when needed.

unnamed 8
DataRobot Notebooks

The Benefits in Action

Polaris is a DataRobot customer which previously adopted the solution to speed and scale up ML projects and improve productivity. Having tried the new DataRobot experience, the Polaris data team has seen a significant acceleration in its speed of experimentation, and found it easier to consistently iterate, find insights, and collaborate.

“DataRobot really helps us to focus on the things we want to focus on, like feature engineering, and then allows us to commoditize the things that we can commoditize and let DataRobot pick the best model for that dataset. But it’s great that we can both have data scientists that are code-based and data scientists that might be on lower code base, working together and collaborating, as opposed to a low-code data scientist feeling like there’s a hurdle that they have to get over and can’t, say, work on the same projects with the code-first data scientists. Because of that, we end up getting to a much richer solution, because they’re working together, as opposed to working in silos.”
Luke Bunge
Luke Bunge

Data Science Product Manager, Polaris Inc.

Ready to Find Out More?

Artificial intelligence (AI) is no longer simply a vision, it’s a technology that can deliver significant real-world impact with tangible value. With the latest features from DataRobot, you’ll find it quicker and easier than ever to deliver value-driven AI insights across your organization.

You can hear more about the new DataRobot experience in this walkthrough delivered by Jillian Schwiep, Director of Product Management at DataRobot.

Accelerate Experimentation

Collaborate more easily and rapidly speed up experimentation to increase the success of your AI projects

Watch now

The post Accelerate Machine Learning Experimentation Through Seamless Collaboration appeared first on DataRobot AI Platform.

]]>
How MLOps Enables Machine Learning Production at Scale https://www.datarobot.com/blog/how-mlops-enables-machine-learning-production-at-scale/ Thu, 23 Mar 2023 13:00:00 +0000 https://www.datarobot.com/?post_type=blog&p=45590 Through adopting MLOps practices and tools, organizations can drastically change how they approach the entire ML lifecycle and deliver tangible benefits. Read more.

The post How MLOps Enables Machine Learning Production at Scale appeared first on DataRobot AI Platform.

]]>
AI adoption remains top-of-mind for organizations. Although companies are keen to gain competitive advantage by leveraging AI to more rapidly bring innovations to market, they are often unable to see end results as quickly as they’d like.

Difficulties faced when moving models into production include cost and a lack of automation – cited by over 55% of respondents to a recent IDC study.1 The complexity of building expertise, managing multiple tools and platforms across the ML pipeline, and staying on top of an ever-expanding repository of production models are noted as further obstacles.

In a challenging economy, agility, speed, and efficiency are vital. Companies need reliable AI predictions that meet business goals so they can make informed decisions and quickly respond to change. This is why businesses are increasingly investing in machine learning operations (MLOps): IDC predicts by 2024, 60% of enterprises will have operationalized their ML workflows by using MLOps.2

What Is MLOps and How Does It Help?

MLOps combines people, processes, best practices and technologies that automate the deployment, monitoring and management of ML models into production. Through adopting MLOps practices and tools, organizations can drastically change how they approach the entire ML lifecycle and deliver tangible benefits. 

The benefits of adopting MLOps tools and processes include:

  • Faster time to value, and more rapid feature roll-out, through better planning and deployment practices;
  • Better risk mitigation for production models through ongoing monitoring, governance, and refresh for underperforming models;
  • Accelerated delivery through improved collaboration for multi-functional teams usually involved in the ML lifecycle, such as data scientists, data engineers, and IT; 
  • Scalable AI strategies that can support dozens or even hundreds of production models. 

Should You Build or Buy an MLOps Platform?

There are key considerations when looking into MLOps. Understand how your organization works with ML – and where it should head. Identify needs regarding building, deploying, monitoring, and governing your ML models on a holistic basis.

IDC recommends treating models as source code to improve collaboration, model reuse, and tracking. Ask further questions to help your organization plan to improve efficiency and agility when working with ML models. How would it cope with scale and managing additional models? How can you best avoid duplicating effort when managing ML models across departments with different needs, and deliver more value? 

Ebook
Building vs. Buying a Machine Learning Management Platform

Working with a vendor will be beneficial. Use a cost-benefit analysis to explore ROI and risk. Doing nothing or moving too slowly could rapidly and negatively impact your business. By contrast, injecting pace into your ML efforts can future-proof your organization and keep it ahead of the competition.

You’ll find opportunities and cost trade-offs – and clear advantages in purchasing an MLOps solution. These might include:

  • more rapidly generating business returns
  • better leveraging learnings
  • reduced need for specialized personnel
  • elastic inferences for cost management
  • automatic scale across your organization
  • efficient model operations from a central management system

How Is DataRobot MLOps Uniquely Positioned to Take on ML Challenges?

When you work with an established and trusted software provider, it’s important to choose one that will save you time and money, and help you efficiently and effectively deal with the many challenges that come with establishing AI projects or accelerating AI adoption. With DataRobot MLOps, you get a center of excellence for your production AI – a single place to deploy, manage and govern models in production, regardless of how they were created or when and where they were deployed.

This full suite of ML lifecycle capabilities delivers model testing, deployment and serving, performance monitoring and granular model-level insights, approval workflows, and a higher level of confidence for decisions informed by models. Data science teams can then better address challenges associated with the ML lifecycle. 

Although it’s packed with features, DataRobot MLOps is also easy to use. Among its many highlights are:

  • A single pane of glass management console that consolidates reporting, with easily digestible charts, workflow review, and quality metrics;
  • Custom AI project governance policies, giving you complete control over access, review, and approval workflows across your organization;
  • Automating much of the ML development process, including monitoring, production diagnostics, and deployment, to improve the performance of existing models;
  • Running your models anywhere, through DataRobot MLOps being able to deploy your model to a production environment of choice;
  • The industry leading DataRobot AutoML, which builds and tests challenger models – and alerts you and provides insights when one outperforms the champion;
  • A humility feature, which configures rules to enable models that recognize in real-time when they make certain predictions;
  • Detailed and user-defined insights, which let you, for example, compare drift across two scoring segments of a model, for any time period, to gain the context required to efficiently make critical decisions that keep models relevant in a fast-changing world.

MLOps is a necessity to remain competitive in today’s challenging economic environment. DataRobot MLOps helps you more rapidly take advantage of the fantastic opportunities ML brings, and efficiently and effectively manage the lifecycle of production models holistically across your entire enterprise.

free trial
Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial

Start Free Trial

1 Source: “IDC MarketScape: Worldwide Machine Learning Operations Platforms 2022 Vendor Assessment,“ doc #US48325822 , December 2022

2 Source: “IDC FutureScape: Worldwide Artificial Intelligence and Automation 2022 Predictions,” IDC#US48298421, October 2021

The post How MLOps Enables Machine Learning Production at Scale appeared first on DataRobot AI Platform.

]]>
Driving AI Success by Engaging a Cross-Functional Team https://www.datarobot.com/blog/driving-ai-success-by-engaging-a-cross-functional-team/ Wed, 15 Feb 2023 13:15:01 +0000 https://www.datarobot.com/?post_type=blog&p=43224 Enterprises see the most success when AI projects involve cross-functional teams. For true impact, AI projects should involve data scientists, plus line of business owners and IT teams. Read more.

The post Driving AI Success by Engaging a Cross-Functional Team appeared first on DataRobot AI Platform.

]]>
Enterprises see the most success when AI projects involve cross-functional teams. For true impact, AI projects should involve data scientists, plus line of business owners and IT teams.  

By 2025, according to Gartner, chief data officers (CDOs) who establish value stream-based collaboration will significantly outperform their peers in driving cross-functional collaboration and value creation.1 In order to drive this kind of AI success, you need a cross-functional team engaged in the process, invested in outcomes, and feeling a sense of responsibility along the entire lifecycle. 

You can build your AI team with people from across your organization, including: 

  • AI leaders who are responsible for AL/ML strategy and the roadmap within an organization; 
  • AI builders who are responsible for AI strategy implementation and seek to address business problems using machine learning; 
  • Business executives who look to solve business problems and drive revenue or reduce costs with AI; 
  • and IT leaders who are focused on the technology infrastructure of an organization, including the data and analytics infrastructure.  

Quite a few complex use cases, such as price forecasting, might require blending tabular data, images, location data, and unstructured text. When you have messy data coming from all over the place, you need a powerful AI platform in order to move forward and implement your AI.  

In addition, it’s essential that models comply with regulations and treat customers fairly, making it more important than ever to monitor models in production. It is possible to manage the end-to-end AI lifecycle in one solution. The DataRobot AI Platform makes it possible to engage your cross-functional team to deliver successful AI outcomes, no matter how complex your inputs. 

The cost of real estate has been a rollercoaster ride in this challenging macroeconomic climate. In this example, we take a deep dive into how real estate companies can effectively use AI to automate their investment strategies. 

We also look at how collaboration is built into the core of the DataRobot AI platform so that your entire team can collaborate from business use case to model deployment. Let’s take a look at an example use case, which showcases the effective use of AI to automate strategic decisions and explores the collaboration capabilities enabled by the DataRobot AI platform.

Improving Productivity with Increased Collaboration

We start by exploring a dataset from the DataRobot AI catalog. The DataRobot AI catalog fosters collaboration by providing users a system of record for datasets, the ability to publish and share datasets with colleagues, tag datasets, and manage the lineage of the dataset throughout the entire project. In essence, the AI catalog allows you to crowdsource datasets in a way that is highly relevant to your business, using already existing assets to build models that are most useful to your business. 

AI catalog encourages a culture of collaboration and sharing data assets that will benefit your organization, leading to big gains in productivity, sharing new sources, and creating a collaborative environment for enterprise AI. 

You can also manage access control and sharing permissions to these datasets, in case you are dealing with sensitive data that should be accessible only to a limited number of stakeholders. 

Estimating Asset Value Using the DataRobot AI Platform

According to the Federal Housing Finance Agency, the U.S. price index rose by 19.17% year over year in 2021, which was a large increase from the prior year’s 6.92% growth—so large that it was the highest annual growth on record. 

In such a hot market, how can teams leverage AI to ensure that they are assessing the right values in their respective markets? The demo from the session highlights unique and differentiated capabilities that empower all users—from the analysts to the data scientists and even the person at the end of the journey who just needs to access an instant price estimate. 

In our demonstration, we utilized a real estate dataset from Ontario which included past sales records of properties. Our objective was to create a machine learning model that could accurately predict the selling price of a single-family home. 

When considering a property, we take into account several factors such as its location, size (square footage), and the number of bedrooms and bathrooms. Additionally, we also analyze unstructured information such as what amenities come with the property, for example a sauna or light fixtures, and review accompanying photographs. By analyzing all of this information, we aim to gain insights and determine an estimated selling price for a new property.

Estimating Asset Value Using the DataRobot AI Platform

The real estate market changes over time, so it’s important that our model learns from past data and is tested on a time frame from the future. DataRobot helps you automate this backtesting by setting up Out-of-Time Validation that forces your model to learn from records before a certain date and then validate against data that comes after that cut-off point. 

When working with location-oriented data like houses in a neighborhood, a capability that really helps within DataRobot is Automated Geospatial Feature Engineering that converts latitude and longitude into points on the map. These points drive a feature engineering process that clusters nearby homes together and calculates many values such as the average selling price in that location.

Automated Feature Discovery is another differentiator that will have an impact in this use case. It allows us to easily combine data from other sources and summarize it at the unit of analysis of our project. In this example, we have census data at the neighborhood and city level which DataRobot will incorporate into our project at the property level. Also, it will automatically compute moving aggregations, such as the average price by neighborhood for the last week, month, and three months. These data preparation tasks are otherwise time consuming, so having DataRobot’s automation here is a huge time saver.

Automated Feature Discovery - DataRobot

After setting up your project, you can get started. Hit the Start button, and DataRobot will begin exploring vast combinations of feature engineering steps and machine learning models. Automated feature engineering reveals many insights by creating new features from existing ones. This helps with getting more creative with your experimentation.

Start button - DataRobot

As we run the model, we see that taking the 90-day median of the sold price at the city level was a useful predictor. DataRobot does a great job of explaining exactly how it got to this feature. It joins the primary data with the city-level dataset and calculates the moving 90-day median. 

Delivering Explainable and Transparent Models with DataRobot

Explainability is a key differentiator in DataRobot that allows for smoother collaboration on your team. DataRobot also provides several tools for understanding the behavior of the model and gaining insight into why predictions are generated as they are. Feature Lineage, Feature Effects, Prediction Explanations, and SHAP (SHapley Additive exPlanations) allow for a comprehensive examination of the model’s underlying logic and decision-making processes. These tools provide valuable information on the relationships between features and predictions, enabling data scientists to make informed decisions when fine-tuning and improving their models. 

DataRobot provides several tools for understanding the behavior of the model and gaining insight into why predictions are generated as they are

DataRobot provides a leaderboard showing results from different experiments, including a diverse range of algorithms, preprocessing, and feature engineering. The algorithm blueprint, including all steps taken, can be viewed for each item on the leaderboard. This allows data scientists to easily compare approaches and choose the best model for their needs.

In each blueprint, users can make custom modifications via drag and drop or code, to test their own ideas, aided by DataRobot’s safety guardrails. As experiments progress, DataRobot provides insights through its use of location features. It highlights the areas where predictions were accurate and those where the model struggled. This information helps data scientists understand where improvements can be made by identifying mistakes and incorporating additional data.

After training a model, it is important to assess its fairness. DataRobot offers the ability to evaluate bias by conducting a bias and fairness analysis. By incorporating census data, such as language or unemployment information, DataRobot can determine if certain neighborhoods are unfairly treated compared to others. The analysis may uncover attributes that improve accuracy but negatively impact fairness. To address this issue, DataRobot provides the ability to manage bias by placing greater emphasis on underrepresented features, improving fairness and enhancing the trustworthiness of the AI model.

Bias and fairness analysis - DataRobot

DataRobot makes it simple to take your model live. With just one click, your model can be containerized and accessible through an API endpoint. The MLOps command center gives you a birds-eye view of your model, monitoring key metrics like accuracy and data drift. The Accuracy tab specifically shows how the model’s accuracy has changed since deployment, helping you keep track of its performance in the real-world.

Model deployments and accuracy - DataRobot

The Data Drift tab displays a scatter plot of the model’s input features, offering a real-time glimpse into the data the model is using to make predictions, such as the type of flooring, proximity to schools, or the exterior of the home. This illustration demonstrates that the model is encountering home exterior types that were not part of its training data, which can lead to unexpected outcomes and decreased accuracy. To prevent this, alerts like this serve as a reminder to retrain the model, an action that can easily be automated within DataRobot.

After retraining the model, DataRobot will replace the outdated model with the updated version. Additionally, you can add the newly retrained model as a challenger, allowing you to compare the performance of both models across various metrics. The option to designate the new model as the champion is also available, enabling you to evaluate their relative strengths and weaknesses.

Data Drift - DataRobot

Finally, you can generate an application that serves as the front-end for the model, allowing users to input variables and get predictions. These business applications can be shared with anyone, enhancing their ability to make informed real-world decisions.

DataRobot Gives Your Team End-to-End Automation, Accuracy, and Fairness

The DataRobot AI Platform empowers your team with features and capabilities that solve some of the most pressing problems teams face when implementing AI. The platform allows your team to clean up data, make adjustments, run experiments, gain insights, ensure fairness, and deploy the model to end users— optionally without writing a line of code. DataRobot can also connect different types of data, including geographic and time series data. 

With DataRobot Automated Feature Engineering, your team can streamline the process of blending external datasets and save time by consolidating and preparing data for model building. This feature helps simplify the model building process, getting better results faster.

With DataRobot MLOps, you can deploy, monitor, and manage your production model with ease. Teams can also build AI apps without writing code and collaborate within a single system of record, setting up user permissions and governance. This simplifies the AI development process, freeing up data scientists to focus on more strategic tasks.

Leading enterprises worldwide rely on DataRobot to deliver successful AI projects, managed by cross-functional teams including data scientists, IT infrastructure specialists, and business units. Effective teamwork and clear communication are key to ensuring a smooth, seamless, and successful process.

Demo
See a Full End-to-End Demo of the DataRobot AI Platform
See Now

1Gartner, How to Overcome the Top 6 Roadblocks to D&A Leader Success, Jorgen Heizenberg, Carlie Idoine, May 4 2022

The post Driving AI Success by Engaging a Cross-Functional Team appeared first on DataRobot AI Platform.

]]>