The post Classify Content Topics into Appropriate Categories appeared first on DataRobot AI Platform.
]]>For media companies, poor content management has a detrimental impact on user experience. Users have the expectation that a website or app will intuitively lead them to the content they are searching for, and misclassifying content will prohibit them from doing just that. In addition, it has become increasingly difficult for the human eye to spot key trends across the tremendous amounts of content being generated on a daily basis. Both of these challenges currently require heaps of human annotators or analysts, but such solutions are unsustainable in the long run as content is ever increasing.
AI helps you improve your user experience by classifying your content under their appropriate topics. AI can bucket content to up to 99 distinct categories, which can be based on predefined categories or topics your publishers have identified as key segments. Unlike existing solutions, AI is both fast and accurate. It is able to classify content in a fraction of the time and is able to do so with a reported 95% accuracy from one US firm. Text mining also allows you to discover the top trends or topics that exist not only among your content, but also among the conversations you have with users both in-app and on social media. This enables you to have a granular understanding of the content that customers like and dislike, and offers insight on ideas you should add to your content pipeline based on what’s trending.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Classify Content Topics into Appropriate Categories appeared first on DataRobot AI Platform.
]]>The post Score Incoming Job Applicants appeared first on DataRobot AI Platform.
]]>An organization’s personnel are key to its success, but the right people can be hard to find. SHRM (the Society for Human Resource Management) estimated in 2017 that between applicants, referrals, and agencies, there are 100 applicants to every hire.
Recruiters dealing with high numbers of applicants are forced to process them extremely quickly—the notorious “six seconds” per resume rule—rather than spending time going deeper with the best candidates and using their time to craft a compelling value proposition for the candidate. Automated screenings can also give applicants results more quickly and dramatically speed up the hiring process.
With AI, organizations can identify candidates who have the right background and credentials to be successful in the role. The explainable insights from AI models (e.g., the relative importance of education vs job experience for new entry-level hires) can provide valuable guidance for recruiters and hiring managers. Prediction Explanations (e.g., individual callouts highlighting what makes someone a particularly strong candidate) could also be used to inform a new hire’s onboarding process to proactively address any relative weaknesses.
Finally and most importantly, models are explainable, consistent, and can be documented to ensure compliance with regulatory guidelines and ensure fairness to applicants.
IMPORTANT: Many countries have laws in place to protect employees from discrimination in regards to hiring or employment decisions. Besides that, fairness is the right thing to do. It is incredibly important that you work closely and proactively with your organization’s HR and Legal and Compliance teams to ensure that the models you build will pass legal and ethical scrutiny before models are put into production (View the business implementation tab to learn more about this use case and Trusted AI).
How would I measure ROI for my use case?
Most HR departments track average cost per hire. This metric combines both internal and external factors (recruiter time, agency fees, recruiting bonuses, employee travel, etc.) and represents the total amount of money and time spent to bring a new hire into the organization.
SHRM’s 2016 Human Capital Report benchmarked average cost per hire at $4,179. (This will vary by industry and job role, e.g., entry level roles will be lower.) If a machine learning algorithm can reduce the total candidate pool by 30% at the beginning of the hiring process, that can save recruiters time and dramatically reduce cost per hire. A 10% reduction in total cost per hire by reducing the demands on recruiters’ time would equate to over $400 saved per hire brought into the organization.
A typical target variable for this use case is to predict whether an applicant will pass a recruiter screen, which is a binary classification problem. This prediction is usually a preliminary review done by the recruiter before passing an applicant to the hiring manager for consideration.
However, defining a target can become complex and will need to be adapted to your process as it depends on the data your organization may or may not have. While many organizations ultimately want to predict a hire decision or even on-the-job performance, there may be data limitations based on how many people were actually hired into that role.
The target, i.e., the “end result” you are trying to predict, will define what features are included in the model; if the goal is to predict which new applicants will get passed by a recruiter to a hiring manager, then we cannot use hiring manager feedback in a model because, in practice, that feedback won’t be available yet. If the target is instead a hiring decision, then the model will do best when hiring manager feedback is included. The model should be trained on the available data at the time the decision is made.
These are some of the recommended features needed to build the model; you can add or remove features based on the nature of the data available and the requirements of the model you are trying to build.
These datasets usually come from Greenhouse or a similar ATS (Applicant Tracking System). For jobs in which candidates don’t generally provide a resume, any kind of completed job application can be used provided it’s in machine-readable format (e.g., avoid scanned PDFs).
Feature Name | Data Type | Description | Data Source | Example |
---|---|---|---|---|
Pass_Screen | Binary (Target) | Whether the applicant passes the hiring manager screen for a given role | ATS | True |
Application Source | Categorical | Source of the application | ATS | Employee referral |
Highest degree attained | Categorical | Highest educational credential | ATS | 2-year college degree |
Previous employers | Text | List of previous employers | ATS | Billy Jo’s Pizza |
Educational studies | Text or Categorical | Dropdown or user-entered text describing educational study | ATS | Business Management |
Resume | Text | Raw resume text (if available) | ATS (may need to be converted from PDF) | |
Questions asked on a job page | Numeric or Categorical | “How many years experience do you have working directly with customers?” | ATS | |
Job description | Text | Description of the position being hired for | Job postings |
To prepare the data, applicant data from an ATS is converted to machine readable as needed (e.g., text fields are extracted from a PDF document). Each row of the training data represents an application rather than an applicant, as applicants may apply to different positions or to the same position multiple times. Any external data sources are considered and added in as new features.
For an applicant scoring model to be accurate, it should be specific. Similar roles can be grouped together, but fundamentally different roles should be trained with different models. This is where automation and iteration are helpful. For instance, a model trained on hires within a specific geography might reveal more concrete insights (e.g., a certain university is a good feeding program for new Analysts) than a national model.
We should also be careful to exclude people from our training data who “failed” the recruiter screen but were actually qualified. Recruiters may decide not to interview applicants for a variety of reasons unrelated to their qualifications, including because the candidates themselves expressed that they weren’t interested. This data can usually be found in an Applicant Tracking System the (ATS).
DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.
While we will jump straight to deploying the model, you can take a look here to see how DataRobot works from start to finish and to understand the data science methodologies embedded in its automation.
A few key modeling decisions for this use case:
After you finalize a model, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the methods by which predictions will ultimately be used for decision making.
Automation | Augmentation | Blend
There are many ways to implement a hiring model in practice. Some organizations use a hiring model as a pass/fail screening tool to cut down on the number of applications that recruiters are required to read and review. This has the advantage of giving candidates an answer more quickly.
Other organizations use the score as a way to stack-rank applicants, allowing recruiters to focus on the most promising candidates first. The most sophisticated is a blended approach: set a relatively low pass/fail barrier so that the “automatic no” values are removed from the pipeline up front. From there, provide recruiters the scores and the Prediction Explanations to help them make better decisions faster.
All new applicants to a role should be scored on a batch basis (e.g., one batch request per hour). Predictions and Prediction Explanations should be returned and saved in the database underlying the Applicant Tracking System.
Here are some examples of decisions you can take using the predictions generated from the model.
Models should be retrained when data drift tracking shows significant deviations between the scoring and training data. In addition, if there are significant changes to the role (e.g., a change in the requirements of a position), the model will need to be refreshed. In that case, teams may have to manually rescore historical applicants against the updated requirements before model retraining can occur.
Finally, think carefully about how to evaluate model accuracy. If a model imposes a pass/fail requirement but failing applicants are never evaluated by the recruiters, then we will track False Positives (applicants predicted to pass who did not) but not False Negatives (applicants rejected who would have passed). In a blended scenario (stack-ranking + scores), the model is directly influencing the recruiters’ decision making, which would tend to make the model seem more accurate than it is.
The best way to evaluate accuracy is to have recruiters score a certain number of applicants independently and evaluate the model accuracy based on those cases.
In addition to traditional risk analysis, AI Trust is essential for this use case.
Bias & Fairness: HR decision makers need to be aware of the risks that come with automating decision making within HR. Specifically, models trained on historically biased hiring practices can learn and reflect those same biases. It is incredibly important to make sure that your organization involves the right decision makers and content experts when building models to ensure that they remain fair.
However, there is also opportunity here. Upturn (a think tank focused on justice in technology) published guidelines for ethical AI in hiring. They note both the risks and opportunities of using AI in this space, suggesting that “with more deliberation, transparency, and oversight, some new hiring technologies might be poised to help improve on [our current ethical] baseline.”
The key, they argue, is explainability. Machine learning in this space must be transparent, documented, and explainable. Using the suite of explainability tools in DataRobot, non-data scientist HR teams can understand:
This is particularly important for free text fields, where it is essential to understand and actively control what words and phrases the model is allowed to learn from. Importantly, these models are not “black boxes;” rather, they are fully transparent and controlled by the organization.
In addition to explainability, bias testing should be part of model evaluation. One bias test that may be appropriate is statistical parity. With statistical parity, your goal is to measure if different demographic groups have an equal probability of achieving a favorable outcome. In this case, that would mean testing whether protected groups (e.g., race, gender, ethnicity) pass the recruiting screen at equivalent rates. In US law, the four-fifths rule is generally used to determine if a personnel process demonstrates adverse impact. A selection rate for a group less than four-fifths (80%) of the rate for another comparable group is evidence of adverse impact.
Note: Leaders interested in ethics and bias should also consider attending DataRobot’s course on Ethical AI, which teaches executives how to identify ethical issues within machine learning and develop an ethics policy for AI.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Score Incoming Job Applicants appeared first on DataRobot AI Platform.
]]>The post Predict Policy Churn For New Customers appeared first on DataRobot AI Platform.
]]>Insurers across all policy types and geographies know that, if there’s anything they have in common, they operate in highly competitive markets. Regardless of whether insurers rely on agents, customer acquisition remains costly when it comes to sourcing prospects and turning them into paying members. A competitive market and complex sales environment translates into high costs of customer acquisition where P&C Insurers pay up to 15 percent in commissions on the first year of premiums sold. For Life Insurers, this can go up to more than 100 percent. While these costs are justified when they are able to develop long term relationships with customers, insurers lose out if the member churns within the first year because the cost of acquiring them would exceed their lifetime value. Alongside churn for renewals, churn for first year members is a large expense for any insurer.
AI helps you ensure the long term profitability of incoming members by predicting in advance whether a prospect will churn within the first 12 months of a policy application. This allows your underwriters to thoroughly review the quality of prospects that are being generated by agents. Insurance companies are deploying AI into the field where their underwriters and agents are now able to prioritize the prospects they invest their time on, focusing first on the prospects that have the highest value and lowest risk of churning prematurely. Based on the data available on each prospect, AI also informs them of the top reasons why a prospect is likely to churn or be retained, equipping them with the ability to personalize their approach with every prospect.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Predict Policy Churn For New Customers appeared first on DataRobot AI Platform.
]]>The post Improve Patient Satisfaction Scores appeared first on DataRobot AI Platform.
]]>To operate, Federally Qualified Health Centers (FQHC) rely on funding through programs such as Medicare and Medicaid. They are required to provide medical services on a sliding scale and are often governed by a board that includes patients. Success of an FQHC is measured in part on the satisfaction of patients having received care; this feedback plays a direct role in a center’s ability to receive funding and continue serving the local community.
In addition to understanding how hospital operations have led to poor satisfaction historically, it’s also important to have the ability to flag potential risks in real time. This can provide insight into which patients are potentially not having a great experience and can be used by hospital administration to intervene, talk to the patient, and rectify any issues. Increased patient satisfaction is a positive outcome for the community that the FQHC serves, and ensures that the provider can continue to operate to its fullest.
With AI, hospital administrators can, in real time, understand which patients are likely to leave the hospital with a bad experience. Knowing which patients may be unsatisfied with their care, coupled with the primary reasons why, provides useful information to hospital administrators who can seek out patients, discuss the quality of their stay, and take steps to improve satisfaction. AI informs hospital administrators with the information they need to increase patient satisfaction scores: an outcome that serves the community and ensures that the provider can continue to operate.
There are two primary parts involved in this particular solution:
How would I measure ROI for my use case?
The value of increased patient satisfaction comes in two primary ways. First, patient satisfaction is a key indicator of how well an FQHC is meeting the healthcare needs of the community that it serves. Therefore, taking steps in increasing satisfaction is a positive outcome for the surrounding community. Additionally, a one percent increase in satisfaction scores can substantially impact that funding that an FQHC receives, allowing it to expand services, hire more clinicians and continue providing quality care for those that need it most.
The target variable for this use case should be aligned to the survey output by which the provider is measured. In this case, we’ll consider the Press Ganey patient satisfaction survey that buckets responses in three categories: top, middle, and bottom box, with top box being the best and bottom box being the worst. Therefore, the target will be if the patient’s survey response was in the top, middle, or bottom box. Furthermore, this means that we can frame this problem in two ways:
Additionally, it might be useful to also develop an initial model that predicts the likelihood of a response to a survey request. In this case, the target variable can be whether or not patients responded to surveys historically.
Key features that are important in predicting patient satisfaction are listed below. They encompass information about the patient, their stay, diagnosis, and the interaction with clinicians.
Beyond the feature categories listed above, we suggest incorporating additional data your organization may collect that could be relevant to readmissions. As you will see later, DataRobot is able to quickly differentiate important vs. unimportant features.
Many of these features are generally stored across proprietary data sources available in an EMR system: Patient Data, Diagnosis Data, Clinician notes, Admissions Data. Examples of EMR systems are Epic and Cerner.
Feature Name | Data Type | Description | Data Source | Example |
---|---|---|---|---|
Response | Multiclass (target) | Top, middle or bottom box as 3,2 and 1, respectively | Provided by survey vendor | 3 |
Age | Numeric | Patient age group | Patient Data | 40 |
Clinical Notes | Text | Notes from nurse or doctor | Clinical Data | |
Number of past visits | Numeric | Count of the prior visits within a specific period. Could also include all prior visits. | Patient Data | 10 |
Distance | Numeric | Distance in miles between home and provider location | Patient Data | 20 |
Diagnosis Code | Text (alpha numeric) | Code indicating the diagnosis upon arrival | Clinical Data | E10.9 |
DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.
Take a look here to see how to use DataRobot from start to finish and how to understand the data science methodologies embedded in its automation.
There are a couple of key modeling decisions for this use case:
After you are able to find the right model that best learns patterns in your data to predict patient satisfaction, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make decisions using the predictions to impact the overall process.
Automation | Augmentation | Blend
In practice, the output of these models can be consumed by a team focused on ensuring care satisfaction during a patient’s stay. A daily report or dashboard of patients and their predicted satisfaction scores can provide a guide for senior hospital administration and an understanding, in real time, of which patients are potentially dissatisfied with their care. Furthermore, leveraging Prediction Explanations from DataRobot can be a useful way to understand the primary drivers of the dissatisfaction and allow the decision maker to address those issues directly.
All new applicants should be scored on a daily batch basis. Predictions and Prediction Explanations should be returned and saved in the database, which can be integrated into a BI dashboard or tool available to the hospital administrators. This allows them to continuously monitor which patients are likely to be dissatisfied with their stay.
Decision Executors
Hospital administrators focused on patient experience. They are often under the organization of the Chief Patient Experience Officer of a provider or hospital.
Decision Managers
The Chief Patient Experience Officer is ultimately responsible for executing the strategy to improve patient experience and responsible for the use of the model output.
Decision Authors
Analytics professionals and data scientists with strong understanding of the patient data and processes are best positioned to develop these models as well as develop a meaningful representation of the output in the form of dashboards or reports. Data engineers and IT support is needed to ensure that the stakeholders can receive timely predictions reliably since these will require daily action from hospital administration.
Models should be retrained when data drift tracking shows significant deviations between the scoring and training data.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Improve Patient Satisfaction Scores appeared first on DataRobot AI Platform.
]]>The post Predict Optimal Marketing Attribution appeared first on DataRobot AI Platform.
]]>Consumer touch points are now spread across social media, search engines, print advertisements, email, podcasts, and more. Since consumers are exposed to thousands of messages on a weekly basis, it has become essential for companies to stand out amid all the noise by investing in the touch points that resonate most with their customer base. But greater exposure has also come with a rise in costs-per-clicks. The Harvard Business Review estimates that global spending on media ballooned to $2.1 trillion in 2019. This reflects the difficulties companies face in understanding which touch points matter and which ones don’t. Since customer acquisition is multidimensional, where consumers are led to a brand not due to only one touch point but many, the aggregate analysis companies make today to determine future uplift is ineffective in the face of having to deliver individualized consumer preferences. The array of marketing touch points are too complex for marketers to optimize through a manual analysis.
AI enables your marketers to optimize their marketing attribution by discovering which combination of touch points will lead to the highest amount of conversions. Unlike A/B testing where companies experiment different combinations and collect lagging data, AI saves marketers time by predicting in advance which combination of investments towards their 50+ touch points will generate the highest lift in responses. Using advanced algorithms, AI trains itself by learning the data from your past campaigns to discover underlying patterns that suggest what outcomes you’ll see in the future based on similarities and differences in the combinations of your marketing attribution. AI allows you to develop a unique marketing mix tailored to the purchasing decisions of your customers. You will not only increase your conversions but also do so more efficiently with as few resources as possible. Greater exposure does not always require greater marketing costs, but it does require greater personalization and utilization of the marketing you already deploy.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Predict Optimal Marketing Attribution appeared first on DataRobot AI Platform.
]]>The post Predict Airline Customer Complaints appeared first on DataRobot AI Platform.
]]>The airline industry, like many others, faces tremendous threats from customer dissatisfaction. In the first 9 months of 2014 alone, the Department of Transportation’s Aviation Consumer Protection Division received over 12,000 complaints including: flight delays, overbooking, mishandling of baggage, and poor customer servicing to name a few. Furthermore, given regulation in certain geographies like EU261 which mandates compensation for pre-defined service failures, this can be very costly. Whether it is a polite email notifying the customer about repatriation of lost luggage, a proactive callout to apologize for a recently delayed flight or financial compensation for a cancellation, there are actions an airline can take to delight the customer.
AI can help empower your airline by predicting customer complaints and their severity. Your organization will be able to predict this information by using past complaints data in order to determine when a complaint is likely. Understanding when and why a complaint may occur will allow your organization to underpin a more effective service recovery strategy. The most common use cases include: forecasting volumes of complaints to inform call center staffing, predicting each customer’s propensity to complain, recommending the best service recovery solution; and where financial compensation is involved, recommending the depth of compensation. A lot of benefit can be measurably demonstrated through AB tests by operationalizing this insight and switching to a proactive service recovery approach, as opposed to being reactive to the customers’ calls. The service recovery paradox demonstrates that with the right targeting methodology, customers can actually be more delighted with proactive service recovery over-and-above if they were not service failed in the first place.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Predict Airline Customer Complaints appeared first on DataRobot AI Platform.
]]>The post Predict Suicide Warning Signs appeared first on DataRobot AI Platform.
]]>The CDC reports that suicide is the 10th leading cause of death in the United States. For those who served in the military, suicide has an even more tragic impact where it is the second leading cause of death. 16.8 veterans die due to suicide every day, as shared by the US Department of Veteran Affairs. While suicide may sometimes appear as if they arose out of nowhere, 90% of the people who commit suicide had a diagnosable mental health condition that could have been treated. In particular, veterans who are more at risk of suicide, compared to the regular individual, may show signs of post-traumatic stress disorders from their time in deployment. Unfortunately, while efforts from government and healthcare institutions to create safe spaces for suicidal individuals to be treated have had some impact on reducing suicides, they are limited by their reactive nature. By then, it is frequently too late as their mental health has deteriorated for long periods of time without receiving the care and help they need.
AI can provide a supplementary assessment that helps prevent suicides and save lives by predicting ahead of time who is likely to commit suicide. In discovering how AI can help the Federal Government reduce the number of warfighters who commit suicide, early results have shown that AI can predict the likelihood a warfighter is at risk of suicide with an overall accuracy of 74%. AI also revealed the reasons behind why warfighters were at risk, showing that thirty-five percent of warfighters who consumed an anxiolytic prescription in the previous six months attempted or committed suicide. As AI produces explainability in its predictions, it allows the government and healthcare workers to understand how they can potentially reduce the likelihood of suicide for each individual at risk, providing personalized mitigation approaches depending on the individual’s background and drivers.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Predict Suicide Warning Signs appeared first on DataRobot AI Platform.
]]>The post Predict Which Patients Will Admit appeared first on DataRobot AI Platform.
]]>The transition to value-based reimbursements compels providers to understand ways in which they can improve patient health outcomes while at the same time reduce the cost of delivery. Amongst the strategies to do so providers are educating their patients on the value of ambulatory clinics and home care in an effort to reduce the volume of avoidable hospital and emergency department admissions that are tied to higher cost and disruption in care. Fortunately, care coordination programs that focus on helping patients with prior admissions and comorbidities with home care have been shown to reduce the reliance on inpatient admittance. That said, not all patients may be enrolled in these programs and complex challenges remain in being able to identify which patients are likely to admit into acute care treatment.
AI empowers your care managers by predicting in advance which patients are likely to be admitted. Unlike existing evaluations of admission risk that are based on limited factors, AI is able to evaluate admission risk with much higher accuracy by finding hidden patterns across outpatient, inpatient, emergency department, and care management data. Based on each patient’s medical history and interactions, AI will reveal which factors contribute to their risk of admission, offering care managers with an understanding of which intervention strategies they should apply depending on each patient’s conditions. Furthermore, AI maximizes the care manager’s impact as they can triage their patients by their probability of admissions. Adding additional focus on those at the highest risk of admissions will maximize the care manager’s utilization of resources. Intervention strategies care managers could apply once they have identified which patients have a high risk of admissions include enrolling them into programs that improve care coordination, home care, transportation, and medication adherence.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Predict Which Patients Will Admit appeared first on DataRobot AI Platform.
]]>The post Predict Whether a Parts Shortage Will Occur appeared first on DataRobot AI Platform.
]]>A critical component of any supply chain network is to prevent parts shortages, especially when they occur at the last minute. Parts shortages not only lead to underutilized machines and transportation, but also cause a domino effect of late deliveries through the entire network. In addition, the discrepancies between the forecasted and actual number of parts that arrive on time prevent supply chain managers from optimizing their materials plans.
Parts shortages are often caused by delays in their shipment. To mitigate the impact delays will have on their supply chain, manufacturers adopt approaches such as holding excess inventory, optimizing product designs for more standardization, and moving away from single-sourcing strategies. However, most of these approaches add up to unnecessary costs for parts, storage, and logistics.
In many cases, late shipments persist until supply chain managers can evaluate root cause and then implement short term and long term adjustments that prevent them from occurring in the future. Unfortunately, supply chain managers have been unable to efficiently analyze historical data available in MRP systems because of the time and resources required.
AI helps supply chain managers reduce parts shortages by predicting the occurrence of late shipments, giving them time to intervene. By learning from past cases of late shipments and their associated features, AI applies these patterns to future shipments to predict the likelihood that those shipments will also be delayed. Unlike complex MRP systems, AI provides supply chain managers with the statistical reasons behind each late shipment in an intuitive but scientific way. For example, when AI notifies supply chain managers of a late shipment, it will also explain why, offering reasons such as the shipment’s vendor, mode of transportation, or country.
Then, using this information, supply chain managers can apply both short term and long term solutions to preventing late shipments. In the short term, based on their unique characteristics, shipment delays can be prevented by adjusting their transportation or delivery routes. In the long term, supply chain managers can conduct aggregated root-cause analyses to discover and solve the systematic causes of delays. They can use this information to make strategic decisions, such as choosing vendors located in more accessible geographies or reorganizing their shipment schedules and quantities.
How would I measure ROI for my use case?
The ROI for implementing this solution can be estimated by considering the following factors:
For illustrative purposes, we use a sample dataset provided by the President’s Emergency plan for AIDS relief (PEPFAR), which is publicly available on Kaggle. This dataset provides supply chain health commodity shipment and pricing data. Specifically, the dataset identifies Antiretroviral (ARV) and HIV lab shipments to supported countries. In addition, the dataset provides the commodity pricing and associated supply chain expenses necessary to move the commodities to other countries for use. We use this dataset to represent how a manufacturing or logistics company can leverage AI models to improve their decision making.
The target variable for this use case is whether or not the shipment would be delayed (Binary; True or False, 1 or 0, etc.). This choice in target (Late_delivery) makes this a binary classification problem. The distribution of the target variable is imbalanced, with 11.4% being 1 (late delivery) and 88.6% being 0 (on time delivery). (See here for more information about imbalanced data in machine learning.)
The features below represent some of the factors that are important in predicting delays. The feature list encompasses all of the information in each purchase order sent to the vendor, which would eventually be used to make predictions of delays when new purchase orders are raised.
Beyond the features listed below, we suggest incorporating any additional data your organization may collect that could be relevant to delays. As you will see later, DataRobot is able to quickly differentiate important/unimportant features.
These features are generally stored across proprietary data sources available in the ERP systems of the organization.
Feature Name | Data Type | Description | Data Source | Example |
---|---|---|---|---|
Supplier name | Categorical | Name of the vendor who would be shipping the delivery | Purchase order | Ranbaxy, Sun Pharma etc. |
Part description | Text | The details of the part/item that is being shipped | Purchase order | 30mg HIV test kit, 600mg Lamivudine capsules |
Order quantity | Numeric | The amount of item that was ordered | Purchase order | 1000, 300 etc. |
Line item value | Numeric | The unit price of the line item ordered | Purchase order | 0.39, 1.33 |
Scheduled delivery date | Date | The date at which the order is scheduled to be delivered | Purchase order | 2-Jun-06 |
Delivery recorded date | Date | The date at which the order was eventually delivered | ERP system | 2-Dec-06 |
Manufacturing site | Categorical | The site of the vendor where the manufacturing was done since the same vendor can ship parts from different sites | Invoice | Sun Pharma, India |
Product Group | Categorical | The category of the product that is ordered | Purchase order | HRDT, ARV |
Mode of delivery | Categorical | The mode of transport for part delivery | Invoice | Air, Truck |
Late Delivery | Target (Binary) | Whether the delivery was late or on-time | ERP System, Purchase Order | 0 or 1 |
The dataset contains historical information on procurement transactions. Each row of analysis in the dataset is an individual order that is placed and whose delivery needs to be predicted. Every order has a scheduled delivery date and actual delivery date, and the difference between these were used to define the target variable (Late_delivery). If the delivery date surpassed the scheduled date, then the target variable had a value 1, else 0. Overall, the dataset contains about 10,320 rows and 26 features, including the target variable.
DataRobot Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.
While we will jump straight to interpreting the model results, you can take a look here to see how DataRobot works from start to finish, and to understand the data science methodologies embedded in its automation.
Something to highlight is, since we are dealing with an imbalanced dataset, DataRobot automatically recommends using LogLoss as the optimization metric to identify the most accurate model, being that it is an error metric which penalizes wrong predictions.
For this dataset, DataRobot found the most accurate model to be Extreme Gradient Boosting Tree Classifier with unsupervised learning features using open source XGboost library.
To give transparency on how the model works, DataRobot provides both global and local levels of model explanations. In broad terms, the model can be understood by looking at the Feature Impact graph, which shows the relative importance of the features in the dataset in relation to the selected target variable. The technique adopted by DataRobot to build this plot is called Permutation Importance.
As you can see, the model identified Pack Price, Country, Vendor, Vendor INCO Term, and Line item Insurance as some of the most critical factors affecting delays in the parts shipments.
Moving to the local view of explainability, DataRobot also provides Prediction Explanations that enable you to understand the top 10 key drivers for each prediction generated. This offers you the granularity you need to tailor your actions to the unique characteristics behind each part shortage.
For example, if a particular country is a top reason for a shipment delay, such as Nigeria or South Africa, you can take actions by reaching out to vendors in these countries and closely monitoring the shipment delivery across these routes.
Similarly, if there are certain vendors that are amongst the top reasons for delays, you can reach out to these vendors upfront and take corrective actions to avoid any delayed shipments which would affect the supply chain network. These insights help businesses make data-driven decisions to improve the supply chain process by incorporating new rules or alternative procurement sources.
For text variables, such as Part description (included in the dataset), we can look at Word Clouds to discover the words or phrases that are highly associated with delayed shipments. Text features are generally the most challenging and time consuming to build models for, but with DataRobot each individual text column is automatically fitted as an individual classifier and is directly preprocessed with NLP techniques (tf-idf, n grams, etc.) In this case, we can see that the items described as nevirapine 10 mg are more likely to get delayed in comparison to other items.
To evaluate the performance of the model, DataRobot by default ran five-fold cross validation and the resulting AUC score (for ROC Curve) was around 0.82. Since the AUC score on the holdout set (unseen data) was also around 0.82, we can be reassured that the model is generalizing well and is not overfitting. The reason we look at the AUC score for evaluating the model is because AUC ranks the output (i.e., the probability of delayed shipment) instead of looking at actual values. The Lift Chart below shows how the predicted values (blue line) compared to actual values (red line) when the data is sorted by predicted values. We see that the model has slight under-predictions for the orders which are more likely to get delayed. But overall, the model does perform well. Furthermore, depending on the problem being solved, you can review the confusion matrix for the selected model and, if required, adjust the prediction threshold to optimize for precision and recall.
After the right model has been chosen, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make decisions using the predictions to impact the overall process.
Automation | Augmentation | Blend
The predictions from this use case can augment the decisions of the supply chain managers as they foresee any upcoming delays in logistics. It acts as an intelligent machine that, combined with the decisions of the managers, help improve your entire supply chain network.
The model can be deployed using the DataRobot Prediction API. A REST API endpoint which would be used to bounce back predictions in near real time when new scoring data from new orders are received.
Once the model has been deployed (in whatever way the organization decides), the predictions can be consumed in several ways. For example, a front-end application that acts as the supply chain’s reporting tool can be used to deliver new scoring data as an input to the model, which then bounces back predictions and Prediction Explanations in real time.
The predictions and Prediction Explanations would be used by supply chain managers or logistic analysts to help them understand the critical factors or bottlenecks in the supply chain.
Decision Executors
Decision executors are the supply chain managers and procurement teams who are empowered with the information they need to ensure that the supply chain network is free from bottlenecks. These personnel have strong relationships with vendors and the ability to take corrective action using the model’s predictions.
Decision Managers
Decision managers are the executive stakeholders such as the Head of Vendor Development who manage large scale partnerships with key vendors. Based on the overall results, these stakeholders can perform quarterly reviews of the health of their vendor relationships to make strategic decisions on long-term investments and business partnerships.
Decision Authors
Decision authors are the business analysts or data scientists who would build this decision environment. These analysts could be the engineers/analysts from the supply chain, engineering, or vendor development teams in the organization who usually work in collaboration with the supply chain managers and their teams.
The decisions that the managers and executive stakeholders would take based on the predictions and Prediction Explanations for identifying potential bottlenecks would be reaching out and collaborating with appropriate vendor teams in the supply chain network based on data-driven insights. The decisions could be both long- and short-term based on the severity of the impact of shortages on the business.
One of the most critical components in implementing an AI is having the ability to track the performance of the model for data drift and accuracy. With DataRobot MLOps, you can deploy, monitor and manage all models across the organization through a centralized platform. Tracking model health is very important for proper model lifecycle management, similar to product lifecycle management.
One of the major risks in implementing this solution in the real world is adoption at the ground level. Having strong and transparent relationships with vendors is also critical in taking corrective action. The risk is that vendors may not be ready to adopt a data-driven strategy and trust the model results.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Predict Whether a Parts Shortage Will Occur appeared first on DataRobot AI Platform.
]]>The post Reduce Avoidable Returns appeared first on DataRobot AI Platform.
]]>Product returns are goods that retailers or consumers send back to manufacturers due to reasons that are either non-preventable or preventable. Both are collectively bucketed under a manufacturer’s total warranty returns (TWR). While returns are often an afterthought, they impact the average manufacturer’s profitability by 3.8%, a significant cost item that incrementally chips away at their profit margins. Manufacturers often lack the information needed to deeply understand the volume of expected returns and why returns occur. While there are numerous solutions manufacturers can establish to learn about the past, what remains to be true is that there is a wide gap in the manufacturer’s ability to have forward-looking insights that can help reduce the financial impact of returns.
AI analyzes historical data that you collect on product returns to learn patterns that can help it predict which products are likely to be returned in the future. With advancements in interpretability, AI will also offer your supply chain managers the top reasons why each individual product is likely to be returned. Using these insights, your supply chain managers can conduct a root cause analysis to prevent avoidable returns and to make iterations to their product or manufacturing processes. For products that have a high risk of being returned, supply chain managers can conduct a cost and benefit analysis on whether they will incur a net loss with the shipment required to deliver the product to and back from the customer. Product returns may also reveal insights on other challenges such as identifying quality defects. Identifying returns can not only help on the product level but can also enable financial analysts to embed forecasted returns into their cash flow projections, ensuring that your organization is prepared for worst-case scenarios.
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeThe post Reduce Avoidable Returns appeared first on DataRobot AI Platform.
]]>