You’re sold. You know automated machine learning will provide amazing benefits for your organization. Now you need to convince others with “proof” it works. Where do you start? Essentially, you need to motivate and excite stakeholders with examples of positive bottom-line impact. Based on my real-world experience, here are some tips on how to select the right pilot project for automated machine learning to provide results that get everyone excited.
TIP #1 – Actionable Results
Effective pilot projects should provide new insights that help you deliver quick, actionable wins – with actionable being the key to your success. Focus on solving the right problems for stakeholders by knowing your audience:
-
Who needs convincing of the value of a model?
-
What do they consider success, and how will they measure it?
-
What functions or processes do they care about improving?
-
What information drives performance in areas of accountability?
-
How, when, and where will your organization use a model for decision making?
-
What action(s) will your organization take based on the model?
Start looking for specific problems to solve with the end goal in mind and plan how you will proactively answer the inevitable “So what?” or “Now what?” questions.
TIP #2 – Limit Pilot Project Scope
Rather than attempt to solve enormous problems right away, use pilot projects to rapidly experiment and build prototypes, and then expand project scope as you gain expertise. It is crucial to reduce the scope of predictions rather than dataset sizes. Most business problems require numerous steps in the process. Think about the steps in a business process flow and pick one to optimize. If you have worked in data warehousing or analytics, this approach should sound familiar. Start small, then incrementally add more to it over time.
For example, your stakeholders might want to predict churn, which is a popular use case. Some churn is preventable, while other types are uncontrollable. Your pilot project should focus on areas or steps in a business process that stakeholders can feasibly control. For example, you could focus your pilot project on improving churn prediction for a specific segment and region rather than trying to take on the whole customer base. This approach will also help you isolate proof points versus floundering in a sea of endless possibilities.
TIP #3 – Select a Metric to Better Understand
Pick a performance metric at a level of analytical granularity such as “number of customers retained annually”, that everyone understands. Do not attempt to prove your machine learning pilot works with data science metrics like ROC curve, which is hard to interpret for most people. You need to translate data science language into business language that stakeholders will comprehend.
For a successful pilot project, you want to choose a metric that offers decision-making granularity. Granularity refers to a unit of analysis. A unit might be a sales opportunity, customer, or transaction. For successful pilot projects, or any machine learning project, granularity is vital for creating a model your business can use. Are current decisions based on the behavior of a single customer or visit, or on the aggregate behavior of several transactions or visits over time?
TIP #4 – Make Sure Enough Data is Available
You don’t need perfect data or petabytes of data for a pilot project. You can easily start modeling with a subset of data. Fundamentally, you do need to make sure you have strong input variables for your chosen metric to predict. A strong input variable contains different values across different rows. If a variable value does not change, it will not contain information for algorithms to make predictions.
Do not undermine the potential success of your machine learning pilot.
Machine learning identifies patterns between input variables and an outcome through variable value changes. For example, if you have a variable “Discount” that contains the exact same value “0” on all rows of data, it should not be included in your input dataset. If “Discount” contains fluctuating values across rows such as “0”, “10”, “15”, “25”, “30”, then you should include it.
Verify what data is already available and what data might be missing. For example, your retail cash register system might contain sold product information and customer demographics, but it might be missing relevant retail location traffic counts and weather information that does significantly influence overall retail store performance metrics. You can build a base machine learning model with your existing data and always add more data to it in future iterations.
To determine minimum data set sizes, consider the dimensionality and pattern complexity of your data1. Here are three simple guidelines.
-
For small models with a few input features, 10 to 20 records per variable value may be sufficient.
-
For medium models with over 20 input features, consider collecting 100 records per variable value.
-
For large models with over 100 input features, you will need a minimum of 10,000 records in the data.
If you’d like more information on data collection and preparation, please skim through my white paper “Data Preparation for Automated Machine Learning.” I covered data collection, sizing, and other data preparation guidance for machine learning projects in more detail in that white paper.
TIP #5 – Include Subject Matter Expert Participation
Do not undermine the potential success of your machine learning pilot. On your very first machine learning project, get help from an experienced data science professional that deeply understands the capabilities of DataRobot. You will gain priceless knowledge from them throughout the process to avoid making common mistakes and ensuring your success on future projects.
Machine learning projects are collaborative, not isolated efforts.
Regarding expert participation, always engage business area, domain, or subject matter experts on machine learning projects. This advice is true for your pilot project and all future projects. Business area, domain, or subject matter experts are best qualified to help correctly frame problems to solve, break down complex issues, question model findings, and validate results in proper context.
I’ll never forget a project that I worked on with a well-known insurance company. Data warehouse technical professionals refused to involve representatives from the line of business until a completed model was built. Well, guess what? The results were absurd – the model predicted 100% churn of New York sales reps. Model input data was missing key regional level legislative attributes that were not collected in the data warehouse. Immediately, line of business subject matter experts recognized that material omission. Machine learning projects are collaborative, not isolated efforts.
Recommended Resources
If you thought my top five tips for selecting a pilot project were helpful, you are likely to appreciate these on-demand webinars: “How to Avoid Building Bad Models” and “Data Preparation Essentials for Automated Machine Learning.” Other excellent “getting started” resources from DataRobot include:
About the Author
Jen Underwood, founder of Impact Analytix, LLC, has over 20 years of experience in “hands-on” development of data warehouses, reporting and advanced analytics solutions. Jen is honored to be an IBM Analytics Insider, SAS contributor, former Tableau Zen Master, and active analytics community member. In the past, Jen has held worldwide product management roles at Microsoft and served as a technical lead for system implementation firms. Jen has a Bachelor of Business Administration – Marketing, Cum Laude from the University of Wisconsin, Milwaukee and a post-graduate certificate in Computer Science – Data Mining from the University of California, San Diego.
1. Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst by Dean Abbott
About the author
Jen Underwood
Senior Director of Product Marketing, DataRobot
Jen Underwood, Sr. Director Product Marketing, has held worldwide analytics product management roles at Microsoft and served as a technical lead for system implementation firms. She has experience launching new products and turning around failed projects in enterprise data warehousing, reporting, and advanced analytics. Today she designs products and helps analytics professionals learn how to solve complex problems with machine learning in the emerging citizen data science segment. Jen has a Bachelor of Business Administration – Marketing, Cum Laude from the University of Wisconsin, Milwaukee and a post-graduate certificate in Computer Science – Data Mining from the University of California, San Diego.
Meet Jen Underwood