True Story About Eureqa
The idea of computers helping to make scientific discoveries is incredibly fascinating to me. My advisor at Cornell (and advisor on Eureqa), Hod Lipson, once joked to me “maybe all the laws of physics are elegant and beautiful because that’s all that human minds can ever hope to understand.” Along with Hod and another brilliant educator, Steven Strogatz, I appeared on a RadioLab episode called the “Limits of Science” that explored this idea further, talking about the fundamental limitations we may have as a species to understand the world around us. Personally, I think there are definitely horizons in math, biology, and physics that we may never reach… but maybe AI can change that.
Ten years ago, this same month, I finished my PhD and started working on Eureqa full time. I remember my peers recommended staying in academia, but I thought it would be a distraction–all I wanted to do was work on Eureqa and try to create AI to help scientists to discover new laws of physics in raw data.
At the time, I had a small following of people interested in using Eureqa to derive mathematical formulas and models. Incredibly, around 30,000 people ended up downloading and using the first version of Eureqa during that first year. Even though I’ve had to branch out to work on other AI problems at DataRobot (especially related problems in time series and COVID-19 vaccine trials), the interest in Eureqa has continued and even grown. Searching online, I found there are 1,330 publications that mentioned Eureqa in their analysis, and 1,637 citations of the principal publication that inspired Eureqa in Science.
So What is Eureqa?
Traditionally, science has advanced in many cases by having brilliant researchers compete different hypotheses to explain experimental data, and then design experiments to measure which is correct. What if we could replicate this competition of ideas in simulation? This was the primary inspirations to Eureqa’s algorithm. I was invited to be on the Discovery Network’s show “Through the Wormhole with Morgan Freeman” which I thought had a great animated introduction to how Eureqa works. It shows how Eureqa generates hypotheses as mathematical expressions, and then refines them to be as elegant, predictive, and precise as possible from experimental data.
This search for mathematical formulas makes Eureqa different from other machine learning algorithms. Instead of a predefined model structure– such as a linear model, neural network activation functions, or successive decision tree splits– Eureqa generates novel, specialized mathematical formulas, much like a physicist would write to describe the laws of nature.
What Makes Eureqa Models Different from other Models
Some of the main benefits in Eureqa models is that they can derive new features that can be extremely predictive (e.g. complex combinations of lags); they can also use these features to generalize to new data more accurately. Finally, Eureqa models minimize complexity, making them more resilient to changes in data, for example, by only using the simplest set of features required.
This class of problem is known as symbolic regression, which has had a surge of attention over the past several years. A couple of the most exciting to me were from Max Tegmark and team for solving a large number of problems in physics, and detecting conservation laws from image pixel data. I’m excited by these efforts and how we might be able to improve Eureqa further as well.
Why Eureqa is a Big Deal for DataRobot Customers
There are a number of significant advantages of using Eureqa models that we think our customers and prospects will be particularly excited about:
- They return human readable and interpretable analytic expressions, which are easily reviewed by subject matter experts. They also deploy very easily.
- They are incredibly good at feature selection because they are forced to reduce complexity during the model building process. For example, if the data had 20 different columns used to predict the target variable, the search for a simple expression would result in an expression that only uses the strongest predictors.
- They work really well in small datasets so are very popular with scientific researchers who gather data from physical experiments that don’t produce massive amounts of data. (In such situations, traditional supervised machine learning models may be unable to learn.)
- They provide an easy way to incorporate domain knowledge. If you know the underlying relationship in the system that you’re modeling, you can actually give Eureqa a hint, e.g., the formula of the heat transfer or how house prices work in a particular neighborhood. You can give Eureqa that known relationship as a building block or a starting point to learn from. Eureqa will build machine learning corrections from there.
You can find a Jupyter notebook and dataset for this example in the DataRobot Community GitHub.
Eureqa Enhancements in 7.1 Release
Eureqa is now generally available in the DataRobot platform and available for all users. Eureqa blueprints are run automatically for certain project types, and otherwise can be run from the model repository. Eureqa blueprints also have special configuration in the DataRobot UI for advanced tuning and setting target expressions, and special user interfaces for reviewing and selecting the different Eureqa models found.
What to Know More about Eureqa?
If you are interested in trying Eureqa, it is part of the core algorithms supported in DataRobot and available in the new 7.1 release. You can also find a walkthrough guide on our community page.
References
- Schmidt, M., Lipson, H. “Distilling Free-Form Natural Laws from Experimental Data, Science 03 Apr 2009: Vol. 324, Issue 5923, pp. 81-85, Paper Link
- Silviu-Marian Udrescu, Max Tegmark “AI Feynman: A physics-inspired method for symbolic regression,” Science Advances 15 Apr 2020: Vol. 6, no. 16, Paper Link
- “Through the Wormhole with Morgan Freeman,” Discovery Network, Video Clip
- “Limits,” NPR: RadioLab, Audio Clip
- Samuel Greydanus, Misko Dzamba, Jason Yosinski “Hamiltonian Neural Networks,” Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Paper Link
About the author
Michael Schmidt
Chief Technology Officer
Michael Schmidt serves as Chief Technology Officer of DataRobot, where he is responsible for pioneering the next frontier of the company’s cutting-edge technology. Schmidt joined DataRobot in 2017 following the company’s acquisition of Nutonian, a machine learning company he founded and led, and has been instrumental to successful product launches, including Automated Time Series. Schmidt earned his PhD from Cornell University, where his research focused on automated machine learning, artificial intelligence, and applied math. He lives in Washington, DC.
Meet Michael Schmidt