- Artificial Intelligence
- Data
- Data Science
- Features
- Generative AI
- Machine Learning
-
Modeling
- Autopilot Mode
- Classification
- Confusion Matrix
- Cross-Validation
- Deep Learning Algorithms
- Machine Learning Model
- Machine Learning Model Accuracy
- Machine Learning Model Deployment
- Model Blueprint
- Model Fitting
- Model Interpretability
- Model Tuning
- Multiclass Classification
- Neural Network
- Open Source Model Infrastructure
- Overfitting
- Regression
- Training Sets, Validation Sets, and Holdout Sets
- Underfitting
- Predictions
- View global site search results
- A
- AI Engineer
- AI Observability
- AIOps
- Artificial Intelligence Wiki
- Automated Machine Learning
- Autopilot Mode
- B
- Big Data
- C
- Citizen Data Scientist
- Classification
- Cognitive Computing
- Confusion Matrix
- Cross-Validation
- D
- Data Collection
- Data Governance
- Data Insights
- Data Management
- Data Preparation
- Data Profiling
- Data Science
- Deep Learning Algorithms
- E
- Explainable AI
- F
- Feature Engineering
- Feature Impact
- Feature Selection
- Feature Variables
- G
- Generative AI
- L
- Large Language Model Operations (LLMOps)
- M
- Machine Learning
- Machine Learning Algorithms
- Machine Learning Life Cycle
- Machine Learning Model
- Machine Learning Model Accuracy
- Machine Learning Model Deployment
- Machine Learning Operations (MLOps)
- Model Blueprint
- Model Fitting
- Model Interpretability
- Model Monitoring
- Model Tuning
- Multiclass Classification
- N
- Natural Language Processing
- Neural Network
- O
- Open Source Model Infrastructure
- Overfitting
- P
- Prediction
- Prediction Explanations
- Predictive Maintenance
- Production Model Governance
- Production Model Lifecycle Management
- R
- Regression
- S
- Scoring Data
- Semi-Supervised Machine Learning
- Stacked Predictions
- Supervised Machine Learning
- T
- Target Leakage
- Target Variable
- Text Mining
- Training Sets, Validation Sets, and Holdout Sets
- U
- Underfitting
- Unsupervised Machine Learning
- W
- What is Artificial Intelligence (AI)?
- View global site search results
- Artificial Intelligence
- Data
- Data Science
- Features
- Generative AI
- Machine Learning
-
Modeling
- Autopilot Mode
- Classification
- Confusion Matrix
- Cross-Validation
- Deep Learning Algorithms
- Machine Learning Model
- Machine Learning Model Accuracy
- Machine Learning Model Deployment
- Model Blueprint
- Model Fitting
- Model Interpretability
- Model Tuning
- Multiclass Classification
- Neural Network
- Open Source Model Infrastructure
- Overfitting
- Regression
- Training Sets, Validation Sets, and Holdout Sets
- Underfitting
- Predictions
Data Collection
What is Data Collection?
As a society, we’re generating data at an unprecedented rate (see big data). These data can be numeric (temperature, loan amount, customer retention rate), categorical (gender, color, highest degree earned), or even free text (think doctor’s notes or opinion surveys). Data collection is the process of gathering and measuring information from countless different sources. In order to use the data we collect to develop practical artificial intelligence (AI) and machine learning solutions, it must be collected and stored in a way that makes sense for the business problem at hand.
Why is Data Collection Important?
Collecting data allows you to capture a record of past events so that we can use data analysis to find recurring patterns. From those patterns, you build predictive models using machine learning algorithms that look for trends and predict future changes.
Predictive models are only as good as the data from which they are built, so good data collection practices are crucial to developing high-performing models. The data need to be error-free (garbage in, garbage out) and contain relevant information for the task at hand. For example, a loan default model would not benefit from tiger population sizes but could benefit from gas prices over time.
Data Collection + DataRobot
DataRobot partners with several organizations that assist in collecting, storing, and transforming data to make it ready for predictive modeling. Once you’ve collected and prepared the appropriate data for your specific business problem, you can easily import it into the DataRobot AI Platform no matter where you’ve stored it. Then, DataRobot automatically creates new features and builds and evaluates hundreds of machine learning models which you can immediately deploy into production.