Machine Learning: process of turning data into actionable knowledge for task support and decision making
Define problem → Model data → Evaluate → Deploy
- Basic Math for computational data analysis
- Unsupervised learning(EDA: exclusive data analysis) for data exploration
- Supervised learning for predictive analysis
- Xnxd is our dataset
- Y → labels or target values
- Point rows: instance (sample data point)
- Columns: dimensions(in machine learning, we can have millions of dimensions), features, attributes, variables
Entire dataset = Training dataset
These matrices are the raw data → we create a ML model using this training dataset
- supervised learning: there will be the random test data point to figure out the output label based on the model
- unsupervised learning: we don’t have Y but X, so we cluster the data points in different buckets based on similarities;
- semi-superivsed learning: possible to utilize unsupervised learning to improve supervised learning
Unsupervised Learning:
Clustering Analysis
K-means
Gaussian mixture model
Hierarchical clustering
Density-based clustering
Evaluation of clustering algorithms
Dimension Reduction: reduce the dimensions of the dataset b/c data collection with multiple features cost heavily