Intro to Machine Leraning

Define problem → Model data → Evaluate → Deploy

Xnxd is our dataset
Y → labels or target values
Point rows: instance (sample data point)
Columns: dimensions(in machine learning, we can have millions of dimensions), features, attributes, variables

Entire dataset = Training dataset

These matrices are the raw data → we create a ML model using this training dataset

supervised learning: there will be the random test data point to figure out the output label based on the model
unsupervised learning: we don’t have Y but X, so we cluster the data points in different buckets based on similarities;
semi-superivsed learning: possible to utilize unsupervised learning to improve supervised learning

Unsupervised Learning:

Clustering Analysis

K-means

Gaussian mixture model

Hierarchical clustering

Density-based clustering

Evaluation of clustering algorithms

Dimension Reduction: reduce the dimensions of the dataset b/c data collection with multiple features cost heavily