/ blog

# Clustering

Clustering is used to find groups to target adds. Also used in genomic research.

# K means clustering

K is a hyper-parameter - the number of clusters. Plot elbow plot to find what value of k reduces distortion the most (but use lowest possible).

Vulnerable to curse of dimensionality. PCA preprocessing helps.

Given enough time, K-means will always converge (centroids stop moving per iteration)

# Hierarchical clustering

Choice of linkage type influences how clusters are formed.

Measure distance between clusters and merge if shorter than a threshold.