# The BetaML.Clustering Module

BetaML.ClusteringModule
Clustering module (WIP)

(Hard) Clustering algorithms

Provide hard clustering methods using K-means and k-medoids. Please see also the GMM module for GMM-mased soft clustering, missing values imputation / collaborative filtering / reccomendation systems using clustering methods as backend.

The module provides the following functions. Use ?[function] to access their full signature and detailed documentation:

source

## Detailed API

BetaML.Clustering.initRepresentativesMethod

initRepresentatives(X,K;initStrategy,Z₀)

Initialisate the representatives for a K-Mean or K-Medoids algorithm

Parameters:

• X: a (N x D) data to clusterise
• K: Number of cluster wonted
• initStrategy: Whether to select the initial representative vectors:
• random: randomly in the X space
• grid: using a grid approach [default]
• shuffle: selecting randomly within the available points
• given: using a provided set of initial representatives provided in the Z₀ parameter
• Z₀: Provided (K x D) matrix of initial representatives (used only together with the given initStrategy) [default: nothing]
• rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Returns:

• A (K x D) matrix of initial representatives

Example:

julia> Z₀ = initRepresentatives([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.6 38],2,initStrategy="given",Z₀=[1.7 15; 3.6 40])
source
BetaML.Clustering.kmeansMethod

kmeans(X,K;dist,initStrategy,Z₀)

Compute K-Mean algorithm to identify K clusters of X using Euclidean distance

Parameters:

• X: a (N x D) data to clusterise
• K: Number of cluster wonted
• dist: Function to employ as distance (see notes). Default to Euclidean distance.
• initStrategy: Whether to select the initial representative vectors:
• random: randomly in the X space
• grid: using a grid approach [default]
• shuffle: selecting randomly within the available points
• given: using a provided set of initial representatives provided in the Z₀ parameter
• Z₀: Provided (K x D) matrix of initial representatives (used only together with the given initStrategy) [default: nothing]
• rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Returns:

• A tuple of two items, the first one being a vector of size N of ids of the clusters associated to each point and the second one the (K x D) matrix of representatives

Notes:

• Some returned clusters could be empty
• The dist parameter can be:
• Any user defined function accepting two vectors and returning a scalar
• An anonymous function with the same characteristics (e.g. dist = (x,y) -> norm(x-y)^2)
• One of the above predefined distances: l1_distance, l2_distance, l2²_distance, cosine_distance

Example:

julia> (clIdx,Z) = kmeans([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4],3)
source
BetaML.Clustering.kmedoidsMethod

kmedoids(X,K;dist,initStrategy,Z₀)

Compute K-Medoids algorithm to identify K clusters of X using distance definition dist

Parameters:

• X: a (n x d) data to clusterise
• K: Number of cluster wonted
• dist: Function to employ as distance (see notes). Default to Euclidean distance.
• initStrategy: Whether to select the initial representative vectors:
• random: randomly in the X space
• grid: using a grid approach
• shuffle: selecting randomly within the available points [default]
• given: using a provided set of initial representatives provided in the Z₀ parameter
• Z₀: Provided (K x D) matrix of initial representatives (used only together with the given initStrategy) [default: nothing]
• rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Returns:

• A tuple of two items, the first one being a vector of size N of ids of the clusters associated to each point and the second one the (K x D) matrix of representatives

Notes:

• Some returned clusters could be empty
• The dist parameter can be:
• Any user defined function accepting two vectors and returning a scalar
• An anonymous function with the same characteristics (e.g. dist = (x,y) -> norm(x-y)^2)
• One of the above predefined distances: l1_distance, l2_distance, l2²_distance, cosine_distance

Example:

julia> (clIdx,Z) = kmedoids([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4],3,initStrategy="grid")
source