Bugl — Daily coding puzzles and drills

Clusters are collections of similar data

Clustering is a type of unsupervised learning

The

Correlation Coefficient describes the strength of a relationship.

Clusters

Clusters are collections of data based on similarity. Data points clustered together in a graph can often be classified into clusters. In the graph below we can distinguish 3 different clusters:

Identifying Clusters

Clusters can hold a lot of valuable information, but clusters come in all sorts of shapes, so how can we recognize them?

The two main methods are:

Using Visualization

Using an Clustering Algorithm

Clustering

Clustering is a type of

Unsupervised Learning.

Clustering is trying to:

Collect similar data in groups

Collect dissimilar data in other groups

Clustering Methods

Density Method

Hierarchical Method

Partitioning Method

Grid-based Method

The

Density Method considers points in a dense regions to have more similarities and differences than points in a lower dense region. The density method has a good accuracy. It also has the ability to merge clusters. Two common algorithms are DBSCAN and OPTICS.

The

Formula

Hierarchical Method forms the clusters in a tree - type structure.

New clusters are formed using previously formed clusters. Two common algorithms are CURE and BIRCH.

The

Formula

Grid - based Method formulates the data into a finite number of cells that form a grid - like structure.

Two common algorithms are CLIQUE and STING

The

Partitioning Method partitions the objects into k clusters and each partition forms one cluster. One common algorithm is CLARANS.

Data Clusters

Data Clusters

Clusters are collections of similar data

Clustering is a type of unsupervised learning

The

Clusters

Identifying Clusters

The two main methods are:

Using Visualization

Using an Clustering Algorithm

Clustering

Clustering is a type of

Clustering is trying to:

Collect similar data in groups

Collect dissimilar data in other groups

Clustering Methods

Density Method

Hierarchical Method

Partitioning Method

Grid-based Method

The

The

The

Two common algorithms are CLIQUE and STING

The

Correlation Coefficient

Data Clusters

Data Clusters

Clusters are collections of similar data

Clustering is a type of unsupervised learning

The

Clusters

Identifying Clusters

The two main methods are:

Using Visualization

Using an Clustering Algorithm

Clustering

Clustering is a type of

Clustering is trying to:

Collect similar data in groups

Collect dissimilar data in other groups

Clustering Methods

Density Method

Hierarchical Method

Partitioning Method

Grid-based Method

The

The

The

Two common algorithms are CLIQUE and STING

The

Correlation Coefficient