# Clustering in Machine learning

## What is Clustering ?

- Clustering is the process of dividing a population or data into groups
- such that
**data points in the same group are similar to other data points in the same group**and less similar to content in other groups. - Clustering is very
**important as it determines the clustering of existing unlabeled data**. - It is used as a method to find useful patterns, useful features and commonalities in the sample.
**common technologies include:**Statistical data analysis , Social network analysis, etc.

### Example:

- Clustering is used by the Amazon in its recommendation system to provide the recommendations as per the past search of products.
- Netflix also uses this technology to show users videos and websites based on their viewing history.

## Types of Clusters

**Partitioning Clustering (e.g. K-means):**Separates data into distinct groups based on their centroids.**Density-Based Clustering (e.g. DBSCAN):**Identifies clusters by recognizing areas of high data density and separating them from sparser regions.**Distribution Model-Based Clustering (e.g. Expectation-Maximization with GMM)**: Divides data based on the likelihood of belonging to a specific distribution, often assuming shapes like Gaussian curves.**Fuzzy Clustering (e.g. Fuzzy C-means) :**Soft method where a data point can belong to multiple clusters with varying degrees of membership.

### K-means Clustering

- K-means is a
**clustering algorithm**that groups anonymous data sets into different groups. - Here K defines the number of groups that need to be created in the process, if K = 2 there will be two groups,
- if K = 3 there will be three groups, etc.
- The k-means algorithm partitions the given data into k clusters.
- Each cluster has a cluster center called centroid. k is specified by the user.
- It is a center-based algorithm where each cluster is associated with a center.
- The main goal of this algorithm is to minimize the distance between data points and their corresponding clusters.

**k-means group algorithm generally performs two tasks:**

- The best value of the center point K or center of gravity is determined by an iterative method.
- Assign each data point to the nearest k-center.
- Data points that are close to a given position k form a cluster.
- So each cluster has data points that have some similarities and distances with other clusters.

### K-means algorithm

- Step-1: Select K number to determine the number of clusters.
- Step-2: Select random K points or centroids. (There may be other data in the input data set).
- Step 3: Assign each data point to the nearest centroid, which will form the previously specified K groups.
- Step 4: calculate the difference and place the new centers of gravity.
- Step 5: Repeat step 3; This means that all data again points to the nearest centroid of each group.
- Step 6: If there is a location, go to step 4, if not, go to completion.
- Step 7: The model is ready.

```
1import matplotlib.pyplot as plt
2from sklearn.cluster import KMeans
3x = [4,5,10,4,3,11,14,6,10,12]
4y = [21,19,24,17,16,25,24,22,21,21]
5plt.scatter(x, y)
6plt.show()
7data = list(zip(x, y))
8print(data)
9kmeans = KMeans(n_clusters=2)
10kmeans.fit(data)
11plt.scatter(x,y,c=kmeans.labels_)
12plt.show()
13
```

## Hierarchical Clustering

**Hierarchical clustering, also known as hierarchical cluster analysis or HCA**, is another unsupervised machine learning algorithm used to group**unlabeled datasets into clusters**.- In this algorithm, we create a hierarchical structure consisting of tree-shaped groups,
**called dendrogram**. - Sometimes, the results of K-means cluster and hierarchical cluster may be similar but they work differently.
**Hierarchical clustering**is better because there is no need to**predetermine the number of clusters**as we do in the K-Means algorithm.

### Methods of Hierarchical clustering

There are two methods of hierarchical clustering:

**Aggregation:**Aggregation is a method where the algorithm first treats all data points as a group and puts them together until one group remains.**Split:**The split algorithm is the opposite of the merge algorithm as it is a top-down approach.

### Agglomerative Hierarchical clustering

- Step 1: Create each data point as a group. Let's assume there are N points, so the number of groups will also be N.
- Step 2: Take the two closest points or groups and combine them into a single group. Therefore, there will now be N-1 groups.
- Again, select the two closest groups and combine them to create a single group. There will be N-2 groups.
- Step 4: Repeat step 3 until only one group remains.
- Step 5: After collecting all the groups into one large group, create a dendrogram to divide the groups into each question.
- The closest of two groups is important for hierarchical grouping.
- There are many ways to calculate the distance between two groups, such as the Euclidean distance, and this method determines the grouping rules.
- These measurements are called connection methods.

```
1import numpy as np
2import matplotlib.pyplot as plt
3from sklearn.cluster import AgglomerativeClustering
4x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]
5y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
6plt.scatter(x, y)
7plt.show()
8data = list(zip(x, y))
9hierarchical_cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
10labels = hierarchical_cluster.fit_predict(data)
11plt.scatter(x, y, c=labels)
12plt.show()
13
```

### Kohonen Self-Organizing Maps

- The concept of self-organizing map (SOM) was first proposed by Kohonen.
- This is a way to reduce data because
**it is an unsupervised neural network that learns using unsupervised learning to create discrete sparse connections**. - Representation of the
**input space for training the model**. This representation is called a map. - To directly reduce the complexity of the translation problem,
- SOM is used to include various objects at a lower level (or
**dimensionality reduction) during processing and integration.** - The output layer and the input layer are the two layers that make up t
**he SOM. This is also called Kohonen mapping.** - The main advantage of using SOM is that the data is easy to read and understand.

### Feature selection and Dimensionality reduction

- The number of input variables or features of the dataset is called
**dimensionality**. - Many practical features often make demonstration work more difficult
**to model, often called a curse**. - High dimensional data can also lead to overfitting,
- where the model fits the training data too closely and does not fit the new data well.
- Therefore it is often necessary to reduce the number of practical features.
- This will reduce the number of practical features.
- The dimensionality of the feature space is therefore called “dimensionality reduction.”

- Dimensionality reduction is a data preparation/preprocessing technique used to pre-model data.
- It can be done after data cleaning and data scaling and before training the prediction model.
- There are two main methods of size reduction:
**feature selection and feature removal.**

### Feature selection

- Feature selection will select a subset of primary features that are relevant to the problem at hand.
- The goal is to reduce the size of the dataset while preserving the most important features.
- There are many feature selection methods, including filtering techniques, wrapping techniques, and embedding techniques.
- Filtering techniques sort features according to their relationships with target variables.
- Wrapping techniques use performance-based models as criteria for feature selection,
- and layers provide specific options with Embedded standard, training models.

### Feature Extraction

- Feature extraction will create new features by combining or modifying old features.
- The aim is to create a set of properties that capture the essence of raw materials in a low-cost environment.
- There are many methods for feature extraction, i
**ncluding principal component analysis (PCA), linear discriminant analysis (LDA).** - Note: Feature selection and dimensionality reduction are two methods used to reduce the number of features.

### Principal Component Analysis

- Principal component analysis is an unsupervised learning algorithm used for dimensionality reduction in machine learning.
- Dimensionality reduction converts features into smaller ones.
- This method was suggested by Karl Pearson.
- Its function is that when data in high-dimensional space is mapped to data in low-dimensional space,
- the difference between data in low-dimensional space should be largest.
- It is a statistical technique that transforms the analysis of relationships into a set of independent features with the help of orthogonal transformation.
- These updated features are called core features.

#### Some terms used in the PCA algorithm

**Dimension:**It is the number of features or variables in the given data. More simply, it is the number of lines in the file.**Correlation:**It shows the extent to which two variables are related to each other.- For example, if one variable changes, another variable also changes. The range of correlation is -1 to +1.
**Orthogonal:**It means that the differences between variables are not equal, so the correlation between a pair of variables is zero.**Eigenvector:**If there is a square matrix M and a non-zero vector v is given. If av is a multiple of v, v will be an eigenvector.**Covariance Matrix:**Reduces calculation time.