Supervised Learning Algorithms in ML (Machine Learning)

Types of Supervised Learning Algorithms

Decision Trees
Tree Pruning
Rule-based Classification
Naïve Bayes
- Gaussian Naive Bayes
- Multinomial Naive Bayes
- Bernoulli Naive Bayes
Bayesian Network
Support Vector Machines (SVM)
k-Nearest Neighbors (k-NN)
Ensemble Learning
- Bagging
- Boosting
- Stacking
Random Forest Algorithm

1. Decision Trees

Decision Trees are a non-linear model used for both classification and regression.
They work by recursively or repetitively splitting the data based on feature values to create a tree-like structure.
Each internal node represents a decision based on a feature, and each leaf node represents the output class or regression value.
The tree typically using metrics like Gini impurity or information gain.
Example: Consider a dataset of emails labeled as spam or not spam.
Decision Trees can be used to classify emails based on features like the presence of certain keywords.
The tree might learn that if the word "discount" is present and the email sender is not in the contact list, it is likely spam.

1from sklearn.tree import DecisionTreeClassifier
2from sklearn.model_selection import train_test_split
3from sklearn.metrics import accuracy_score
4
5# Sample data (replace with your dataset)
6X, y = your_features, your_labels
7
8# Split data
9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
10
11# Create and fit Decision Tree
12dt_classifier = DecisionTreeClassifier()
13dt_classifier.fit(X_train, y_train)
14
15# Make predictions
16y_pred = dt_classifier.predict(X_test)
17
18# Evaluate accuracy
19accuracy = accuracy_score(y_test, y_pred)
20print(f"Decision Tree Accuracy: {accuracy}")

2. Tree Pruning

Pruning is a technique to prevent overfitting by limiting the depth of the tree or removing branches that do not contribute significantly.
Tree pruning is a technique used to optimize Decision Trees.
It involves removing branches from the tree that do not provide significant predictive power, thus preventing overfitting.
Overfitting occurs when the model learns noise in the training data and performs poorly on new, unseen data.
Pruning is often done by setting a maximum depth for the tree or by using algorithms that identify and remove unnecessary branches.

Types of Pruning

Pre-Pruning (Early Stopping)

Pre-pruning involves stopping or discontinuing the growth of the decision tree before it becomes too complex.
It sets conditions during the tree-building process to decide when to stop adding new branches.
Decisions are made during tree construction.
Less flexible as decisions are made prematurely.
Potentially lower Computational cost as fewer nodes are considered.

Post-Pruning

Post-pruning, also known as cost-complexity pruning or just pruning,
involves growing the tree without constraints and then later removing branches that do not contribute significantly to improving the model's performance.
More flexible as it allows for adjusting the tree.
Decisions are made after the tree is fully grown.
May be higher computational cost as it involves assessing and modifying a fully grown tree.

1from sklearn.tree import DecisionTreeClassifier
2from sklearn.model_selection import train_test_split
3from sklearn.metrics import accuracy_score
4
5# Sample data (replace with your dataset)
6X, y = your_features, your_labels
7
8# Split data
9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
10
11# Create and fit pruned Decision Tree
12pruned_dt_classifier = DecisionTreeClassifier(max_depth=3)
13pruned_dt_classifier.fit(X_train, y_train)
14
15# Make predictions
16y_pred_pruned = pruned_dt_classifier.predict(X_test)
17
18# Evaluate accuracy
19accuracy_pruned = accuracy_score(y_test, y_pred_pruned)
20print(f"Pruned Decision Tree Accuracy: {accuracy_pruned}")

3. Rule-based Classification

Rule-based classification involves using a set of if-else statements to make decisions.
Each rule consists of conditions based on input features, leading to a specific class or outcome.
Rules are derived from analyzing the relationships between input features and the target variable.
It's an interpretable approach but may struggle with complex relationships in data.
Example: In a rule-based system for loan approval, a rule might be:
"If income > $50,000 and credit score > 700, approve the loan; otherwise, deny the loan."

1# Example rule-based classification using if-else statements
2
3def rule_based_classifier(features):
4    if features[0] > 5 and features[1] < 10:
5        return 'Class A'
6    else:
7        return 'Class B'
8
9# Sample data (replace with your dataset)
10sample_data = [6, 8]
11
12# Make a prediction using the rule-based classifier
13prediction = rule_based_classifier(sample_data)
14print(f"Rule-based Classification Prediction: {prediction}")

4. Naïve Bayes

it is a probabilistic classification algorithm derived from Bayes' theorem.
Naïve Bayes is versatile, suitable for both binary (two-class) and multiclass classification problems.
Known for its simplicity, efficiency, and effectiveness in dealing with high-dimensional data.
It assumes that features are conditionally independent given in the class.
The algorithm calculates the probability of a class given the input features and selects the class with the highest probability.
Despite its "naïve" assumption, Naïve Bayes often performs well and is computationally efficient.

Example: Medical Diagnosis

Suppose we want to predict whether a person has a particular medical condition based on two symptoms: high fever and persistent cough

1from sklearn.naive_bayes import GaussianNB
2from sklearn.model_selection import train_test_split
3from sklearn.metrics import accuracy_score
4
5# Sample data (replace with your dataset)
6X, y = your_features, your_labels
7
8# Split data
9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
10
11# Create and fit Naïve Bayes classifier
12nb_classifier = GaussianNB()
13nb_classifier.fit(X_train, y_train)
14
15# Make predictions
16y_pred_nb = nb_classifier.predict(X_test)
17
18# Evaluate accuracy
19accuracy_nb = accuracy_score(y_test, y_pred_nb)
20print(f"Naïve Bayes Accuracy: {accuracy_nb}")

Types of Naives Bayes

Gaussian Naive Bayes

Gaussian Naive Bayes is used when the features (attributes or variables) in the dataset are continuous and have a Gaussian (normal) distribution.

Multinomial Naive Bayes

Multinomial Naive Bayes is suitable when the features represent the frequency of occurrences of different events,
and each feature is a count (non-negative integers).
It is commonly used in text classification tasks.

Bernoulli Naive Bayes

Bernoulli Naive Bayes is used when the features are binary (0 or 1),
representing the presence or absence of a particular characteristic.
Example: Bernoulli Naive Bayes can help classify emails as spam or not based on the binary occurrence of these words.

5. Bayesian Network

A Bayesian Network is a graphical model that represents probabilistic relationships among a set of variables.
Nodes in the graph represent variables, and edges represent dependencies.
The network is built based on conditional dependencies between variables.
Inference involves using observed evidence to update probabilities of other variables in the network.

1from pgmpy.models import BayesianModel
2from pgmpy.estimators import ParameterEstimator
3from pgmpy.inference import VariableElimination
4
5# Define the structure of the Bayesian Network
6model = BayesianModel([('A', 'B'), ('C', 'B')])
7
8# Sample data (replace with your dataset)
9data = your_data
10
11# Fit the model to the data
12model.fit(data)
13
14# Estimate parameters
15pe = ParameterEstimator(model, data)
16print(pe.state_counts('B'))
17
18
19# Perform inference
20infer = VariableElimination(model)
21result = infer.query(variables=['B'], evidence={'A': 1})
22print(result)

6. Support Vector Machines (SVM)

(for understanding)

Imagine you have a bunch of dots on a piece of paper, and these dots can be of two types - let's say red dots and blue dots.

Now, you want to draw a line (not necessarily straight) in such a way that it creates the most significant gap or space between the red dots and blue dots.

SVM is like finding the best line that separates different groups of dots. This line is called a hyperplane.
Support Vector Machines are linear classifiers that find the optimal hyperplane to separate classes.
They work well in high-dimensional spaces.
SVM aims or strives to find the hyperplane that maximizes the margin between classes.
The kernel trick allows SVM to handle non-linear decision boundaries by transforming input features into higher-dimensional space.
SVM algorithm can be used for image classification, text categorization, etc.

Example: For classifying handwritten digits, SVM might find the optimal hyperplane that best separates different digits in a high-dimensional space.

Types of Linear SVM

Linear SVM
Non Linear SVM

Linear SVM

Linear SVM aims to find a straight-line (hyperplane) that best separates the classes in the input feature space.
Example: Classifying emails as spam or not spam based on features like word frequencies.

Non-linear SVM

Non-linear SVM handles complex relationships by transforming input features into a higher-dimensional space,
allowing for the creation of non-linear decision boundaries.
Example: Classifying handwritten digits based on pixel values, where a simple straight line wouldn't be enough

7. k-Nearest Neighbors (k-NN)

k-Nearest Neighbors is a simple, Supervised learning algorithm that classifies data points based on the majority class of their k-nearest neighbors.
k-NN (k-Nearest Neighbors) does not make any assumptions about the underlying data distribution.
It is also called a lazy learner algorithm because it does not learn from the training set immediately.
Example: In a k-NN classifier for predicting movie genres,
the algorithm classifies a movie based on the genres of its k-nearest neighbors in a feature space, such as viewer ratings etc.

1from sklearn.neighbors import KNeighborsClassifier
2from sklearn.model_selection import train_test_split
3from sklearn.metrics import accuracy_score
4
5# Sample data (replace with your dataset)
6X, y = your_features, your_labels
7
8# Split data
9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
10
11# Create and fit k-NN classifier
12knn_classifier = KNeighborsClassifier(n_neighbors=3)
13knn_classifier.fit(X_train, y_train)
14
15# Make predictions
16y_pred_knn = knn_classifier.predict(X_test)
17
18# Evaluate accuracy
19accuracy_knn = accuracy_score(y_test, y_pred_knn)
20print(f"k-NN Accuracy: {accuracy_knn}")

8. Ensemble Learning

Ensemble Learning combines multiple models to improve overall performance and robustness.
Voting classifiers combine predictions from multiple models to make a final prediction.
Bagging (used in Random Forest) trains multiple instances of the same model on different subsets of the data to reduce overfitting.
Example: Stock market prediction ensemble combines models (e.g., Decision Trees, SVMs) for increased accuracy and robustness.

1from sklearn.ensemble import RandomForestClassifier, VotingClassifier
2from sklearn.linear_model import LogisticRegression
3from sklearn.svm import SVC
4from sklearn.model_selection import train_test_split
5from sklearn.metrics import accuracy_score
6
7# Sample data (replace with your dataset)
8X, y = your_features, your_labels
9
10# Split data
11X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
12
13# Create individual classifiers
14logreg = LogisticRegression()
15svm = SVC()
16rf = RandomForestClassifier()
17
18
19# Create an ensemble using a Voting Classifier
20ensemble_classifier = VotingClassifier(estimators=[('logreg', logreg), ('svm', svm), ('rf', rf)])
21
22# Fit ensemble classifier
23ensemble_classifier.fit(X_train, y_train)
24
25# Make predictions
26y_pred_ensemble = ensemble_classifier.predict(X_test)
27
28
29
30# Evaluate accuracy
31accuracy_ensemble = accuracy_score(y_test, y_pred_ensemble)
32print(f"Ensemble Accuracy: {accuracy_ensemble}")

Types of Ensemble methods

1. Bagging
2. Boosting
3. Stacking

Bagging (Bootstrap Aggregating)

Bagging involves training multiple or numerous instances of the same base model on different subsets of the training data,
generated through random sampling with replacement (bootstrap samples).
Example: Random Forest

Boosting

Boosting focuses on sequentially training multiple weak learners, giving more weight to instances that the previous models misclassified.
This iterative process aims to correct errors and improve overall model accuracy.
Example: AdaBoost (Adaptive Boosting)

Stacking

Stacking involves training multiple diverse models, referred to as base models, and then combining their predictions using a meta-model.
The meta-model is trained on the outputs of the base models, allowing for a higher-level understanding of the data.
Example: Stacked Ensemble

9. Random Forest Algorithm

Random Forest is an ensemble learning algorithm that combines the predictions of multiple decision trees to enhance overall performance and reduce the risk of overfitting.
It operates through the process of bagging (Bootstrap Aggregating) and
Introduces an additional layer of randomness, making it robust and accurate in various applications.
It takes less time to train as compared to other algorithms.

1from sklearn.ensemble import RandomForestClassifier
2from sklearn.model_selection import train_test_split
3from sklearn.metrics import accuracy_score
4
5# Sample data (replace with your dataset)
6X, y = your_features, your_labels
7
8# Split data
9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
10
11# Create and fit Random Forest classifier
12rf_classifier = RandomForestClassifier()
13rf_classifier.fit(X_train, y_train)
14
15# Make predictions
16y_pred_rf = rf_classifier.predict(X_test)
17
18# Evaluate accuracy
19accuracy_rf = accuracy_score(y_test, y_pred_rf)
20print(f"Random Forest Accuracy: {accuracy_rf}")

Table of contents