Artificial Neural Networks in Machine Learning

Artificial Neural Networks in Machine Learning

What are Artificial Neural Networks?

  • The term "Artificial Neural Network" comes from replicating the structure of the human brain, which is composed of biological neural networks.
  • Artificial Neural Networks (ANNs) are computational models inspired by the human brain's structure and functioning.
  • They consist of interconnected nodes, or artificial neurons, organized into layers.
  • ANNs are used for tasks such as pattern recognition, classification, and regression.

Hebbian Network (Hebb Net)

  • Hebbian learning is a principle stating that if two neurons are activated simultaneously, the connection between them strengthens.
  • Hebbian Networks, or Hebb Nets, implement this learning rule.
  • The Hebbian rule was the first learning rule. In 1949, Donald Hebb created it as the learning algorithm for unsupervised neural networks.
  • It is used for pattern classification. It is a single layer neural network, i.e. it has one input layer and one output layer.

Learning Rule:

"Hebb's Rule" - "Cells that fire together wire together."

Example:

In a Hebb Net, if neuron A and neuron B frequently fire together, the connection between them strengthens.
Over time, this reinforces associations between activated neurons.

Perceptron

  • The perceptron serves as a fundamental component of an Artificial Neural Network.
  • A Perceptron is the simplest form of a neural network.
  • It takes multiple binary inputs, applies weights to them, sums the results.
  • Perceptron is considered as a single-layer neural network that consists of four main components.

Example:

  • In a Perceptron for cancer prediction, inputs could represent factors like tumor size and age.
  • The network learns to predict cancer (output: 1) or no cancer (output: 0) based on weighted features.

Adaline (Adaptive Linear Neuron)

  • Adaline is an improvement over the Perceptron, introducing a continuous activation function and the ability to learn from errors.
  • It was created by Professor Bernard Widrow and his student Ted Hoff at Stanford University in 1960.
  • It employs gradient descent to minimize the cost function, adjusting weights iteratively.

Components:

  • Inputs and Weights: Similar to a Perceptron.
  • Summation Function: Calculates the weighted sum of inputs.
  • Activation Function: Uses the weighted sum to produce the output.

Example:

In stock market prediction, Adaline could learn to predict stock prices by adjusting weights based on historical data and minimizing prediction errors.

Multilayer Neural Network

  • Multilayer Perceptron also known as Artificial Neural Networks consists of more than one perceptron which is grouped together to form a multiple layer neural network.
  • A Multilayer Neural Network consists of multiple layers of neurons, including input, hidden, and output layers.
  • Feedforward Neural Networks (FNN), also known as Multi-layer Perceptrons (MLP),
  • They are a type of artificial neural network designed for information processing in a forward direction.
  • A multi-layered perceptron model is effective for tackling complex non linear problems.
  • The presence of hidden layers allows these networks to learn complex relationships and patterns.

Components:

  • Input Layer: Receives input features.
  • Hidden Layers: Process information, capturing intricate patterns.
  • Output Layer: Produces final predictions.

Example:

In natural language processing, a Multilayer Neural Network could learn to understand the sentiment of text by processing word embeddings in the input layer and capturing nuanced relationships in hidden layers.

Architecture

The architecture of a neural network refers to its structure, including the number of layers, the number of neurons in each layer, and the connections between them.

Input Layer

  • Role: Receives Features.
  • The input layer is the first layer of the neural network, receiving the raw features or input data.
  • Each neuron in this layer represents a specific feature.

Hidden Layers

  • Role: Process Information.
  • Hidden layers come between the input and output layers and are where the neural network learns complex patterns and relationships within the data.
  • Each neuron in a hidden layer processes information based on learned weights and activation functions.

Output Layer

  • Role: Produces Predictions.
  • The output layer produces the final predictions or outcomes based on the processed information from the hidden layers.
  • The number of neurons in this layer depends on the nature of the task (e.g., binary classification, multi-class classification, regression).

Connections (Weights)

  • Role: Determine Signal Strength.
  • Connections between neurons are represented by weights.
  • These weights determine the strength of the signal between neurons.
  • During training, the neural network adjusts these weights to minimize the difference between predicted and actual outputs.

Example:

In a recommendation system, the architecture could involve an input layer for user preferences, multiple hidden layers capturing complex preferences, and an output layer suggesting relevant items.

Activation Functions

  • Activation functions brings non-linearity to neural networks, allowing them to learn complex patterns and relationships in data.
  • An activation function acts as a gate, checking if an incoming value surpasses a specific threshold.
  • Activation functions is also known as Transfer Function.
  • These functions determine the output of a neuron or a layer and play a crucial role in the learning process.
  • Sigmoid Function: Squashes input values between 0 and 1.
  • Example: Used in the output layer of binary classification models.
  • Hyperbolic Tangent (tanh): Similar to the sigmoid but squashes values between -1 and 1.
  • Example: Often used in hidden layers of neural networks.
  • Rectified Linear Unit (ReLU): Outputs the input for positive values; zero for negative values.
  • Example: Widely used in hidden layers due to simplicity and effectiveness.

Example:

In a neural network for image recognition, ReLU activation functions in hidden layers help the model learn intricate patterns in pixel values.

Loss Function

The loss function measures the difference between the predicted output and the actual target.
The goal during training is to minimize this loss. Various tasks may necessitate the use of different loss functions.

Types of Loss Functions

  • Mean Squared Error (MSE)
  • This involves calculating the average squared difference between predicted and actual values.
  • Example: Regression tasks where precise numeric predictions are crucial.
  • Binary Cross-Entropy
  • Measures the binary classification error.
  • Example: Used when the output is binary, such as spam or not spam classification.
  • Categorical Cross-Entropy
  • Extends binary cross-entropy to multi-class classification.
  • Example: Classifying images into multiple categories.

Example:

In a sentiment analysis model, binary cross-entropy loss is suitable as it penalizes deviations from correct sentiment labels.

Hyperparameters

  • Hyperparameters are configuration settings external to the model that impact its learning process.
  • These settings are not learned from the data but need to be set beforehand.
  • random_state: It is a parameter that specifies the random number used for weights and bias initialization.
  • Learning Rate: Determines the step size during gradient descent.
  • Batch Size: Specifies the number of training samples used in one iteration.
  • Activation function for the hidden layer. Examples, identity, logistic, tanh, and relu.

Gradient Descent

  • Gradient Descent is a fundamental optimization algorithm used in machine learning and deep learning to iteratively train models by minimizing the error or cost function.
  • It operates by adjusting the model's parameters, often represented as weights, in the direction that reduces the error.

Key Concepts

Stochastic Gradient Descent (SGD)

  • SGD is a variant of gradient descent where parameters are updated using a subset (batch) of the training data rather than the entire dataset.
  • This approach reduces computational requirements and is particularly useful for large datasets.

Gradient Ascent

  • While gradient descent minimizes the cost function, gradient ascent maximizes a function.
  • It's used in scenarios where the goal is to find the maximum, such as maximizing likelihood in probabilistic models.

Conclusion

  • Artificial Neural Networks (ANNs) mimic the brain's structure for tasks like pattern recognition.
  • They evolve from basic components like Perceptrons to more sophisticated Multilayer Neural Networks with input, hidden, and output layers.