Parallel Processing Architecture , Approaches and laws

What is Parallel Processing?

Parallel processing involves the simultaneous execution of multiple tasks to achieve faster and more efficient computation.
Various hardware architectures have been developed to support parallel processing, each with its strengths and limitations.

In shared memory architecture, multiple processors share a common memory space.
This allows for easy communication between processors as they can directly access shared data.
However, managing concurrent access to shared memory requires careful synchronization to avoid conflicts and ensure data consistency.
Example: Symmetric Multiprocessing (SMP) systems, where multiple processors share a common RAM.

In distributed memory architecture, each processor has its own local memory.
Processors communicate by exchanging messages, and data must be explicitly transferred between processors.
While this architecture can scale well, efficient communication becomes crucial for optimal performance.
Example: Cluster computing, where individual nodes have their own memory and communicate via a network.

SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instruction, Multiple Data) architectures define how instructions are executed across processors.
In SIMD, all processors execute the same instruction simultaneously on different data.
In MIMD, each processor can execute a different set of instructions independently.
Example: SIMD is used in graphics processing units (GPUs), where a single instruction is applied to multiple pixels simultaneously.
MIMD is exemplified by modern multi-core processors.

Parallelization is the technique of dividing a task into smaller, independent parts that can be executed simultaneously.

Task parallelism involves dividing a program into smaller tasks that can be executed concurrently.
Each task is independent, and their execution does not affect each other.
This approach is well-suited for applications with naturally occurring parallelism.
Example: Parallelizing a video encoding program by processing different frames simultaneously.

Data parallelism involves distributing the data across multiple processors, and each processor performs the same operation on its assigned portion of the data.
This approach is effective when the same operation needs to be applied to a large dataset.
Example: Parallelizing matrix multiplication by distributing rows or columns across different processors.

Amdahl's Law is like a speed limit for parallelization.
It says that even if you have many processors working together,
the overall speedup of a program is limited by the part that can't be split into parallel tasks.
Example: Imagine you have a task, and 90% of it can be done in parallel, but 10% must be done sequentially.
According to Amdahl's Law, even with infinite processors, the maximum speedup you can achieve is 10 times faster because that 10% sequential part sets the upper limit.
In Other words, Imagine you have a job to do, and you can divide it into parts.
Some parts can be done at the same time by different workers, but there's a part that one person has to do alone.
Amdahl's Law is like saying, no matter how many workers you have, the total speed-up of the job is limited by that part that only one person can do.

Gustafson's Law is like a positive twist on Amdahl's Law.
It says that as the size of the problem (or data) increases, the impact of the sequential part becomes less significant.
In simpler terms, when you have more work to do, parallelization can have a bigger effect on overall performance.
Example: Think of big data analytics. Even if there's a fixed part that must be done sequentially, as the dataset gets larger, parallelization becomes more effective.
So, with a bigger dataset, the overall speedup can be substantial.

Moore's Law is like a rule that says the number of tiny switches on a computer chip doubles every two years, making computers more powerful.
But, as we get closer to the physical limits of technology, it becomes harder to keep doubling the power.
Consider your smartphone. In the past, new models used to have significantly more powerful chips every couple of years.
But now, it's getting challenging to keep doubling the number of transistors on a chip because we're reaching the limits of how small we can make them.

Understanding hardware architectures and programming approaches is essential for effective parallel programming.