For the past fifty years or so, computer programmers everywhere could rely on Moore's law to provide them with ever faster computers every year. They, therefore, had little incentive to write their programs better to achieve better performances.

Unfortunately, that trend could not last forever, as noted by Herb Sutter in his famous paper ``The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software'', and computer makers are slowly reaching the impassable wall of the limits of Physics in terms of die size and power dissipation of the chips.

The new direction of chip designers to continue offering faster computers goes towards devising computers with slowers chips, but with more of them. Hence, the dual-core, quad-core, and other multicore chips that equip most of the new computers. The trend is stretched even farther for the development of Graphics Processing Units with many cores to achieve massive parallelism.

For the programmer eager to take advantage of the full potential of current computers, this new design spells a complete shift in the way he develops programs. Writing parallel programs is now the key. In this course, we will review the fundamentals of parallel programming; we will then consider several frameworks to implement parallel programs: MPI for distributed memory programming, OpenMP, C++11 Standard Thread Library and Intel TBB for shared memory programming, OpenCL and CUDA for GPGPU programming.