[Coursera] Heterogeneous Parallel Programming
Wen-mei W. Hwu (University of Illinois)

folder coursera-heterogeneous-parallel-programming (87 files)
fileassignments/README.txt 0.24kB
filelectures/week1/Heterogeneous Parallel Programming 0.0 1.1 Course Overview.mp4 127.30MB
filelectures/week1/Heterogeneous Parallel Programming 0.1 1.2 Introduction to Heterogeneous Parallel Computing.mp4 77.82MB
filelectures/week1/Heterogeneous Parallel Programming 0.2 1.3 Portability and Scalability in Heterogeneous Parallel Computing.mp4 34.90MB
filelectures/week1/Heterogeneous Parallel Programming 0.3 1.4 Introduction to CUDA Data Parallelism and Threads.mp4 128.64MB
filelectures/week1/Heterogeneous Parallel Programming 0.4 1.5 Introduction to CUDA Memory Allocation and Data Movement API.mp4 118.09MB
filelectures/week1/Heterogeneous Parallel Programming 0.5 1.6 Introduction to CUDA Kernel-Based SPMD Parallel Programming.mp4 111.87MB
filelectures/week1/Heterogeneous Parallel Programming 0.6 1.7 Kernel-based Parallel Programming Multidimensional Kernel Configuration.mp4 94.58MB
filelectures/week1/Heterogeneous Parallel Programming 0.7 1.8 Kernel-based Parallel Programming Basic Matrix-Matrix Multiplication.mp4 98.36MB
filelectures/week2/Heterogeneous Parallel Programming 1.0 2.1 Kernel-based Parallel Programming - Thread Scheduling.mp4 117.27MB
filelectures/week2/Heterogeneous Parallel Programming 1.1 2.2 Control Divergence.mp4 86.63MB
filelectures/week2/Heterogeneous Parallel Programming 1.2 2.3 Memory Model and Locality -- CUDA Memories.mp4 129.66MB
filelectures/week2/Heterogeneous Parallel Programming 1.3 2.4 Tiled Parallel Algorithms.mp4 112.01MB
filelectures/week2/Heterogeneous Parallel Programming 1.4 2.5 Tiled Matrix Multiplication.mp4 124.82MB
filelectures/week2/Heterogeneous Parallel Programming 1.5 2.6 Tiled Matrix Multiplication Kernel.mp4 178.77MB
filelectures/week2/Heterogeneous Parallel Programming 1.6 2.7 Handling Boundary Conditions in Tiling.mp4 82.63MB
filelectures/week2/Heterogeneous Parallel Programming 1.7 2.8 A Tiled Kernel for Arbitrary Matrix Dimensions.mp4 99.34MB
filelectures/week3/Heterogeneous Parallel Programming 2.0 3.1 Performance Considerations - DRAM Bandwidth.mp4 126.75MB
filelectures/week3/Heterogeneous Parallel Programming 2.1 3.2 Performance Considerations - Memory Coalescing in CUDA.mp4 88.12MB
filelectures/week3/Heterogeneous Parallel Programming 2.2 3.3 Parallel Computation Patterns - Convolution.mp4 77.53MB
filelectures/week3/Heterogeneous Parallel Programming 2.3 3.4 Parallel Computation Patterns - Tiled Convolution.mp4 95.96MB
filelectures/week3/Heterogeneous Parallel Programming 2.4 3.5 Parallel Computation Patterns - 2D Tiled Convolution Kernel.mp4 95.46MB
filelectures/week3/Heterogeneous Parallel Programming 2.5 3.6 Parallel Computation Patterns - Data Reuse in Tiled Convolution.mp4 124.21MB
filelectures/week4/Heterogeneous Parallel Programming 3.0 4.1 Parallel Computation Patterns - Reduction.mp4 132.82MB
filelectures/week4/Heterogeneous Parallel Programming 3.1 4.2 Parallel Computation Patterns - A Basic Reduction Kernel.mp4 101.41MB
filelectures/week4/Heterogeneous Parallel Programming 3.2 4.3 Parallel Computation Patterns - A Better Reduction Kernel.mp4 77.66MB
filelectures/week4/Heterogeneous Parallel Programming 3.3 4.4 Parallel Computation Patterns - Scan (Prefix Sum).mp4 121.50MB
filelectures/week4/Heterogeneous Parallel Programming 3.4 4.5 Parallel Computation Patterns - A Work-Inefficient Scan Kernel.mp4 127.69MB
filelectures/week4/Heterogeneous Parallel Programming 3.5 4.6 Parallel Computation Patterns - A Work-Efficient Parallel Scan Kernel.mp4 130.22MB
filelectures/week4/Heterogeneous Parallel Programming 3.6 4.7 Parallel Computation Patterns - More on Parallel Scan.mp4 133.34MB
filelectures/week5/Heterogeneous Parallel Programming 4.0 5.1 Parallel Computation Patterns - Histogramming.mp4 61.59MB
filelectures/week5/Heterogeneous Parallel Programming 4.1 5.2 Parallel Computation Patterns - Atomic Operations.mp4 61.04MB
filelectures/week5/Heterogeneous Parallel Programming 4.2 5.3 Parallel Computation Patterns - Atomic Operations in CUDA.mp4 87.74MB
filelectures/week5/Heterogeneous Parallel Programming 4.3 5.4 Parallel Computation Patters - Atomic Operations Performance.mp4 75.40MB
filelectures/week5/Heterogeneous Parallel Programming 4.4 5.5 Parallel Computation Patterns - A Privatized Histogram Kernel.mp4 62.09MB
filelectures/week6/Heterogeneous Parallel Programming 5.0 6.1 Efficient Host-Device Data Transfer - Pinned Host Memory.mp4 123.32MB
filelectures/week6/Heterogeneous Parallel Programming 5.1 6.2 Efficient Host-Device Data Transfer - Task Parallelism in CUDA.mp4 118.72MB
filelectures/week6/Heterogeneous Parallel Programming 5.2 6.3 Efficient Host-Device Data Transfer - Overlapping Data Transfer with Computation.mp4 139.29MB
filelectures/week7/Heterogeneous Parallel Programming 6.0 7.1 Related Programming Models - OpenCL Data Parallelism Model.mp4 88.21MB
filelectures/week7/Heterogeneous Parallel Programming 6.1 7.2 Related Programming Models - OpenCL Device Architecture.mp4 60.51MB
filelectures/week7/Heterogeneous Parallel Programming 6.2 7.3 Related Programming Models - OpenCL Host Code Part 1.mp4 144.19MB
filelectures/week7/Heterogeneous Parallel Programming 6.3 7.4 Related Programming Models - OpenCL Host Code (Cont.).mp4 82.65MB
filelectures/week7/Heterogeneous Parallel Programming 6.4 7.5 Related Programming Models - OpenACC.mp4 101.61MB
filelectures/week7/Heterogeneous Parallel Programming 6.5 7.6 Related Programming Models - OpenACC Details.mp4 95.53MB
filelectures/week8/Heterogeneous Parallel Programming 7.0 8.1 Related Parallel Models - C++ AMP.mp4 81.71MB
filelectures/week8/Heterogeneous Parallel Programming 7.1 8.2 Related Parallel Models - C++ AMP Advance Concepts.mp4 113.78MB
filelectures/week8/Heterogeneous Parallel Programming 7.2 8.3 Related Parallel Models - Introduction to Heterogeneous Supercomputing and MPI.mp4 131.64MB
filelectures/week8/Heterogeneous Parallel Programming 7.3 8.4 Conclusions and Future Directions.mp4 120.62MB
fileresources/Coursera_files/204.js 6.33kB
Too many files! Click here to view them all.
Type: Course

title= {[Coursera] Heterogeneous Parallel Programming},
keywords= {},
journal= {},
author= {Wen-mei W. Hwu (University of Illinois)},
year= {2015},
url= {},
license= {},
abstract= {This course introduces concepts, languages, techniques, and patterns for programming heterogeneous, massively parallel processors. Its contents and structure have been significantly revised based on the experience gained from its initial offering in 2012. It covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, and parallel algorithm patterns.

All computing systems, from mobile to supercomputers, are becoming heterogeneous, massively parallel computers for higher power efficiency and computation throughput. While the computing community is racing to build tools and libraries to ease the use of these systems, effective and confident use of these systems will always require knowledge about low-level programming in these systems. This course is designed for students to learn the essence of low-level programming interfaces and how to use these interfaces to achieve application goals. CUDA C, with its good balance between user control and verboseness, will serve as the teaching vehicle for the first half of the course. Students will then extend their learning into closely related programming interfaces such as OpenCL, OpenACC, and C++AMP.

The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge for understanding. It covers the concept of data parallel execution models, memory models for managing locality, tiling techniques for reducing bandwidth consumption, parallel algorithm patterns, overlapping computation with communication, and a variety of heterogeneous parallel programming interfaces. The concepts learned in this course form a strong foundation for learning other types of parallel programming systems.

superseded= {},
terms= {}

Send Feedback