[Coursera] Heterogeneous Parallel Programming
Wen-mei W. Hwu (University of Illinois)

coursera-heterogeneous-parallel-programming (87 files)
assignments/README.txt 0.24kB
lectures/week1/Heterogeneous Parallel Programming 0.0 1.1 Course Overview.mp4 127.30MB
lectures/week1/Heterogeneous Parallel Programming 0.1 1.2 Introduction to Heterogeneous Parallel Computing.mp4 77.82MB
lectures/week1/Heterogeneous Parallel Programming 0.2 1.3 Portability and Scalability in Heterogeneous Parallel Computing.mp4 34.90MB
lectures/week1/Heterogeneous Parallel Programming 0.3 1.4 Introduction to CUDA Data Parallelism and Threads.mp4 128.64MB
lectures/week1/Heterogeneous Parallel Programming 0.4 1.5 Introduction to CUDA Memory Allocation and Data Movement API.mp4 118.09MB
lectures/week1/Heterogeneous Parallel Programming 0.5 1.6 Introduction to CUDA Kernel-Based SPMD Parallel Programming.mp4 111.87MB
lectures/week1/Heterogeneous Parallel Programming 0.6 1.7 Kernel-based Parallel Programming Multidimensional Kernel Configuration.mp4 94.58MB
lectures/week1/Heterogeneous Parallel Programming 0.7 1.8 Kernel-based Parallel Programming Basic Matrix-Matrix Multiplication.mp4 98.36MB
lectures/week2/Heterogeneous Parallel Programming 1.0 2.1 Kernel-based Parallel Programming - Thread Scheduling.mp4 117.27MB
lectures/week2/Heterogeneous Parallel Programming 1.1 2.2 Control Divergence.mp4 86.63MB
lectures/week2/Heterogeneous Parallel Programming 1.2 2.3 Memory Model and Locality -- CUDA Memories.mp4 129.66MB
lectures/week2/Heterogeneous Parallel Programming 1.3 2.4 Tiled Parallel Algorithms.mp4 112.01MB
lectures/week2/Heterogeneous Parallel Programming 1.4 2.5 Tiled Matrix Multiplication.mp4 124.82MB
lectures/week2/Heterogeneous Parallel Programming 1.5 2.6 Tiled Matrix Multiplication Kernel.mp4 178.77MB
lectures/week2/Heterogeneous Parallel Programming 1.6 2.7 Handling Boundary Conditions in Tiling.mp4 82.63MB
lectures/week2/Heterogeneous Parallel Programming 1.7 2.8 A Tiled Kernel for Arbitrary Matrix Dimensions.mp4 99.34MB
lectures/week3/Heterogeneous Parallel Programming 2.0 3.1 Performance Considerations - DRAM Bandwidth.mp4 126.75MB
lectures/week3/Heterogeneous Parallel Programming 2.1 3.2 Performance Considerations - Memory Coalescing in CUDA.mp4 88.12MB
lectures/week3/Heterogeneous Parallel Programming 2.2 3.3 Parallel Computation Patterns - Convolution.mp4 77.53MB
lectures/week3/Heterogeneous Parallel Programming 2.3 3.4 Parallel Computation Patterns - Tiled Convolution.mp4 95.96MB
lectures/week3/Heterogeneous Parallel Programming 2.4 3.5 Parallel Computation Patterns - 2D Tiled Convolution Kernel.mp4 95.46MB
lectures/week3/Heterogeneous Parallel Programming 2.5 3.6 Parallel Computation Patterns - Data Reuse in Tiled Convolution.mp4 124.21MB
lectures/week4/Heterogeneous Parallel Programming 3.0 4.1 Parallel Computation Patterns - Reduction.mp4 132.82MB
lectures/week4/Heterogeneous Parallel Programming 3.1 4.2 Parallel Computation Patterns - A Basic Reduction Kernel.mp4 101.41MB
lectures/week4/Heterogeneous Parallel Programming 3.2 4.3 Parallel Computation Patterns - A Better Reduction Kernel.mp4 77.66MB
lectures/week4/Heterogeneous Parallel Programming 3.3 4.4 Parallel Computation Patterns - Scan (Prefix Sum).mp4 121.50MB
lectures/week4/Heterogeneous Parallel Programming 3.4 4.5 Parallel Computation Patterns - A Work-Inefficient Scan Kernel.mp4 127.69MB
lectures/week4/Heterogeneous Parallel Programming 3.5 4.6 Parallel Computation Patterns - A Work-Efficient Parallel Scan Kernel.mp4 130.22MB
lectures/week4/Heterogeneous Parallel Programming 3.6 4.7 Parallel Computation Patterns - More on Parallel Scan.mp4 133.34MB
lectures/week5/Heterogeneous Parallel Programming 4.0 5.1 Parallel Computation Patterns - Histogramming.mp4 61.59MB
lectures/week5/Heterogeneous Parallel Programming 4.1 5.2 Parallel Computation Patterns - Atomic Operations.mp4 61.04MB
lectures/week5/Heterogeneous Parallel Programming 4.2 5.3 Parallel Computation Patterns - Atomic Operations in CUDA.mp4 87.74MB
lectures/week5/Heterogeneous Parallel Programming 4.3 5.4 Parallel Computation Patters - Atomic Operations Performance.mp4 75.40MB
lectures/week5/Heterogeneous Parallel Programming 4.4 5.5 Parallel Computation Patterns - A Privatized Histogram Kernel.mp4 62.09MB
lectures/week6/Heterogeneous Parallel Programming 5.0 6.1 Efficient Host-Device Data Transfer - Pinned Host Memory.mp4 123.32MB
lectures/week6/Heterogeneous Parallel Programming 5.1 6.2 Efficient Host-Device Data Transfer - Task Parallelism in CUDA.mp4 118.72MB
lectures/week6/Heterogeneous Parallel Programming 5.2 6.3 Efficient Host-Device Data Transfer - Overlapping Data Transfer with Computation.mp4 139.29MB
lectures/week7/Heterogeneous Parallel Programming 6.0 7.1 Related Programming Models - OpenCL Data Parallelism Model.mp4 88.21MB
lectures/week7/Heterogeneous Parallel Programming 6.1 7.2 Related Programming Models - OpenCL Device Architecture.mp4 60.51MB
lectures/week7/Heterogeneous Parallel Programming 6.2 7.3 Related Programming Models - OpenCL Host Code Part 1.mp4 144.19MB
lectures/week7/Heterogeneous Parallel Programming 6.3 7.4 Related Programming Models - OpenCL Host Code (Cont.).mp4 82.65MB
lectures/week7/Heterogeneous Parallel Programming 6.4 7.5 Related Programming Models - OpenACC.mp4 101.61MB
lectures/week7/Heterogeneous Parallel Programming 6.5 7.6 Related Programming Models - OpenACC Details.mp4 95.53MB
lectures/week8/Heterogeneous Parallel Programming 7.0 8.1 Related Parallel Models - C++ AMP.mp4 81.71MB
lectures/week8/Heterogeneous Parallel Programming 7.1 8.2 Related Parallel Models - C++ AMP Advance Concepts.mp4 113.78MB
lectures/week8/Heterogeneous Parallel Programming 7.2 8.3 Related Parallel Models - Introduction to Heterogeneous Supercomputing and MPI.mp4 131.64MB
lectures/week8/Heterogeneous Parallel Programming 7.3 8.4 Conclusions and Future Directions.mp4 120.62MB
resources/Coursera_files/204.js 6.33kB
resources/Coursera_files/400.js 7.79kB
resources/Coursera_files/assessApi.js 0.45kB
resources/Coursera_files/backbone.hascollections.js 1.30kB
resources/Coursera_files/course.css 0.17kB
resources/Coursera_files/flexjoinLastChanceModal.html.js 3.42kB
resources/Coursera_files/ga.js 43.08kB
resources/Coursera_files/header(1).js 0.09kB
resources/Coursera_files/header.html.js 28.87kB
resources/Coursera_files/header.js 2.12kB
resources/Coursera_files/jquery.v1-7.js 134.93kB
resources/Coursera_files/LearnerStoriesCollection.js 0.52kB
resources/Coursera_files/LearnerStoryModel.js 0.18kB
resources/Coursera_files/loadOrRefreshMathJax.js 0.04kB
resources/Coursera_files/logo 29.86kB
resources/Coursera_files/MathJax.js 50.41kB
resources/Coursera_files/path.js 0.23kB
resources/Coursera_files/QuestionCollection.js 0.44kB
resources/Coursera_files/QuestionModel.js 1.34kB
resources/Coursera_files/readme.js 4.94kB
resources/Coursera_files/require.v2-1-1.js 22.78kB
resources/Coursera_files/routes.js 387.22kB
resources/Coursera_files/sessionModel(1).js 0.49kB
resources/Coursera_files/sessionModel.js 2.82kB
resources/Coursera_files/sidebar(1).js 0.09kB
resources/Coursera_files/sidebar.html.js 9.23kB
resources/Coursera_files/sidebar.js 4.17kB
resources/Coursera_files/signature_track.js 5.07kB
resources/Coursera_files/signatureTrackLastChanceModal.html.js 4.41kB
resources/Coursera_files/spark.main.css 283.29kB
resources/Coursera_files/sparkSurveyQuestionsSessionModel.js 0.68kB
resources/Coursera_files/student-page(1).js 0.01kB
resources/Coursera_files/student-page.html.js 0.90kB
resources/Coursera_files/student-page.js 2.26kB
resources/Coursera_files/textbook_wiki.js 0.29kB
resources/Coursera_files/underscore.extend.js 0.93kB
resources/Coursera_files/university_logo 4.16kB
resources/Coursera_files/util.js 24.49kB
resources/intro_to_C.html 41.48kB
Type: Course

title= {[Coursera] Heterogeneous Parallel Programming},
keywords= {},
journal= {},
author= {Wen-mei W. Hwu (University of Illinois)},
year= {2015},
url= {},
license= {},
abstract= {This course introduces concepts, languages, techniques, and patterns for programming heterogeneous, massively parallel processors. Its contents and structure have been significantly revised based on the experience gained from its initial offering in 2012. It covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, and parallel algorithm patterns.

All computing systems, from mobile to supercomputers, are becoming heterogeneous, massively parallel computers for higher power efficiency and computation throughput. While the computing community is racing to build tools and libraries to ease the use of these systems, effective and confident use of these systems will always require knowledge about low-level programming in these systems. This course is designed for students to learn the essence of low-level programming interfaces and how to use these interfaces to achieve application goals. CUDA C, with its good balance between user control and verboseness, will serve as the teaching vehicle for the first half of the course. Students will then extend their learning into closely related programming interfaces such as OpenCL, OpenACC, and C++AMP.

The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge for understanding. It covers the concept of data parallel execution models, memory models for managing locality, tiling techniques for reducing bandwidth consumption, parallel algorithm patterns, overlapping computation with communication, and a variety of heterogeneous parallel programming interfaces. The concepts learned in this course form a strong foundation for learning other types of parallel programming systems.

superseded= {},
terms= {}

10 day statistics (28 downloads)

Average Time 14 mins, 46 secs
Average Speed 5.53MB/s
Best Time 3 mins, 49 secs
Best Speed 21.42MB/s
Worst Time 59 mins, 08 secs
Worst Speed 1.38MB/s