[Coursera] Heterogeneous Parallel Programming
Wen-mei W. Hwu (University of Illinois)



Support
Academic Torrents!

Disable your
ad-blocker!

coursera-heterogeneous-parallel-programming (87 files)
assignments/README.txt0.24kB
lectures/week1/Heterogeneous Parallel Programming 0.0 1.1 Course Overview.mp4127.30MB
lectures/week1/Heterogeneous Parallel Programming 0.1 1.2 Introduction to Heterogeneous Parallel Computing.mp477.82MB
lectures/week1/Heterogeneous Parallel Programming 0.2 1.3 Portability and Scalability in Heterogeneous Parallel Computing.mp434.90MB
lectures/week1/Heterogeneous Parallel Programming 0.3 1.4 Introduction to CUDA Data Parallelism and Threads.mp4128.64MB
lectures/week1/Heterogeneous Parallel Programming 0.4 1.5 Introduction to CUDA Memory Allocation and Data Movement API.mp4118.09MB
lectures/week1/Heterogeneous Parallel Programming 0.5 1.6 Introduction to CUDA Kernel-Based SPMD Parallel Programming.mp4111.87MB
lectures/week1/Heterogeneous Parallel Programming 0.6 1.7 Kernel-based Parallel Programming Multidimensional Kernel Configuration.mp494.58MB
lectures/week1/Heterogeneous Parallel Programming 0.7 1.8 Kernel-based Parallel Programming Basic Matrix-Matrix Multiplication.mp498.36MB
lectures/week2/Heterogeneous Parallel Programming 1.0 2.1 Kernel-based Parallel Programming - Thread Scheduling.mp4117.27MB
lectures/week2/Heterogeneous Parallel Programming 1.1 2.2 Control Divergence.mp486.63MB
lectures/week2/Heterogeneous Parallel Programming 1.2 2.3 Memory Model and Locality -- CUDA Memories.mp4129.66MB
lectures/week2/Heterogeneous Parallel Programming 1.3 2.4 Tiled Parallel Algorithms.mp4112.01MB
lectures/week2/Heterogeneous Parallel Programming 1.4 2.5 Tiled Matrix Multiplication.mp4124.82MB
lectures/week2/Heterogeneous Parallel Programming 1.5 2.6 Tiled Matrix Multiplication Kernel.mp4178.77MB
lectures/week2/Heterogeneous Parallel Programming 1.6 2.7 Handling Boundary Conditions in Tiling.mp482.63MB
lectures/week2/Heterogeneous Parallel Programming 1.7 2.8 A Tiled Kernel for Arbitrary Matrix Dimensions.mp499.34MB
lectures/week3/Heterogeneous Parallel Programming 2.0 3.1 Performance Considerations - DRAM Bandwidth.mp4126.75MB
lectures/week3/Heterogeneous Parallel Programming 2.1 3.2 Performance Considerations - Memory Coalescing in CUDA.mp488.12MB
lectures/week3/Heterogeneous Parallel Programming 2.2 3.3 Parallel Computation Patterns - Convolution.mp477.53MB
lectures/week3/Heterogeneous Parallel Programming 2.3 3.4 Parallel Computation Patterns - Tiled Convolution.mp495.96MB
lectures/week3/Heterogeneous Parallel Programming 2.4 3.5 Parallel Computation Patterns - 2D Tiled Convolution Kernel.mp495.46MB
lectures/week3/Heterogeneous Parallel Programming 2.5 3.6 Parallel Computation Patterns - Data Reuse in Tiled Convolution.mp4124.21MB
lectures/week4/Heterogeneous Parallel Programming 3.0 4.1 Parallel Computation Patterns - Reduction.mp4132.82MB
lectures/week4/Heterogeneous Parallel Programming 3.1 4.2 Parallel Computation Patterns - A Basic Reduction Kernel.mp4101.41MB
lectures/week4/Heterogeneous Parallel Programming 3.2 4.3 Parallel Computation Patterns - A Better Reduction Kernel.mp477.66MB
lectures/week4/Heterogeneous Parallel Programming 3.3 4.4 Parallel Computation Patterns - Scan (Prefix Sum).mp4121.50MB
lectures/week4/Heterogeneous Parallel Programming 3.4 4.5 Parallel Computation Patterns - A Work-Inefficient Scan Kernel.mp4127.69MB
lectures/week4/Heterogeneous Parallel Programming 3.5 4.6 Parallel Computation Patterns - A Work-Efficient Parallel Scan Kernel.mp4130.22MB
lectures/week4/Heterogeneous Parallel Programming 3.6 4.7 Parallel Computation Patterns - More on Parallel Scan.mp4133.34MB
lectures/week5/Heterogeneous Parallel Programming 4.0 5.1 Parallel Computation Patterns - Histogramming.mp461.59MB
lectures/week5/Heterogeneous Parallel Programming 4.1 5.2 Parallel Computation Patterns - Atomic Operations.mp461.04MB
lectures/week5/Heterogeneous Parallel Programming 4.2 5.3 Parallel Computation Patterns - Atomic Operations in CUDA.mp487.74MB
lectures/week5/Heterogeneous Parallel Programming 4.3 5.4 Parallel Computation Patters - Atomic Operations Performance.mp475.40MB
lectures/week5/Heterogeneous Parallel Programming 4.4 5.5 Parallel Computation Patterns - A Privatized Histogram Kernel.mp462.09MB
lectures/week6/Heterogeneous Parallel Programming 5.0 6.1 Efficient Host-Device Data Transfer - Pinned Host Memory.mp4123.32MB
lectures/week6/Heterogeneous Parallel Programming 5.1 6.2 Efficient Host-Device Data Transfer - Task Parallelism in CUDA.mp4118.72MB
lectures/week6/Heterogeneous Parallel Programming 5.2 6.3 Efficient Host-Device Data Transfer - Overlapping Data Transfer with Computation.mp4139.29MB
lectures/week7/Heterogeneous Parallel Programming 6.0 7.1 Related Programming Models - OpenCL Data Parallelism Model.mp488.21MB
lectures/week7/Heterogeneous Parallel Programming 6.1 7.2 Related Programming Models - OpenCL Device Architecture.mp460.51MB
lectures/week7/Heterogeneous Parallel Programming 6.2 7.3 Related Programming Models - OpenCL Host Code Part 1.mp4144.19MB
lectures/week7/Heterogeneous Parallel Programming 6.3 7.4 Related Programming Models - OpenCL Host Code (Cont.).mp482.65MB
lectures/week7/Heterogeneous Parallel Programming 6.4 7.5 Related Programming Models - OpenACC.mp4101.61MB
lectures/week7/Heterogeneous Parallel Programming 6.5 7.6 Related Programming Models - OpenACC Details.mp495.53MB
lectures/week8/Heterogeneous Parallel Programming 7.0 8.1 Related Parallel Models - C++ AMP.mp481.71MB
lectures/week8/Heterogeneous Parallel Programming 7.1 8.2 Related Parallel Models - C++ AMP Advance Concepts.mp4113.78MB
lectures/week8/Heterogeneous Parallel Programming 7.2 8.3 Related Parallel Models - Introduction to Heterogeneous Supercomputing and MPI.mp4131.64MB
lectures/week8/Heterogeneous Parallel Programming 7.3 8.4 Conclusions and Future Directions.mp4120.62MB
resources/Coursera_files/204.js6.33kB
resources/Coursera_files/400.js7.79kB
resources/Coursera_files/assessApi.js0.45kB
resources/Coursera_files/backbone.hascollections.js1.30kB
resources/Coursera_files/course.css0.17kB
resources/Coursera_files/flexjoinLastChanceModal.html.js3.42kB
resources/Coursera_files/ga.js43.08kB
resources/Coursera_files/header(1).js0.09kB
resources/Coursera_files/header.html.js28.87kB
resources/Coursera_files/header.js2.12kB
resources/Coursera_files/jquery.v1-7.js134.93kB
resources/Coursera_files/LearnerStoriesCollection.js0.52kB
resources/Coursera_files/LearnerStoryModel.js0.18kB
resources/Coursera_files/loadOrRefreshMathJax.js0.04kB
resources/Coursera_files/logo29.86kB
resources/Coursera_files/MathJax.js50.41kB
resources/Coursera_files/path.js0.23kB
resources/Coursera_files/QuestionCollection.js0.44kB
resources/Coursera_files/QuestionModel.js1.34kB
resources/Coursera_files/readme.js4.94kB
resources/Coursera_files/require.v2-1-1.js22.78kB
resources/Coursera_files/routes.js387.22kB
resources/Coursera_files/sessionModel(1).js0.49kB
resources/Coursera_files/sessionModel.js2.82kB
resources/Coursera_files/sidebar(1).js0.09kB
resources/Coursera_files/sidebar.html.js9.23kB
resources/Coursera_files/sidebar.js4.17kB
resources/Coursera_files/signature_track.js5.07kB
resources/Coursera_files/signatureTrackLastChanceModal.html.js4.41kB
resources/Coursera_files/spark.main.css283.29kB
resources/Coursera_files/sparkSurveyQuestionsSessionModel.js0.68kB
resources/Coursera_files/student-page(1).js0.01kB
resources/Coursera_files/student-page.html.js0.90kB
resources/Coursera_files/student-page.js2.26kB
resources/Coursera_files/textbook_wiki.js0.29kB
resources/Coursera_files/underscore.extend.js0.93kB
resources/Coursera_files/university_logo4.16kB
resources/Coursera_files/util.js24.49kB
resources/intro_to_C.html41.48kB
Type: Course
Tags:

Bibtex:
@article{,
title= {[Coursera] Heterogeneous Parallel Programming},
keywords= {},
journal= {},
author= {Wen-mei W. Hwu (University of Illinois)},
year= {2015},
url= {},
license= {},
abstract= {This course introduces concepts, languages, techniques, and patterns for programming heterogeneous, massively parallel processors. Its contents and structure have been significantly revised based on the experience gained from its initial offering in 2012. It covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, and parallel algorithm patterns.

All computing systems, from mobile to supercomputers, are becoming heterogeneous, massively parallel computers for higher power efficiency and computation throughput. While the computing community is racing to build tools and libraries to ease the use of these systems, effective and confident use of these systems will always require knowledge about low-level programming in these systems. This course is designed for students to learn the essence of low-level programming interfaces and how to use these interfaces to achieve application goals. CUDA C, with its good balance between user control and verboseness, will serve as the teaching vehicle for the first half of the course. Students will then extend their learning into closely related programming interfaces such as OpenCL, OpenACC, and C++AMP.

The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge for understanding. It covers the concept of data parallel execution models, memory models for managing locality, tiling techniques for reducing bandwidth consumption, parallel algorithm patterns, overlapping computation with communication, and a variety of heterogeneous parallel programming interfaces. The concepts learned in this course form a strong foundation for learning other types of parallel programming systems.

},
superseded= {},
terms= {}
}