Name | DL | Torrents | Total Size | Video Lectures [edit] | 155 | 727.63GB | 3032 | 0 |
coursera-heterogeneous-parallel-programming (87 files)
assignments/README.txt | 0.24kB |
lectures/week1/Heterogeneous Parallel Programming 0.0 1.1 Course Overview.mp4 | 127.30MB |
lectures/week1/Heterogeneous Parallel Programming 0.1 1.2 Introduction to Heterogeneous Parallel Computing.mp4 | 77.82MB |
lectures/week1/Heterogeneous Parallel Programming 0.2 1.3 Portability and Scalability in Heterogeneous Parallel Computing.mp4 | 34.90MB |
lectures/week1/Heterogeneous Parallel Programming 0.3 1.4 Introduction to CUDA Data Parallelism and Threads.mp4 | 128.64MB |
lectures/week1/Heterogeneous Parallel Programming 0.4 1.5 Introduction to CUDA Memory Allocation and Data Movement API.mp4 | 118.09MB |
lectures/week1/Heterogeneous Parallel Programming 0.5 1.6 Introduction to CUDA Kernel-Based SPMD Parallel Programming.mp4 | 111.87MB |
lectures/week1/Heterogeneous Parallel Programming 0.6 1.7 Kernel-based Parallel Programming Multidimensional Kernel Configuration.mp4 | 94.58MB |
lectures/week1/Heterogeneous Parallel Programming 0.7 1.8 Kernel-based Parallel Programming Basic Matrix-Matrix Multiplication.mp4 | 98.36MB |
lectures/week2/Heterogeneous Parallel Programming 1.0 2.1 Kernel-based Parallel Programming - Thread Scheduling.mp4 | 117.27MB |
lectures/week2/Heterogeneous Parallel Programming 1.1 2.2 Control Divergence.mp4 | 86.63MB |
lectures/week2/Heterogeneous Parallel Programming 1.2 2.3 Memory Model and Locality -- CUDA Memories.mp4 | 129.66MB |
lectures/week2/Heterogeneous Parallel Programming 1.3 2.4 Tiled Parallel Algorithms.mp4 | 112.01MB |
lectures/week2/Heterogeneous Parallel Programming 1.4 2.5 Tiled Matrix Multiplication.mp4 | 124.82MB |
lectures/week2/Heterogeneous Parallel Programming 1.5 2.6 Tiled Matrix Multiplication Kernel.mp4 | 178.77MB |
lectures/week2/Heterogeneous Parallel Programming 1.6 2.7 Handling Boundary Conditions in Tiling.mp4 | 82.63MB |
lectures/week2/Heterogeneous Parallel Programming 1.7 2.8 A Tiled Kernel for Arbitrary Matrix Dimensions.mp4 | 99.34MB |
lectures/week3/Heterogeneous Parallel Programming 2.0 3.1 Performance Considerations - DRAM Bandwidth.mp4 | 126.75MB |
lectures/week3/Heterogeneous Parallel Programming 2.1 3.2 Performance Considerations - Memory Coalescing in CUDA.mp4 | 88.12MB |
lectures/week3/Heterogeneous Parallel Programming 2.2 3.3 Parallel Computation Patterns - Convolution.mp4 | 77.53MB |
lectures/week3/Heterogeneous Parallel Programming 2.3 3.4 Parallel Computation Patterns - Tiled Convolution.mp4 | 95.96MB |
lectures/week3/Heterogeneous Parallel Programming 2.4 3.5 Parallel Computation Patterns - 2D Tiled Convolution Kernel.mp4 | 95.46MB |
lectures/week3/Heterogeneous Parallel Programming 2.5 3.6 Parallel Computation Patterns - Data Reuse in Tiled Convolution.mp4 | 124.21MB |
lectures/week4/Heterogeneous Parallel Programming 3.0 4.1 Parallel Computation Patterns - Reduction.mp4 | 132.82MB |
lectures/week4/Heterogeneous Parallel Programming 3.1 4.2 Parallel Computation Patterns - A Basic Reduction Kernel.mp4 | 101.41MB |
lectures/week4/Heterogeneous Parallel Programming 3.2 4.3 Parallel Computation Patterns - A Better Reduction Kernel.mp4 | 77.66MB |
lectures/week4/Heterogeneous Parallel Programming 3.3 4.4 Parallel Computation Patterns - Scan (Prefix Sum).mp4 | 121.50MB |
lectures/week4/Heterogeneous Parallel Programming 3.4 4.5 Parallel Computation Patterns - A Work-Inefficient Scan Kernel.mp4 | 127.69MB |
lectures/week4/Heterogeneous Parallel Programming 3.5 4.6 Parallel Computation Patterns - A Work-Efficient Parallel Scan Kernel.mp4 | 130.22MB |
lectures/week4/Heterogeneous Parallel Programming 3.6 4.7 Parallel Computation Patterns - More on Parallel Scan.mp4 | 133.34MB |
lectures/week5/Heterogeneous Parallel Programming 4.0 5.1 Parallel Computation Patterns - Histogramming.mp4 | 61.59MB |
lectures/week5/Heterogeneous Parallel Programming 4.1 5.2 Parallel Computation Patterns - Atomic Operations.mp4 | 61.04MB |
lectures/week5/Heterogeneous Parallel Programming 4.2 5.3 Parallel Computation Patterns - Atomic Operations in CUDA.mp4 | 87.74MB |
lectures/week5/Heterogeneous Parallel Programming 4.3 5.4 Parallel Computation Patters - Atomic Operations Performance.mp4 | 75.40MB |
lectures/week5/Heterogeneous Parallel Programming 4.4 5.5 Parallel Computation Patterns - A Privatized Histogram Kernel.mp4 | 62.09MB |
lectures/week6/Heterogeneous Parallel Programming 5.0 6.1 Efficient Host-Device Data Transfer - Pinned Host Memory.mp4 | 123.32MB |
lectures/week6/Heterogeneous Parallel Programming 5.1 6.2 Efficient Host-Device Data Transfer - Task Parallelism in CUDA.mp4 | 118.72MB |
lectures/week6/Heterogeneous Parallel Programming 5.2 6.3 Efficient Host-Device Data Transfer - Overlapping Data Transfer with Computation.mp4 | 139.29MB |
lectures/week7/Heterogeneous Parallel Programming 6.0 7.1 Related Programming Models - OpenCL Data Parallelism Model.mp4 | 88.21MB |
lectures/week7/Heterogeneous Parallel Programming 6.1 7.2 Related Programming Models - OpenCL Device Architecture.mp4 | 60.51MB |
lectures/week7/Heterogeneous Parallel Programming 6.2 7.3 Related Programming Models - OpenCL Host Code Part 1.mp4 | 144.19MB |
lectures/week7/Heterogeneous Parallel Programming 6.3 7.4 Related Programming Models - OpenCL Host Code (Cont.).mp4 | 82.65MB |
lectures/week7/Heterogeneous Parallel Programming 6.4 7.5 Related Programming Models - OpenACC.mp4 | 101.61MB |
lectures/week7/Heterogeneous Parallel Programming 6.5 7.6 Related Programming Models - OpenACC Details.mp4 | 95.53MB |
lectures/week8/Heterogeneous Parallel Programming 7.0 8.1 Related Parallel Models - C++ AMP.mp4 | 81.71MB |
lectures/week8/Heterogeneous Parallel Programming 7.1 8.2 Related Parallel Models - C++ AMP Advance Concepts.mp4 | 113.78MB |
lectures/week8/Heterogeneous Parallel Programming 7.2 8.3 Related Parallel Models - Introduction to Heterogeneous Supercomputing and MPI.mp4 | 131.64MB |
lectures/week8/Heterogeneous Parallel Programming 7.3 8.4 Conclusions and Future Directions.mp4 | 120.62MB |
resources/Coursera_files/204.js | 6.33kB |
Type: Course
Tags:
Bibtex:
Tags:
Bibtex:
@article{, title= {[Coursera] Heterogeneous Parallel Programming}, keywords= {}, journal= {}, author= {Wen-mei W. Hwu (University of Illinois)}, year= {2015}, url= {}, license= {}, abstract= {This course introduces concepts, languages, techniques, and patterns for programming heterogeneous, massively parallel processors. Its contents and structure have been significantly revised based on the experience gained from its initial offering in 2012. It covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, and parallel algorithm patterns. All computing systems, from mobile to supercomputers, are becoming heterogeneous, massively parallel computers for higher power efficiency and computation throughput. While the computing community is racing to build tools and libraries to ease the use of these systems, effective and confident use of these systems will always require knowledge about low-level programming in these systems. This course is designed for students to learn the essence of low-level programming interfaces and how to use these interfaces to achieve application goals. CUDA C, with its good balance between user control and verboseness, will serve as the teaching vehicle for the first half of the course. Students will then extend their learning into closely related programming interfaces such as OpenCL, OpenACC, and C++AMP. The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge for understanding. It covers the concept of data parallel execution models, memory models for managing locality, tiling techniques for reducing bandwidth consumption, parallel algorithm patterns, overlapping computation with communication, and a variety of heterogeneous parallel programming interfaces. The concepts learned in this course form a strong foundation for learning other types of parallel programming systems. }, superseded= {}, terms= {} }