Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

GPU Programming in MATLAB
GPU Programming in MATLAB
GPU Programming in MATLAB
Ebook524 pages4 hours

GPU Programming in MATLAB

Rating: 1 out of 5 stars

1/5

()

Read preview

About this ebook

GPU programming in MATLAB is intended for scientists, engineers, or students who develop or maintain applications in MATLAB and would like to accelerate their codes using GPU programming without losing the many benefits of MATLAB. The book starts with coverage of the Parallel Computing Toolbox and other MATLAB toolboxes for GPU computing, which allow applications to be ported straightforwardly onto GPUs without extensive knowledge of GPU programming. The next part covers built-in, GPU-enabled features of MATLAB, including options to leverage GPUs across multicore or different computer systems. Finally, advanced material includes CUDA code in MATLAB and optimizing existing GPU applications. Throughout the book, examples and source codes illustrate every concept so that readers can immediately apply them to their own development.

  • Provides in-depth, comprehensive coverage of GPUs with MATLAB, including the parallel computing toolbox and built-in features for other MATLAB toolboxes
  • Explains how to accelerate computationally heavy applications in MATLAB without the need to re-write them in another language
  • Presents case studies illustrating key concepts across multiple fields
  • Includes source code, sample datasets, and lecture slides
LanguageEnglish
Release dateAug 25, 2016
ISBN9780128051337
GPU Programming in MATLAB
Author

Nikolaos Ploskas

Nikolaos Ploskas is a Postdoctoral Researcher at the Department of Chemical Engineering, Carnegie Mellon University, USA. He received his Bachelor of Science degree, Master’s degree, and Ph.D. in Computer Systems from the Department of Applied Informatics of the University of Macedonia, Greece. His primary research interests are in: Operations research, Mathematical programming, Linear programming, Parallel programming, GPU programming, Decision support systems. Dr. Ploskas has participated in several international and national research projects. He is the author or co-author of writings in more than 40 publications, including high-impact journals and book chapters, and conference publications. He has also served as a reviewer for many scientific journals. He received an honorary award from HELORS (Hellenic Operations Research Society) for the best doctoral dissertation in operations research (2014).

Related to GPU Programming in MATLAB

Related ebooks

Systems Architecture For You

View More

Related articles

Reviews for GPU Programming in MATLAB

Rating: 1 out of 5 stars
1/5

1 rating1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 1 out of 5 stars
    1/5
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaaaaaaa sssss aaaa a

Book preview

GPU Programming in MATLAB - Nikolaos Ploskas

http://UndocumentedMatlab.com

Preface

Nikolaos Ploskas; Nikolaos Samaras

MATLAB is a high-level language for technical computing. It is widely used as a rapid prototyping tool in many scientific areas. Many researchers and companies use MATLAB to solve computationally intensive problems and run their codes faster. MATLAB provides the Parallel Computing Toolbox that allows users to solve their computationally intensive problems using multicore processors, computer clusters, and GPUs.

With the advances made in hardware, GPUs have gained a lot of popularity in the past decade and have been widely applied to computationally intensive applications. There are currently two major models for programming on GPUs: CUDA and OpenCL. CUDA is more mature and stable. In order to access the CUDA architecture, a programmer can write codes in C/C++ using CUDA C or Fortran using the PGI’s CUDA Fortran, among others.

This book, however, takes another approach. This book is intended for students, scientists, and engineers who develop or maintain applications in MATLAB and would like to accelerate their codes using GPU programming without losing the many benefits that MATLAB offers. The readers of this book likely have some or a lot of experience with MATLAB coding, but they are not familiar with parallel architectures.

The main aim of this book is to help readers implement their MATLAB applications on GPUs in order to take advantage of their hardware and accelerate their codes. This book includes examples for every concept that is introduced in order to help its readers apply the knowledge to their applications. We preferred to follow a tutorial rather than a case study approach when writing this book because MATLAB’s users have different backgrounds. Hence, the examples presented in this book aim to focus the interest of the readers on the techniques used to implement an application on a GPU and not on a specific application domain. The examples provided are common problems in many scientific areas such as image processing, signal processing, optimization, communications systems, statistics, etc.

MATLAB’s documentation for GPU computing is very helpful, but the information is not available in one location and important implementation issues on GPU programming are not discussed thoroughly. Various functions and toolboxes have been created since MATLAB introduced GPU support in 2010, so information is scattered. The aim of this book is to fill this gap. In addition, we provide many real-world examples in various scientific areas in order to demonstrate MATLAB’s GPU capabilities. Readers with some experience of CUDA C/C++ programming will also be able to obtain more advanced knowledge by utilizing CUDA C/C++ code in MATLAB or by profiling and optimizing their GPU applications.

The main emphasis of this book is addressed on two fronts:

• The features that MATLAB inherently provides for GPU programming. This part is divided into three parts:

1. GPU-enabled MATLAB built-in functions that require the existence of the Parallel Computing Toolbox.

2. Element-wise operations for GPUs that do not require the existence of the Parallel Computing Toolbox.

3. GPU-enabled MATLAB functions found in several toolboxes other than Parallel Computing Toolbox, including Communications System Toolbox, Image Processing Toolbox, Neural Network Toolbox, Phased Array System Toolbox, Signal Processing Toolbox, and Statistics and Machine Learning Toolbox.

• Linking MATLAB with CUDA C/C++ codes either when MATLAB cannot execute an existing piece of code on GPUs or when the user wants to use highly optimized CUDA-accelerated libraries.

The main target groups of this book are:

• Undergraduate and postgraduate students who take a course on GPU programming and want to use MATLAB to exploit the parallelism in their applications.

• Scientists who develop or maintain applications in MATLAB and would like to accelerate their codes using GPUs without losing the many benefits that MATLAB offers.

• Engineers who want to accelerate their computationally intensive applications in MATLAB without the need to rewrite them in another language, such as CUDA C/C++ or CUDA Fortran.

We are thankful to MathWorks for providing us an academic license for MATLAB through their MathWorks Book Program. We also thank NVIDIA for hardware donations in the context of the NVIDIA Academic Partnership program. Special thanks to Ioannis Athanasiadis for providing ideas and implementing examples for GPU-enabled functions of MATLAB toolboxes. Finally, we thank our families for their love and support over many years.

Chapter 1

Introduction

Abstract

This chapter introduces some key features of parallel programming and GPU programming on CUDA-capable GPUs. Furthermore, some real-world examples that can be accelerated through GPUs are presented. After reading this chapter, you should be able to:

• understand the key concepts of parallel programming.

• understand the key concepts of GPU programming.

• describe the architecture of a CUDA-capable GPU.

• list those applications where parallel and GPU programming can be used.

understand the key concepts of parallel programming.

understand the key concepts of GPU programming.

describe the architecture of a CUDA-capable GPU.

list those applications where parallel and GPU programming can be used.

Keywords

Parallel programming; GPU programming; CUDA; Grid; Block; Thread

Chapter Objectives

This Chapter introduces some key features of parallel programming and GPU programming on CUDA-capable GPUs. Furthermore, some real-world examples that can be accelerated through GPUs are presented. After reading this Chapter, you should be able to:

• understand the key concepts of parallel programming.

• understand the key concepts of GPU programming.

• describe the architecture of a CUDA-capable GPU.

• list those applications where parallel and GPU programming can be used.

1.1 Parallel Programming

1.1.1 Introduction to Parallel Computing

A software that has been written for serial computation is executed on a single processor. Hence, the problem is broken into a discrete series of instructions, and those instructions are executed sequentially one after another (Fig. 1.1).

Fig. 1.1 Serial computing overview.

For example, consider that we want to calculate the dot product of two column vectors:

The following serial program in MATLAB will calculate the dot product (inner product) of these vectors:

The instructions will be executed sequentially on a single processor (Fig. 1.2).

Fig. 1.2 Serial computing example.

On the other hand, a software that has been written for parallel computation is executed on multiple compute resources. Hence, the problem is broken into discrete parts that can be solved concurrently, and parts are further broken into a series of instructions. The instructions from each part are executed simultaneously on different compute resources. Of course, the instructions of a single part are executed sequentially one after another on a specific compute resource (Fig. 1.3).

Fig. 1.3 Parallel computing overview.

Considering that we want to calculate the dot product over a large number of vectors, we can calculate the dot product of two vectors on each compute resource (Fig. 1.4).

Fig. 1.4 Parallel computing example.

The compute resources can be typically either cores (on a single computer with one processor) or processors (either on a single computer or on different computers connected by a network).

Parallel computing is mainly used for the following reasons:

• Reduce the execution time of a program: The use of more compute resources to execute a program will possibly reduce the execution time of this program, and sometimes this reduction to the execution time also offers cost savings. The need to reduce the execution time is more critical to real-time applications where a program should complete its execution in a specific time period.

• Perform more tasks concurrently: A single compute resource can execute only one specific task at a time. The use of multiple compute resources will increase the tasks than can be executed concurrently.

• Solve larger problems: The use of more compute resources makes it possible to solve large problems that cannot be solved on a single computer with a given limited memory.

However, there exist problems that either cannot be parallelized at all or are very difficult to be parallelized. Let’s look closer at two examples to understand if those problems can be actually parallelized:

• Example 1 (Parallelizable problem): Count the number of occurrences of a specific word in many large texts. This problem can be solved in parallel. Each compute resource will find the number of occurrences in a number of those texts, and the result of each one will be shared to determine the final number of occurrences in all texts. These type of problems, where there exists little or no dependency between the parallel tasks, are called embarrassingly parallel tasks or perfectly parallel tasks.

• Example 2 (Non-parallelizable problem): Calculate the first n Fibonacci numbers (0, 1, 1, 2, 3, 5, 8, 13, …) using the formula f(n) = f(n − 1) + f(n − 2), where f(0) = 0 and f(1) = 1. This problem cannot be parallelized because the calculation of the Fibonacci numbers includes dependent calculations. The calculation of f(n) depends on the calculation of f(n − 1) and f(n − 2); these three values cannot be calculated independently.

Some tips before you start to parallelize your serial program are the following:

• Determine whether or not your problem can be parallelized. Try to find parts of your program that can be run independently with no or little communication between the parallel tasks. If you find that there are no independent tasks or the communication cost between the parallel tasks is very high, then consider finding another algorithm to solve your problem. For example, the calculation of the first n Fibonacci numbers using the formula f(n) = f(n − 1) + f(n − 2) cannot be parallelized, as already mentioned. However, we can use Binet’s formula to calculate the n. Using this formula, we can easily parallelize the calculation of the first n Fibonacci numbers.

• Find the hotspots on your program. Try to use profilers or performance analysis tools to identify the parts of your program that perform the most time-consuming tasks. Then, focus to parallelize those tasks.

• Identify possible bottlenecks that cause parallelizable tasks to halt.

• Use third-party highly optimized parallel libraries when possible.

1.1.2 Classification of Parallel Computers

According to Flynn’s taxonomy [1], multiprocessor computer architectures can be classified into two independent dimensions: (i) instruction stream (I) and (ii) data stream (D). Each of these dimensions has one of two possible states: (i) single (S) and (ii) multiple (M). Hence, there are four possible types of computers (Fig.

Enjoying the preview?
Page 1 of 1