Escolar Documentos
Profissional Documentos
Cultura Documentos
Overview
Introduction to OpenCL Design Goals of OpenCL OpenCL Architecture OpenCL Framework
Support for a wide diversity of applications From embedded and mobile software through consumer applications to HPC solutions
Rapid deployment in the market Designed to run on current latest generations of GPU hardware
What is OpenCL:
OpenCL (Open Computing Language) is an open royalty-free standard for general purpose parallel programming across CPUs, GPUs and other processors, giving software developers portable and efficient access to the power of these heterogeneous processing platforms.
OpenCL stands for Open Computing Language. It supported by Apple and several vendors. The Khronos group who made OpenGL making OpenCL. Cross-platform parallel computing API and C-like language for heterogeneous computing devices
A single OpenCL kernel will likely not achieve peak performance on all device types
OpenCL exposes CPUs, GPUs, and other Accelerators as devices Each device contains one or more compute units, i.e. cores, SMs, etc...
Kernel is code for work item. Executed on OpenCL devices, similar to C functions, CUDA kernels, etc. Data-parallel or task-parallel
Host program executed on host. Collection of compute kernels and internal functions. Analogous to a dynamic library
An OpenCL program contains one or more kernels and any supporting routines that run on a target device.
An OpenCL kernel is the basic unit of parallel code that can be executed on a target device
OpenCL Kernels :
Kernel Execution:
Kernel body is instantiated once for each work item An OpenCL work item is equivalent to a CUDA thread Each OpenCL work item gets a unique index
Eg: __kernel void vadd(__global const float *a, __global const float *b, __global float *result) { int id = get_global_id(0); result[id] = a[id] + b[id]; }
The host program invokes a kernel over an index space called an NDRange - NDRange, N-Dimensional Range, can be a 1D, 2D, or 3D space
A single kernel instance at a point in the index space is called a work-item - Work-items have unique global IDs from the index space - CUDA: thread Ids Work-items are further grouped into work-groups - Work-groups have a unique work-group ID - Work-items have a unique local ID within a work-group - UDA: Block IDs
Total number of work-items = Gx * Gy Size of each work-group = Sx * Sy Global ID can be computed from work-group ID and local ID
Kernels run over global dimension index ranges (NDRange), broken up into work groups,and work items Work items executing within the same work group can synchronize with each other with barriers or memory fences Work items in different work groups cant sync with each other, except by launching a new kernel
Static compilation: The code is compiled from source to machine execution code at a specific point in the past.
Dynamic compilation: Also known as runtime compilation Step 1 : The code is complied to an Intermediate Representation (IR), which is usually an assembler of a virtual machine. Step 2: The IR is compiled to a machine code for execution. This step is much shorter. In dynamic compilation, step 1 is done usually once, and the IR is stored. The App loads the IR and does step 2 during the Apps runtime
OpenCL Object:
Setup DevicesGPU, CPU, Cell/B.E. ContextsCollection of devices QueuesSubmit work to the device Memory BuffersBlocks of memory Images2D or 3D formatted images Execution ProgramsCollections of kernels KernelsArgument/execution instances Synchronization/profiling Events
OpenCL Framework:
OpenCL Framework:
The OpenCL framework allows applications to use a host and one or more OpenCL devices as a single heterogeneous parallel computer system. The framework contains the following components: OpenCL Platform layer: The platform layer allows the host program to discover OpenCL devices and their capabilities and to create contexts OpenCL Runtime: The runtime allows the host program to manipulate contexts once they have been created. OpenCL Compiler: The OpenCL compiler creates program executable that contain OpenCL kernels. The OpenCL C programming language implemented by the compiler supports a subset of the ISO C99 language with extensions for parallelism.
OpenCL Context
Contains one or more devices OpenCL memory objects are associated with a context, not a specific device clCreateBuffer() is the main data object allocation function error if an allocation is too large for any device in the context Each device needs its own work queue(s) Memory transfers are associated with a command queue (thus a specific device
Host Code:
Future of OpenCL:
The future lies with OpenCL as it is an open standard, not restricted to a vendor or specific hardware. Also because AMD is going to release a new processor called fusion. AMD Fusion is a new approach to processor design and software development, delivering powerful CPU and GPU capabilities for HD, 3D and data-intensive workloads in a single-die processor called an APU. APUs combine high-performance serial and parallel processing cores with other special-purpose hardware accelerators, enabling breakthroughs in visual computing, security, performance-per-watt and device form factor. This processor would be perfect for OpenCL, As that doesnt care what type of processor is available; as long as it can be used.
Conclusion:
OpenCL would attract HPC programmers because it is long term strategy with GPUs and other accelerators. It might be complicated language for the short application, but it is very useful with more complicated application . There are some restrictions on OpenCL. But it won't affect the language reliability. There will be other implementations for OpenCL in other high end language where would be easy for the normal programmers. In the end, you might find OpenCL very difficult. But when you master it, you will be the master of parallel computing There are already some requests for OpenCL programmers from UK companies.
http://www.youtube.com/watch?v=MCaGb40Bz58&feature=related
http://www.youtube.com/watch?v=PJ1jydg8mLg
http://www.youtube.com/watch?v=mcU89Td53Gg
Thank You