cf4ocl (C Framework for OpenCL)  v2.1.0
Object-oriented framework for developing and benchmarking OpenCL projects in C/C++
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
User guide

cf4ocl user guide.

The C Framework for OpenCL, cf4ocl, is a cross-platform pure C object-oriented framework for accelerating the development of OpenCL projects in C, with support for C++.

This user guide is organized as follows. First, in section Using the library, the basics of cf4ocl are introduced, followed by a more thorough discussion on how to use wrapper and other library modules. The utilities provided with cf4ocl are presented in section Using the utilities. Advanced topics, such as the wrapper architecure, are discussed in section Advanced.

Using the library

Basics

Library organization

The cf4ocl library offers an object-oriented interface to the OpenCL API using wrapper classes and methods (or structs and functions, in C terms), grouped into modules of the same name, as shown in the following table:

cf4ocl module cf4ocl wrapper class Wrapped OpenCL type
Platform module CCLPlatform* cl_platform_id
Device module CCLDevice* cl_device_id
Context module CCLContext* cl_context
Queue module CCLQueue* cl_command_queue
Program module CCLProgram* cl_program
Kernel module CCLKernel* cl_kernel
Event module CCLEvent* cl_event
MemObj module CCLMemObj* cl_mem
Buffer module CCLBuffer* cl_mem
Image module CCLImage* cl_mem
Sampler module CCLSampler* cl_sampler

Some of the provided methods directly wrap OpenCL functions (e.g. ccl_buffer_enqueue_copy()), while others perform a number of OpenCL operations in one function call (e.g. ccl_kernel_set_args_and_enqueue_ndrange()). The wrapper classes are organized in a hierarchical inheritance tree.

Additional modules are also available:

cf4ocl module Description
Device selector module Automatically select devices using filters.
Device query module Helpers for querying device information, mainly used by the ccl_devinfo program.
Errors module Convert OpenCL error codes into human-readable strings.
Platforms module Management of the OpencL platforms available in the system.
Profiler module Simple, convenient and thorough profiling of OpenCL events.

The new/destroy rule

The cf4ocl constructors and destructors have new and destroy in their name, respectively. In cf4ocl, the new/destroy rule states the following:

For each invoked constructor, the respective destructor must also be invoked.

This might seem obvious, but in many instances several objects are obtained using other (non-constructor) methods during the course of a program. These objects are automatically released and should not be destroyed by client code.

For example, it is possible to get a kernel belonging to a program using the ccl_program_get_kernel() function:

CCLKernel* krnl;
prg = ccl_program_new_from_source_file(ctx, "myprog.cl", NULL);
krnl = ccl_program_get_kernel(prg, "someKernel", NULL);

The returned kernel wrapper object will be freed when the program is destroyed; as such, there is no need to free it. Destroying the program will suffice:

Getting info about OpenCL objects

The ccl_<class>_get_info_<scalar|array>() macros can be used to get information about OpenCL objects. Use the array version when the expected return value is a pointer or array, or the scalar version otherwise (e.g. when the expected return value is primitive or scalar type).

For example, to get the name and the number of compute cores on a device:

CCLDevice* dev;
char* name;
cl_uint n_cores;
name = ccl_device_get_info_array(dev, CL_DEVICE_NAME, char*, NULL);
n_cores = ccl_device_get_info_scalar(dev, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint, NULL);

The ccl_<class>_get_info() macros serve more specific scenarios, and are likely to be used less often. These macros return a CCLWrapperInfo* object, which contains two public fields:

  • value - A pointer to the requested value as returned by the OpenCL clGet*Info() functions.
  • size - The size in bytes of the value pointed to by the value field.

To use the value, a cast should be performed on the value field to convert it to the required type (which is what the ccl_<class>_get_info_<scalar|array>() macros automatically do).

The values and objects returned by these macros are automatically released when the respective wrapper object is destroyed and should never be directly freed by client code.

Error handling

Error-reporting cf4ocl functions provide two methods for client-side error handling:

  1. The return value.
  2. CCLErr-based error reporting.

The first method consists of analysing the return value of a function. Error-throwing functions which return a pointer will return NULL if an error occurs. The remaining error-reporting functions return CL_FALSE if an error occurs (or CL_TRUE otherwise). Client code can check for errors by looking for NULL or CL_FALSE return values, depending on the function. This error handling method does not provide additional information about the reported error. For example:

prg = ccl_program_new_from_source_file(ctx, "program.cl", NULL);
if (!prg) {
fprintf(stderr, "An error ocurred");
exit(-1);
}

The second method is more flexible. A CCLErr object is initialized to NULL, and a pointer to it is passed as the last argument to the function being called. If the CCLErr object is still NULL after the function call, no error has occurred. Otherwise, an error occurred and it is possible to get a user-friendly error message:

CCLErr* err = NULL;
prg = ccl_program_new_from_source_file(ctx, "program.cl", &err);
if (err) {
fprintf(stderr, "%s", err->message);
exit(-1);
}

An error domain and error code are also available in the CCLErr object. The domain indicates the module or library in which the error was generated, while the code indicates the specific error that occurred. Three kinds of domain can be returned by error-reporting cf4ocl functions, each of them associated with distinct error codes:

Domain Codes Description
CCL_ERROR ccl_error_code enum Error in cf4ocl not related with external libraries
CCL_OCL_ERROR cl.h Error in OpenCL function calls
A GLib domain GLib-module dependent Error in GLib function call (file open/save, etc)

For example, it is possible for client code to act on different OpenCL errors:

CCLBuffer* buf;
CCLErr* err = NULL;
buf = ccl_buffer_new(ctx, flags, size, host_ptr, &err);
if (err) {
if (err->domain == CCL_OCL_ERROR) {
/* Check if it's OpenCL error. */
switch (err->code) {
/* Do different things depending on OpenCL error code. */
case CL_INVALID_VALUE:
/* Handle invalid values */
case CL_INVALID_BUFFER_SIZE:
/* Handle invalid buffer sizes */
case CL_INVALID_HOST_PTR:
/* Handle invalid host pointer */
/* Handle other OpenCL errors */
}
} else {
/* Handle other errors */
}
}

Finally, if client code wants to continue execution after an error was caught, it is mandatory to use the ccl_err_clear() function to free the error object and reset its value to NULL. Not doing so is a bug, especially if more error-reporting functions are to be called moving forward. For example:

CCLError* err = NULL;
prg = ccl_program_new_from_source_file(ctx, "program.cl", &err);
if (err) {
/* Print the error message, but don't terminate program. */
fprintf(stderr, "%s", err->message);
}

Even if the program terminates due to an error, the ccl_err_clear() function can be still be called to destroy the error object, avoiding memory leaks to be reported by tools such as Valgrind.

The internals of CCLErr-based error handling are discussed in further detail in section The GLib and OpenCL dependencies.

Wrapper modules

Each OpenCL class is associated with a cf4ocl module. At the most basic level, each module offers a wrapper class and functions which wrap or map their OpenCL equivalents. This, in itself, already simplifies working with OpenCL in C, because the cf4ocl wrapper classes internally manage related objects, which may be created during the course of a program. As such, client code just needs to follow the new/destroy rule for directly created objects, thus not having to worry with memory allocation/deallocation of intermediate objects.

In most cases, however, each cf4ocl module also provides methods for other common and not-so-common OpenCL host code patterns, allowing the programmer to avoid their verbosity and focus on OpenCL device code.

All cf4ocl wrapper classes extend the CCLWrapper* abstract wrapper class. The properties and methods of this class, which are concerned with reference counts, wrapping/unwrapping of OpenCL objects and getting object information, are essentially of internal use by other cf4ocl classes. This functionality is also available to client code which requires a more advanced integration with cf4ocl, as explained in the advanced section.

Several OpenCL objects, namely cl_platform_id, cl_context and cl_program, have a direct relationship with a set of cl_device_id objects. In order to map this relationship, cf4ocl provides the CCLDevContainer* class, which is an intermediate class between the CCLWrapper* parent class and the CCLPlatform*, CCLContext* and CCLProgram* wrappers. The CCLDevContainer* class implements functionality for managing a set of CCLDevice* wrapper instances. This functionality is exposed to client code through concrete wrapper methods. For example, the CCLContext* class provides the ccl_context_get_all_devices(), ccl_context_get_device() and ccl_context_get_num_devices() methods for this purpose.

Platform module

The platform wrapper module provides functionality for simple handling of OpenCL platform objects. Platform wrappers can be obtained in two ways:

  1. From a given CCLDevice* device wrapper, using the ccl_platform_new_from_device() function (in which case, the object must be destroyed with ccl_platform_destroy(), following the cf4ocl new/destroy rule).
  2. From a CCLPlatforms* object (which contains the list of platforms in the system), using the ccl_platforms_get() function. In this case the platform wrapper is automatically destroyed when the CCLPlatforms* object is freed; as such, client code should not directly destroy the platform wrapper. See the platforms module for more details.

The provided functions allow to get the device wrappers associated with the given platform wrapper, as well as to obtain the supported OpenCL version of the underlying OpenCL platform object.

Information about platform objects can be fetched using the platform info macros:

However, because the platform info return type is always char*, cf4ocl provides an additional helper macro for platform wrappers, ccl_platform_get_info_string(), which is simpler to use.

The CCLPlatform* class extends the CCLDevContainer* class; as such, it provides methods for handling a list of devices associated with the platform:

Usage example:

CCLPlatform* platf;
CCLDevice* dev;
char* platf_name;
cl_uint platf_ver;
platf = ccl_platform_new_from_device(dev, NULL);
platf_name =
ccl_platform_get_info_string(platf, CL_PLATFORM_NAME, NULL);
platf_ver = ccl_platform_get_opencl_version(platf, NULL);
printf("Platform name is %s\n: ", platf_name);
printf("Platform version is %f\n: ", platf_ver / 100.0f);

Device module

The device wrapper module provides functionality for simple handling of OpenCL device objects. In most cases, device wrapper objects should not be directly instanced by client code. They are usually fetched from device container objects such as CCLPlatform* or CCLContext* instances, or created as sub-devices using the ccl_device_create_subdevices() function. In either case, when the parent object is destroyed, the associated devices (or sub-devices) are also destroyed. As such, and in accordance with the cf4ocl new/destroy rule, the ccl_device_destroy() destructor function will rarely be used.

Information about device objects can be fetched using the device info macros:

Example: getting the first device in a context

CCLDevice* dev;
dev = ccl_context_get_device(ctx, 0, NULL);

Context module

The context wrapper module provides functionality for simple handling of OpenCL context objects. Context wrappers can be created using three different approaches:

  1. From a list of CCLDevice* device wrappers, using the ccl_context_new_from_devices_full() function or the ccl_context_new_from_devices() macro.
  2. From a list of CCLDevSelFilters* device filters, using the ccl_context_new_from_filters_full() function or the ccl_context_new_from_filters() macro. This is a very flexible mechanism, which is explained in detail in the device selection module section.
  3. Using one of the several convenience constructors, which contain predefined filters, such as ccl_context_new_gpu(), ccl_context_new_any() or ccl_context_new_from_menu().

Instantiation and destruction of context wrappers follows the cf4ocl new/destroy rule; as such, context wrapper objects must be released with the ccl_context_destroy() function.

Information about context objects can be fetched using the context info macros:

The CCLContext* class extends the CCLDevContainer* class; as such, it provides methods for handling a list of devices associated with the context:

Example: using all devices in a platform

CCLPlatform* platf;
const* CCLDevice* devs;
cl_uint num_devs;
devs = ccl_platform_get_all_devices(platf, NULL);
num_devs = ccl_platform_get_num_devices(platf, NULL);
ctx = ccl_context_new_from_devices(num_devs, devs, NULL);

Example: select device from menu

Command queue module

The command queue wrapper module provides functionality for simple handling of OpenCL command queue objects. Queue wrappers can be instantiated with the ccl_queue_new() and ccl_queue_new_full() constructors. While both constructors can be used with any OpenCL version, the later is targeted for OpenCL >= 2.0, exposing features such as on-device queues to client code. If OpenCL >= 2.0 features are requested for platforms which do not support them, a warning will be logged and the queue will be created without the unsupported features.

Instantiation and destruction of queue wrappers follows the cf4ocl new/destroy rule; as such, queues should be freed with the ccl_queue_destroy() destructor.

Queue wrappers created with the CL_QUEUE_PROFILING_ENABLE property can be automatically profiled with the profiler module.

Information about queue objects can be fetched using the info macros:

Example:

/* Wrappers for OpenCL objects. */
CCLQueue* queue;
/* Error handling object (must be initialized to NULL). */
CCLErr* err = NULL;

/* Create a command queue. */
queue = ccl_queue_new(ctx, dev, 0, &err);

/* Release wrappers. */

Memory object module

The memory object wrapper module provides functionality for simple handling of generic OpenCL memory objects. All the functions in this module are direct wrappers of the respective OpenCL memory object functions, with the exception of ccl_memobj_get_opencl_version(), which returns the OpenCL version of the platform associated with the memory object.

For specific buffer and image handling, see the buffer wrapper and image wrapper modules.

Information about memory objects can be fetched using the memory object info macros:

Buffer module

The buffer wrapper module provides functionality for simple handling of OpenCL buffer objects. All the functions in this module are direct wrappers of the respective OpenCL buffer functions, except for the ccl_buffer_new_from_region() function. This function wraps clCreateSubBuffer() but assumes that the sub-buffer will represent a specific region in the original buffer (which is the only sub-buffer type, up to OpenCL 2.1).

Buffer wrapper objects can be directly passed as kernel arguments to functions such as ccl_kernel_set_args_and_enqueue_ndrange() or ccl_kernel_set_args_v().

Information about buffer objects can be fetched using the info macros from the memory object module:

Instantiation and destruction of buffer wrappers follows the cf4ocl new/destroy rule.

Example:

CCLBuffer* buf;
cl_float host_data[BSIZE];
size_t buf_size = BSIZE * sizeof(cl_float);
context, CL_MEM_READ_WRITE, buf_size, NULL, NULL);
ccl_buffer_enqueue_write(queue, buf, CL_TRUE, 0, buf_size,
host_data, NULL, NULL);
ccl_buffer_enqueue_read(queue, buf, CL_TRUE, 0, buf_size,
host_data, NULL, NULL);

Image module

The image wrapper module provides functionality for simple handling of OpenCL image objects. All the functions in this module are direct wrappers of the respective OpenCL image functions. The ccl_image_new() constructor accepts a variable list of arguments which describe the image to be created. There is also the ccl_image_new_v() constructor, which accepts the image description arguments given in a CCLImageDesc* object. Both constructors will automatically use the old style clCreateImage2D()/clCreateImage3D() OpenCL functions if the OpenCL version of the underlying platform is less or equal than 1.1, or the new clCreateImage() constructor otherwise. Instantiation and destruction of image wrappers follows the cf4ocl new/destroy rule; as such, images should be freed with the ccl_image_destroy() destructor.

Image wrapper objects can be directly passed as kernel arguments to functions such as ccl_program_enqueue_kernel() or ccl_kernel_set_arg().

Information about image objects can be fetched using the image info macros:

If the information to be fetched is relative to the memory object parent class (e.g. CL_MEM_TYPE or CL_MEM_FLAGS), then the memory object module info macros should be used instead:

Example:

CCLQueue* queue;
CCLImage* img_in;
CCLImage* img_out;
cl_uchar4 host_img[IMG_X * IMG_Y];
cl_image_format image_format = { CL_RGBA, CL_UNSIGNED_INT8 };
size_t origin[3] = { 0, 0, 0 };
size_t region[3] = { IMG_X, IMG_Y, 1 };
img_in = ccl_image_new(ctx, CL_MEM_READ_ONLY, &image_format, NULL, &err,
"image_type", (cl_mem_object_type) CL_MEM_OBJECT_IMAGE2D,
"image_width", (size_t) IMG_X,
"image_height", (size_t) IMG_Y,
NULL);
img_out = ccl_image_new(ctx, CL_MEM_WRITE_ONLY, &image_format, NULL, &err,
"image_type", (cl_mem_object_type) CL_MEM_OBJECT_IMAGE2D,
"image_width", (size_t) IMG_X,
"image_height", (size_t) IMG_Y,
NULL);
ccl_image_enqueue_write(queue, img_in, CL_TRUE, origin, region,
0, 0, host_img, NULL, NULL);

enqueue some image processing kernel...

ccl_image_enqueue_read(queue, img_out, CL_TRUE, origin, region,
0, 0, host_img, NULL, NULL);

Sampler module

The sampler wrapper module provides functionality for simple handling of OpenCL sampler objects. Sampler wrapper objects can be instanced with two constructors:

The former follows the constructor format in OpenCL 1.0-1.2, where three main sampler properties are directly given as constructor arguments, namely the normalized coordinates flag, the addressing mode and the filter mode. The later uses the OpenCL 2.0 constructor format, in which sampler properties are given in a zero-terminated array of cl_sampler_properties. Both formats can be used without concern for the underlying platform's OpenCL version, because cf4ocl will automatically select the most adequate OpenCL constructor.

Sampler wrapper objects should be freed with the ccl_sampler_destroy() function, in accordance with the cf4ocl new/destroy rule.

Sampler wrapper objects can be directly passed as kernel arguments to functions such as ccl_kernel_set_args_and_enqueue_ndrange() or ccl_kernel_set_args().

Information about sampler objects can be fetched using the sampler info macros:

Example:

CCLContext* ctx;
CCLSampler* smplr;
CCLErr* err = NULL;

smplr = ccl_sampler_new(ctx, CL_FALSE, CL_ADDRESS_CLAMP_TO_EDGE,
CL_FILTER_NEAREST, &err);

Program module

The program wrapper module provides functionality for simple handling of OpenCL program objects. OpenCL program objects can be created from source code, from binary data or from built-in kernels using the clCreateProgramWithSource(), clCreateProgramWithBinary() or clCreateProgramWithBuiltInKernels(), respectively (the later requires OpenCL >= 1.2). cf4ocl provides a set of CCLProgram* program wrapper constructors which not only map the native OpenCL program constructors, but also extend some of their functionality.

For creating programs from source code, cf4ocl provides the following constructors:

Program constructors which use binary data follow the same pattern as their source code counterparts:

The ccl_program_new_from_built_in_kernels() constructor directly wraps the native OpenCL clCreateProgramWithBuiltInKernels() function, allowing to create programs from built-in kernels. This method is only available for platforms which support OpenCL version 1.2 or higher.

Like most cf4ocl wrapper objects, program wrapper objects follow the new/destroy rule, and should be released with the ccl_program_destroy() destructor.

The ccl_program_build() and ccl_program_build_full() methods allow to build a program executable from the program source or binary. While the later directly maps the native clBuildProgram() OpenCL function, the former provides a simpler interface which will be useful in many situations.

Compilation and linking (which require OpenCL >= 1.2) are provided by the ccl_program_compile() and ccl_program_link() functions.

Information about program objects can be obtained using the program module info macros:

However, program binaries cannot be retrieved using these macros. Consequently, cf4ocl provides a specific and straightforward API for handling them:

Program build information can be obtained using a specific set of info macros:

For simple programs and kernels, the program wrapper module offers three functions, which can be used after a program is built:

  • ccl_program_get_kernel() - Get the kernel wrapper object for the given program kernel function.
  • ccl_program_enqueue_kernel() - Enqueues a program kernel function for execution on a device, accepting kernel arguments as NULL-terminated variable list of parameters.
  • ccl_program_enqueue_kernel_v() - Enqueues a program kernel function for execution on a device, accepting kernel arguments as NULL-terminated array of parameters.

Program wrapper objects only keep one kernel wrapper instance per kernel function; as such, for a given kernel function, these methods will always use the same kernel wrapper instance (and consequently, the same OpenCL kernel object). While this will work for single-threaded host code, it will fail if the same kernel wrapper is invoked from different threads. In such cases, use the kernel wrapper module API for handling kernel wrapper objects.

The CCLProgram* class extends the CCLDevContainer* class; as such, it provides methods for handling a list of devices associated with the program:

Example:

Kernel code:

__kernel void sum(__global const uint *a, __global const uint *b,
__global uint *c, uint d, uint buf_size) {
/* Get global ID. */
uint gid = get_global_id(0);
/* Only perform sum if this workitem is within the size of the
* vector. */
if (gid < buf_size)
c[gid] = a[gid] + b[gid] + d;
}

Host code:

int main(int argc, char** argv) {

/* Wrappers. */
CCLContext* ctx = NULL;
CCLProgram* prg = NULL;
CCLDevice* dev = NULL;
CCLQueue* queue = NULL;
CCLKernel* krnl = NULL;
CCLBuffer* a_dev;
CCLBuffer* b_dev;
CCLBuffer* c_dev;
CCLEvent* evt_exec;

/* Global and local worksizes. */
size_t gws = 0;
size_t lws = 0;

/* Error reporting object. */
CCLErr* err = NULL;

/* Create a new program from kernel source. */
prg = ccl_program_new_from_source(ctx, KERNEL_SRC, &err);

/* Build program. */
ccl_program_build(prg, NULL, &err);

evt_exec = ccl_program_enqueue_kernel(prg, KERNEL_NAME, queue, 1,
NULL, &gws, &lws, &ewl, &err,
/* Kernel arguments. */
a_dev, b_dev, c_dev,
ccl_arg_priv(d_host, cl_uint), ccl_arg_priv(buf_n, cl_uint),
NULL);

/* Destroy wrappers. */

}

Kernel module

The kernel wrapper module provides functionality for simple handling of OpenCL kernel objects. Kernel wrappers can be obtained using two approaches:

  1. Using the ccl_program_get_kernel() function. This function always returns the same kernel wrapper object (with the same underlying OpenCL kernel object) associated with a program. The returned object is automatically freed when the program wrapper object is destroyed; as such, client code should not call ccl_kernel_destroy().
  2. Using the ccl_kernel_new() constructor. The created kernel wrapper should be released with the ccl_kernel_destroy() function, in accordance with the cf4ocl new/destroy rule.

While the first approach might be more convenient, it will not work properly if the same kernel function is to be handled and executed by different threads. In these cases, use the second approach to create distinct kernel wrapper instances (wrapping distinct OpenCL kernel objects) for the same kernel function, one for each thread.

This module offers several functions which simplify kernel execution. For example, the ccl_kernel_set_args_and_enqueue_ndrange() function can set all kernel arguments and execute the kernel in one call.

Information about kernel objects can be fetched using the kernel info macros:

Six additional macros are provided for getting kernel workgroup info and kernel argument info (the later are only available from OpenCL 1.2 onwards). These work in the same way as the regular info macros:

Example: getting a kernel wrapper from a program wrapper

CCLKernel* krnl;
krnl = ccl_program_get_kernel(prg, "some_kernel", NULL);

Example: creating a kernel wrapper

CCLKernel* krnl;
krnl = ccl_kernel_new(prg, "some_kernel", NULL);

Kernel arguments module

This module defines the CCLArg* class which wraps kernel arguments. Several functions in the kernel wrapper module, such as ccl_kernel_set_args() or ccl_kernel_set_args_and_enqueue_ndrange(), accept kernel arguments as parameters. CCLBuffer*, CCLImage* and CCLSampler* objects can be directly passed as global kernel arguments to these functions. However, local and private kernel arguments need to be passed using the macros provided in this module, namely ccl_arg_local() and ccl_arg_priv(), respectively.

The ccl_arg_skip constant can be passed to methods which accept a variable list of ordered arguments in order to skip a specific argument.

Example:

Kernel code:

__kernel void my_kernel(
__global int* g, __local int *l, __private float p) {
}

Host code:

#define LOC_SIZE 16
const cl_float pi=3.1415;
CCLKernel* krnl;
CCLBuffer* buf;
krnl = ccl_program_get_kernel(prg, "my_kernel", NULL);
ccl_kernel_set_args(krnl, buf, ccl_arg_local(LOC_SIZE, cl_int),
ccl_arg_priv(pi, cl_float), NULL);
Note
The ccl_arg_local() and ccl_arg_priv() macros invoke the ccl_arg_new() function, which returns a new CCLArg* object. CCLArg* objects are destroyed when the kernel to which they were passed is released. For further control of argument instantiation, client code can use the ccl_arg_full() macro instead of the ccl_arg_new() function in order to respect the new/destroy rule.
Attention
A CCLArg* object can only be passed once to a kernel. One way to guarantee this is to use the macros directly when setting the kernel arguments, as shown in the example above.

Event module

The event wrapper module provides functionality for simple handling of OpenCL event objects. Typically, event wrappers will not be directly created by client code. They are returned by event producing functions (such as ccl_image_enqueue_write(), which wraps the clEnqueueWriteImage() OpenCL function). As such, and in accordance with the new/destroy rule, regular event wrappers objects should not be destroyed by client code. They are automatically released when the command queue wrapper where the event took place is destroyed. User events (OpenCL >= 1.1), created with the ccl_user_event_new() constructor, are the only exception. These are special events which allow applications to enqueue commands that wait on user-controlled occurrences before the command is executed by the device. These events should be destroyed with ccl_event_destroy().

The event wait list section provides additional information on how to use events to synchronize the execution of OpenCL commands. Events are also used by the profiler module, although indirectly via CCLQueue* wrappers, to profile and benchmark applications.

Information about event objects can be fetched using the respective info macros:

Three additional macros are provided for getting event profiling info. These work in the same way as the regular info macros:

Event wait lists module

This module provides simple management of event wait lists. Client code must initialize CCLEventWaitList variables to NULL, and can reuse them between ccl_*_enqueue_*() function calls. No allocation and deallocation of events and event wait lists is required if populated event wait lists are consumed by ccl_*_enqueue_*() functions; otherwise, unused non-empty event wait lists should be freed with the ccl_event_wait_list_clear() function.

Example 1:

CCLEvent *evt1, *evt2, *evt3;
CCLEventWaitList evt_wait_lst = NULL;
evt1 = ccl_buffer_enqueue_write(cq, a_dev, CL_FALSE, 0, size, a_host, NULL, NULL);
evt2 = ccl_buffer_enqueue_write(cq, b_dev, CL_FALSE, 0, size, b_host, NULL, NULL);
ccl_event_wait_list_add(&evt_wait_lst, evt1, evt2, NULL);
evt3 = ccl_kernel_enqueue_ndrange(krnl, cq, dim, offset, gws, lws, &evt_wait_lst, NULL);
ccl_event_wait_list_add(&evt_wait_lst, evt3, NULL);
ccl_buffer_enqueue_read(cq, c_dev, CL_TRUE, 0, size, c_host, &evt_wait_lst, NULL);

Example 2:

CCLEvent *evt = NULL;
CCLEventWaitList evt_wait_lst = NULL;
evt = ccl_buffer_enqueue_write(cq1, dev, CL_FALSE, 0, size, a_host, NULL, NULL);
ccl_kernel_enqueue_ndrange(krnl, cq2, dim, offset, gws, lws,
ccl_ewl(&evt_wait_lst, evt, NULL), NULL);

Other modules

Device selector module

This module offers a mechanism for selecting OpenCL devices, mainly for context creation, although its functionality can be used for other purposes. The ccl_context_new_from_filters_full() context wrapper constructor (and the ccl_context_new_from_filters() macro) accepts a CCLDevSelFilters object containing a set of filters. These filters define which devices can be used in the context. Instances of CCLDevSelFilters must be initialized to NULL:

/* cf4ocl objects. */
CCLContext* ctx;
CCLDevSelFilters filters = NULL;

Filters can then be added to the CCLDevSelFilters object with the ccl_devsel_add_dep_filter() and ccl_devsel_add_indep_filter() functions, which add dependent or independent filters, respectively. Filters are processed in the order in which they are added. The next example shows how to add an independent and a dependent filter to the CCLDevSelFilters object:

/* Add independent filter which accepts CPU devices. */
/* Add "same platform" dependent filter. This filter should always be added
* (usually in last position) for context creation, because all devices in a
* context must belong to the same platform. */

At this stage, the most common use for the CCLDevSelFilters object is to pass it to a context wrapper constructor:

/* Create context wrapper, which must have at least one device. */
ctx = ccl_context_new_from_filters(&filters, &err);

/* Free context. */

However, the CCLDevSelFilters object can also be used for explicit device selection:

/* Perform device selection. */
devices = ccl_devsel_select(&filters, &err);

The CCLDevSelDevices object represented by devices is just a GLib pointer array, so we have direct access to the list of device wrappers and its size. For example, let us list the filtered devices by name:

/* List selected devices. */
if (devices->len > 0) {
printf("%d devices were accepted by the filters:\n", devices->len);
for (unsigned int i = 0; i < devices->len; ++i) {
devices->pdata[i], CL_DEVICE_NAME, char*, &err);

printf("\t%d - %s\n", i + 1, dev_name);
} /* For */
} /* If */
/* Free array object containing device wrappers. */

CCLDevSelFilters objects are automatically freed and reset to NULL when passed to context wrapper constructors or to the ccl_devsel_select() function.

See the complete example here.

The device selector module provides two additional helper functions which return and print a list of all OpenCL devices available in the system, respectively:

Finally, the device selector module also offers the ccl_devsel_devices_new() function, which returns a CCLDevSelDevices object containing device wrappers for all OpenCL devices in the system. This array should not be modified directly, and when no longer required, should be freed with ccl_devsel_devices_destroy().

Device query module

This module facilitates the querying of OpenCL devices. It primarily supports the ccl_devinfo utility, but may also be of use to client code.

Errors module

This module offers a function to convert OpenCL error codes into human-readable strings. It is widely used by other cf4ocl modules, but may also be useful to client code which directly uses OpenCL functions.

Example:

cl_int status;
cl_event event;
status = clWaitForEvents(1, &event);
if (status != CL_SUCCESS) {
fprintf(stderr, "OpenCL error %d: %s", status, ccl_err(status));
}

Platforms module

The platforms module provides functionality for managing the OpenCL platforms available in the system. The ccl_platforms_new() function returns a list of platforms available in the system. The ccl_platforms_count() can be used to get the number of platforms in the list, while the ccl_platforms_get() will return the $i^{th}$ platform.

Example:

/* cf4ocl objects. */
CCLPlatforms* platf_list;
CCLPlatform* platf;

/* Other variables. */
cl_uint num_platfs;
CCLErr* err = NULL;

/* Get all platforms in system. */
platf_list = ccl_platforms_new(&err);

/* Get number of platforms in system. */
num_platfs = ccl_platforms_count(platf_list);

/* Cycle through platforms. */
for (cl_uint i = 0; i < num_platfs; ++i) {
/* Get current platform. */
platf = ccl_platforms_get(platf_list, i);

} /* Cycle platforms. */
/* Release platform set, which will release the underlying
* platform wrappers, device wrappers and the requested info. */
ccl_platforms_destroy(platf_list);

Profiler module

The profiler module provides classes and methods for profiling wrapped OpenCL events and queues.

Warning
The functions in this module are not thread-safe.

The profiling module offers two methods for obtaining information about the performed computations:

  1. Detailed profiling of OpenCL events using the ccl_prof_add_queue() function.
  2. Simple (and optional) timming of the performed computations using the ccl_prof_start() and ccl_prof_stop() functions. If these function are used, the measured time will be taken into account by the ccl_prof_*_summary() functions.

In order to use the first method, the CL_QUEUE_PROFILING_ENABLE property should be specified when creating command queue wrappers with ccl_queue_new() or ccl_queue_new_full().

After all the computations and memory transfers take place, the utilized queue wrappers are passed to the profiler using the ccl_prof_add_queue() function. The ccl_prof_calc() function can then be called to perform the required analysis.

At this stage, different types of profiling information become available, and can be iterated over:

  1. Aggregate event information: absolute and relative durations of all events with same name, represented by the CCLProfAgg* class. If an event name is not set during the course of the computation, the aggregation is performed by event type, i.e., by events which represent the same command. A sequence of CCLProfAgg* objects can be iterated over using the ccl_prof_iter_agg_init() and ccl_prof_iter_agg_next() functions. A specific aggregate event can be obtained by name using the ccl_prof_get_agg() function.
  2. Non-aggregate event information: event-specific information, represented by the CCLProfInfo* class, such as event name (or type, if no name is given), the queue the event is associated with, and submit, queue, start and end instants. A sequence of CCLProfInfo* objects can be iterated over using the ccl_prof_iter_info_init() and ccl_prof_iter_info_next() functions.
  3. Event instants: specific start and end event instants, represented by the CCLProfInst* class. A sequence of CCLProfInst* objects can be iterated over using the ccl_prof_iter_inst_init() and ccl_prof_iter_inst_next() functions.
  4. Event overlaps: information about event overlaps, represented by the CCLProfOverlap* class. Event overlaps can only occur when more than one queue is used on the same device. A sequence of CCLProfOverlap* objects can be iterated over using the ccl_prof_iter_overlap_init() and ccl_prof_iter_overlap_next() functions.

While this information can be subject to different types of examination by client code, the profiler module also offers some functionality which allows for a more immediate interpretation of results:

  1. A summary of the profiling analysis can be obtained or printed with the ccl_prof_get_summary() or ccl_prof_print_summary() functions, respectively.
  2. An exported list of CCLProfInfo* data, namely queue name, start instant, end instant and event name, sorted by start instant, can be opened by the plot events script to plot a Gantt-like chart of the performed computation. Such list can be exported with the ccl_prof_export_info() or ccl_prof_export_info_file() functions, using the default export options.

Example: Conway's game of life using double-buffered images (complete example)

/* Wrappers for OpenCL objects. */
CCLQueue* queue_exec;
CCLQueue* queue_comm;
CCLProf* prof;
/* Error handling object (must be NULL). */
CCLErr* err = NULL;

/* Create command queues. */
queue_exec = ccl_queue_new(ctx, dev, CL_QUEUE_PROFILING_ENABLE, &err);
queue_comm = ccl_queue_new(ctx, dev, CL_QUEUE_PROFILING_ENABLE, &err);

/* Start profiling. */
prof = ccl_prof_new();
/* Write initial state. */
ccl_image_enqueue_write(img1, queue_comm, CL_TRUE,
origin, region, 0, 0, input_image, NULL, &err);

/* Run CA_ITERS iterations of the CA. */
for (cl_uint i = 0; i < CA_ITERS; ++i) {
/* Read result of last iteration. On first run it is the initial
* state. */
evt_comm = ccl_image_enqueue_read(img1, queue_comm, CL_FALSE,
origin, region, 0, 0, output_images[i], NULL, &err);

/* Execute iteration. */
krnl, queue_exec, 2, NULL, gws, lws, NULL, &err,
img1, img2, NULL);

}
/* Read result of last iteration. */
ccl_image_enqueue_read(img1, queue_comm, CL_TRUE,
origin, region, 0, 0, output_images[CA_ITERS], &ewl, &err);

/* Stop profiling timer and add queues for analysis. */
ccl_prof_add_queue(prof, "Comms", queue_comm);
ccl_prof_add_queue(prof, "Exec", queue_exec);

/* Process profiling info. */
ccl_prof_calc(prof, &err);

/* Print profiling info. */
/* Save profiling info. */
ccl_prof_export_info_file(prof, "prof.tsv", &err);

/* Release wrappers. */
ccl_queue_destroy(queue_comm);
ccl_queue_destroy(queue_exec);

/* Destroy profiler. */

The output of ccl_prof_print_summary() will be something like:

   Aggregate times by event  :
     ------------------------------------------------------------------
     | Event name                     | Rel. time (%) | Abs. time (s) |
     ------------------------------------------------------------------
     | NDRANGE_KERNEL                 |       97.2742 |    3.7468e-02 |
     | READ_IMAGE                     |        2.6747 |    1.0303e-03 |
     | WRITE_IMAGE                    |        0.0511 |    1.9690e-05 |
     ------------------------------------------------------------------
                                      |         Total |    3.8518e-02 |
                                      ---------------------------------
   Event overlaps            :
     ------------------------------------------------------------------
     | Event 1                | Event2                 | Overlap (s)  |
     ------------------------------------------------------------------
     | READ_IMAGE             | NDRANGE_KERNEL         |   1.3618e-04 |
     ------------------------------------------------------------------
                              |                  Total |   1.3618e-04 |
                              -----------------------------------------
   Tot. of all events (eff.) : 3.838198e-02s
   Total ellapsed time       : 4.295200e-02s
   Time spent in device      : 89.36%
   Time spent in host        : 10.64%

Instead of the default command-based event names such as NDRANGE_KERNEL, specific names can be set with the ccl_event_set_name() function. This allows to: (a) separate the aggregation of events of the same type (e.g., differentiate between the execution of two different kernels); and, (b) aggregate events of different types (e.g., aggregate reads and writes into a single "comms" event).

The ccl_plot_events script can be used to plot a Gantt-like chart of the events which took place in the queues. Running the following command...

$ python ccl_plot_events.py prof.tsv

...will produce the following image:

gantt_ca.png

Bundled utilities

cf4ocl is bundled with the following utilities:

  • ccl_devinfo - Utility to query OpenCL platforms and devices.
  • ccl_c - Utility for offline compilation and linking of OpenCL kernels.
  • ccl_plot_events - Plots a Gantt-like chart of OpenCL events using the profiling info exported using the profiler module.

Advanced

Wrapper architecture

The wrapper classes, which wrap OpenCL types, are implemented using an object-oriented (OO) approach in C. While C does not directly provide OO constructs, it is possible to implement features such as inheritance, polymorphism or encapsulation. Using this approach, cf4ocl is able to offer a clean and logical class system, while being available in a form (C) which can be directly or indirectly invoked from other programming languages.

Each cf4ocl wrapper class is defined by a source (.c) file and a header (.h) file. The former contains the private class properties and the method implementations, while the later defines its public API. The class body is implemented in the source file as a C struct; the header file provides an opaque pointer to it, which is the public side of the class from a client code perspective. Inheritance is implemented by including a member representing the parent class struct in the body of the child class struct. This requires the sharing of parent class implementations. In order to keep these opaque, the respective struct is defined in "private" header files which are not included in the public API. This way, instances of the child class can be cast to its parent type when required. The child class struct effectively extends the parent class struct. An example of this approach can be shown with the definitions of the abstract CCLWrapper* class and of the concrete CCLEvent* class, which extends CCLWrapper*:

In _ccl_abstract_wrapper.h (not part of public API):

/* Base class for all OpenCL wrappers. */
struct ccl_wrapper {
/* The class or type of wrapped OpenCL object. */
CCLClass class;
/* The wrapped OpenCL object. */
void* cl_object;
/* Information about the wrapped OpenCL object. */
/* Reference count. */
int ref_count;
};

In ccl_common.h:

/* Event wrapper class type declaration. */
typedef struct ccl_event CCLEvent;

In ccl_event_wrapper.c:

/* Event wrapper class, extends CCLWrapper */
struct ccl_event {
/* Parent wrapper object. */
CCLWrapper base;
/* Event name, for profiling purposes only. */
const char* name;
};

Methods are implemented as functions which accept the object on which they operate as the first parameter. When useful, function-like macros are also used as class methods, such as the case of the info macros. Polymorphism is not used, as the so called "abstract" methods are just functions which provide common operations to concrete methods, named differently for each concrete class. For example, the ccl_dev_container_get_device() abstract method is called by the ccl_context_get_device(), ccl_platform_get_device() and ccl_program_get_device() concrete methods, for which it provides common functionality.

The cf4ocl class hierarchy is shown in the following inheritance diagram:

dot_inline_dotgraph_1.png

The CCLWrapper base class

The CCLWrapper* base class is responsible for common functionality of wrapper objects, namely:

  • Wrapping/unwrapping of OpenCL objects and maintaining a one-to-one relationship between wrapped objects and wrapper objects
  • Low-level memory management (allocation and deallocation)
  • Reference counting
  • Information handling (i.e., handling of data returned by the several clGet*Info() OpenCL functions)

Wrapper constructors create the OpenCL object to be wrapped, but delegate memory allocation to the special ccl_<class>_new_wrap() functions. These accept the OpenCL object, and in turn call the ccl_wrapper_new() function, passing it not only the object, but also the size in bytes of the wrapper to be created. The ccl_wrapper_new() function allocates memory for the wrapper (initializing this memory to zero), and keeps the OpenCL object (wrapping it) in the created wrapper instance. For example, the ccl_kernel_new() creates the cl_kernel object with the clCreateKernel() OpenCL function, but then relies on the ccl_kernel_new_wrap() function (and thus, on ccl_wrapper_new()) for allocation and initialization of the new CCLKernel* wrapper object memory.

The destruction of wrapper objects and respective memory deallocation is performed in a similar fashion. Each wrapper class has its own ccl_<class>_destroy() method, but this method delegates actual object release to the "abstract" ccl_wrapper_unref() function. This function accepts the wrapper to be destroyed, its size in bytes, and two function pointers: the first, with prototype defined by ccl_wrapper_release_fields(), is a wrapper specific function for releasing internal wrapper objects, which the super class has no knowledge of; the second is the OpenCL object destructor function, with prototype defined by ccl_wrapper_release_cl_object(). Continuing on the kernel example, the ccl_kernel_destroy() method delegates kernel wrapper destruction to ccl_wrapper_unref(), passing it the kernel wrapper object, its size (i.e. sizeof( CCLKernel )), the "private" (static in C) ccl_kernel_release_fields() function for destroying kernel internal objects, and the clReleaseKernel() OpenCL kernel destructor function.

As such, all cf4ocl wrapper objects use a common memory allocation and deallocation strategy, implemented in the CCLWrapper* super class.

The ccl_<class>_new_wrap() special constructors respect the new/destroy rule. Wrappers created with their special constructor must be released with the respective ccl_<class>_destroy() function. This allows client code to create OpenCL objects directly with OpenCL functions, and then wrap the objects to take advantage of cf4ocl functionality and features. The OpenCL object can be retrieved from its wrapper at all times with the respective ccl_<class>_unwrap method.

If ccl_<class>_new_wrap() functions are passed an OpenCL object which is already wrapped, a new wrapper will not be created. Instead, the existing wrapper is returned, with its reference count increased by 1. Thus, there is always a one-to-one relationship between wrapped OpenCL objects and their respective wrappers. In reality, the ccl_<class>_destroy() functions decreases the reference count of the respective wrapper, only destroying it if the reference count reaches zero. Client code can increase and decrease the reference count of a wrapper object using the associated ccl_<class>_ref() and ccl_<class>_unref() macros. The ccl_<class>_ref() macros call the ccl_wrapper_ref() function, casting the wrapper to its base class (CCLWrapper*), while the ccl_<class>_unref() macros are just aliases for the respective ccl_<class>_destroy() functions.

The CCLWrapper* class maintains a static hash table which associates OpenCL objects (keys) to cf4ocl wrappers (values). Access to this table is thread-safe and performed by the ccl_wrapper_new() and ccl_wrapper_unref() functions.

The management of OpenCL object information is also handled by the CCLWrapper* class. The ccl_wrapper_get_info() method accepts two wrapper objects, the first being the object to query; the second is an auxiliary object required by some lower-level OpenCL info functions, such clGetKernelWorkGroupInfo(), which requires a device object besides the kernel object. ccl_wrapper_get_info() also requires the specification of the type of query to perform via the CCLInfo enumeration. ccl_wrapper_get_info() returns a CCLWrapperInfo* object, which contains two public properties: the queried value and its size. To be useful, the value must be cast to the correct type. The ccl_wrapper_get_info_value() and ccl_wrapper_get_info_size() methods call ccl_wrapper_get_info(), but directly return the value and size of the CCLWrapper* object, respectively.

The requested information is kept in the information table of the respective wrapper object. When the wrapper object is destroyed, all the information objects are also released. As such, client code does not need to worry about freeing objects returned by the ccl_wrapper_get_info*() methods. These also accept a use_cache boolean argument, which if true, causes the methods to first search for the information in the wrappers information table, in case it has already been requested; if not, they proceed with the query as normal.

Client code will commonly use the info macros of each wrapper in order to fetch information about the underlying OpenCL object. These macros expand into the ccl_wrapper_get_info*() methods, automatically casting objects and values to the appropriate type, selecting the correct clGet*Info() function for the object being queried. The cache is never used by the info macros.

The CCLDevContainer class

The intermediate CCLDevContainer* class provides functionality for managing a set of CCLDevice* wrapper instances, abstracting code common to the CCLPlatform*, CCLContext* and CCLProgram* classes, all of which internally keep a set of devices. The CCLDevContainer* class contains three "abstract" methods for accessing the associated set of CCLDevice* wrappers, namely:

  • ccl_dev_container_get_all_devices() : get all CCLDevice* wrappers in device container object.
  • ccl_dev_container_get_device() : get CCLDevice* wrapper at given index.
  • ccl_dev_container_get_num_devices() : return number of devices in device container object.

Concrete wrapper implementations rely on this functionality, which is exposed to client code via specific methods, e.g. in the case of CCLProgram* objects, these are ccl_program_get_all_devices(), ccl_program_get_device() and ccl_program_get_num_devices(), respectively.

The CCLMemObj class

The relationship between the CCLMemObj* class and the CCLBuffer* and CCLImage* classes follows that of the respective OpenCL types. In other words, both OpenCL images and buffers are memory objects with common functionality, and cf4ocl directly maps this relationship with the respective wrappers.

The GLib and OpenCL dependencies

cf4ocl relies heavily on its two dependencies: GLib and OpenCL. In previous versions of cf4ocl no special care was taken to shield client code from these dependencies, and in some cases client applications were required to link against them during the build process. However, developers may not wish to tackle additional libraries, and keeping cf4ocl self-contained promotes a cleaner build process. As such, since version 2.1.0, client applications are not required to link against either GLib or OpenCL, except if they specifically wish to use the functionality provided by these libraries.

One feature of GLib previously exposed to cf4ocl client code was its error reporting framework, since cf4ocl uses it for internal and external error handling. From version 2.1.0 onwards, this framework is no longer directly exposed to client code. Two backward-compatible changes were implemented:

Newly written client code should use the CCLErr class and the ccl_err_clear() function in order to avoid a direct dependency on GLib.

A second aspect of GLib indirectly exposed to client code is the use of its pointer arrays as the underlying type for the CCLDevSelDevices, CCLDevSelFilters and CCLEventWaitList classes. The last two are automatically freed in typical client code usage, but there can be situations in which CCLDevSelDevices objects may have to be explicitly released. This is can be accomplished with the ccl_devsel_devices_destroy() function, which is a wrapper for GLib's g_ptr_array_free() function. Thus, client code never needs to directly or explicitly manage GLib pointer arrays.

Finally, several cf4ocl functions, such as ccl_devsel_get_device_strings(), return a NULL-terminated array of strings. Previously, these arrays were released using the g_strfreev() GLib function. From version 2.1.0 onwards, cf4ocl provides the ccl_strv_clear() for this purpose. In practice this function is just a wrapper for g_strfreev(). Thus, this change does not break compatibility with existing client code.

Log messages

cf4ocl internally uses the GLib message logging framework to log messages and warnings. cf4ocl log output is handled by GLib's default log handler, which outputs warnings and messages to stderr. Debug messages are also shown if the G_MESSAGES_DEBUG environment variable is set to cf4ocl2. If client code wishes to redirect this output, it can do so by specifying another log function for the cf4ocl2 log domain with g_log_set_handler(). For example:

/* Log function which outputs messages to a stream specified in user_data. */
void my_log_function(const gchar *log_domain, GLogLevelFlags log_level,
const gchar *message, gpointer user_data) {
g_fprintf((FILE*) user_data, "[%s](%d)>%s\n",
log_domain, log_level, message);
}
FILE* my_file;
/* Add log handler for all messages from cf4ocl. */
g_log_set_handler("cf4ocl2", G_LOG_LEVEL_MASK | G_LOG_FLAG_FATAL | G_LOG_FLAG_RECURSION,
my_log_function, my_file);

This requires the client application to be linked against GLib..