There was a time when virtually all applications were executed by just one type of device: the CPU. That time has ended. Since the release of OpenCL 1.0 in 2009, developers have had an easy way of offloading application execution to other types of devices — or, in some cases, of using CPUs in new ways to execute software.
This doesn’t mean, of course, that every application should use OpenCL. Obviously, we live in a world where CPUs remain the primary number crunchers for most standard applications.
However, when used for the right purpose and deployed in the right way, OpenCL can be a game changer. It enables levels of speed and efficiency that were simply impossible in earlier years of the history of computing.
With this in mind, you may find yourself wondering when, exactly, you should use OpenCL. What are the main advantages and use cases for OpenCL, and what are examples of situations where OpenCL won’t provide a major benefit?
This article explores those questions. It begins with an overview of what OpenCL is and which implementations are available. It then walks through prime use cases for OpenCL, and concludes with a few notable examples of OpenCL being used on a large scale today.
What is OpenCL?
A basic definition of OpenCL: It’s a framework for writing software that can be executed on many different types of computational devices.
Ask a developer on the street to tell you what OpenCL means, and you may get a response about writing applications that are executed using GPUs instead of CPUs.
In fact, OpenCL is about more than just offloading execution to GPUs. OpenCL makes it possible to write and deploy applications that use multiple types of devices for processing. They include not just GPUs, but also CPUs (which, again, can be used effectively in conjunction with OpenCL in certain scenarios), Digital Signal Processors (DSPs) and other types of hardware accelerators.
For this reason, a proper definition of OpenCL is: a framework for creating applications that can be executed across heterogeneous computational units. In this context, heterogeneous refers to a diverse set of hardware components that are capable of performing computation. It includes much more than just GPUs.
Also important to keep in mind is that OpenCL itself is merely a specification. There are multiple implementations of OpenCL from both hardware and software vendors. Generally speaking, an application that uses OpenCL is compatible with any implementation of the framework — although in some cases OpenCL implementations include special extensions that aren’t supported by all implementations.
When to use OpenCL
OpenCL is very widely supported on a variety of software platforms and hardware devices. Its syntax, which is similar to C, is also easy to learn for most programmers who already know the latter language (which is most programmers, given that C is the second-most popular programming language). All of this means that you can use OpenCL in a wide variety of situations.
The real question for developers, however, is whether you should use OpenCL. Although OpenCL rarely hurts application execution, it provides more benefits in some use cases than in others.
Here are criteria to consider when deciding whether to use OpenCL.
Is your target hardware supported?
This is the first and most obvious question to ask. You need to make sure that the devices where your OpenCL software will be deployed support OpenCL.
Chances are fairly good that they will. Most modern GPUs and CPUs that are designed for standard consumer devices support OpenCL. Support can be less thorough when you are dealing with specialized types of devices. But either way, information about OpenCL support is typically easy to find within the specifications of whichever devices you are working with.
Do your target software environments support it?
The second obvious question to ask is whether the software environments you are targeting support OpenCL. Here, too, all mainstream modern software platforms (including Windows, macOS, Android, Linux and FreeBSD) are OpenCL-compatible, and most come with OpenCL preinstalled.
If you are building OpenCL software for a different type of environment, or one in which you can’t count on OpenCL drivers being available by default (which could be the case on certain Android phones, for example, if a vendor chooses not to provide OpenCL by default), you may need to provide OpenCL drivers along with the software you are building. This is typically easy enough to do, although you should also consider in this case how difficult the OpenCL drivers will be for end-users to install and how you can simplify the installation process for them.
Can your code use parallelization?
The main reason why OpenCL can make it possible to execute applications more quickly than you could using a standard, CPU-based environment is that OpenCL allows you to use many compute devices at once. Offloading application execution to devices like GPUs is powerful not because the individual compute facilities inside GPUs are faster than those in a CPU, but because GPUs typically contain many more cores than a standard CPU. Therefore, they enable faster execution because you can run more application threads at once.
In order to take effective advantage of the greater number of cores to which OpenCL gives you access, your code must be written to support parallelization — which means the ability to run multiple processes or tasks at the same time.
If you’re writing an application from scratch, designing it for a parallel architecture is usually quite possible. But if you’re trying to add OpenCL code to an existing application, implementing the code in a way that takes full advantage of the parallelization support of the devices you are targeting could be more challenging.
Your data won’t bottleneck between devices
A common challenge for taking advantage of OpenCL is bottlenecks. Bottlenecks occur when you can’t transfer data quickly enough between the devices that are executing OpenCL code and the “host” device of your system, which is typically a standard CPU. The bottlenecks occur because the bus that connects these two components can’t handle rapid transfer speeds.
There are things you can do to mitigate this challenge, such as asynchronous data transfers, which reduce the risk of bottlenecks by moving data more efficiently. You can also avoid the limitation by running OpenCL code on a device where you don’t have to transfer data out at all. This is one common reason for running OpenCL on a CPU rather than a GPU, because the CPU can access standard system memory more quickly than a GPU.
But if the bus that you need to work with is just too slow, workarounds will only get you so far. The point: Before deciding to use OpenCL, review the specifications of the system you are working with to make sure that they won’t leave you bottlenecked. In addition, consider the type of application you’re writing, and how often you will need to move data between devices and hosts.
You know C
As noted above, OpenCL syntax is based on that of C, so if you know C (or C++, for that matter), writing OpenCL code should not be particularly hard.
Keep in mind, too, that not all of your application needs to be written in OpenCL in order to take advantage of OpenCL. You can use OpenCL in conjunction with virtually any programming language by writing a bit of OpenCL kernel code.
OpenCL use cases
What do real-world use cases for OpenCL look like? Here are some examples:
- OCR: Optical character recognition, or OCR, is a compute-intensive task. OpenCL can significantly improve the performance of OCR applications, provided that the applications take advantage of parallelization.
- Image recognition. Identifying patterns or objects in images for purposes such as facial recognition is another compute-intensive task for which OpenCL can come in handy.
- Bitcoin mining. If you know much about Bitcoin, you know that the process for mining Bitcoin requires immense computing power — so much, in fact, that mining Bitcoin using standard CPUs is not practical today. For that reason, a variety of OpenCL Bitcoin- mining applications have been created.
- Image resizing. Resizing images is another compute-intensive operation that can benefit from OpenCL