¶template <int Dims>
void constrain_split(
handler& cgh,
const range<Dims>& constraint)
template <int Dims>
void constrain_split(
handler& cgh,
const range<Dims>& constraint)
Description
Constrains the granularity at which a task's global range can be split into chunks. In some situations an output buffer access is only guaranteed to write to non-overlapping subranges if the task is split in a certain way. For example when computing the row-wise sum of a 2D matrix into a 1D vector, a split constraint is required to ensure that each element of the vector is written by exactly one chunk. Another use case is for performance optimization, for example when the creation of lots of small chunks would result in hardware under-utilization and excessive data transfers. Since ND-range parallel_for kernels are already constrained to be split with group size granularity, adding an additional constraint on top results in an effective constraint of LCM(group size, constraint). The constraint (or effective constraint) must evenly divide the global range. This function has no effect when called for a task without a user-provided global range.
Template Parameters
- int Dims
Parameters
- handler& cgh
- const range<Dims>& constraint