Skip to main content

enum class out_of_order_engine::target

Description

Identifies a category of execution resources that may be distinguished further by a device- or lane id.

Enumerators

NameValueComment
immediate0Execution can begin immediately, no queueing takes place in the backend and no lane is assigned. Used for low-overhead instructions that do not profit from additional concurrency such as horizons, as well as for instructions where asynchronicity is managed outside the backend (p2p transfers through communicator and receive_arbiter).
alloc_queue1The instruction shall be inserted to the backend's (singular) host allocation queue. No lane is assigned. Since at least CUDA serializes the slow alloc / free operations anyway to update page tables globally, the added concurrency from multiple thread queues would not increase throughput. The separation between alloc_queue and host_queues further enforces a host round-trip between every alloc_instruction and its first successor, which we use to inform the executor of the newly allocated pointer for the purpose of accessor hydration.
host_queue2The instruction shall be submitted to a backend thread queue identified by the lane id. Used for host tasks and host-to-host copies.
device_queue3The instruction shall be submitted to a backend in-order device queue identified by the lane id. Used for device kernels and host-to-device / device-to-device / device-to-host copies.