enum class out_of_order_engine::target
Description
Identifies a category of execution resources that may be distinguished further by a device- or lane id.
Enumerators
Name | Value | Comment |
---|---|---|
immediate | 0 | Execution can begin immediately, no queueing takes place in the backend and no lane is assigned. Used for low-overhead instructions that do not profit from additional concurrency such as horizons, as well as for instructions where asynchronicity is managed outside the backend (p2p transfers through communicator and receive_arbiter). |
alloc_queue | 1 | The instruction shall be inserted to the backend's (singular) host allocation queue. No lane is assigned. Since at least CUDA serializes the slow alloc / free operations anyway to update page tables globally, the added concurrency from multiple thread queues would not increase throughput. The separation between alloc_queue and host_queues further enforces a host round-trip between every alloc_instruction and its first successor, which we use to inform the executor of the newly allocated pointer for the purpose of accessor hydration. |
host_queue | 2 | The instruction shall be submitted to a backend thread queue identified by the lane id. Used for host tasks and host-to-host copies. |
device_queue | 3 | The instruction shall be submitted to a backend in-order device queue identified by the lane id. Used for device kernels and host-to-device / device-to-device / device-to-host copies. |