vllm.v1.kv_offload.tiering.base ¶
Abstract interfaces and data types for the secondary tiering layer.
JobMetadata dataclass ¶
Metadata for an in-flight async transfer job.
Source code in vllm/v1/kv_offload/tiering/base.py
JobResult dataclass ¶
SecondaryTierManager ¶
Bases: ABC
Abstract interface for managing a single non-primary offloading tier.
Secondary tiers cannot directly access GPU memory. All data transfers must go through the CPU (primary) tier: - Store: GPU → CPU (primary) → secondary (cascade) - Load: secondary → CPU (primary) → GPU (promotion)
IMPORTANT: All methods run in the Scheduler process and must be lightweight and non-blocking. submit_load() and submit_store() submit async jobs; get_finished() polls for completion.
Source code in vllm/v1/kv_offload/tiering/base.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |
__init__ ¶
__init__(
offloading_spec: OffloadingSpec,
primary_kv_view: memoryview,
tier_type: str,
) -> None
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
offloading_spec | OffloadingSpec | Offloading configuration. | required |
primary_kv_view | memoryview | Memoryview of the primary tier's CPU KV cache. | required |
tier_type | str | Tier type identifier, set by SecondaryTierFactory from the registered tier type. | required |
Source code in vllm/v1/kv_offload/tiering/base.py
get_finished abstractmethod ¶
Return all jobs (loads and stores) that completed since the last call.
The framework uses these results to release resources and finalize transfers.
Returns:
| Type | Description |
|---|---|
Iterable[JobResult] | Iterable of JobResult objects for jobs finished since the |
Iterable[JobResult] | last call. |
Source code in vllm/v1/kv_offload/tiering/base.py
lookup abstractmethod ¶
lookup(
key: OffloadKey, req_context: ReqContext
) -> bool | None
Check whether a block exists in this secondary tier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key | OffloadKey | Offload key to look up. | required |
req_context | ReqContext | per-request context (e.g. kv_transfer_params). | required |
Returns:
| Type | Description |
|---|---|
bool | None | True if the block is present and ready, |
bool | None | False if not found, |
bool | None | or None if the block is being transferred (retry later). |
Source code in vllm/v1/kv_offload/tiering/base.py
shutdown ¶
submit_load abstractmethod ¶
submit_load(job_metadata: JobMetadata) -> None
Submit an async job to load blocks from this secondary tier to the primary tier.
This method must be lightweight and non-blocking: mark blocks as in-flight and submit the transfer, but do NOT perform the data copy on the calling thread.
Preconditions (guaranteed by the framework): - job_metadata.block_ids are allocated primary-tier slots ready to receive data.
The implementation must copy data from this tier into the primary-tier slots identified by block_ids.
Report completion via get_finished().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_metadata | JobMetadata | Job metadata including job_id, keys, and block_ids identifying the primary-tier slots to write into. | required |
Source code in vllm/v1/kv_offload/tiering/base.py
submit_store abstractmethod ¶
submit_store(job_metadata: JobMetadata) -> None
Submit an async job to store blocks from the primary tier to this secondary tier.
This method must be lightweight and non-blocking: allocate metadata and submit the transfer, but do NOT perform the data copy on the calling thread.
Preconditions (guaranteed by the framework): - job_metadata.block_ids are valid primary-tier slots, pinned (ref-counted) for the duration of the transfer.
The implementation is responsible for
- Filtering out blocks already present in this tier
- Evicting blocks if capacity is needed
- Allocating space in this tier
- Submitting the async transfer (read from primary via block_ids)
Report completion via get_finished().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_metadata | JobMetadata | Job metadata including job_id, keys, and block_ids identifying the primary-tier slots to read from. | required |
Source code in vllm/v1/kv_offload/tiering/base.py
touch ¶
touch(
keys: Collection[OffloadKey], req_context: ReqContext
)
Mark blocks as recently used for eviction policy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys | Collection[OffloadKey] | Offload keys to mark as recently used. | required |
req_context | ReqContext | Per-request context. | required |