vllm.model_executor.kernels.mhc.triton ¶
_hc_head_triton ¶
_hc_head_triton(
hs_flat: Tensor,
fn: Tensor,
hc_scale: Tensor,
hc_base: Tensor,
out: Tensor,
hidden_size: int,
rms_eps: float,
hc_eps: float,
hc_mult: int,
) -> None
Fill pre-allocated out (T, H) in-place with the hc_head result.
Source code in vllm/model_executor/kernels/mhc/triton.py
_rmsnorm_nw_kernel ¶
Weight-free RMSNorm Triton kernel: out = x * rsqrt(mean(x², -1) + eps).
Source code in vllm/model_executor/kernels/mhc/triton.py
rmsnorm_nw ¶
Weight-free RMSNorm over the last dimension.
Treats x as [num_rows, D] where num_rows = product(shape[:-1]). Returns a contiguous tensor with the same shape and dtype as x.