vllm.config.quantization ¶
QuantSpec ¶
Quantization spec for one layer kind (linear or MoE).
None on either side means the method class falls back to its own default (typically inherited from the checkpoint, or unquantized for online).
Source code in vllm/config/quantization.py
QuantizationConfigArgs ¶
User-facing quantization configuration.
See docs/features/quantization/online.md for the schema and shorthand string forms accepted on linear and moe.
Source code in vllm/config/quantization.py
resolve_quantization_config ¶
resolve_quantization_config(
quantization: str | None,
quantization_config: dict[str, Any]
| QuantizationConfigArgs
| None,
) -> QuantizationConfigArgs | None
Resolve --quantization shorthand and --quantization-config into a QuantizationConfigArgs.
quantization is a CLI shorthand that desugars into a base config via _ONLINE_SHORTHANDS. quantization_config is a dict or pre-built args object. When both are given, fields explicitly set in quantization_config take precedence over the shorthand.