Renderer APIs¶

Our renderer API is designed to disaggregate the render phase(preprocessing) and enable a token-in / token-out API server.

GPU-less deployment of frontend: Allow preprocessing (tokenization, MM input processing) and postprocessing (detokenization, tool call parsing, reasoning parsing) to run without GPU.
Disaggregated tokenization: Support use cases such as llm-d, Dynamo, and custom frontends that need to leverage vLLM's preprocessing logic without running the full inference engine.
Tokens-in / tokens-out engine: Make the engine a pure token-in / token-out service, decoupled from request preprocessing.

API Reference¶

Completions Render API (/v1/completions/render)
- Render completion requests
Chat Completions Render API (/v1/chat/completions/render)
- Render chat completions