Renderer APIs¶
Our renderer API is designed to disaggregate the render phase(preprocessing) and enable a token-in / token-out API server.
- GPU-less deployment of frontend: Allow preprocessing (tokenization, MM input processing) and postprocessing (detokenization, tool call parsing, reasoning parsing) to run without GPU.
- Disaggregated tokenization: Support use cases such as llm-d, Dynamo, and custom frontends that need to leverage vLLM's preprocessing logic without running the full inference engine.
- Tokens-in / tokens-out engine: Make the engine a pure token-in / token-out service, decoupled from request preprocessing.
API Reference¶
- Completions Render API (
/v1/completions/render)- Render completion requests
- Chat Completions Render API (
/v1/chat/completions/render)- Render chat completions