XPENG unveils X-Mind framework to advance predictive AI for autonomous driving

Gasgoo· June 30, 2026

XPENG has introduced its new X-Mind technology framework, an AI architecture designed to enhance predictive reasoning and decision-making in autonomous vehicles. By utilizing an embedded predictive world model and a Visual Chain of Thought, the system allows vehicles to simulate future traffic scenarios and evaluate outcomes before taking action. This shift from reactive to proactive AI represents a significant step toward achieving safer, more human-like driving capabilities in production-ready autonomous systems.

XPENG’s General Intelligence Center head, Liu Xianming, detailed the X-Mind framework at the CVPR 2026 Workshop, emphasizing three core capabilities: proactive reasoning, controllable generation, and long-horizon temporal prediction. Unlike traditional perception-to-action pipelines that react only to immediate sensor data, X-Mind enables an in-vehicle AI agent to perform complex spatiotemporal reasoning. This approach addresses the limitations of current AI methods, such as text-based reasoning’s inability to grasp geometric relationships and the computational inefficiency of image-based future generation.

The framework introduces a Visual Chain-of-Thought (Visual CoT) that allows the model to carry out explicit reasoning before generating control actions. To make this practical for real-time deployment, XPENG utilizes thought sketches—compact visual representations that combine bird’s-eye-view layouts with abstract driving priors like lane boundaries, obstacles, and traffic signal status. By using a deep compression autoencoder (DC-AE), the system compresses 12 future frames of predictive data into just 96 tokens, significantly reducing the computational load compared to processing high-resolution images or 3D reconstructions.

X-Mind serves as a reasoning layer for XPENG’s vision-language-action (VLA) models, working alongside the previously announced X-Foresight framework. While X-Foresight focuses on predicting future multi-view scenes, X-Mind provides the cognitive inference necessary to make the model's decision-making process transparent and interpretable. Together, these technologies aim to advance VLA architecture toward a general-purpose physical AI capable of understanding real-world physics and anticipating future events. To further improve predictive efficiency, XPENG developed a Recursive Block Diffusion (RBD) mechanism, which performs the generative process internally across different layers of the driving model.

Read the full story at Gasgoo

Summary generated by RabbitReport AI from public reporting. The full article and original reporting belong to Gasgoo.