Is "end-to-end" the optimal solution for autonomous driving?

  Author: Yang Zhongyang

  Recently, "end-to-end" has been on fire in the car circle! Tesla’s benchmark demonstration effect based on the "end-to-end" FSDV12 (fully autonomous driving) scheme, and the rumors of joining China, have driven companies such as "Wei Xiaoli" and service providers such as Huawei and Horizon to turn around and add end-to-end autonomous driving technology.

  The so-called "end-to-end" actually comes from the concept of deep learning, and English is "end-to-end (E2e)", which means that through an AI model, the final result can be output as long as the original data is input. When it is applied to the field of automatic driving, it means that only one model is needed to convert the sensing information collected by sensors such as camera, millimeter-wave radar and laser radar into specific operating instructions such as the steering angle of the vehicle, the stepping depth of the accelerator pedal and the braking strength, so that the vehicle can realize automatic driving. In the words of He Xiaopeng, the founder of Xpeng Motors, it is "very silky" and more like "human drivers driving".

  Previously, most autonomous driving systems on the market were traditional modular, that is, a mix-and-match system with artificial and intelligent worlds: perception relied on neural networks, and planning and control used algorithms designed manually by humans. The advantage of this system is that the division of labor is clear, and it is convenient to check and solve the defects in modules. But the problem is that this modular automatic driving system performs well in relatively simple driving tasks, and its ceiling is obvious in the face of complex driving tasks. Even the so-called advanced intelligent driving function of the city, which is far ahead, still has a sense of machinery, and it will also stop when it merges into the expressway and passes through a large intersection.

  Considering that the core challenge of autonomous driving is to solve endless edge scenes, the cost and time of solving the infinite long tail problem with limited manpower are incalculable, and data and modeling have become an inevitable trend. However, end-to-end, it is also a difficult technical job that needs careful polishing by the master.

  On the one hand, end-to-end needs massive high-quality data "feeding" training. Unlike the big language model, which can crawl a large amount of text data on the Internet for training, the cost and difficulty of obtaining video data for end-to-end intelligent driving are extremely high. Take Tesla as an example. At present, its FSD has accumulated more than 20 million human driving video clips, and the data collection cost of this scale only needs 5 billion to 8 billion yuan.

  On the other hand, end-to-end needs the support of powerful computing power. Automatic driving involves technologies and solutions such as lidar, image perception and V2X vehicle-road coordination. Powerful computing power is not only conducive to real-time processing of massive data, reducing data transmission delay, but also better supporting the whole scene for smart cities, smart transportation, high-level autonomous driving and so on. However, the computing power growth of domestic enterprises such as Huawei BU, Baidu Jiyue, Weilai, Ideality, Geely, Great Wall, Tucki, etc. are currently facing major bottlenecks.

  The problem is that the constraints of computing power and data will significantly affect the development of the algorithm. Although UniAD, an end-to-end autopilot model put forward by domestic academic circles, won the Best Paper Award of CPVR in 2023, which provides a reference direction for domestic enterprises, but UniAD developed under the open-loop verification system and small sample data still needs some engineering transformation and large-scale data training.

  In addition, the upper and lower limits of the autopilot system will be enlarged end to end. Because the end-to-end construction is a neural network black box, in the process of obtaining a higher upper limit, some of the interpretability of the traditional module scheme is transferred. How to retain the interpretability in the autopilot system and characterize the rules that should not be overstepped, such as don’t run a red light, into the neural network to ensure the end-to-end application and evolution safely, will also be an important topic for regulatory engineers.

  There are two routes to climb Mount Everest: one is the northern slope of Tibet, China, and the other is the southern slope of Nepal. Whether you choose to climb from the south slope or the north slope, you will eventually reach the same peak. This is similar to the current development path of autonomous driving. Although it is still difficult to determine that end-to-end is the optimal solution or final solution of autonomous driving, this does not hinder the innovation and exploration of enterprises. After all, end-to-end can handle extreme cases better than traditional modular methods, and it represents a more efficient way to reduce the dependence on manual coding. Based on this path, perhaps autonomous driving can lead to a higher stage. (Yang Zhongyang)

[Editor in charge:
Jin lingbing
]