Skip to content

Smart data storage key in autonomous vehicle training

Autonomous vehicles are far from production readiness, but the challenge of handling all that data is already weighing heavily on developers, writes Freddie Holmes

Machine learning, often referred to as artificial intelligence (AI), is widely expected to reshape the automotive industry, finding use in anything from dealerships and servicing to digital assistants and autonomous driving.

Machine learning is not new—it has been around for decades—but as vehicles become increasingly digital, AI has found itself at the forefront of most major automakers’ marketing schemes. It is not only a buzzword, however; significant investments are being made to back up the apparent interest.

In March 2019, ZF announced that a new tech centre for AI and cyber security would be established in Saarbrücken, Germany. Continental plans to add an additional 400 AI experts to its ranks by 2021. The supplier has also partnered with the German Research Centre for Artificial Intelligence (DFKI), the largest non-profit AI-research organisation.

Automakers have also made their interests clear. PSA Group has formed a dedicated AI lab with Inria, the French National Institute for computer science and applied mathematics; Volkswagen Group has highlighted AI as ‘a key competitive factor’; and Mercedes-Benz Cars believes AI is a ‘central’ topic to the future of vehicle development, production and services. Ford’s investment of US$1bn in Argo AI, staggered over five years, further underlines the apparent thirst for machine learning expertise.

But it is one thing being able to source the necessary expertise in AI; developers must be able to put all of that data to use at speed and scale.

Argo AI Self-Driving Car
The automotive industry has taken a keen interest in artificial intelligence

AI can be trained by exposing it to vast sums of data, and autonomous vehicles are a particularly challenging application. So-called super computers are being embedded within these vehicles to perform complex calculations required to control the vehicle electronically. Part of this involves the fusion of various sensor and connectivity technologies—cameras, LiDAR, infrared and GPS, for example. Today’s AV test prototypes typically house between four and six cameras and one to five LiDAR sensors, which continuously produce complex data sets.

“Each one of those sensors pulls off a great deal of data that has to be analysed in real time,” explained Keith Rieken, Solutions Manager, Artificial Intelligence at Pure Storage. Speaking during a recent Automotive World webinar, Rieken explained that current estimates suggest a Level 5 autonomous vehicle—that which does not allow for human control at any point—generates between one and 20 terabytes per hour. “Those are estimates of course, because no one has developed a Level 5 vehicle,” he added. “But this is a challenge because in order to train these neural networks, the amount of data required is even larger.”

The environment around the vehicle and any immediate or potential hazards need to be modelled, and in a myriad of conditions. The scale of the data sets are “almost innumerable,” he noted. The brain of the autonomous vehicle—a computer system known as the neural network—must be exposed to as much of that data as possible in order to familiarise itself with those conditions. It is not intelligence, per se, but more an ability to calculate the ideal reaction to a given scenario—slowing to avoid colliding with a pedestrian stepping into the road without looking, for example, in different lighting and weather conditions, at different speeds and locations on the road.

“That’s the challenge we see today—how do we sufficiently train those networks in order to achieve full autonomy?” said Rieken. Typical validation programmes for AV neural networks require a minimum of 20 petabytes of training data, a figure that is growing rapidly, he advised. This is why a company like Pure Storage—traditionally known as an enterprise flash storage provider—is finding traction within the automotive space.

Modern AV sensors are able to create far richer data sets (Source: Velodyne)

The data requirements of an AV dwarf the automotive industry’s requirements of the past, and data scientists are required in order to make that data useful. “We spend about 70% of our development time managing, labelling and processing that data, and transforming it into something we can use just to train the neural networks,” explained Rieken.

In working with Zenuity, a joint venture born of the Autoliv-Volvo partnership, Pure Storage has been able to optimise how training data is handled. Initially, a group of data scientists would work from individual laptops to develop their models, which were based on ‘very limited’ training sets. At this point, validation was not being carried out at all.

However, as projects grew and validation needed to be carried out across larger data sets—storage capacity was doubling every 18 months, according to Pure Storage—Zenuity’s teams had to move into a server environment, which allowed for shared access to these data sets. It has since progressed to utilise Pure Storage’s Flashblade architecture, which consolidates groups of compartmentalised data sets—known as ‘data silos’—into easily accessible hubs.

If developing an autonomous vehicle was not challenging enough from a hardware and software perspective, the simple task of handling all of the associated data will only make things harder. It is deemed crucial that storage solutions are adopted in order to cut development time in a safe manner. “The internet and all of the new sensor technologies have enabled more access to data,” concluded Rieken. “The challenge now is how to store and manage that data, and present it to the compute in an efficient way to rapidly iterate and innovate AI-based systems.”

Welcome back , to continue browsing the site, please click here