The Reward Principle – Reinforcement learning as function development 4.0

Thanks to reinforcement learning, neural networks can intervene autonomously and with foresight and support controllers in maintaining setpoints even under disturbing influences. IAV brings the methodology of “reinforcement learning” to automotive development and has applied it to projects such as boost pressure control – with the aim of making the concept of neural networks fit for series production.

Artificial Intelligence (AI) that plays Atari games independently and successfully – such messages regularly make it through the media. This is made possible by reinforcement learning. In reinforcement learning, a software agent independently learns a strategy through the principle of reward. It is a bit like conditioning: The right decision is rewarded – in the world of AI: provided with a positive feedback – and therefore aimed at in the future. In this way, AI improves its experience and thus its performance by trial and error. This holds enormous potential, especially for development in the automotive sector. “At IAV, we see reinforcement learning as a key component of future functional developments,” says Dr. Christian Kruschel, Manager Data Science. “It can be used to solve problems for which there has not yet been a satisfactory answer.”

Neural networks to supplement existing controllers

IAV has already successfully applied the method for customers and has supplemented existing controllers in external and internal projects with neural networks and significantly improved performance. In one project, for example, a neural network trained with reinforcement learning was able to optimize boost pressure control and ensure that the desired setpoint values were achieved quickly and without overshooting the boost pressure. The result is not only visible to the developer; the driver can feel it in the behavior of their car. “Especially in dynamic situations where the controllers used have poor performance, neural networks can be used as a supplementary variable,” says Dr. Dennis Schmidt, Data Scientist at IAV. Through reinforcement learning, they learn how the controller needs to be amplified or attenuated in order to achieve the optimum at the current point in time and with foresight in the future. “The concept of supplementing controllers in critical situations is not newWir können schnell Daten– but often current models do not have the flexibility to react adequately to complex, dynamic situations.”

«With reinforcement learning, problems to which there has not yet been a satisfactory answer can be solved.»

Dr. Christian Kruschel — Teamleiter Data Science at IAV

Active instead of reactive

Systems trained by reinforcement learning have a great advantage: They recognize that an error could occur in the future and actively intervene to prevent it. “Many of the controllers used, on the other hand, can only react to the control deviation between the actual value and the target value and thus only readjust,” says Schmidt. But using neural networks alone as controllers is still a pipe dream. “As long as the quality criteria that apply to conventional controllers are not guaranteed for neural networks, we will not rely on this procedure alone,” says Kruschel. “That would not be compatible with our high quality standards.” In addition, he says, you have to weigh up anew in every system which methodology you use. “A neural network is only one of many possibilities, even if it is currently trending,” says Kruschel. In principle, the approach can be transferred to similar use cases.

Fit for the series

Thanks to a safeguarding strategy, IAV makes neural networks fit for use in series production – even though, unlike conventional methods, their decision-making processes are not transparent and are therefore difficult to safeguard. “Put simply, we cannot predict how they will react in unknown situations,” explains Kruschel. To solve this problem, IAV and research partners have developed a concept called Safety Supervisor specifically for ECU-related applications. It is a monitoring system to which the neural network reports the results it calculates. The Safety Supervisor decides for itself whether it can trust the result or has to switch on a substitute system to play it safe.

«We can process data quickly, train a neural network efficiently through our high-performance cluster and bring it into the ECU just as quickly.»

Dr. Christian Kruschel — Teamleiter Data Science at IAV

Optimized data processing process

Another challenge: The neural network is developed and trained on a high-performance cluster; the computing resources exceed the conditions prevailing on a control unit. Not only does the memory limit the size of the neural network, but also the execution time in the control unit must be less than one millisecond. The solution is called Neural Network Compression: It enables neural networks to be reduced in size so that they require fewer resources but still deliver the same performance. “We can process data quickly, train a neural network efficiently through our high-performance cluster and bring it into the ECU just as quickly,” says Kruschel. “The entire data processing process is optimized at our company.” IAV uses a fully automated workflow for this – and also relies on its domain knowledge. “We combine comprehensive know-how in the automotive sector and in the latest methods of artificial intelligence, develop our methods in-house and bring the solutions safely to production maturity – in short: At IAV, we can offer everything from a single source.”

The article was published in automotion 03/2020, the automotive engineering magazine of IAV. Here you can order the automotion free of charge.

Stay up to date

Subscribe to the newsletter