Finding the Path to Value in Engineering AI – Why We Need to Rethink Benchmarking

By Pierre Baqué, CEO & Co-Founder, Neural Concept

AI: Everywhere but in the Bottom Line?

Recent headlines echo reports that AI is failing to deliver the economic impact many had anticipated. Executives often tell me: AI is everywhere, except in my profit forecast. This situation is not unprecedented; it mirrors, on a shorter timescale, the early stages of other technological revolutions, such as the advent of the electric engine or even the digital era.  As economist David Levine reminded us 15 years ago, "The productivity gains of general-purpose technologies depend less on the technology itself than on the organizational innovations it enables.’

Most transformative technologies look disappointing in terms of short-term growth. Yet, looking back, we know that companies that failed to embrace digital transformation disappeared, while the most valuable firms of today are those that adapted and evolved.One key reason for this short-term disappointment is that organizations often chase low-hanging fruit, trying to replace isolated components of old technologies with new ones, when what is truly needed is a deeper, systemic transformation.

AI Physics Surrogates in Engineering: A telling example

In this article, I will illustrate this point with an example I know well: AI in engineering, and the temptation to treat one of its building blocks, 3D surrogates for physics, as a simple substitute for physics solvers.

Why Benchmarks Mislead

As Neural Concept is currently publishing flattering surrogate benchmark results on the popular DriveAerNet++ dataset, I have the chance to set the record straight. This is only a small part of the story and you should not stop there. Simulation engineers, in particular, need to unlearn some ingrained reflexes when working with surrogate models.

Simulation engineers are used to evaluating solvers, measuring accuracy, and validating them. So, when 3D surrogate models first appeared—Neural Concept was the pioneer here—engineers naturally applied the same habits. Before considering practical applications, they would define simplified problems and test accuracy. Now, they continue this pattern with benchmark comparisons.

This made sense in classical simulation, where stability and reliability largely depended on the type of physical phenomenon and the quality of the solver’s predictions (linear, non-linear, turbulence, Reynolds number, etc.). In that context, conclusions from toy datasets could often be generalized, assuming only that larger problems would be “a bit more complex.”

In data-driven approaches, these assumptions no longer hold.

Thinking “backwards” from production value

Neural Concept’s models are generally robust, because they have been battle-tested more than any others, as shown by the DriveAerNet++ benchmark, but I can never assume that one model configuration will outperform another in a specific setup. What truly matters is the distribution of the data, how it is pre-processed and post-processed, how it supports decision-making, and most importantly;, how it integrates into a production workflow. Production value comes from tools that designers can use directly, from workflows that enable meaningful generative design, and from removing bottlenecks in the design chain.

We need to step beyond our simulation reflexes and leave our comfort zone. A surrogate model is only one element in the broader toolbox of modern, data-driven engineering. It cannot be separated from the generative models that propose geometries, the interfaces through which users interact, or the ways results are post-processed. Surrogates do not deserve scrutiny in isolation, examined under a microscope before any real use case has even been defined.

The right way to judge a surrogate is to recreate the actual conditions of use: identify the training data, its source and frequency, and the types of insights or decisions that will depend on the surrogate. If you think of it only as a replacement for simulation, you will be disappointed. If instead you explore how AI, surrogates, generative models, and even “vibe-coding” can accelerate your workflow, you will save millions.

Proof from the Field

The life of a CEO in our field is full of doubts. But I have one certainty: whenever a skilled engineering team, applying an AI-first mindset, tackles a concrete business challenge, the gains are massive. In the energy sector, we have seen a major turbine manufacturer achieved in 2 months what had previously taken 3 years of R&D. At an automotive supplier, we have seen development efforts that once required 10 people being reduced to 2. Conversely, whenever a team evaluates surrogates as if they were solvers, the result is only confusion and frustration.This change requires more than a shift in mindset; it demands organizational transformation. The typical separation between “simulation” and “design” departments is a legacy of a time when running a single simulation was a scientific achievement. That era is mostly behind us. With multi-physics, generative design, and AI, reorganization is key. The analytical, scientific mindset of simulation engineers must move closer to, or even into, design. Independent simulation teams may still make sense when companies need to develop fundamental physical models, but this is increasingly rare.

WIthin each engineering team, until this reorganization occurs, or at least the intent exists, we are repeating a well known pattern: installing electric engines in steam factories, or printing every document typed on a computer. Differently from those previous transformations, however, the journey will be much faster this time as all its key ingredients are available and already broadly accessed at scale.