Interview with Prof. Stephan Günnemann on the technology behind AI
Artificial intelligence (AI) is being used in more and more areas and has become a buzzword. But at what point do we speak of AI and no longer of automation or machine learning. In this interview, Stephan Günnemann, professor of Data Analytics and Machine Learning at the Technical University of Munich (TUM), explains the key technological aspects of AI and the significance of the latest developments. At the same time, he assesses Germany’s progress with the technological development of AI as compared to other countries.How can we distinguish AI from concepts such as automation and machine learning?
To begin with, the problem is that intelligence is difficult to define. The public generally sees AI in the context of human-like intelligence. Examples of this are robots that can sense their surroundings, grasp objects or perform similar tasks. But that is not the only type of application where AI plays a role.
Other key areas include the development of drugs for personalized medicine or in the design of new materials. But given the fact that human skills are not deployed in those applications, it would be more appropriate to describe these methods as machine learning. The decisive factor is that the behavior of the methods is learned from data - in contrast to conventional automation, in which the functionality is programmed from the start.
How does an AI algorithm work from a technological standpoint?
Every algorithm always starts with the data that are collected. These data can originate from all kinds of systems, for example social networks or sensors in technical equipment. The machine learns a highly complex function that helps it to recognize patterns in these data, for example to classify certain objects or make predictions in the future. These functions are generally so complex that they are beyond the grasp of an individual person.
Wouldn’t it be important to know how AI arrives at certain decisions?
I have to answer that with another question: Isn’t a human being, as an alternative to AI, the perfect black box? We’re often unable to fully explain how we reach our own decisions. With AI, in my opinion, one does not always need an explanation. Explainable really means understandable for a person. And at this point, I’m a bit skeptical. Does a person always need to be able to grasp what an algorithm is doing? For some tasks, a person simply lacks the necessary expertise.
Can you think of an example?
Suppose, for example, that an algorithm automatically creates certain drugs: is it really crucial to know how it got there? Or is it only important to know that we now have a drug that has certain characteristics? In this case, why do I need to be able to retrace the machine’s decision making process? But in clinical practice the situation is different. Here it is important to know how the algorithm arrives at a certain diagnosis. That is an application area where I can imagine people and machines working together in the future.
Key application areas for AI can be found not only in medicine and material design, but also in the creation of media content. The latest trend is text-to-video generation. How far have we come with the development of AI at this point? Are we still just plodding along or are we en route to general AI?
Of course the latest results in generative AI are very impressive. However, it is still too soon to speak of true intelligence. In particular, skills such as deductive reasoning and planning are not yet fully covered by these methods. Moreover, this leads to a major debate on whether it is even possible for true intelligence to be realized with the methods used to date.
What technological methods are used to turn texts into video?
In terms of the basic principles, this does not differ too much from text-to-image methods. At present, this primarily involves the use of diffusion models. Naturally, enormous volumes of data are used here, too. The models attempt to generate images from pure noise. Using a neural network, they learn how to predict the original version of images from noisy ones. The text is then used as additional input for the neural network to steer it in the right direction so that it generates other images or videos, depending on the input. Of course many other components play a role so that excellent results can be achieved in practice.
How do Germany and Europe compare at the international level when it comes to the development of AI technology?
In research, we are in excellent shape. But when it comes to the transfer of research into real-world applications, both Germany and Europe have some catching up to do. And then there is the issue of startups. That has a lot to do with our investment culture, where there is less focus on venture capital for very young, high-risk ventures - which is often the case especially in AI. There Americans are ahead of us in that regard, for example. But TUM does a lot to promote startups.
There is also the crucial question of the fields where German and European companies are using AI. And here it soon becomes clear that Germany and Europe are strong especially in areas where the reliability of AI plays an important role, for example autonomous driving or medicine. In contrast to language models like ChatGPT, in those fields it is important that decisions made by AI are truly correct. That means that a lot more development is needed before such technologies can be used in practice.