
Professor Wit, what does understanding the world through data science mean?
"Data science combines statistical methods, computational techniques and domain-specific knowledge to identify relationships within extracted data. As a result, it gives us the ability to make informed decisions that go beyond intuition, anecdotes or isolated facts. Our goal is to filter out background noise, search for correlations, and understand why certain phenomena occur. By doing so, we can gain a better understanding of social behaviour and economic trends and even make more accurate predictions about future developments. It is important to note that data science does not replace decision-makers; rather, it enhances the decision-making process and helps facilitate better choices."For example?
"We are collaborating with a group of environmental experts from Germany, Austria, and England to study the proliferation of invasive species in Switzerland and across Europe. Our research spans approximately 140 years, and we have detailed yearly data on the arrival of various species, including when they entered specific countries. Our goal is to understand how human activities impact the spread of these species, with the hope of preventing certain invasive species from entering Switzerland. A funny mix-up occurred to me in relation to this work..."Would you like to explain further...
"I was once invited to speak at a high school in Switzerland about invasive species. To my surprise, many students attended the talk. After a while, I realised that when they referred to "aliens," they meant extraterrestrials. As a result, we shifted our discussion to hypothetical alien invasions. We demonstrated how data science can analyse available data on UFO sightings, and ultimately, we concluded that there is no evidence to suggest that we have been invaded by extraterrestrials. It was an interesting experience."Ordinary people think data science is mainly for marketing, advertising, social media, etc. There is more to it than that, then....
"Data science is certainly also used in these fields, but it also concerns everyday life. For instance, in healthcare, the careful design of clinical trials, along with thorough biostatistical analysis, enhances diagnoses and treatment plans. In finance, predictive analysis aids in risk assessment and fraud detection, among other applications."
The transition from statistics to data science; where does it lie? Is it in the volume of data, or is there something specific?
"The amount of data can be small or huge, but it matters relatively little. What has significantly changed over the years is the automation of the analysis process. Tasks that once took months to complete twenty years ago can now often be finished in a week, a day, or even just a few hours."
How do you manage, without going into technical details, to analyse the collected data and thus build an interpretative model?
"We have to think, when we see data, what was the mechanism that generated it. We miss the bigger picture if we view these figures merely as numbers. We need to take a step back and analyse the process that created the data to interpret it properly. In this regard, data science is fundamentally different from artificial intelligence (AI) - they are almost opposites. While AI processes vast amounts of data without understanding the context or implications, data science focuses on interpreting even small datasets to develop models for understanding the information and, if possible, making predictions."
Is it possible to ignore Artificial Intelligence?
"I did not say that. We are talking about two different things. AI has access to billions of images of cats and dogs, so it doesn’t need to understand the specific characteristics that distinguish a dog from a cat. However, if I start with a statistical model where I define the parameters I want to consider, I am less likely to encounter similar challenges. Although we have 140 years of data on invasive species, that timeframe is still limited. This means it is crucial to consider not only the numerical data but also the physical models that generated it. For example, when working with a hospital to analyse patient arrivals in the emergency room and optimise the time it takes to treat them, we must consider various factors related to the timing: when patients arrive, how long they wait, and so on. Only by focusing on these aspects can we process the data effectively and arrive at a meaningful solution."It is unclear whether you use AI for your processing or not?
"Of course, but as a technical method. For example, in a national project with a colleague from the Faculty of Economics, I analysed the global innovation process of the last 50 years from the point of view of patents and how one patent is related to others. We have 15 million patents with 120 million interconnections: to analyse them and try to give an answer, we need techniques that can use 120 million numbers, and AI is one of them."Does data science ever make mistakes?
"It is clear that data science makes mistakes, because it is a human activity and every human activity is subject to mistakes. Together with my colleagues from abroad, I authored a paper discussing data science’s response to the COVID-19 pandemic. We presented our findings on 10 April this year at the Royal Statistical Society in London. While the paper is somewhat critical, it aims to highlight the mistakes we made-mistakes that may have affected public health policies. At the same time, it offers important lessons that we can apply in future crises."Which ones?
"We acted a bit too quickly with invasive measures like lockdowns. This is why we stressed the importance of targeted sampling and validated models, effective risk communication, and the need to focus on long-term public health strategies instead of short-term goals."Do you also use the Cornaredo supercomputer at the Swiss National Supercomputing Centre (CSCS) for your work?
"We use it a lot because - as I said - we often need a very high level of data processing. Apart from us lecturers, my students also use it almost every day remotely. At USI, we have a beautiful Master’s degree in Data Science and high performance computing that we could not hold without it..."
What will data science look like in the future?
"Although there have been significant technical advancements, the core concepts have largely remained the same: finding, understanding, and interpreting data. This process is still fundamentally an intellectual activity. While we can utilise computers for assistance, our minds ultimately connect the numbers to the processes that generated them."It all sounds too good. However, almost everything has a downside, a grey area to be observed and kept under control. Where does the danger lurk in the case of data science?
"It is always a matter of ethics, fairness, and transparency when dealing with data. Key questions include: Where does this data come from? Who collected it? Who processed it, and in what manner? For instance, if we were to analyse the university population at the beginning of the 20th century, one might incorrectly conclude that women were not suited for intellectual activities, which is clearly false. Interpreting data responsibly is crucial, as it helps identify and mitigate social biases and prejudices. Our responsibility goes beyond mere analysis; it includes critical thinking and ethics to ensure that our conclusions are accurate and fair."



