Data-Journalism: Hope for a future in a troubled profession - Interview with Mirko Lorenz

The present interview is a little digression from "The Life of a Computational Linguist" series, but since data journalism is connected to digital humanities, we decided to present this new movement here.
We interviewed Mirko Lorenz who is an information architect, journalist & trainer. He studied History and Economics at the University of Cologne, then worked for various new media companies. Currently he is working with Deutsche Welle and organised the European Journalism Centre's conference on data journalism among other things. You can find him on twitter as @mirkolorenz.

Számítógépes nyelvészet: For most of us (computational linguists and alike folks), data journalism is known from Django co-creator Adrian Holovaty's writings, but we're just scratching the surface of the concept. It has to do something with data, and it is related to journalism, but what is data journalism exactly?

Mirko Lorenz: Your are right that Holovaty pointed out the value of data and delivered a wake-up call. But if you look at when "A fundamental way how newspapers have to change" was written - it's four years ago. How many media organisations have really acted by now? Not many.

My definition goes like this: Data-driven journalism is a work-flow, where we first find data, filter it, visualize it and THEN start to tell comprehensive stories based on the patterns and truth we that was hidden before. Think of the subprime crisis in the US: What would have happened if the media would have watched this space based on the data? I think that increasingly politicians, management and other big institutions would actually benefit from more data analysis, and that data journalism will play a role in that.

Why is there such an interest in data-driven journalism by now? 

To some extend the current interest in data-driven journalism is hope. Hope of a profession to regain capacity and a future. If we as journalists would be able to "read" the data that surrounds us, we could do a better job. The technique itself is not that new - in his talk at our conference in Amsterdam, Simon Rogers cited Florence Nigtingale: She not only helped wounded soldiers in the Crimean War, she collected information and drew a very compelling diagram. From that it became obvious that more soldiers died because of the unsanitary conditions than of fighting. This changed modern medicine profoundly.

Your opinion: How can this help journalism?
One thing is obvious by now: In the old days journalism was financed because some technologies like the printing press or a satellite transponder were the only ways to connect to millions of readers or viewers. This is why it was interesting for advertising. This model is by broken. Working with data is now possible, because there are so many tools. This makes this area an opportunity, possibly even leading to a new "printing press" for journalism. Only that in the future we would not sell to reach the audiences, but trust. Trust in the form of information that is believable, easy to find and to understand is the scarce resource by now. Additionally we have to find new service packages, for example making it much, much easier for people to find the really relevant information bits.

It seems data journalists are trying to reflect to the 'data boom', and they are experimenting with the tools of data science. As Holovaty noted, "Newspapers need to stop the story-centric world-view" by aggregate data and building mash-ups on the available data sets, but how can we break the rules of linear story telling?

Linear story telling is here to stay, at least I think so. Just because there is the Internet we as humans did not change that much. In effect we are all still hunting and gathering, only that we do not search for food, but for information that might help us.

On the other side, the linear story telling model is already broken by an overlay of community discussions that we now see. A good share of our communication moves away from confined products like books or newspapers to platforms. Twitter, Facebook and other social platforms make it possible to engage in constant discussion with people we either know or find interesting. And through these networks comes the news in form of links. Jeff Jarvis has a presentation on Slideshare that he used to teach journalism students - where he likens the change as a shift from product (newspaper) to process (creating, curating and collecting of content through journalists). So, this is not linear at all. I think the future will mix this process with good stories, if things go well.

How does this new movement change journalism? Does it effect the traditional journalism curriculum?

This is indeed a huge discussion right now. I think that we will need writers that have one additional area of knowledge in the future: They should at least not be scared looking at data, they might be very proficient in visualising information or they should be able to produce a good multimedia piece based on data.

In Amsterdam, at the conference, this was a recurring theme: Journalist do not need to become programmers, but they should understand how to use software and available tools. Aron Pilhofer of the New York Times describes it more like cooking with several ingredients: You start with something, cut it, cook it and transform it into a good meal.

There is no really good training manual so far, but we are working on it. If data-journalism proves to be a lasting trend, there will be quick shift of resources to the "new data media". I don't know how long that will take, but it seems logical to me that we are heading into a direction like that.

The most notable (and known) examples of data journalism are the Guardian and NY Times APIs, but these are just the stepping stones towards build a mash-up on the data. Is there any difference between using an open data (gathered from governmental agencies or other places) and data journalism?

The movement towards open data is another force in all this. From a society point of view I think that data should be open, especially the data that has been gathered by governments. My personal hope is that in the future politicians would make more decisions good data models.

Journalism is said to be the "Fourth Estate" but gov 2.0 and other movements (open dataopen government, etc) are changing the picture since all of them is trying to give back the control to the citizens by providing them information directly and they don't need newspapers to explain and gather the data. How can journalism, esp. data journalism, reflect to this change?

You are right that many interactions are more direct. But that has been the case even before: To interact with a certain office I don't need a journalist. On the other side: Someone has to do the filtering as before someone did the gathering of news. And someone has to write a good story and have the time to do the research on that. And that is plain and simple work, we will need journalists to do that.

What are the essential parts of a data journalist's toolbox?

I am in line with others like Paul Bradshaw or Aron Pilhofer here: Always, always, always remember that you are a journalist in the first place and that finding a story is your main assignment. Related to data the first question is not what tool I need, but what the question is. See how Hans Rosling first had a question and only then developed an innovation like Gapminder.

The tools are all there: Visualization, data integration, HD cameras, websites that can potentially be seen around the world. So, again, it's how we use these, not the technology that we should focus on.

If a youngster (or anyone else) got excited about data journalism, what kind of (educational) route would you advise her/him?

ML: Don't be trapped by the tools - you will never become a really great musician just by buying the newest keyboard. Look for story, because society needs storytellers and there is market for that, in journalism and other professions.

If you are interested in this, explore this topic. Here are three destinations to start with: 

To get a good introduction what is state of the art, I would recommend to look at three sources at the moment: First, watch the talk by David McCandless that is available via Ted.com. 

Secondly, watch the documentation on "Journalism in the Age of Data", produced by Geoff McGhee .

And then, you might be interested in our documentation of the Amsterdam data-driven journalism conference, a PDF we named "Data-driven journalism: What is there to learn". The download is free. 
Mirko Lorenz Homepage (http://www.mirkolorenz.com)

