Newspapers that write history
And how a new South African entity can help us do so
Published last week on Our Long Walk: Fighting fires | Who was the perfect soldier? | Thank you for subscribing. Please consider a paid subscription.
When the economist Johannes Norling and I recently wrote an article about the Spanish flu in the United States, he was in America and I was here in Stellenbosch.1 We divided the work neatly. Johannes handled the survey and quantification; I read the newspapers. That meant working through dozens of American regional papers to see how people’s spending changed during the pandemic and the subsequent period of quarantines. A generation ago, it would have been impossible to do my part from the southern tip of Africa. I would have had to go to the US, live in archives, scroll through reels of microfilm and rely on a generous research grant. Instead, I could now do the work with a laptop and a decent internet connection from the comfort of my office.
The big shift is not just that historical newspapers are available online; it is that they have been turned into structured datasets that social scientists can actually analyse. A scan of a century-old page is one thing. When a computer also knows exactly where the headline ends and the article begins, can distinguish between an editorial and an advert, and even identify which scraps of text are just photo captions, we are in a different league.
In the United States, projects like American Stories take millions of pages from the Library of Congress and train algorithms to do exactly this.2 They slice a newspaper page into meaningful blocks, run top-quality text recognition over it, and from that build a meticulously tagged dataset of articles, headlines and adverts. A sister project, Newswire, reconstructs the telegraphed “wire stories” that local newspapers once relied on. In both cases a jumble of old news is turned into an economist’s dream – data.
Once newspapers become data, familiar questions can be tackled with much finer tools. During the 1918 flu pandemic, local papers recorded the type and timing of restriction measures. These snippets, mask mandates, church closures, from hundreds of towns can now be searched, dated and linked to local mortality. Instead of a story of reckless versus cautious cities, we can estimate the actual effects of each policy.
We can apply the same logic to other important issues: tracing news of racial unrest, fears of unemployment, word of new technology, across time and space.
Adverts are a particularly rich source of data. Economic historians have long used them to reconstruct prices in places and periods where no official statistics exist. Property adverts reveal housing costs, while job adverts show shifting skill demands, disappearing occupations and changing wages, including across the gender divide. In principle, you can even infer inflation from grocery adverts by tracing the price of bread or sugar over decades. There are, of course, pitfalls – advertised prices do not always reflect what people actually paid. The point remains: where earlier generations had access to a handful of data points, we now have thousands.
With greater ambition and better technology, we can move from counting things to studying identity. Political scientists Fabio Ellger, Caterina Chiopris and Daniel Ziblatt, for example, built a dataset of German parliamentary speeches.3 Tracking how nineteenth-century politicians used the term das Volk, they examine which adjectives sit closest to “the people”, which groups are included or excluded. Over time the meaning shifts. Sometimes it refers to all citizens; sometimes only to ethnic Germans; sometimes to a party’s supporters.
Before, it would take a sharp historian to notice this shift. Now we can measure it over decades. The question “who are the people?” remains philosophical, yes, but it also acquires an empirical edge.
The same logic is beginning to change how we understand our own history. Jonathan Schoots shows how late nineteenth-century isiXhosa papers such as Isigidimi samaXhosa and Imvo Zabantsundu birthed African nationalism.4 These popular papers – produced by Xhosa-speaking, mission-educated intellectuals and printed under colonial supervision– spoke in several registers at once, reflecting notions of modernity and progress, kingdom and kinship all at once.
By tracing where and how terms like isizwe (nation, people), or uhlanga (race, lineage, tribe) appear, Jonathan can show how an African nationalist project was woven from very local threads.
Over coffee at a LEAP conference in November, Jonathan, Caterina and I began to wonder how far this idea can go. One interesting claim is that nationalist language only gathers momentum once some form of democracy becomes thinkable: Afrikaner nationalism with representative government in the 1850s and 1860s, Black nationalism in the Cape after representative government and the inclusion of African areas in the 1870s and 1880s.
With digitised English, Dutch/Afrikaans and isiXhosa newspapers, alongside parliamentary debates, we can now test this. We can count when terms like “our people”, isizwe, “native race” or “Afrikaner nation” appear in the press, and ask whether their rising frequency coincides with new political rights. What used to be clever speculation can now, carefully framed, become a question that we put to the data.
Newspapers also help to illuminate a more uncomfortable topic: the blurred line between public duty and private gain. For his PhD, business historian Munashe Chideya studied private joint-stock companies in the Cape Colony finding that at least 22 politicians and seven civil servants invested in eight private property companies between 1897 and 1902.5 Together they provided almost a quarter of these firms’ capital.
These companies did not simply buy land and wait for values to rise. They needed legislation to drive their growth: railways, harbour rights, water concessions. The bills were introduced and defended by the very men who stood to benefit directly from rising property prices.
For new work I am doing with Munashe and Ed Kerby, newspapers are indispensable. Prospectuses and glossy advertorials in papers like the Cape Times promised fortunes in new suburbs like Milnerton, or in holiday resorts at Saldanha Bay. Alongside these ran reports on parliamentary debates, commentary and editorials. Critics accused colleagues of self-enrichment. Defenders claimed their actions served the public interest.
Without these press reports, the pattern is invisible. Thanks to them, we can follow the intertwined interests of public authority and private speculation. It looks uncomfortably like an early version of what we today call state capture.
The point is that historical newspapers are not just sources of colourful anecdotes. They are society’s mirror: a place to decide who ‘we’ are and what happens next. Once these texts are digitised at scale, they allow economists to reconstruct prices and market shocks, sociologists to trace identities and attitudes, and political scientists to connect concepts such as language, institutions and behaviour. With newspapers, we make sense not only of what people did, but of what they believed they were doing.
Which brings us back to contemporary South Africa. On a warm late-November evening in Stellenbosch, a new digital museum for Afrikaans cultural heritage was launched. Nuuseum – a blend of “nuus” (news) and “museum” – aims to gather Afrikaans newspapers, magazines, photos, films, radio and television material into a single searchable archive. Around one and a half million pages have already been digitised, with more added every day. The goal is not only to preserve the past, but to make it usable. Nuuseum will give any student the ability to instantly summon a century of Afrikaans public debate, whether they are in Upington or Utrecht.
At the launch, I argued that this could be just as important for South African social sciences as the big American newspaper datasets have been. With the right tools, Nuuseum could provide the raw material for hundreds of theses and dozens of books.
This dream will only be realised if the archive is genuinely broad and representative. Nuuseum can only include newspapers and their content where permission has been granted. If media companies keep their historical collections locked behind walls, we will skew the digital memory of Afrikaans towards papers that cooperate and risk erasing traditions, regions or communities from the searchable record.
For researchers, that would be a serious constraint. For a language community still arguing over who Afrikaans belongs to, the consequences are even more troubling. We can only ask the big questions if we can see the whole linguistic and media landscape.
This is not the moment to pull up the digital drawbridges around private collections. It is the moment to open the gates and allow future generations to read, to count, to think and to reason with the many voices that have shaped us. Old news only really gets a new jacket when all of us are allowed to wear it.
This is an edited and translated version of my monthly column, Agterstories, on Litnet. To support more writing like this, consider becoming a paid member. The image was created using Midjourney v7.
Fourie, J. and Norling, J., 2025. Household Spending during the 1918 Influenza Pandemic. Journal of Interdisciplinary History, 55(3), pp.369-413.
Dell, M., Carlson, J., Bryan, T., Silcock, E., Arora, A., Shen, Z., D’Amico-Wong, L., Le, Q., Querubin, P. and Heldring, L., 2023. American stories: A large-scale structured text dataset of historical us newspapers. Advances in Neural Information Processing Systems, 36, pp.80744-80772.
See https://www.caterinachiopris.com/research-publications
Schoots, J., 2024. The multivocality of the nation: political imagination and transformation in the emergence of African Nationalism. Theory and Society, 53(6), pp.1357-1387.
Chideya, M.T., 2024. Private joint-stock companies and government relations in the Cape Colony, 1892-1902 (Doctoral dissertation, Stellenbosch University).







Nog 'n uitsonderlik bruikbare artikel deur my gunsteling ekonoom. 🤗 Johan, dit sal natuurlik vrek interessant en informatief wees as 'n student die skitterende databasis waarna jy verwys het kan vergelyk met die podsendings in Afrikaans wat myns insiens blyk 'n nuwe narratief te verkondig het, en steeds toenemend verkondig, as wat die vorige hoofstroom gedrukte media gedoen het. Sou die podsendings se inhoud ook vasgelê en afgepak en verpak kan word soos die geskrewe media sin, om dan op soortgelike wyse geanaliseer en geïnterpreteer te kan word? Ek sou dink dat AI dit sou kon doen, maar ek "vra 'n vriend" (met aansienlik meer kennis), soos wat hulle mos met die vasvraprogramme aanbeveel... 😊