An interdisciplinary team of CMU researchers use machine learning and statistical tools to determine origins of a 375-year-old document
Though John Milton’s "Areopagitica" - one of the most significant documents in the history of the freedom of the press - was first published 375 years ago, the printer of the pamphlet has - until now - remained unknown.
An interdisciplinary team of literary scholars, statisticians and computer scientists from Carnegie Mellon University has attributed the Nov. 23, 1644, printing of "Areopagitica" to the London printers Matthew Simmons and Thomas Paine, with the possible involvement of Gregory Dexter. The results of the research will be available in the Spring 2020 issue of Milton Studies.
"It’s tremendous to celebrate Areopagitica’s 375th birthday by learning something new about such a foundational document," said Christopher Warren, associate professor of in CMU’s Dietrich College of Humanities and Social Sciences and senior author on the paper.
For fear of persecution and punishment, printers in Britain from 1473 to 1800 declined to attach their names to about a quarter of known books and pamphlets, leaving the origin of many historical texts unidentified.
"In Milton’s time, printers could be jailed and even executed for printing controversial material. While Milton’s printers joined him in rejecting the notion that ideas had to be licensed before they could be printed, they also needed the protection of anonymity," Warren said. "The reason we haven’t known who printed ’Areopagitica’ is directly tied to the reason Milton had to write it. Those of us who benefit from press freedoms and freedom of speech sometimes forget the risks early printers took in producing controversial materials."
Many of today’s principles relating to freedom of speech and expression are based on "Areopagitica."
"’Areopagitica’ proposes that we need freedom of the press because truth isn’t something that a priest or politician can know in advance and say ’That’s not true, you can’t print it,’" Warren said. "Instead, the truth is something that develops communally through a range of voices shared and expressed, and, to me, that’s a foundational idea at the core of the modern commitment against censorship."
The team used computer vision, historical optical character recognition (OCR) and old-fashioned historical sleuthing to identify the printers. Like a fingerprint, damaged pieces of metal type create unique stamps. Since typesets belonged to specific printers, impressions of damaged type can help identify a book’s printers. By using statistical approaches to group many similar letters together, the team was able to compare type impressions more efficiently than prior methods have allowed.
"There’s a nice parallel here. The printing press was an influential technology of that time. We’re now using machine learning - an influential technology of our time - to reverse that printing process, disassembling books into their constituent characters and matching them to identify and track individual metal stamps across books, printers and time," said Max G’Sell, assistant professor in the Department of Statistics & Data Science and co-author on the paper.
The team examined distinctive and damaged type pieces from 100 pamphlets from the 1640s. Through this, they found that Simmons and Paine were responsible for printing not only "Areopagitica," but also several other foundational tracts on liberty of conscience, including Roger Williams’ "The Bloudy Tenent of Persecution," William Walwyn’s "The Compassionate Samaritane," and Henry Robinson’s "Liberty of Conscience" and "John the Baptist."
Christopher Warren, associate professor of English, and Max G’Sell, assistant professor in the Department of Statistics & Data Science, are part of the "Print and Probability: A Statistical Approach to Analysis of Clandestine Publication" team, which has received support from an A.W. Mellon digital humanities seed grant, the National Science Foundation and the Pittsburgh Supercomputing Center.
"This project shows just how much computer scientists, statisticians and humanists can learn by working together," said Shruti Rijhwani, a Ph.D. student in CMU’s Language Technologies Institute , who developed the project’s software along with G’Sell and Taylor Berg-Kirkpatrick, a former faculty member in CMU’s School of Computer Science and current assistant professor in the Department of Computer Science and Engineering at the University of California, San Diego.
The paper, "Damaged Type and ’Areopagitica’s’ Clandestine Printers," is an early finding in a larger project to recontextualize the printing of thousands of anonymously printed books in early modern Britain. By leveraging the processing power of computers and tools from statistics and machine learning, the researchers say they are able to approach longstanding questions at a new scale. Their broader research project, "Print and Probability: A Statistical Approach to Analysis of Clandestine Publication," has received past and current support from an A.W. Mellon digital humanities seed grant, the National Science Foundation and the Pittsburgh Supercomputing Center.
The "Print and Probability" team includes Warren, G’Sell, Berg-Kirkpatrick and Rijhwani as well as CMU’s Pierce Williams, Kartik Goyal, Matt Lincoln, Dan Evans, Nikolai Vogler, Ciaran Evans and Kishore Venkatswammy.
"Valuable previous scholarship guided our investigation, and then it turned out that using OCR to identify and group letters for comparison put us in an excellent position to amass and assess evidence quite rapidly," said Williams, a Ph.D. student in the Department of English’s Literary and Cultural Studies Program and co-author on the paper.
The team and their collaborators took photos to support the paper’s conclusions with the help of the Harvard University’s Houghton Library, Columbia University’s Union Theological Seminary Burke Library, the Huntington Library, the Folger Shakespeare Library, the Harry Ransom Center at University of Texas in Austin, the University of Michigan Libraries, the Library of Congress, and the New York Public Library. They were aided in this effort by Bernadette Cay, William Clayton, David Como, Will Fithian, Kyle Grazier, Lucas Janson, Miles Lopes, Anjali Mazumder, John Overholt, Aaron Pratt, Kristina Straub, and Sara Walters.