A new article appearing in the peer-reviewed Digital Scholarship in the Humanities: An Application of a Profile-Based Method for Authorship Verification: Investigating the Authenticity of Pliny the Younger’s Letter to Trajan Concerning the Christians. Author: Enrico Tuccinardi.
Book 10 of Pliny the Younger‘s letters consist of his correspondence with the emperor Trajan when he was governor of Bithynia/Pontus. Letters 96 and 97 are the famous exchange over the question of what to do about the Christians. These letters are the earliest evidence for Christianity found outside Christian sources and after the controversial references in Josephus.
The article with its bibliography have introduced me to recent developments in various techniques of author attribution studies. I’ll explain some of the details later, but to begin with I’ll set out an overview of how Tuccinardi studied the style of the Pliny’s famous letter against the rest of his letters to Trajan.
First, though, here is the abstract:
Pliny the Younger’s letter to Trajan regarding the Christians is a crucial subject for the studies on early Christianity. A serious quarrel among scholars concerning its genuineness arose between the end of the 19th century and the beginning of the 20th; per contra, Plinian authorship has not been seriously questioned in the last few decades. After analysing various kinds of internal and external evidence in favour of and against the authenticity of the letter, a modern stylometric method is applied in order to examine whether internal linguistic evidence allows one to definitely settle the debate. The findings of this analysis tend to contradict received opinion among modern scholars, affirming the authenticity of Pliny’s letter, and suggest instead the presence of large amounts of interpolation inside the text of the letter, since its stylistic behaviour appears highly different from that of the rest of Book X.
I’ve read some of those early debates and the article by Sherwin-White that seems to have settled the argument in favour of the authenticity of Pliny’s letter 10.96, and although a few doubts have never completely vanished, I have decided it wisest to accept the letter as genuine, at least for the sake of argument, pending any new evidence that might surface.
In brief, what Tuccinardi has shown is that a stylometric analysis of Pliny’s letter about the Christians is as stylistically different from the remainder of Pliny’s letters to Trajan as are letters of Cicero and Seneca.
The letters of Pliny in Book 10 were isolated from Trajan’s and other correspondence. Pliny’s letter 96 (called the PT — the Plinian Testimonium) was separated from the others. The text of the remainder of Pliny’s letters was divided up into fifteen sections about the same length as the PT. That is, about 3000 characters each.
The stylometric analysis of each one of these fifteen fragments was then compared, in turn, with the remainder of the Pliny set. Finally, the PT was compared to see if was as close or as different in its comparison with the main body as were each of the other fifteen sections. If you prefer visuals to words here is Tuccinardi’s diagram to clarify the process. Lk is the text of known authorship, and Lu is the section of disputed or proxy disputed or unknown text:
The method is based on the idea that every author has a stylistic “fingerprint” that is subconscious and hence unable to be normally recognized by readers, and hence unable to be hidden or copied. Of course there is another level at which authors do consciously work on their style and even imitate others sometimes, but what is identified by the stylometric analysis here is a unique authorial profile that works at a yet deeper and unselfconscious level.
According to this family of measures, a text is viewed as a mere sequence of characters. That way, various character-level measures can be defined, including alphabetic characters count, digit characters count, uppercase and lowercase characters count, letter frequencies, punctuation marks count, and so on. (de Vel et al., 2001; Zheng et al., 2006). This type of information is easily available for any natural language and corpus, and it has been proven to be quite useful to quantify the writing style (Grieve, 2007).
A more elaborate, although still computationally simplistic, approach is to extract frequencies of n-grams on the character level. For instance, the character 4-grams of the beginning of this paragraph would be: |A_mo|, |_mor|, |more|, |ore_|, |re_e|, and so on. This approach is able to capture nuances of style, including lexical information (e.g., |_in_|, |text|), hints of contextual information (e.g., |in_t|), use of punctuation and capitalization, and so on. . .
Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society of Information Science and Technology, 60(3): 538–56.
The above example is of a four character n-gram unit, or 4-gram. The size of the character units can vary from 1 to ten or fifteen (perhaps for those long German compound words).
All of the n-grams are tabulated and listed from the most frequently occurring to the least. The most frequently occurring ones are the author’s fingerprint or profile. How many of the most frequently found units one chooses to use to define the author profile can vary.
So how does one determine what size n-grams to use and what size profile set should one use?
Tuccinardi compared similar sized letters by Cicero and Seneca with Pliny’s Book 10 letters (minus the PT) and found that these were most clearly differentiated from Pliny’s style by use of “five-gram” units and a profile size of 500 of the most frequently found five-character units.
He then applied the same measures in his comparison of the PT with the rest of Pliny’s corpus. Here are the results:
For readers using devices that cut off the right side of the above diagram here is the key you are missing:
The triangular shapes represent each of the fifteen subsections of Pliny’s letters in Book 10. Clearly the PT is as much an outlier as the letters of Cicero, and on the same side of difference as the letters of Seneca are from Pliny’s.
If this sort of testing is new to you you are probably wondering about its validity.
The most convincing proof that a computer-assisted stylometry really works is provided by several controlled attribution tests (Burrows 2002; Hoover 2004a, 2004b; Juola 2006; Juola and Baayen 2005; Jockers et al. 2008, 2010; Eder 2010; Rybicki and Eder 2011; Smith and Aldridge 2011, etc.). The general conception of such a controlled benchmark is to collect a corpus of texts written by known authors only, and then to perform a series of blind tests for authorship. Leaving the technical details aside, the way of testing is simple: the more samples are “guessed” correctly (in terms of being linked to their actual authors), the more accurate a given methodology.
Eder, M. (2011). Style-markers in authorship attribution: a cross-language study of the authorial fingerprint. Studies in Polish Linguistics, 6: 99–114.
Character n-grams are able to catch nuances of style including lexical, syntactical, and morphological information. (Tuccinardi, 2016)
Some references in Tuccinardi’s bibliography I found helpful:
- Brocardo, M. L., Traore, I., Saad, S., and Woungang, I. (2013). Authorship Verification for Short Messages using Stylometry. Proceedings of the IEEE International Conference on Computer, Information and Telecommunication Systems, Piraeus-Athens, Greece, May 2013.
- Brocardo, M. L., Traore, I., and Woungang, I. (2014). Authorship verification of e-mail and tweet messages applied for continuous authentication. Journal of Computer and System Sciences, 81: 1429–40.
- Chen, X., Hao, P., Chandramouli, R., and Subbalakshmi, K. P. (2011). Authorship Similarity Detection from E-mail Messages. Proceedings of the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition, New York, NY, August– September 2011.
- Eder, M. (2011). Style-markers in authorship attribution: a cross-language study of the authorial fingerprint. Studies in Polish Linguistics, 6: 99–114.
- Frantzeskou, G., Stamatatos, E., Gritzalis, S., and Katsikas, S. (2006). Source Code Author Identification Based on N-gram Author Profiles. Proceedings of the 3rd IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI), Athens, Greece, June 2006.
- Keselj, V., Peng, F., Cercone, N., and Thomas, C. (2003). N-gram based Author Profiles for Authorship Attribution. Proceedings of the Pacific Association for Computational Linguistics, Halifax (Canada), August 2003.
- Koppel, M. and Winter, Y. (2011). Determining if two documents are by the same author. Journal of the American Society for Information Science and Technology, 65(1): 178–87.
- Potha, N. and Stamatatos, E. (2014). A Profile-Based Method for Authorship Verification. Proceedings of the 8th Conference on Artificial Intelligence: Methods and Applications, Ioannina, Greece, May 2014.
- Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society of Information Science and Technology, 60(3): 538–56.
- Stamatatos, E., Fakotakis, N., and Kokkinakis, G. (2000). Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4): 471–95.
Pliny the Younger, Book 10, Letter #96.
Pliny to the Emperor Trajan:
It is my practice, my lord, to refer to you all matters concerning which I am in doubt. For who can better give guidance to my hesitation or inform my ignorance? I have never participated in trials of Christians. I therefore do not know what offenses it is the practice to punish or investigate, and to what extent. And I have been not a little hesitant as to whether there should be any distinction on account of age or no difference between the very young and the more mature; whether pardon is to be granted for repentance, or, if a man has once been a Christian, it does him no good to have ceased to be one; whether the name itself, even without offenses, or only the offenses associated with the name are to be punished.
Meanwhile, in the case of those who were denounced to me as Christians, I have observed the following procedure: I interrogated these as to whether they were Christians; those who confessed I interrogated a second and a third time, threatening them with punishment; those who persisted I ordered executed. For I had no doubt that, whatever the nature of their creed, stubbornness and inflexible obstinacy surely deserve to be punished. There were others possessed of the same folly; but because they were Roman citizens, I signed an order for them to be transferred to Rome.
Soon accusations spread, as usually happens, because of the proceedings going on, and several incidents occurred. An anonymous document was published containing the names of many persons. Those who denied that they were or had been Christians, when they invoked the gods in words dictated by me, offered prayer with incense and wine to your image, which I had ordered to be brought for this purpose together with statues of the gods, and moreover cursed Christ–none of which those who are really Christians, it is said, can be forced to do–these I thought should be discharged. Others named by the informer declared that they were Christians, but then denied it, asserting that they had been but had ceased to be, some three years before, others many years, some as much as twenty-five years. They all worshipped your image and the statues of the gods, and cursed Christ.
They asserted, however, that the sum and substance of their fault or error had been that they were accustomed to meet on a fixed day before dawn and sing responsively a hymn to Christ as to a god, and to bind themselves by oath, not to some crime, but not to commit fraud, theft, or adultery, not falsify their trust, nor to refuse to return a trust when called upon to do so. When this was over, it was their custom to depart and to assemble again to partake of food–but ordinary and innocent food. Even this, they affirmed, they had ceased to do after my edict by which, in accordance with your instructions, I had forbidden political associations. Accordingly, I judged it all the more necessary to find out what the truth was by torturing two female slaves who were called deaconesses. But I discovered nothing else but depraved, excessive superstition.
I therefore postponed the investigation and hastened to consult you. For the matter seemed to me to warrant consulting you, especially because of the number involved. For many persons of every age, every rank, and also of both sexes are and will be endangered. For the contagion of this superstition has spread not only to the cities but also to the villages and farms. But it seems possible to check and cure it. It is certainly quite clear that the temples, which had been almost deserted, have begun to be frequented, that the established religious rites, long neglected, are being resumed, and that from everywhere sacrificial animals are coming, for which until now very few purchasers could be found. Hence it is easy to imagine what a multitude of people can be reformed if an opportunity for repentance is afforded.