So goes the Yahoo News headline of an Associated Press article by Matti Friedman: http://news.yahoo.com/israeli-algorithm-sheds-light-bible-163128454.html
It’s a frustrating article. One looks in vain for details of who developed the software, under what funding program they did so, and any other associations of the developers and funders. But one does read that a certain Michael Segal of the Hebrew University’s Bible Department was NOT involved in the project! But I suppose the “Israeli” label in a headline speaking of the Bible does have power to attract attention among many bible believers.
We read of the program:
The program, part of a sub-field of artificial intelligence studies known as authorship attribution, has a range of potential applications — from helping law enforcement to developing new computer programs for writers. But the Bible provided a tempting test case for the algorithm’s creators.
How could the Bible possibly provide a “test case” for law enforcement applications? Is a scholarly construct of criteria for priestly and yahwist identifiers in OT texts to be used as verification tool to determine real life criminal guilt?
The (software) Code’s Secret
Over the past decade, computer programs have increasingly been assisting Bible scholars in searching and comparing texts, but the novelty of the new software seems to be in its ability to take criteria developed by scholars and apply them through a technological tool more powerful in many respects than the human mind, Segal said.
Finally the article is beginning to make some sense. The software is nothing more, as one would expect, a tool for “faster than human” applications of scholarly constructed criteria (e.g. how often a certain word is used for “God”) to large quantities of texts.
One wonders if a more informative article headline would have read something like:
Forensic tool has potential to shed no new light on biblical studies, but sure can save a lot of time in doing them
Then maybe one can add in the story that there may be a few curious anomalies between human and machine conclusions and if one wants to conclude that the humans will accordingly see the machine results as enlightenment then one might need one’s head scanned.
And thanks to the blog reader who alerted me to this news story.
Neil Godfrey
Latest posts by Neil Godfrey (see all)
- The Buddha Meets Bayes - 2025-01-22 00:38:08 GMT+0000
- Paul’s Letters and Accounting for Paul’s Name - 2025-01-21 10:58:18 GMT+0000
- Ceasefire and hostage exchange - 2025-01-20 22:38:53 GMT+0000
If you enjoyed this post, please consider donating to Vridar. Thanks!
I guess the law enforcement application would be to test if your comments on twitter where you sent out pictures of your weiner really sound like the way you write. Did it use the same vocabulary you normally use? the same style? If not, then Anthony Weiner must have hacked your twitter account.
I can see this approach as being useful for pointing out where criteria have been misapplied, or where the criteria might need refinement. Your attitude is overly dismissive, I think.
Nothing wrong with writing programs to do the tedious work in super-fast time. What I can see happening with this sort of thing as suggested by the article is that certain perspectives will be able to claim “computer analysis verification”, when it needs to be more widely understood that the scholarly criteria they rely upon ought to be challenged. I am sure scholars themselves know this. But I would prefer they are more careful with the public perceptions they have some power to influence.
I am a fan of your blog, generally speaking, but I do not understand your reaction to this item.
You say ‘One looks in vain for details of who developed the software, under what funding program they did so, and any other associations of the developers and funders.’
The software was developed by a team of four. Each of the four is identified in the article by name and affiliation. As for the funding program, I do not understand why you would expect this information or why it would be important.
A quick Google search finds the underlying paper, “Unsupervised Decomposition of a Document into Authorial Components,” in co-author Navot Akiva’s section of the Bar-Ilan University web site. Akiva’s publications page is here:
http://u.cs.biu.ac.il/~akivan/Research.htm
The direct link to the paper is here:
http://www.cs.biu.ac.il/~akivan/papers/Koppel-Akiva_ACL2011.pdf
The only substantive statements I see in the article that are not in the paper are the statement about Genesis 1, and the statement about the division of Isaiah.
Google searches (and Akiva’s publications page above) show that main authors Moshe Koppel and Navot Akiva have been involved in several other computerized textual analysis projects. For example, Koppel developed an algorithm that attempts to determine a writer’s gender. You can try out a version of the algorithm at “Gender Genie” here:
http://bookblog.net/gender/genie.php
You say ‘How could the Bible possibly provide a “test case” for law enforcement applications?’
The Bible seems to provide excellent test cases for authorship attribution, which is what the algorithm tries to do, and authorship attribution is very relevant in many legal cases. See, for example, the Wikipedia article on Donald Wayne Foster (mentioned in the article):
‘Donald Wayne Foster… is known for his work dealing with various issues of Shakespearean authorship through textual analysis. He has also applied these techniques in attempting to uncover mysterious authors of some high-profile contemporary texts. As several of these were in the context of criminal investigations, Foster has sometimes been labeled a “forensic linguist”.’
(I do not mean this as an endorsement of Foster or his techniques.)
What the algorithm attempts is this: Assume that a given text consists of segments (of undetermined number and length) by exactly 2 authors. Assume that each segment is written by a single author. (The algorithm uses no assumptions or knowledge about the authors except what it can glean from the composite text itself; it simply assumes that different authors will tend to make different word choices, especially choices from sets of synonyms.) Find the segment boundaries, and assign each segment to 1 of 2 groups, such that all the segments in a group are by the same author.
The algorithm does not include ‘a scholarly construct of criteria for priestly and yahwist identifiers.’ This is not implied in the article, it is definitely contradicted by the paper, and it would be quite useless for the analysis of the works of the prophets as described.
I think you were misled by this part of the article: ‘When the new software was run on the Pentateuch, it found the same division, separating the “priestly” and “non-priestly.” It matched up with the traditional academic division at a rate of 90 percent… [However, the] first chapter of Genesis… is usually thought to have been written by the “priestly” author, but the software indicated it was not.’ I read this as saying the algorithm (without any pre-determined identifiers) divided the Pentateuch into 2 sets, which turned out to mostly correspond with the priestly and non-priestly division scholars have favored; one exception being Genesis 1, which was unexpectedly grouped with the writings considered non-priestly.
The main test case, if accurately reported, seems most impressive. The munged (their word) together random-length segments from the books of Ezekiel and Jeremiah. They then applied their algorithm, which separated the differently-authored segments with near-perfect accuracy.
As they say in the paper, ‘Our main result is that given artificial books constructed by randomly “munging” together actual biblical books, we are able to separate out authorial components with extremely high accuracy, even when the components are thematically similar. Moreover, our automated methods recapitulate many of the results of extensive manual research in authorial analysis of biblical literature.’
The assumption that different authors tend to make different word choices, especially choices from sets of synonyms, seems reasonable to me. Does it seem unreasonable to you, and if so why? Regardless of how reasonable or unreasonable it might seem, it would appear to be correct; otherwise, why do the tests work?
I do not understand why you consider this useless or worthless.
You are peruading me I have been too hasty in my dismissal. Thanks for the input.
I’m female, and I tried “The Gender Genie” three times —
(Note: The genie works best on texts of more than 500 words.)
First try:
Words: 575
Female Score: 609
Male Score: 658
The Gender Genie thinks the author of this passage is: male!
Second try:
Words: 914
Female Score: 1204
Male Score: 1531
The Gender Genie thinks the author of this passage is: male!
Third try:
Words: 554
Female: 678
Male: 833
The Gender Genie thinks the author of this passage is: male!
Sorry, Genie. Wrong. Bummer.
I must be an anomaly, or the Genie isn’t totally reliable.