How to Detect Lying Words (…. 76% of the time)

by Neil Godfrey

Yesterday I found myself watching videos of the testimonies of Dr Betty Ford and Brett Kavanaugh. I went out of the way to seek those videos out after seeing in my rss feeds totally opposite interpretations of each. One side said that Kavenaugh, for example, just oozed sincerity and honesty and had deservedly secured his appointment on the Supreme Court; the other side claimed that his presentation was viscerally insincere and false and that he had lost any chance of being honestly appointed a Supreme Court judge. Wow, now that’s polarization!

James Pennebaker

After comparing the two for myself I remembered an interesting book that addresses the language of lying and honesty (among other functions of language),

Pennebaker, James W. 2013. The Secret Life of Pronouns: What Our Words Say About Us. New York: Bloomsbury Press.

In one study Pennebaker describes some participants related accounts of real traumas in their past and others imaginary ones. The language of each was compared. Accounts of real traumas were associated with:

  • More words, bigger words, more numbers, more details. If you have experienced a real trauma in your life it is easy for you to describe what happened. You can describe the details of the experience without having to do much thinking. Some of the details include information about time, space, and movement.
  • Fewer emotion and cognitive words. If you have lived through a trauma, your emotional state is obvious. For example, if your father died, most people don’t then say “and I was really sad.” It is implicit in the experience. However, people who haven’t experienced the death think to themselves, “Well, if my father died I would feel very sad so I should mention that in my essay.” The person who has had a trauma in the past already has a reasonable story to explain it. The person who is inventing the story must do more thinking—and use more cognitive words to explain it.
  • Fewer verbs. There are a number of different types of verbs that can serve different functions in language. When a person uses more verbs it generally tells us that they are referring to more active and dynamic events. For a person who has had a trauma in the past, much of it is over. If you are writing about an imaginary trauma, you are living it as you tell about it. In addition, imaginary traumas cause people to ask themselves, “What would have happened? How would I have felt?” Discrepancy verbs such as would, should, could, and ought were used at particularly high rates in the imaginary traumas.
  • More self-references: I-words. Recall that I-words signal that people are paying attention to themselves—their feelings, their pain, themselves as social objects. By the same token, the use of first-person singular pronouns implies a sense of ownership. Not surprisingly, people writing about their own traumatic experiences were more acutely aware of their feelings and, at the same time, embraced their traumas as their own.

(Pennebaker, p. 143)

Pennebaker does not claim that those indicators are foolproof lie detectors but the follow up test analysis found them able to accurately classify 74% of the narratives.

Other tests included both a mock crime setup where participants were required to protest their innocence under cross examination and a far more laborious process of examining court transcripts of witness testimonies. A strong finding in the latter was the preponderance of I-words among those who were exonerated compared with substantially higher uses of third-person pronouns. The latter were indicators of attempting to shift the blame away from oneself and on to others. The honest accounts again used “bigger words, described events in greater detail, and evidenced more complex thinking.” Computer analysis of the language again correctly classified 76% of the cases.

Recall the lies used to justify the invasion of Iraq. Pennebaker zeroes in on one particular response by Vice President Cheney to a reporter’s (Wolf Blitzer) question. The highlighted words were identified by the Center for Public Integrity as truthful.

What we said, Wolf, if you go back and look at the record is, the issue’s not inspectors. The issue is that he has chemical weapons and he’s used them. The issue is that he’s developing and has biological weapons. The issue is that he’s pursuing nuclear weapons. It’s the weapons of mass destruction and what he’s already done with them. There’s a devastating story in this week’s New Yorker magazine on his use of chemical weapons against the Kurds of northern Iraq back in 1988; may have hit as many as 200 separate towns and villages. Killed upwards of 100,000 people, according to the article, if it’s to be believed.

This is a man of great evil, as the president said. And he is actively pursuing nuclear weapons at this time, and we think that’s cause for concern for us and for everybody in the region. And I found during the course of my travels that it is indeed a problem of great concern for our friends out there as well too.

Computer text analyses comparing the truthful with the deceptive statements resulted in findings comparable to the others reported. Truthful statements were associated with higher rates of I-words. In addition, they were more nuanced, focused on more detail, and tended to be associated with fewer emotions. The above quotation is a good example of these differences. In the truthful section, Cheney is more detailed in his information, uses more complex sentences, and uses I-words.

(Pennebaker, p. 159)

Other Common Deception Markers

Passive constructions: “Mistakes were made.” In a delightful book on misinformation, Mistakes Were Made (But Not By Me), Carol Tavris and Elliot Aronson examine how people frequently avoid responsibility through ingenious linguistic maneuvers. For example, historians are in general agreement that Secretary of State Henry Kissinger frequently deceived the American people about the direction and scope of the Vietnam War during the 1970s. Years later, in an interview, Tavris and Aronson quote Kissinger as saying, “Mistakes were quite possibly made by the administrations in which I served.” Note his wording. Obviously, Kissinger didn’t make any mistakes. Rather, someone probably did.

Avoiding answering a question. In the mock-crime experiment where students were asked to “steal” a dollar, we asked each person point-blank: “Did you steal the dollar that was in the book?” People who actually did take the money said things such as:

I don’t believe in stealing. I have a problem with it. I did it once a long time ago; I was … younger. I really didn’t like the feeling of knowing they’re going to catch me. I just, you know, especially you said for a dollar? I wouldn’t have taken it.

Why would I? I would never even think to look in the book to look for a dollar. I was just writing in my journal for my freshman seminar.

It really offends me that you would accuse me of something like that. I would never do something like that.

The most common response of people who were telling the truth was “No, I didn’t take your dollar.” Unlike the liars, the truth-tellers answered the question directly without any embellishment. As these examples attest, when someone doesn’t directly answer your question, there is a good chance they are hiding something no matter how earnest they may sound.

Let me be clear about that: performatives. Linguists and philosophers have long been intrigued by a language device called a performative. Performatives are statements about statements. In the statement “I promise you that I did not steal the money,” the phrase “I promise you” is a performative. It is simply claiming “I say to you” or “I am uttering the following words to you.” What is interesting about performative statements is that they cannot be assessed on their truthfulness. In the sentence starting with “I promise you,” the claim “I did not steal the money” is not directly asserted. The truth of the phrase is that the speaker is merely saying that he or she promises that they didn’t steal the money. It’s a fine distinction but one used surprisingly frequently.

Toward the end of his term, President Bill Clinton was being hounded by the press concerning rumors of sexual misconduct with a White House aide, Monica Lewinsky. In a January 26, 1998, press conference, Clinton announced:

I’m going to say this again: I did not have sexual relations with that woman, Miss Lewinsky.

A naïve human being would think that the president did not have sex with Lewinsky. Actually, the statement he said is true: “I’m going to say this again …” In fact, it is technically correct. He was saying it again. OK, so he later admitted that he had had sexual relations with “that woman” but in the press conference, he was not officially lying.

One of the great baseball pitchers of all time, Roger Clemens, was accused by former teammates of taking performance-enhancing drugs during his baseball career. In a press conference several months before later admitting that he had, in fact, taken drugs, Clemens said:

I want to state clearly and without qualification: I did not take steroids, human growth hormone or any other banned substances at any time in my baseball career or, in fact, my entire life …

There again, you can see that Mr. Clemens was technically honest. He did, in fact, state clearly and without qualification. What he stated was a lie but it was truthfully a statement.

(Pennebaker, pp. 166f)

There you have it. You need never be fooled by a liar again, at least around 76% of the time if you are as clever as a computer.

Now, let’s have another look at those Ford and Kavanaugh testimonies . . . .


11 thoughts on “How to Detect Lying Words (…. 76% of the time)

  1. Apparently Ford can’t remember when or where it happened. Is this credible? Perhaps she has a false memory.
    “A false memory is a psychological phenomenon where a person recalls something that did not happen. There is a growing body of evidence that false memories are created whenever memories are recalled…The syndrome takes effect because the person believes the influential memory to be true.” https://en.wikipedia.org/wiki/False_memory

    1. What a silly thing to say; all memories are fragmentary, and the older they are the more fragmentary they get, real or false! That’s not an indicator either way. And more memories are real than false (or what would be the point of them).

      That memories are recreated every time they’re recalled is how memory is implemented in the brain. That implementation makes more false memories than what we like to think (because we’d like to think our memories are infallible), but it’s still designed to preserve an accurate-enough representation of the past more than the alternative. Otherwise, again, what would it have evolved *for*. (We can argue that there is some evolutionary advantage in having memories that are more self-serving than real, but to be useful you’d not just need the memories, you need to convince others the memories are real – which would be impossible if memories weren’t mostly real, so even there reality constrains the evolution of memory)

    1. As Pennebaker wrote,

      As these examples attest, when someone doesn’t directly answer your question, there is a good chance they are hiding something no matter how earnest they may sound.

      I sometimes think politicians must do a special training course in how to avoid answering reporters’ questions.

    2. This is a complicated inference problem, regardless of your politics. I am unsure that low-level “scoring” heuristics are helpful for comparing witnesses’ honesty in this quagmire.

      The very first “pink band” (marking what Vox designates as an unresponsive answer) which I could visually resolve to click on it in Kavanaugh’s compressed transcript:

      Q: [Accusers have asked the FBI to investigate their claims] … why aren’t you also asking the FBI to investigate these claims?

      A: Senator, I’ll do whatever the committee wants. I wanted a hearing the day after the allegation came up… [non-responsive testimony about what happened since then].

      The witness never says “Here’s why I’m not asking the FBI.” “Pink” is therefore the correct designation for scoring purposes. Substantively, however, the witness is responsive. The committee determines procedure in this matter; the witness had already testified that he petitioned the committee to investigate the earliest claim in timely fashion (what “wanted a hearing” means).

      All of that occurs in a context where the questionner (Senator Feinstein) is on record that she’d known of the first claim throughout the proceedings. She was in a position to cooperate with the FBI, but the committee (of which she is a member) does in fact properly determine procedure, including the extent to which the FBI is involved.

      This is hardball politics, of which both Kavanaugh and Feinstein are seasoned professional players (but Dr Ford isn’t). Determining the truth of Dr Ford’s accusation simply is not the only purpose in asking the quesion, nor the only dimension of merit applicable to the answer.

      Pink and blue, however accurately applied, just won’t cut it, IMO.

      1. I disagree. First, “wanted a hearing” is completely different from “called for an investigation”. To illustrate, this was a hearing but I’d hardly call it an investigation, both because there was no effort to comprehensively interview all relevant witnesses, and because the questions themselves (when they were questions) were haphazard, not consistently truth-seeking, and with little of the follow-up that truth-seeking would require.

        Second, I disagree that the answer was “substantively responsive”. His answer *can be interpreted* as responsive, but not in a way that stakes a specific position that the questioner could follow up on. For example, talking about how the committee decides procedure ignores the fact that he can ask the committee to do things as a person involved (like some accusers apparently did according to the question, and like the committee itself has to officially ask the White House in the first place, as it apparently doesn’t have the power to directly ask the FBI itself). It sounds like that’s what the question actually asked (what with the reference to the accusers, who also don’t decide committee procedure), but did he misunderstand it, and answer a different question like “will you use your authority over the FBI to launch an investigation”? Did *I* misunderstand the question? Did he understand the question, and basically answer “no, I won’t call for an FBI investigation, but also won’t protest if the committee calls for one”? Did he purposefully choose to confuse the issue to avoid “being caught saying” “I won’t call for an FBI investigation”? Did he sincerely misunderstand the question, or state his answer in a confusing way?

        It’s hard to tell which of these is the case, and it’s hard to formulate a follow-up question that would clarify, given the “he misunderstood the question” and “I might have misunderstood his answer because it was confusing” hypotheses demand different kinds of follow-ups.

        I’ve been in such conversations, where I got replies that weren’t straight answers to my question, but that I could plausibly interpret as containing or implying said answers, but that answer seemed to contradict other things the person had said, and the reply could also be plausibly interpreted as a misunderstanding of my question because it did qualify as a straight answer to a different, related question, but also maybe the person involved is deliberately trying to avoid answering my question because they think I want to use the actual answer against them, but maybe they’re misunderstanding my intentions and if I explained where I’m going with this they’d be happier answering?

        It’s extremely hard to deal with such situations, and when it happens I try very hard to ask clear questions and demand straight answers, even when it seems redundant, because it’s the only way to tell all those scenarios apart, and if you don’t you easily fall in long acrimonious debates with both sides arguing past each other. And doing that I often find I was the one misreading things in the first place !

        That’s why straight answers are important, and why being “substantively responsive” is no substitute for it.

        And even if you disagree with their metric, there is the fact they applied it to both Ford and Kavanaugh – do you find they did so inconsistently? If not, are you arguing that the differy found between the two doesn’t mean what they claim, and if so how?

        1. @Caravelle: Thank you for your follow-on comment.

          While the verb _to want_ has many possible meanings, its meaning was unambiguous when spoken to a member of the committe that received the communication being referred to, which had already been described to the committee shortly before. Judge Kavanaugh’s opening (sworn) statement included the claim,

          “The day after the allegation appeared, I told this committee that I wanted a hearing as soon as possible to clear my name. I demanded a hearing for the very next day.”

          source: https://www.nytimes.com/2018/09/26/us/politics/read-brett-kavanaughs-complete-opening-statement.html

          On another matter, a reply from which the information sought may be confidently inferred is substantively responsive to a query. “Why aren’t you asking the FBI for something?” “Because I’ve already asked you for something related.” Omitting the word _because_ doesn’t change the substance of the reply, although we seem to agree that for syntactical scoring purposes, its absence was justly “penalized.”

          I have no objection to Vox’ metric, nor how they applied it, nor do I deny that the outcome supports what people generally (including me) seem to agree upon anyway. Dr Ford’s tone and demeanor greatly contrasted with Judge Kavanaugh’s.

          I’m unsure precisely what Vox or James Chapman are claiming follows from that widely acknowledged contrast. Given the number of dimensions along which Dr Ford’s situation differs from Judge Kavanaugh’s, I won’t hazard a guess about which specific differences are casually related to the observed contrast.

          “What I’m arguing” is the first sentence of my comment: This is a complicated inference problem, regardless of your politics.

          1. The difference I was highlighting between “wanted a hearing” and “called for an investigation” wasn’t between “want” and “called for”, but between “hearing” and “investigation”. As I said, I think this very hearing illustrated the difference between the two things. It is true Judge Kavanaugh argued elsewhere that they were the same thing for the purposes of discussion (all the times he brought up how the FBI would ask the exact same questions the senators at the hearing would), but I don’t think he is correct on that at all. Which is another illustration of how his non-straight answer makes it hard to find something solid to follow up on – you’d have to clarify whether there is a disagreement on semantics, or whether it’s worth going into another completely separate argument over whether an FBI investigation and a congressional hearing actually are equivalent for the purposes of the two parties, and whether that depends on the purposes of the two parties…

            > “Because I’ve already asked you for something related.”

            It isn’t the lack of the word “because” that is penalized there IMO, it’s that even with the “because” it would be a non-sequitur. If you add a “because”, then you also need to add an explanation for why asking for the “something related” justifies not asking for the original thing. This could be because the person considers the two equivalent, which leads to the question of how they think they’re equivalent, whether this is something both sides can agree on or whether it’s another point of substantive disagreement, or if the person is trying to pass them off as equivalent to avoid answering the question, or if they’re misunderstanding the question (or the assumptions behind it) in such a way that they think the “something related” is more important to the questioner than it is…

            > I’m unsure precisely what Vox or James Chapman are claiming follows from that widely acknowledged contrast.

            OK; I haven’t actually read the source article so I don’t know either 🙂 (so, you know, take the things I said higher up about “what penalized what sentence” with a grain of salt; I assumed we were talking in generics but maybe you’re referring to a part of the article where they made their analysis explicit and it might not match with anything I’ve said!)

  2. The thing with the Kavanaugh testimony is that he made plenty of statements that were false, extremely misleading or completely implausible on their face. Also, he consistently went on tangents and attacked the questioners instead of giving straight answers, sometimes not answering a question at all. You don’t need subtle sentiment analysis when the words themselves tell the story.

  3. Since Kavanaugh is a judge you’d think he wouldn’t be silly enough to lie. His professional life has been spent sniffing out lies and so knows the tactics that liars use. Surely he wouldn’t risk it. That said, I wasn’t impressed by the manner of his testimony . . . “The lady doth protest too much, methinks” . . . hope I’m wrong.

