Month: April 30, 2018

Linguistics as engineering.

I’ve never liked Chomsky, despite never reading anything by him. His ideas are so prevalent in linguistics, at least in American universities, that you don’t really have to read his work to be exposed to his ideas. However, it’s important to me to have a good idea of the context within which ideas have been proposed and developed, so I finally read Syntactic Structures (Chomsky, 1957/2002), which I think encapsulates everything I dislike about Chomsky and the sort of theoretical linguistics that his ideas have led to.

First of all, though, let me say that I do not think Syntactic Structures is a worthless book. Even though I disagree with much of what Chomsky wrote, he did pose some interesting questions, and even that alone gives it value. For instance, Chomsky argued that grammars should be developed using nothing but formal means, disregarding semantics completely (pp. 93-94). There are several reasons why I don’t think this is correct, which I won’t get into here, as my point is simply that this is an interesting question to consider.

What I don’t like about Chomsky and the sort of theoretical linguistics that he spawned is the near complete disregard for empirical evidence for anything. Theoretical linguistics has relied almost entirely on intuitions for its “data”, often the intuitions of linguists themselves, not of informants. Despite Syntactic Structures often being credited as a foundational work for cognitive science, it never once suggests that linguists use things like experimentation to validate their theories as those in other scientific fields dealing with cognition would do, such as psychologists and neuroscientists.

There are two things in Syntactic Structures that I think have given linguists cover to approach their “science” this way:

  1. Chomsky argued that grammars have nothing to do with synthesis or analysis (p. 48)
  2. Chomsky argued that the goal of linguist theory is to develop an evaluation procedure (pp. 50-52)

By synthesis and analysis, Chomsky meant how humans produce language and how they understand language, respectively. He didn’t think that grammars address these questions, which is patently bizarre. What exactly do grammars describe if not one or both of these things? It seems that one is instead engineering how a grammar could work for some imagined artificial being, in which case we don’t need to consider empirical evidence generated by observing or experimenting on real human beings.

As for the evaluation procedure, Chomsky meant that developing a linguistic theory that could tell us if a given grammar is the correct grammar for a given language is too hard, and developing a linguistic theory that could generate a grammar from a corpus is even harder, so we’re better off developing a linguistic theory that simply tells us if one grammar is better than another for describing a given language. And what is the criteria? Simplicity.

The problem with focusing on an evaluation procedure, though, is that this downplays the importance of empirical evidence once again. There’s no need to test human beings to figure out if they employ transformations, for instance; we just need to show that transformations simplify the grammar more than some other proposal would, that other proposal also having been developed without any regard for testing if it actually represents what happens in the heads of human beings.

Ultimately, the direction that Chomsky set out for linguistics in Syntactic Structures seemed to be about how best to engineer an efficient grammar, not how to understand how humans do language. If Chomskyan linguistics actually does explain what humans do, that result is purely accidental, as there’s nothing about how its done that would be able to establish that connection.

Unsurprisingly, what the results of Chomsky’s approach to linguistics seem most useful for is developing speech synthesis and speech analysis software, i.e. engineering. There’s no need for AIs to do language in the same way that humans do language; they simply have to work. And I’m very much happy that they do. I use Google Assistant all the time, and I can’t wait to be able to speak to my house like the crew of the USS Enterprise speaks to their spaceship.

However, as far as advancing linguistics as a science, I think Chomsky’s approach, as set out in Syntactic Structures, has led to a monumental waste of time and resources. Numerous very intelligent and creative linguists have now spent some 60 years essentially playing a puzzle game that has not shed any light whatsoever on how exactly humans do language, and I don’t think it’s going too far to say that Chomsky’s ideas, combined with his enormous influence in the field, are to blame.


Chomsky, N. (2002). Syntactic Structures (2nd ed.). Berlin; New York: Mouton de Gruyter. (Original work published 1957)

How conflating terminology helps racists validate their racism.

Somewhat related to a recent post of mine, I came across this troubling article in the NY Times by David Reich, a Harvard geneticist who seems to regularly be described as “eminent”, in which he argues that “it is simply no longer possible to ignore average genetic differences among ‘races.'” He seems to have positive intentions — he even begins the article by acknowledging that race is a social construct — and I have no doubt that his knowledge of genetics is lightyears beyond my own non-existent knowledge of that subject, but despite his intentions and knowledge in that field, he seems to not have consulted with social scientists at all. The crux of the issue is that he conflates “race” with “population”. Indeed, immediately after acknowledging that race is a social construct, he states the following:

The orthodoxy goes further, holding that we should be anxious about any research into genetic differences among populations.

He seems to be using the two terms as synonyms, or at the very least, he’s being careless enough with his use of the two that it appears that he’s using them as synonyms. I seriously doubt that there are any respected geneticists who would argue that genetic differences among populations do not exist, but that’s not at all the same as making an argument about whether genetic differences between races exist.

There are already two good responses to the article, one in BuzzFeed, co-signed by some 67 scientists, and another by sociologist Ann Morning, who also co-signed the BuzzFeed article. These do a pretty good job of explaining the problem with Reich’s article — although I think the BuzzFeed article would have been better if they had not attempted to comment on genetic findings as much — so I just want to talk about Reich’s example from his own research supposedly showing how race can be used productively to study genetics. Here’s the relevant quote from his article:

To get a sense of what modern genetic research into average biological differences across populations looks like, consider an example from my own work. Beginning around 2003, I began exploring whether the population mixture that has occurred in the last few hundred years in the Americas could be leveraged to find risk factors for prostate cancer, a disease that occurs 1.7 times more often in self-identified African-Americans than in self-identified European-Americans. This disparity had not been possible to explain based on dietary and environmental differences, suggesting that genetic factors might play a role.

Self-identified African-Americans turn out to derive, on average, about 80 percent of their genetic ancestry from enslaved Africans brought to America between the 16th and 19th centuries. My colleagues and I searched, in 1,597 African-American men with prostate cancer, for locations in the genome where the fraction of genes contributed by West African ancestors was larger than it was elsewhere in the genome. In 2006, we found exactly what we were looking for: a location in the genome with about 2.8 percent more African ancestry than the average.

When we looked in more detail, we found that this region contained at least seven independent risk factors for prostate cancer, all more common in West Africans. Our findings could fully account for the higher rate of prostate cancer in African-Americans than in European-Americans. We could conclude this because African-Americans who happen to have entirely European ancestry in this small section of their genomes had about the same risk for prostate cancer as random Europeans.

Reich offers this as an example of how using race as a variable can be fruitful, but I think what he really does is undermine his own argument. What he’s ultimately talking about here is not African-Americans, but people with a section of their genome matching that which was commonly found in people who lived in West Africa. This appears to be the population that’s relevant to his study, yet he insists on talking about his results in terms of a race instead, repeatedly referring to African-Americans, a culturally diverse group that’s too often treated as monolithic and who don’t even necessarily have this ancestry, a fact that Reich admits in this very passage.

The use of the label African-American in his explanation serves no explanatory purpose and in fact is not even very precise. What it does do is make it easy for racists to claim that some Harvard geneticist has validated their racism, and confuse laymen who aren’t versed in subtle terminology distinctions for referring to groups of people, which Reich himself doesn’t even seem to be versed in. He repeatedly describes these subjects as “self-identified”, which I assume he does in order to take responsibility for using the label out of his own hands, but as I explained in my previous post, this strategy offers no protection at all for people who would be hurt by the stereotypes that are generated when using social variables like race.

Indeed, my admittedly unscientific survey of Twitter has led me to what appear to be three types of reactions to the piece: 1) social scientists pointing out how irresponsible the article is, 2) geneticists mocking “soft scientists” and/or praising the article as a fantastically delicate treatment of a difficult topic, and 3) blatant, hardcore racists using the article as validation for their racism. (3) should be troubling enough to those in (2) to convince them to go talk to those in (1) about how to better deal with the social side of their research.

© 2024 Josh McNeill

Theme by Anders NorenUp ↑