It’s been a long time since I’ve written a post here, but I promise, there’s a good reason: I was finishing up my master’s thesis. However, now that it’s submitted, I can talk a bit about what I did.1

Because I made use of social network analysis to detect communities in the study, there was little motivation to class subjects by social variables like ethnic group, race, religion, etc. In fact, I wouldn’t have been able to do so if I wanted to, because I assembled the corpus from tweets sent by some 200k people. Ultimately, the only variable that I can call a social variable that I used was the number for the community to which the subject belonged.

The advantage of this situation is that I completed avoided imposing stereotypes on the subjects or minimizing the differences between their identities by avoiding classifying them with people from elsewhere. A typical example of the problem in sociolinguistics is the variable of race. Some celebrated studies, like Labov’s (1966) and Wolfram’s (1969), classified their subjects according to their races, so that one ends up identifying some as African-American, for example. Even if these subjects don’t live together nor interact, they inevitably end up being viewed as constituting a single group. From there, these groups’ diverse identities are minimized.

This problem has already been recognized in sociolinguistics, and several solutions have been proposed, mainly the implementation of the concept of communities of practice and more reliance on self-identification. For example, in Bucholtz’ (1999) study, she studied a group whose members she identified according to an activity: being a member of a club. Unfortunately, she applied a label to the member of this club; she called them “nerds”. This name links them to nerds from elsewhere, regardless of the differences between this group and other groups of nerds. She wasn’t able to avoid minimizing the identity of the group that she studied by the simple implementation of the concept of communities of practice. Likewise, Eckert (2000) relied on self-identification of her subjects as either “jock” or “burnout”, but one ends up with the same problem: even if the subjects self-identify, they can choose labels that link them to distant groups. Jocks surely exist elsewhere, but these others jocks can be exceptionally different from the jocks in Eckert’s study. So, one cannot avoid minimizing identities by the simple reliance on self-identification, either.

In my thesis, I identified communities simply with ID numbers, so I never classified the subjects with other groups to which they didn’t belong. The fact that I used social network analysis to automatically detect these communities allowed me to more easily avoid applying labels to the subjects that could minimize their identities, but this is possible in any study, even if the researcher employs classic social variables. In the same way that one anonymizes the identities of individuals, one can anonymize the identities of the groups under study. Why is it necessary to know that the races in a study are “black” and “white or that the religions are “Jewish” and “Catholic”? If a researcher is interested in the way that their subjects navigate stereotypes that are relevant to their lives, that’s one thing, but most variationist studies don’t take up this question, so most studies can do more to protect marginalized people.


1. For those who don’t know the topic of my thesis, I analyzed the use of the linguistic variable (lol), made up of lol, mdr, etc., on Twitter.


Bucholtz, M. (1999). “Why Be Normal?”: Language and Identity Practices in a Community of Nerd Girls. Language in Society, 28(2), 203–223. https://doi.org/10.1017/s0047404599002043

Eckert, P. (2000). Linguistic Variation as Social Practice: The Linguistic Construction of Identity in Belten High. Madlen, MA: Blackwell Publishers, Inc.

Labov, W. (2006). The Social Stratification of English in New York City (2nd ed.). Cambridge, England: Cambridge University Press. (Originally published in 1966)

Wolfram, W. (1969). A sociolinguistic description of Detroit negro speech. Washington, D.C: Center for Applied Linguistics.