Evading algorithmic detection: How "algospeak" became the newest version of linguistic subterfuge

By Roger J. Kreuz, Associate Dean and Professor of Psychology, University of Memphis

A linguistic arms race is raging online, and it is not clear who is winning.

On one side are social networks like Facebook, Instagram and TikTok. These sites have become better and better at identifying and removing language and content that violates their community standards.

Social media users are on the other side, and they’ve come up with coded terminology designed to evade algorithmic detection. These expressions are collectively referred to as “algospeak.”

New terms like these are just the latest development in the history of linguistic concealment. Typically, such codes have been employed by small groups. Given the reach of social media, however, algospeak has the potential to more broadly influence everyday language.

An online standoff

Due to the sheer volume of posted content, social media platforms use algorithms to automatically flag and remove problematic material. The goal is to reduce the spread of misinformation as well as to block content considered offensive or inappropriate.

Yet many people have legitimate reasons for wanting to discuss sensitive topics online.

Victims of sexual assault, for example, may find it therapeutic to discuss their experiences with others. And those who struggle with thoughts of self-harm or suicide can benefit from online communities that provide support. But algorithms may identify and remove such content as a violation of a site’s terms of service.

But those who repeatedly run afoul of a site’s policies may find that their posts have been downranked or made less visible – a process called shadow banning. And repeated violations can lead to a temporary or permanent suspension. To get past content filters, social media users are making use of coded language instead of the banned terms.

References to sex, for example, might be replaced by an innocuous word like “mascara.” “Unalive” has become an agreed-upon way to refer to death or suicide. “Accountant” takes the place of sex worker. “Corn” stands in for porn. “Leg booty” is LGBTQ.

A history of hidden language

Although circumventing content filters is a relatively new phenomenon, the use of coded terms to conceal one’s meaning is not.

For example, the 19th-century Russian satirist Mikhail Saltykov-Shchedrin made use of “Aesopian,” or allegorical, language. He and others used it to circumvent censorship in Tsarist Russia. For example, the forbidden term “revolution” would be replaced with a phrase like “the big job.”

Many subcultures have developed their own private codes that are only really understood by in-group members. These are referred to by a variety of names, such as argot, cant or slang.

Polari was a private language used by gay men in early 20th-century Britain, at a time when public sentiment against homosexuality was running high. “Rough trade,” for example, referred to a working-class sex partner.

Rhyming slang has also been employed to obfuscate one’s meaning to outsiders. A term like telephone, for example, can be replaced by a rhyming equivalent, such as “dog and bone,” and then shortened to “dog.” In this way, a member of a gang could publicly request that another member call them, and do so even in the presence of the police.

Cockney rhyming slang, which emerged in 19th-century London, is perhaps the best-known example, although there are several others.

Leetspeak evolved in the 1980s, as intrepid internet pioneers ventured online to use bulletin board systems. Some of the workarounds they created to evade moderation are still being used today on sites like TikTok.

This form of linguistic subterfuge typically involves using numbers and symbols as stand-ins for letters. “3” resembles a backwards capital E, “1” looks like a lowercase l, “$” can take the place of the letter s, and so on. The term “leet” itself is often written as “1337.”

Although it’s most commonly used when writing about sex, algospeak has also proven useful in other contexts. For example, it was employed last year in Iran by those protesting the government’s crackdown on dissent. Creative misspellings like “Ir@n” were pressed into service to evade censorship.

Concealment breeds miscommunication

About a decade ago, when emoji became a popular way of augmenting text messages, a new means of circumventing content moderation was born.

As I describe in my recently published book on miscommunication, fruits and vegetables that vaguely resemble parts of the human anatomy were employed to get around policies prohibiting sexual content.

As a result, the humble eggplant and peach emoji took on distinctly new meanings in the online world. And in 2019, both Facebook and Instagram took steps to block their use as sexual stand-ins.

The various social media platforms seem to be caught up in an escalating feud with their users. The sites may block certain terms, but this leads to new algospeak equivalents springing up to take their place.

Different sites have different rules that ban different terms, and what is considered acceptable and what is not is constantly changing. Keeping up can be a challenge.

In January, the actress Julia Fox made a seemingly insensitive observation regarding a post mentioning “mascara” on TikTok.

Fox was apparently unaware that the term was being used as a stand-in for sexual assault. Fox was called out for her seemingly boorish remark, and a backlash compelled her to issue an apology.

As this linguistic tug of war continues, such misunderstandings seem likely to become more common. And at least some algospeak terms will inevitably spill over into vocabulary used offline.

After all, coded language survives because it is useful. Such terms can, for example, function as dog whistles to taunt one’s political opponents.