How to un-censor dirty Facebook posts

Let's teach a machine how to un-censor text, using dirty Facebook posts as an example. We'll keep things safe for work.

Censoring text is straightforward. Reading through a message, we can compare each word to a dictionary of "bad words." If a word is "bad," get rid of it. "Darn you" becomes "D*** you."

Un-censoring is a trickier problem. That means translating self-censored writing — words containing asterisks — back into regular swears. Why bother?

Most of all, it's a fun puzzle. I'll try to describe the concepts in a visual way that makes sense to everyone. If you're familiar with algorithms, maybe skip ahead to the code.

Beyond fun, there are practical reasons to build an algorithm for un-censoring:

  • Baddies: Malicious users can self-censor as a way of getting around filters: Transfer me the cash via W*estern U***ion*, where Western Union is banned.
  • Bullies: If someone writes You're an i**** instead of idiot on YouTube, they're still being mean-spirited.

When you use the first letter, you're making me pronounce it in my head. That's what saying a word is. Why don't you take responsibility for the s****y words you want to use?

- Louis C.K. (paraphrased)

That said, how do we un-censor text?

Build a swear machine in 4 steps

This is a friendly explanation of the algorithm for readers who may not have a programming background. I'll leave out all the code and math, but feel free to skip ahead if that's why you're here.

1. Dirty training data

Censoring text is a simple matter of replacing dirty words with bleeped out ones. Uncensoring is more complicated because we have to guess how people use profanity. Consider this example:

This fictional post may be a little self-deprecating, but hey, it's tame considering what strangers have said about me on the Internet.

What does Jane Doe mean when she says what a ****** idea? That really depends on the way Jane uses expletives. In other words, when Jane has sworn in the past in similar contexts, what word did she choose?

To answer this question, we'll assemble a block of text containing posts in which Jane has sworn. Let's call it the book of dirty.

2. Finding the bleeps

You can close the book of dirty for now. We'll come back to it in a second. Let's turn our attention to the censored input, the stuff we need to un-censor.

Let's walk through the input — word by word — until we encounter a word containing asterisks. For simplicity's sake, assume any word with asterisks is a swear. When you encounter a censored word, jump back to the previous word:

We need to predict what word Jane typically says after this word.

3. Predicting the next word

This is the trickiest step, but if you've made it this far, hang in there; I'll try to run through it without putting anyone to sleep.

Open the proverbial book of dirty we created in Step 1. This is the block of text containing all Jane's uncensored posts. For each word in the document — let's call it w — predict all the words that come after w and the frequency with which they follow.

In case that definition sounds confusing, here's an example:

4. Swapping out the bleeps

At this point, we know which words are likely to follow any particular word in a person's writing. We just have to choose the most commonly occurring word that is also a swear.

In the example above, we would choose dumb because it occurs most frequently in Jane Doe's writing:

It works! I've never been so excited about an insult!

Summary & Code

Un-censoring text is a fun, tricky problem. If you're new to coding, I hope I piqued your interest in algorithms. For further reading, I recommend Cracking the Coding Interview, which provides a great intro even if you're not looking for a coding job. And here's a repository with code for all the concepts described in this post.

Note: The featured image is a reference to the Boston Dynamics Robot Parody.