Autocorrect: The System
Table of Contents
The Posts In the Series
Edit Distance
In this series of posts we'll implement models that correct words that are 1 and 2 edit distances away.
- We say two words are n edit distance away from each other when we need n edits to change one word into another.
An edit could consist of one of the following options:
- Delete (remove a letter): ‘hat’ => ‘at, ha, ht’
- Switch (swap 2 adjacent letters): ‘eta’ => ‘eat, tea,…’
- Replace (change 1 letter to another): ‘jat’ => ‘hat, rat, cat, mat, …’
- Insert (add a letter): ‘te’ => ‘the, ten, ate, …’
We'll be using the four methods above to implement an Auto-correct system by computing the probabilities that a certain word is correct given an input word.
This auto-correct system was first created by Peter Norvig in 2007 in this article.
The goal of our spell check model is to compute the following probability:
\[ P(c|w) = \frac{P(w|c)\times P(c)}{P(w)} \tag{Equation 1} \]
The equation above is Bayes Rule, and it is saying that the probability of a word being correct \(P(c|w)\) is equal to the probability of having a certain word w, given that it is correct \(P(w|c)\), multiplied by the probability of being correct in general \(P(C)\) divided by the probability of that word w appearing \(P(w)\) in general.