Autocorrect: Finding Candidates Using Edits

Beginning

Middle

Our Data

This is the word we are going to autocorrect.

word = "dearz"

Splits

This finds all the splits in our word.

splits = [(word[:index], word[index:]) for index in range(len(word) + 1)]
for split in splits:
    print(split)
('', 'dearz')
('d', 'earz')
('de', 'arz')
('dea', 'rz')
('dear', 'z')
('dearz', '')

Delete

Starting with the splits, delete the first letter of the second element of each tuple. This means that each letter gets deleted exactly once.

deletes = [left + right[1:] for left, right in splits if right]
for deleted in deletes:
    print(f" - {deleted}")
- earz
- darz
- derz
- deaz
- dear

These are now the candidates to use for the correction.

Filter Out

Since not all the candidates are real words you need to filter out all but the real candidates using a pre-defined vocabulary.

vocabulary = ['dean','deer','dear','fries','and','coke']
candidates = set(vocabulary).intersection(set(deletes))
print(candidates)
{'dear'}

End

This doesn't demonstrate all the edits (nor the use of edit distance) but the remaining types of edits are done in a similar manner to these two.