Autocorrect: Finding Candidates Using Edits

Cloistered Monkey

Beginning

This is the word we are going to autocorrect.

word = "dearz"

This finds all the splits in our word.

splits = [(word[:index], word[index:]) for index in range(len(word) + 1)]
for split in splits:
    print(split)

('', 'dearz')
('d', 'earz')
('de', 'arz')
('dea', 'rz')
('dear', 'z')
('dearz', '')

Starting with the splits, delete the first letter of the second element of each tuple. This means that each letter gets deleted exactly once.

deletes = [left + right[1:] for left, right in splits if right]
for deleted in deletes:
    print(f" - {deleted}")

- earz
- darz
- derz
- deaz
- dear

These are now the candidates to use for the correction.

Since not all the candidates are real words you need to filter out all but the real candidates using a pre-defined vocabulary.

vocabulary = ['dean','deer','dear','fries','and','coke']
candidates = set(vocabulary).intersection(set(deletes))
print(candidates)

{'dear'}

This doesn't demonstrate all the edits (nor the use of edit distance) but the remaining types of edits are done in a similar manner to these two.