This leaves you with a set of candidate words that use almost the same letters, and also are almost the same length. dissimilar and biography have a total difference of 13, but considering the length (8/9) you can probably bail out once you've found 5 differences. Note that for longer words, you can bail out early when you've found too many differences. This narrows down your set of candidates even further. car and rack have an difference of size 1, car and hat have a difference of size 2. The reason is that it's cheap to determine the (size of an) difference of two sorted sets. You'll do this in preprocessing for your dictionary, and for each input word. This is easiest to compare if you sort the letters in a word alphabetically. With this I mean that they use entirely different letters. Once you've restricted yourself to candidate words of similar length, you'd want to strip out words that are entirely dissimilar. For short words that means +/- 1 character longer words should have a higher margin (exactly how well can your demographic spell?) Match your input word with words of similar length. So, your dictionary should be sorted by word length. That's an immediate disqualification (but with one exception - below). Now why is that so obvious to us humans? For starters, the length is entirely different. Logically, you can quickly eliminate candidates that are "just too different".įor instance, the words car and dissimilar may share a suffix, but they're obviously not misspellings of each other. Therefore you need to prepare your dictionary. You will be doing quite a few lookups of words against a fixed dictionary.
0 Comments
Leave a Reply. |