Due Date -- 11:59pm on Wed, Feb. 27, 2013
In this assignment you will be implementing a simple spell checker.
1. Implementation (55 pts)
A skeleton version of the code is provided here: skeleton.m. Please fill in the missing parts with your implementation.
- Read in a dictionary of common English words from words.txt (5 pts).
- Read in an email with misspellings from email.txt (5 pts).
- For each word in the email, check whether the word is in the
dictionary. Print out misspelled words. (5 pts)
- For each misspelled word, find the 20 most similar words from
the dictionary where similarity is measured as the number of letters in common
(ie intersect) divided by the max length of the misspelled word and dictionary
Ask the user to select one of your suggestions or type in a new spelling. (30 pts).
- Correct the misspelled words in the email (5 pts).
- Print out a spelling corrected version of the email into a file called email_correct.txt (5 pts).
2. Try it out (15 pts)
Try out your code on some of your own files containing missspellings. Describe
where the spell checking algorithm worked and where it didn't (when it could
and couldn't find good corrections) and why you think it failed on some words.
3. Enhancements (30 pts)
The proposed spell checking algorithm is very naive. Propose extensions to this
algorithm that would improve the quality of the spell checker. There is no
single right answer here. You may brainstorm your own ideas or do some research
online to see what current spell checkers do. Note, you don't need to
implement anything for this portion of the assignment, just describe how you
would enhance the spell checker.
To turn in your assignment, email your commented code, readme,
and a pdf describing your results and extensions to email@example.com.
Make sure to expalin where the algorithm worked and where it didn't and why.
Useful Matlab functions
fopen, fclose, fgetl, ischar, isempty, lower, find, strcmp, intersect, max, length, sort, input, str2num, fprintf.
Remember: help functionName (e.g. help fgetl) will provide usage instructions and often example usage cases.
Implement some of the extensions you proposed in part 3 of the assignment.