HW1: Simple Spell-Checker
CSE/ISE 364: Advanced Multimedia

Due Date -- 11:59pm on Wed, Feb. 27, 2013

Overview

In this assignment you will be implementing a simple spell checker.

1. Implementation (55 pts)

  1. Read in a dictionary of common English words from words.txt (5 pts).
  2. Read in an email with misspellings from email.txt (5 pts).
  3. For each word in the email, check whether the word is in the dictionary. Print out misspelled words. (5 pts)
  4. For each misspelled word, find the 20 most similar words from the dictionary where similarity is measured as the number of letters in common (ie intersect) divided by the max length of the misspelled word and dictionary word. Ask the user to select one of your suggestions or type in a new spelling. (30 pts).
  5. Correct the misspelled words in the email (5 pts).
  6. Print out a spelling corrected version of the email into a file called email_correct.txt (5 pts).
A skeleton version of the code is provided here: skeleton.m. Please fill in the missing parts with your implementation.

2. Try it out (15 pts)

Try out your code on some of your own files containing missspellings. Describe where the spell checking algorithm worked and where it didn't (when it could and couldn't find good corrections) and why you think it failed on some words.

3. Enhancements (30 pts)

The proposed spell checking algorithm is very naive. Propose extensions to this algorithm that would improve the quality of the spell checker. There is no single right answer here. You may brainstorm your own ideas or do some research online to see what current spell checkers do. Note, you don't need to implement anything for this portion of the assignment, just describe how you would enhance the spell checker.

Deliverables

To turn in your assignment, email your commented code, readme, and a pdf describing your results and extensions to cseise364@gmail.com. Make sure to expalin where the algorithm worked and where it didn't and why.

Useful Matlab functions

fopen, fclose, fgetl, ischar, isempty, lower, find, strcmp, intersect, max, length, sort, input, str2num, fprintf.

Remember: help functionName (e.g. help fgetl) will provide usage instructions and often example usage cases.

Extra Credit

Implement some of the extensions you proposed in part 3 of the assignment.