Your blog is really helpful for me as it describe the thing very simply and easy to understand for a new person in the market.
My Computer Is Smarter Than Your Computer
The project was based on the concept of creating a program to accurately guess the author of a passage or work. In order to do this, the program has to do be able to train itself on works by several [known] authors. Assuming that you've trained the program on a good variety of passages for each author, you can give the program a text file (without telling it the author) and see if it guesses the correct author. While this may sound like the computer simply has to read and remember, it's not since it should be able to make guesses about passages it hasn't seen. It does this by collecting statistics related to the author's writing style (i.e. use of commas, average syllables in words, typical paragraph length, etc.) and calculating a probability for each author. In the end, the author with the greater probability is the one that is guessed. For example, if I train the program on Huck Finn (Twain) and Pride and Prejudice (Austen), it should be able to figure out that Twain wrote Tom Sawyer having never seen it before.
As the above example suggests, my program only had to be tested for Mark Twain and Jane Austen, although graduate students had to get their programs to work with a larger variety of writers. I'm happy to report that my program has a 96% success rate - correctly guessing any works by Jane Austen and only having trouble with Twain's Tramp Abroad. Yay - go me! While the program was only tested with Twain and Austen, I designed most of my Java method calls to make the design adaptable to work with any sample of authors. So - should I have time over break, I might play around with it to see if I can get it to work with any number of random authors...man, I'm such a geek...