OCR: first results
I managed to get the character recognition to work with a very simple learning. This learning consists in clicking on each type of letter and say what it stands for.
[I managed to get the character recognition to work with a very simple learning. This learning consists in clicking on each type of letter and say what it stands for.
](/blog/wp-content/2006/04/Image%206.png “Image 6.png”){.imagelink}
Next steps :
- do an automatic recognition, by drawing letters of all fonts on images and learning them. The resulting dictionnary will be shipped with the application (precomputed).
- recognize words : make an histogram of the space between contiguous horizontal segments. After smoothing there should be two peaks : a small one coming from spaces between letters inside the same word, and another, hopefully much bigger, for spaces between words. Breaking the segments stream into words consists merely in testing if the space between two segment is in this second peak of the histogram.
- build an english and a french dictionnary (use public domain lists), feed them to my fuzzy string lookup table.
- find a good algorithm to combine likelinesses of letters with those of the words.
That’s it, and tomorrow morning will be really really harsh !
Tomorrow I will try to post a downloadable demo program…