OCR: Smaller circular projections are ok...
Previously, I was using very detailed circular projections (hundreds of directions) that led to false matches because I had to let the error threshold high…</p>
Using very small projections (20-30 different projection angles) requires a very low reject threshold but provides amazingly good results.</p>
(all the “h” were correctly gathered){.imagelink}</p>
My main concern is now the page segmentation into lines and characters. In many cases, the method I use right now fails to separate letters that are stuck together.</p>
I need to be able to detect words, and to make it possible to ask for incremental resegmentation of a given segment.</p>
I’d like to connect my existing code to some new higher-level code that would be vocabulary and grammar-aware, so that likeliness of word-match for words where there is an unmatched segment/character would trigger resegmentation of this segment/character.</p>
Stay tuned…</p>