Nowadays tremendous research is carried on document analysis and classification digitally. Old manuscripts like magazines novels study material business cards and post cards can be straightforwardly construed to computer via these OCRs. Segmentation and Identification of the handwritten or typed lettering is thorny due to a variety of aspect. Words need to be fragmented perfectly and with more precision else the recognition rate decreases. At this point the projected methodology is presented which performs the job of line word and character fragmentation. For line segmentation customized horizontal histogram approach has been designed. Words are split up using the regionprops and bounding box method which provide the accuracy in the segmentation. Characters are fragmented by using the vertical projection amid the upper and lower modifier fragmentation using the cut line concept. Lines and Words are also split with good precision and accuracy rate in typed documents but in handwritten document it also splits the separated or broken characters as extraneous words. Further study need to be done to cover up many more issues which are not identified and solved during this work.
CNN
machine learning
₹12000 (INR)
IEEE 2019