Feature Extraction Based Document Image Processing For OCR
Abstract
Image processing plays a vital role in document image processing system. For large scale of digitization process, various methods are available to provide an electronic version of a paper document, and scanning of the paper document is one of the best suitable methods. Optical scanning is the new technique applied on an image document, which converts the raw output data to the optical character recognition (OCR) system. Since the computer system cannot understand the language of the written documents, we need to convert these documents into the electronic documents, so that they can easily processed by the computer system. OCR converts the written text documents into the e- documents. In this paper we determine the threshold value of a scanned image document by using global thresholding method, which is based on the otsu’s algorithm. On the basis of the threshold values obtain from the different methods, we can judge the quality of an image document and hence can improve the quality of an scanned image document.
Keywords: Scanned documents, OCR, Thresholding and Document image processing.