Articles by Topic
Your Email:
Your Name:
To:
Subject:
Message: If your firm regular scans paper documents and drawings for archival purposes and you would like to make these scanned documents “searchable” from within Newforma Project Center, there are a variety of solutions that utilize OCR (optical character recognition) technology to do this. If your firm regular scans paper documents and drawings for archival purposes and you would like to make these scanned documents “searchable” from within Newforma Project Center, there are a variety of solutions that utilize OCR (optical character recognition) technology to do this. If you have paper drawings that have not yet been scanned, your should try to use a scanner that comes bundled with software that does the OCR processing during the scanning process. The scanned documents will most likely be saved as “hybrid” PDF files that contain the original scanned image overlaid on a hidden but selectable and searchable text layer. While the resulting PDF files are searchable from Newforma Project Center, your results may vary depending on the capabilities of the OCR software and the quality of the scanned document. With recent advancements in OCR software, the accuracy is now more a function of the quality of the original paper document. OCR tends to produce great results with crisp scans of documents and drawings that originated from a word processing or CAD application. But OCR performs less well on tattered, faded, gray-scale, hand-faxed or hand-written documents. If you already have a large archive of scanned raster files, or your scanner’s software lacks effective OCR processing capabilities, there are a variety of software applications that provide OCR processing on existing raster files. We have tested recent versions of Adobe Acrobat, which now has OCR capabilities built right in. They allow you to convert any raster-based PDF to a hybrid PDF that is searchable. If you need to convert a collection of files, you can use Acrobat’s Batch Processing functionality to build sequences that will OCR-process a selection of supported file formats (PDF, TIF, JPG, BMP, etc.). An added benefit of OCR processing of scanned image files is the resulting PDF is often smaller than the source raster file in spite of the added text intelligence, due to the image compression that comes along with the OCR process. http://www.newformant.com/index.php/50/