CONTENTS Title Page Copyright Page Preface 1 Introduction to DCRS 1.1 What Is DCRS? 1.2 Hardware and Software Requirements for DCRS 1.3 OCR Concepts and Process 1.3.1 Formats of Visual Data 1.3.2 OCR Process 1.4 How DCRS Works 1.4.1 DCRS Objects 1.4.1.1 Image frames 1.4.1.2 Regions of Interest (ROIs) 1.4.2 DCRS Process 1.4.3 DCRS Services 1.4.3.1 Page Segmentation Services 1.4.3.2 Text Recognition Services 1.4.3.3 Structure Access Services 1.4.3.4 Text Export Services 1.4.3.5 Postprocessing Services 2 DCRS Process 2.1 Segmenting Regions 2.1.1 Listing Regions 2.1.2 Copying and Deleting Regions 2.2 Recognizing Text 2.2.1 Feature Extraction 2.2.2 Listing Words 2.2.3 Types of Recognition Errors 2.3 Exporting Text 2.4 Deleting Structures 3 Guidelines for Optimizing Recognition 3.1 Checking the Quality of the Document and the Scanning Process 3.1.1 Checking the Quality of the Document 3.2 Checking the Quality of the Scanning Process 3.3 Specifying a Language 3.4 Examples of Text Processed by DCRS 4 DCRS Routines IrsDeleteBuffer IrsDeleteRegion IrsDeleteStruct IrsExportASCII IrsExportDDIF IrsExportPS IrsGetRegionList IrsGetWordList IrsRecognizeText IrsSegmentRegion A Condition Values and Error Messages B Example Program in VAX C(RECOGNIZE_TEXT_C.C) Glossary FIGURES 1-1 DCRS Relationship to DECimage Application Services for VMS 1-2 Example of a Scanned Business Document 1-3 DCRS Process 1-4 DCRS Services 2-1 Sequence of Routines in the DCRS Process 2-2 Regions of a Segmentation Structure 2-3 Spacing of Text 2-4 Fonts Returned by DCRS 2-5 Similarly-Shaped Characters 2-6 Columnized Text 2-7 Exported ASCII Text 3-1 Scanned Image with Errors 3-2 Highly-Stylized Font 3-3 Measuring Font Size 3-4 Enlarged Text From Low-Quality Document Before Recognition 3-5 Text From Low-Quality Document After Recognition 3-6 Enlarged Text From a High-Quality Document Before Recognition 3-7 Text From High-Quality Document After Recognition TABLES 1-1 Page Segmentation Services Routine 1-2 Text Recognition Services Routine 1-3 Structure Access Services Routines 1-4 Text Export Services Routines 1-5 Postprocessing Services Routines 2-1 Returned Fonts 3-1 Point Size and Scan Resolution Combinations 4-1 Headings in the Routine Template 4-2 DECimage Character Recognition Services Routines 4-3 Export PostScript Flag 4-4 Fields in the Region List Structure 4-5 Font Style Values 4-6 Fields in the Word List Structure 4-7 Word Type Values 4-8 Font Info Values 4-9 Character Set Values 4-10 Resolution Values 4-11 Segment Region Flag