CONTENTS Title Page Copyright Page Preface 1 Overview of DECimage Character Recognition Software 1.1 What Is DECimage Character Recognition Software? 1.2 Hardware and Software Requirements for DECimage Character Recognition Software 1.3 OCR Concepts and Process 1.3.1 Formats of Visual Data 1.3.2 OCR Process 1.4 How DECimage Character Recognition Software Works 1.4.1 DECimage Character Recognition Software Objects 1.4.1.1 Image frames 1.4.1.2 Regions of Interest (ROIs) 1.4.2 DECimage Character Recognition Software Process 1.4.3 DECimage Character Recognition Software Services 1.4.3.1 Page Segmentation Services 1.4.3.2 Text Recognition Services 1.4.3.3 Dictionary Support Services 1.4.3.4 Structure Access Services 1.4.3.5 Text Export Services 1.4.3.6 Postprocessing Services 2 DECimage Character Recognition Software Process 2.1 Segmenting an Image into Regions 2.1.1 Listing Regions 2.1.2 Copying and Deleting Regions 2.2 Recognizing Text 2.2.1 Feature Extraction 2.2.2 Creating a Dictionary Context 2.2.3 Specifying a Language 2.2.4 Listing Words 2.2.5 Types of Recognition Errors 2.3 Exporting Output 2.4 Deleting Structures 3 Guidelines for Optimizing Text Recognition 3.1 Checking Image Quality 3.1.1 Original Document Quality 3.1.2 Image Input Quality 3.1.3 Using the Features of DECimage Character Recognition Software 3.2 Examples of Text Processed by DECimage Character Recognition Software 4 DECimage Character Recognition Software Routines IrsCreateDictContext IrsDeleteBuffer IrsDeleteRegion IrsDeleteStruct IrsExportASCII IrsExportDDIF IrsExportPS IrsGetRegionList IrsGetWordList IrsRecognizeText IrsSegmentRegion A Condition Values and Error Messages B More About Optimizing Recognition B.1 Font Typeface B.2 Color of Text and Background B.3 Dot Matrix Text B.4 Text from Fax Printers C Example Program in VAX C (RECOGNIZE_TEXT_C.C) Glossary FIGURES 1-1 DECimage Character Recognition Software Relationship to DECimage Application Services for VMS 1-2 Creating an Image 1-3 Example of Image Data 1-4 Segmentation Regions 1-5 Recognized Text 1-6 DECimage Character Recognition Software Process 1-7 DECimage Character Recognition Software Services 2-1 Sequence of Routines in the DECimage Character Recognition Software Process 2-2 Spacing of Text 2-3 Fonts Returned by DECimage Character Recognition Software 2-4 Similarly-Shaped Characters 3-1 Scanned Image with Errors 3-2 Measuring Font Size 3-3 Comparison of 10-Point Font Size in Different Fonts 3-4 Placement of Original 3-5 Enlarged Text from Low-Quality Image Before Text Recognition 3-6 Text from Low-Quality Image After Text Recognition 3-7 Text from Low-Quality Image After Recognition Using a Dictionary Context 3-8 Enlarged Text from High-Quality Image Before Text Recognition 3-9 Text from High-Quality Image After Text Recognition 4-1 ISO Latin-1 Character Set B-1 Highly-Stylized Font B-2 Dot Matrix Characters TABLES 1-1 Page Segmentation Services Routine 1-2 Text Recognition Services Routine 1-3 Dictionary Support Services Routine 1-4 Structure Access Services Routines 1-5 Text Export Services Routines 1-6 Postprocessing Services Routines 2-1 Returned Fonts 2-2 Recognition Errors 3-1 Most Desirable Characteristics for Text Recognition 3-2 Least Desirable Characteristics for Text Recognition 3-3 Font Size and Resolution Combinations 4-1 Headings in the Routine Template 4-2 DECimage Character Recognition Software Routines 4-3 Export ASCII Flag 4-4 Export DDIF Flag 4-5 Export PostScript Flag 4-6 Fields in the Region List Structure 4-7 Font Style Values 4-8 Fields in the Word List Structure 4-9 Word Type Values 4-10 Font Info Values 4-11 Character Set Values 4-12 Recognize Text Flag 4-13 Resolution Values 4-14 Segment Region Flag