CONTENTS Title Page Copyright Page Preface 1 Introduction to VMScluster System Management 1.1 Overview of VMScluster Systems 1.1.1 Background 1.1.2 Definition 1.1.3 Uses 1.1.4 Benefits 1.2 Hardware Components 1.2.1 Introduction 1.2.2 Computers 1.2.3 Physical Interconnects 1.2.4 Storage Devices 1.3 Software Components 1.3.1 Introduction 1.3.2 VMScluster Software Functions 1.4 Communications 1.4.1 Introduction 1.4.2 System Communications 1.4.3 Application Communications 1.4.4 Cluster Alias 1.5 System Management 1.5.1 Introduction 1.5.2 Ease of Management 1.5.3 Tools and Utilities 1.5.4 Other Configuration Aids 2 VMScluster Concepts 2.1 VMScluster Design and Implementation 2.1.1 Introduction 2.1.2 Port Layer 2.1.3 SCS Layer 2.1.4 System Applications (SYSAP) Layer 2.1.5 Other Layered Components 2.2 VMScluster Software Functions 2.2.1 Overview 2.2.2 Functions 2.3 Ensuring the Integrity of Cluster Membership 2.3.1 Overview 2.3.2 Connection Manager 2.3.3 Cluster Partitioning 2.4 The Quorum Algorithm 2.4.1 Definition 2.4.2 System Parameters 2.4.3 Calculating Cluster Votes 2.4.4 Example 2.4.5 Quorum Disk 2.4.6 Quorum Disk Watcher 2.4.7 Rules for Specifying Quorum 2.5 State Transitions 2.5.1 Overview 2.5.2 Adding a Member 2.5.3 Losing a Member 2.6 VMScluster Membership 2.6.1 Overview 2.6.2 Cluster Group Number 2.6.3 Cluster Password 2.6.4 Location 2.6.5 Example 2.7 Synchronizing Cluster Functions 2.7.1 Distributed Lock Manager 2.7.2 Functions 2.7.3 System Management of the Lock Manager 2.8 Resource Sharing 2.8.1 Distributed File System 2.8.2 Use with RMS 2.9 Disk Availability 2.9.1 MSCP Server 2.9.2 Device Serving 2.9.3 Enabling the MSCP Server 2.10 Tape Availability 2.10.1 TMSCP Server 2.10.2 Enabling the TMSCP Server 2.11 Queue Availability 2.11.1 Overview 2.11.2 Controlling Queues 3 VMScluster Interconnect Configurations 3.1 Overview 3.1.1 Introduction 3.1.2 In This Chapter 3.2 VMScluster Systems Interconnected by CI 3.2.1 Introduction 3.2.2 Design 3.2.3 Availability 3.2.4 Example 3.2.5 Star Couplers 3.2.6 Configuring Multiple CI Adapters 3.3 VMScluster Systems Interconnected by DSSI 3.3.1 Introduction 3.3.2 Design 3.3.3 Availability 3.3.4 Guidelines 3.3.5 Example 3.3.6 Configuring Multiple DSSI Adapters 3.4 VMScluster Systems Interconnected by LANs 3.4.1 Introduction 3.4.2 Design 3.4.3 Availability 3.4.4 Group Numbers and Passwords 3.4.5 Servers 3.4.6 Satellites 3.4.7 Satellite Booting 3.4.8 Examples 3.4.9 Examples 3.5 Mixed-Interconnect VMScluster Systems 3.5.1 Introduction 3.5.2 Availability 3.5.3 Examples 3.6 Configuring Multiple LAN Adapters 3.6.1 Overview 3.6.2 System Characteristics 3.6.3 System Requirements 3.6.4 Guidelines 3.7 Configuring Highly Available LANs 3.7.1 Guidelines 3.7.2 Selecting MOP Servers 3.7.3 Configuring Two LAN Segments 3.7.4 Configuring Three LAN Segments 3.8 Allowing for LAN Bridge Failover 3.8.1 Overview 3.8.2 Guidelines 3.8.3 System Parameters 3.8.4 Failover Process 4 The VMScluster Operating Environment 4.1 Preparing the Operating Environment 4.1.1 Introduction 4.2 Installing the OpenVMS Operating System 4.2.1 Introduction 4.2.2 System Disks 4.2.3 Where to Install 4.2.4 Information Required 4.3 Installing Software Licenses 4.3.1 Introduction 4.3.2 Guidelines 4.4 Installing Layered Products 4.4.1 Introduction 4.4.2 Procedure 4.5 Configuring and Starting the DECnet Software 4.5.1 Introduction 4.5.2 Configuring DECnet 4.5.3 Starting DECnet 4.5.4 What is the Cluster Alias? 4.5.5 Enabling Alias Operations 5 Preparing a Shared Environment 5.1 Providing Shared Resources 5.1.1 Overview 5.1.2 Shareable Resources 5.1.3 Local Resources 5.1.4 Sample Configuration 5.2 VMScluster Environments 5.2.1 Types 5.3 Directory Structure on Common System Disks 5.3.1 Overview 5.3.2 Directory Roots 5.3.3 Directory Structure 5.3.4 Search Order 5.4 Coordinating Startup Command Procedures 5.4.1 Overview 5.4.2 OpenVMS Startup Procedures 5.4.3 Building Startup Procedures 5.4.4 Combining Existing Procedures 5.4.5 Using Multiple Startup Procedures 5.5 Providing VMScluster System Security 5.5.1 Overview 5.5.2 Security Checks 5.5.3 Security Files 5.6 Files Relevant to VMScluster Security 5.6.1 Security Files 5.7 Network Security 5.7.1 Approaches 5.7.2 Mechanisms 5.8 Coordinating System Files 5.8.1 Guidelines 5.8.2 Procedure 5.8.3 Additional Files 5.9 System Time on the Cluster 5.9.1 Introduction 5.9.2 Setting System Time 6 Cluster Storage Devices 6.1 Introduction 6.1.1 Cluster Accessible Devices 6.1.2 Data File Sharing 6.1.3 Access Methods 6.1.4 Examples 6.1.5 Specifying a Preferred Path 6.2 Specifying Allocation Classes 6.2.1 Introduction 6.2.2 Purpose 6.2.3 Naming Conventions 6.2.4 Syntax 6.2.5 Default Value 6.2.6 Rules for Specifying Values 6.2.7 Assigning Values on Computers 6.2.8 Assigning Values on HSC Subsystems 6.2.9 Assigning Values on HSJ Subsystems 6.2.10 Assigning Values on HSD Subsystems 6.2.11 Assigning Values on DSSI ISEs 6.3 Sample Configurations 6.3.1 DSA Example 6.3.2 Mixed Interconnect Example 6.4 VAX 6000 Tapes 6.4.1 Avoiding Duplicate Names 6.4.2 Specifying a Tape Allocation Class 6.4.3 Ensuring a Unique Access Path 6.5 Served Disks and Tapes 6.5.1 MSCP and TMSCP Servers 6.5.2 Enabling Servers 6.6 MSCP Load Balancing 6.6.1 Calculating Load Ratings 6.6.2 Balancing I/O Load 6.6.3 Load Balance Ratings (VAX Only) 6.7 Managing Shared Disks 6.7.1 Mounting Shared Disks 6.7.2 Examples 6.7.3 Configuring Cluster Disks 6.7.4 Rebuilding Cluster Disks 6.7.5 Rebuilding System Disks 6.8 Shadowing Disks Across a VMScluster 6.8.1 Introduction 6.8.2 Purpose 6.8.3 Shadow Sets 6.8.4 I/O Capabilities 6.8.5 Supported Devices 6.8.6 Other Devices 6.8.7 Mounting 6.8.8 Distributing Shadowed Disks 7 Setting Up and Managing Cluster Queues 7.1 Introduction 7.1.1 Overview of Queue Management 7.1.2 In This Chapter 7.1.3 Controlling Queue Availability 7.2 Starting a Queue Manager and Creating the Queue Database 7.2.1 Starting a Manager 7.3 Starting Additional Queue Managers 7.3.1 Overview 7.3.2 Command Format 7.3.3 Database Files 7.4 Stopping the Queuing System 7.4.1 Introduction 7.4.2 Command 7.5 Moving Queue Database Files 7.5.1 Introduction 7.5.2 Location Guidelines 7.6 Setting Up Printer Queues 7.6.1 Before You Begin 7.6.2 Creating a Queue 7.6.3 Command Format 7.6.4 Ensuring Queue Availability 7.6.5 Examples 7.7 Setting Up Clusterwide Generic Printer Queues 7.7.1 Overview 7.7.2 Sample Configuration 7.7.3 Command Example 7.8 Setting Up Execution Batch Queues 7.8.1 Introduction 7.8.2 Before You Begin 7.8.3 Batch Command Format 7.8.4 Autostart Command Format 7.8.5 Examples 7.9 Setting Up Clusterwide Generic Batch Queues 7.9.1 Overview 7.9.2 Sample Configuration 7.10 Starting Local Batch Queues 7.10.1 Overview 7.10.2 Startup Command Procedure 7.11 Using a Common Command Procedure 7.11.1 Introduction 7.11.2 Command Procedure 7.11.3 Examples 7.11.4 Example 7.12 Disabling Autostart During Shutdown 7.12.1 Failover 7.12.2 Options 8 Configuring a VMScluster System 8.1 The CLUSTER_CONFIG.COM Procedure 8.1.1 Overview 8.1.2 Before Configuring the System 8.1.3 Data Requested 8.1.4 Invoking the Procedure 8.2 Adding Computers 8.2.1 Preparation 8.2.2 Controlling Conversational Bootstrap Operations 8.2.3 Common AUTOGEN Parameter Files 8.2.4 Examples 8.2.5 Adding a Quorum Disk 8.3 Removing Computers 8.3.1 Preparation 8.3.2 Example 8.3.3 Removing a Quorum Disk 8.4 Changing Computer Characteristics 8.4.1 Preparation 8.4.2 Examples 8.5 Creating a Duplicate System Disk 8.5.1 Preparation 8.5.2 Example 8.6 Postconfiguration Tasks 8.6.1 Updating Parameter Files 8.6.2 Shutting Down the Cluster 8.6.3 Shutting Down a Single Node 8.6.4 Updating Network Data 8.6.5 Altering Satellite Local Disk Labels 8.6.6 Changing Allocation Class Values 8.6.7 Rebooting 8.6.8 Rebooting Satellites Configured with OpenVMS on a Local Disk 8.7 Running AUTOGEN with Feedback 8.7.1 Overview 8.7.2 Advantages 8.7.3 Initial Values 8.7.4 Obtaining Reasonable Feedback 8.7.5 Creating a Command File to Run AUTOGEN 9 Building Large VMScluster Systems 9.1 What is a Large VMScluster System? 9.1.1 Overview 9.1.2 Setting Up the Cluster 9.2 General Booting Considerations 9.2.1 Concurrent Booting Activity 9.2.2 Minimizing Boot Time 9.3 Booting Satellites 9.4 Configuring and Booting Satellite Nodes 9.4.1 Preparation 9.4.2 Booting from a Single LAN Adapter 9.4.3 Alternate Adapter Booting 9.4.4 Booting from Multiple LAN Adapters (AXP Only) 9.4.5 Changing the LAN Address in the DECnet Database 9.4.6 Configuring MOP Service 9.4.7 Controlling Satellite Booting 9.5 System-Disk Throughput 9.5.1 Overview 9.5.2 Avoiding Disk Rebuilds 9.5.3 Offloading Work 9.5.4 Configuring Multiple System Disks 9.6 Conserving System Disk Space 9.6.1 Overview 9.6.2 Techniques 9.7 System Parameters 9.7.1 Overview 9.7.2 The SCSCONNCNT Parameter 9.7.3 The SCSBUFFCNT Parameter (VAX Only) 9.7.4 The SCSRESPCNT Parameter 9.8 Minimize Network Instability 9.9 DECnet Cluster Alias 10 Maintaining a VMScluster System 10.1 Overview 10.1.1 Ongoing Management 10.1.2 Backing Up Data and Files 10.1.3 Updating the OpenVMS Operating System 10.1.4 Rolling Upgrades 10.1.5 LAN Network Failure Analysis 10.2 Recording Configuration Data 10.2.1 Overview 10.2.2 Record Information 10.2.3 Satellite Network Data 10.3 Cross-Architecture Satellite Booting 10.3.1 Description 10.3.2 Sample Configurations 10.3.3 Usage Notes 10.3.4 Configuring DECnet 10.4 Controlling OPCOM Messages 10.4.1 Introduction 10.4.2 Overriding OPCOM Defaults 10.4.3 Example 10.5 Shutting Down a Cluster 10.5.1 Overview 10.5.2 Removing a Computer 10.5.3 Cluster Shutdown 10.5.4 Rebooting 10.5.5 Saving AUTOGEN Feedback 10.6 Dump Files 10.6.1 Controlling Size and Creation 10.6.2 Sharing Files 10.7 Maintaining the Integrity of VMScluster Membership 10.7.1 Overview 10.7.2 Cluster Group Data 10.7.3 Example 10.8 Adjusting Maximum Packet Size for FDDI Configurations 10.8.1 System Parameter Settings 10.8.2 Increasing Packet Size 10.8.3 Reducing Packet Size 10.8.4 Editing Parameter Files 10.9 Determining Process Quotas 10.9.1 Introduction 10.9.2 Quota Values 10.9.3 PQL Parameters 10.9.4 Examples 10.10 Restoring Cluster Quorum 10.10.1 Overview 10.10.2 Restoring Votes 10.10.3 Reducing Cluster Quorum Value 10.11 Cluster Performance 10.11.1 Overview 10.11.2 Using the SHOW Commands 10.11.3 Using the Monitor Utility 10.11.4 Using DECamds 10.11.5 Monitoring LAN Activity A Cluster System Parameters A.1 Parameters Used in VMScluster Systems A.1.1 Overview A.1.2 Values for AXP and VAX Computers B Building Common Files B.1 Guidelines B.1.1 Overview B.1.2 Building a Common SYSUAF.DAT File B.1.3 Merging RIGHTSLIST.DAT Files C Cluster Troubleshooting C.1 Diagnosing Computer Failures C.1.1 Overview C.1.2 Preliminary Checklist C.1.3 Sequence of Booting Events C.2 Computer on the CI Fails to Boot C.3 Satellite Fails to Boot C.3.1 Displaying Connection Messages C.3.2 General VMScluster Satellite-Boot Troubleshooting C.3.3 MOP Server Troubleshooting C.3.4 Disk Server Troubleshooting C.3.5 Satellite Booting Troubleshooting C.3.6 AXP Booting Messages (AXP Only) C.4 Computer Fails to Join the Cluster C.4.1 Verifying VMScluster Software Load C.4.2 Verifying Boot Disk and Root C.4.3 Verifying SCSNODE and SCSSYSTEMID Parameters C.4.4 Verifying Cluster Security Information C.5 Startup Procedures Fail to Complete C.6 Diagnosing LAN Component Failures C.7 Diagnosing Cluster Hangs C.7.1 Overview C.7.2 Cluster Quorum is Lost C.7.3 Inaccessible Cluster Resource C.8 Diagnosing CLUEXIT Bugchecks C.8.1 What is a Bugcheck? C.8.2 Conditions Causing Bugchecks C.9 Diagnosing Port Communication Problems C.9.1 Port Polling C.9.2 LAN Communications C.9.3 System Communications Services (SCS) Connections C.10 Port Failures C.10.1 Hierarchy of Communication Paths C.10.2 Where Failures Occur C.10.3 Verifying CI Port Functions C.10.4 Verifying Virtual Circuits C.10.5 Verifying CI Cable Connections C.10.6 Diagnosing CI Cabling Problems C.10.7 Repairing CI Cables C.10.8 Verifying LAN Connections C.11 Analyzing Error-Log Entries for Port Devices C.11.1 Overview C.11.2 Examine the Error Log C.11.3 Formats C.11.4 CI Device-Attention Entries C.11.5 Error Recovery C.11.6 LAN Device-Attention Entries C.11.7 Logged Message Entries C.11.8 Error-Log Entry Descriptions C.12 OPA0 Error-Message Logging and Broadcasting C.12.1 Methods C.12.2 OPA0 Error Messages C.12.3 CI Port Recovery D Sample Programs for LAN Control D.1 Overview D.1.1 Introduction D.1.2 Purpose of Programs D.2 Starting the NISCA Protocol D.2.1 Build the Program D.2.2 Start the Protocol D.3 Stopping the NISCA Protocol D.3.1 Build the Program D.3.2 Stop the Protocol D.3.3 Verify Successful Execution D.4 Analyzing Network Failures D.4.1 Overview D.4.2 Failure Analysis D.4.3 How It Works D.5 Using the Network Failure Analysis Program D.5.1 Create a Network Diagram D.5.2 Edit the Source File D.5.3 Assemble and Link the Program D.5.4 Modify Startup Files D.5.5 Execute the Program D.5.6 Modify MODPARAMS.DAT D.5.7 Test the Program D.5.8 Display Suspect Components E Subroutines for LAN Control E.1 Overview E.1.1 Introduction E.1.2 Purpose of the Subroutines E.2 Starting the NISCA Protocol E.2.1 Subroutine Syntax E.2.2 Status E.2.3 Error Messages E.3 Stopping the NISCA Protocol E.3.1 Subroutine Syntax E.3.2 Status E.3.3 Error Messages E.4 Creating a Representation of a Network Component E.4.1 Subroutine Syntax E.4.2 Status E.4.3 Error Messages E.5 Creating a Network Component List E.5.1 Subroutine Syntax E.5.2 Status E.5.3 Error Messages E.6 Starting Network Component Failure Analysis E.6.1 Subroutine Syntax E.6.2 Status E.6.3 Error Messages E.7 Stopping Network Component Failure Analysis E.7.1 Subroutine Syntax E.7.2 Status E.7.3 Error Messages F Troubleshooting the NISCA Protocol F.1 Overview F.2 How NISCA Fits into the SCA F.2.1 SCA Protocols F.2.2 Paths Used for Communication F.2.3 PEDRIVER F.3 Addressing LAN Communication Problems F.3.1 Symptoms F.3.2 Traffic Control F.3.3 Preliminary Network Diagnosis F.3.4 Tracing Intermittent Errors F.3.5 Checking System Parameters F.3.6 Channel Timeouts F.4 Using SDA to Monitor LAN Communications F.4.1 Isolating Problem Areas F.4.2 SDA SHOW PORT Command F.4.3 Monitoring Virtual Circuits F.4.4 Monitoring PEDRIVER Buses F.4.5 Monitoring LAN Adapters F.5 Troubleshooting NISCA Communications F.5.1 Areas of Trouble F.6 Channel Formation F.6.1 How Channels Are Formed F.6.2 Techniques for Troubleshooting F.7 Retransmission Problems F.7.1 Why Retransmissions Occur F.7.2 Techniques for Troubleshooting F.8 Understanding NISCA Datagrams F.8.1 Packet Format F.8.2 LAN Headers F.8.3 Ethernet Header F.8.4 FDDI Header F.8.5 Datagram Exchange (DX) Header F.8.6 Channel Control (CC) Header F.8.7 Transport (TR) Header F.9 Using a LAN Protocol Analysis Program F.9.1 Overview F.9.2 Single or Multiple LAN Segments F.9.3 Multiple LAN Segments F.10 Data Isolation Techniques F.10.1 Overview F.10.2 All VMScluster Traffic F.10.3 Specific VMScluster Traffic F.10.4 Virtual Circuit (Node-to-Node) Traffic F.10.5 Channel (LAN Adapter-to-LAN Adapter) Traffic F.10.6 Channel Control Traffic F.10.7 Transport Data F.11 Setting Up an HP 4972A LAN Protocol Analyzer F.11.1 Introduction F.11.2 Analyzing Channel Formation Problems F.11.3 Analyzing Retransmission Problems F.12 Filters F.12.1 Capturing All LAN Retransmissions for a Specific VMScluster F.12.2 Capturing All LAN Packets for a Specific VMScluster F.12.3 Setting Up the Distributed Enable Filter F.12.4 Setting Up the Distributed Trigger Filter F.13 Messages F.13.1 Overview F.13.2 Distributed Enable Message F.13.3 Distributed Trigger Message F.14 Programs That Capture Retransmission Errors F.14.1 Starter Program F.14.2 Partner Program F.14.3 Scribe Program G PEDRIVER Congestion Control and Channel Selection G.1 PEDRIVER Congestion Control G.1.1 Network Congestion G.1.2 Congestion Caused by Retransmission G.1.3 HELLO Multicast Datagrams G.2 Transmit Channel Selection G.2.1 Channel Selection G.2.2 Preferred Channel G.2.3 Restrictions