TITLE: [OpenVMS] Cookbook of Performance Slowdown, VAX and Alpha, V6.0 and Up

PRODUCT:    DIGITAL OpenVMS VAX, Versions 6.0 and above
            DIGITAL OpenVMS Alpha, Versions 6.1 and above

COMPONENT:  Performance

SOURCE:     Compaq Computer Corporation

  Note:
    For information on performance tuning on OpenVMS VAX V5.n
    systems, reference another article in the OPSYS database
    with the query:

         V5.n Cookbook Performance Slowdown

    This article is extremely long.  We recommend producing a
    hardcopy for reading.


BACKGROUND:

Occasionally OpenVMS systems may experience performance slowdowns.
The performance degradation is typically due to requirements placed
on one or more of a system's main resources, i.e., CPU, memory, and
the I/O subsystem.  This article describes basic techniques to help
determine the cause of the performance degradation.


PREPARATION:

The following guidelines will help ensure the best possible outcome of
performance analysis and tuning:

- Collect Baseline Performance Metrics

  The most efficient technique for determining if a resource
  limitation exists is by comparing performance metrics which were
  collected during a period of normal system activity to current
  system conditions.

  Unfortunately, you may currently be experiencing performance
  problems and haven't collected any baseline performance metrics.
  If so, alleviate the current problems and plan to collect this
  vital information.  This information will assist in avoiding future
  occurrences by providing a useful resource for monitoring performance.

  Collect the baseline performance metrics during time periods when
  normal resource consumption is at its peak.  If you're unsure of the
  times, evaluate the systems performance for a few days.  After the
  metrics are collected ensure their continued validity with monthly
  comparisons to the current environment.

  The MONITOR utility may be used for evaluating the operating
  environment and collecting baseline metrics.

- Run AUTOGEN

  AUTOGEN calculates the initial values for key SYSGEN parameters on
  your particular system.  It can solve performance problems caused
  by the manual modification of parameters, typically done without
  regard for the complex interrelationships between certain parameters.

  AUTOGEN is coded to account for the relationship between system
  resources and their parameters.  With this ability and the use of
  performance metrics make it the best tool to use for parameter
  modification.

  Run AUTOGEN whenever the systems configuration is modified.  This is
  particularly important after a significant change to the user load,
  the installation of new applications, or the addition of memory.
  Run the procedure after 1 to 2 days of a normal system load following
  the change.  This allows for the collection of more feedback data.

- Plan Recovery Procedures and Notify Users

  Tuning an OpenVMS system may require parameter modification, the
  reallocation of user quotas, or the reconfiguration of hardware
  devices.  Some changes may not produce the desired effect further
  degrading system performance.  This is not meant to dissuade anyone
  from the analysis and tuning of their OpenVMS system.  It's meant
  to heighten awareness regarding the complexity of system tuning
  and to avoid complications leading to unexpected downtime.

  The following will help ensure success:

     * Always ensure that any vital data has been saved.
     * Notify users of the planned performance analysis and the
       possible impact on the system.
     * Before applying any of the articles solutions, ensure the
       analysis and possible effects to your system are understood.
     * Because any change to a systems parameter file has the potential
       for causing a system hang, you should review the procedures to
       force crash and reboot your system.

       Reference Articles:
         There are many articles in the OPENVMS database which
         contain procedures to force crash and reboot your system.
         These articles may be found using the following search
         string:

                   FORCE_CRASH


STEPS TO INVESTIGATE SYSTEM PERFORMANCE (table of contents):

   1) RAISE YOUR PRIORITY SO YOU CAN LOOK FOR THE CAUSE OF THE PROBLEM
   2) CHECK FOR CONSOLE ERROR MESSAGES RELATING TO PAGEFILE SPACE
   3) CHECK FOR CONSOLE ERROR MESSAGES RELATING TO POOL PROBLEMS
   4) CHECK FOR CONSOLE ERROR MESSAGES RELATING TO AUDIT SERVER ALARMS
   5) CHECK FOR PROCESSES IN SYSTEM RESOURCE WAIT STATES (RWxxx)
   6) CHECK TO SEE IF A PROCESS IS USING A MAJORITY OF THE CPU TIME
   7) CHECK FOR A MEMORY SHORTAGE
   8) CHECK FOR PAGEFILE SPACE GETTING FULL
   9) CHECK FOR HEAVY PAGE FAULTING
  10) CHECK FOR NONPAGED POOL EXPANSION
  11) CHECK FOR HEAVY DIRECT I/O
  12) CHECK FOR SWAPPING
  13) CHECK FOR PAGED POOL DEPLETION
  14) CHECK FOR AUTOMATIC WORKING SET DECREMENTING
  15) CHECK FOR HEAVY BUFFERED I/O
  16) CHECK PROCESSOR MODES
  17) CHECK DECnet ACTIVITY


 1) RAISE YOUR PRIORITY SO YOU CAN LOOK FOR THE CAUSE OF THE PROBLEM

    At the DCL prompt, issue the 'SET PROCESS/PRIORITY=15' command to
    raise your process priority.  This should help improve the system's
    response to your commands, unless the system is hung or a real-time
    process is taking all the CPU time.  This command requires ALTPRI
    privilege and depending on the exact nature of the slowdown, may
    take several minutes to execute.

    If you have real-time processes on your system, you may need to
    elevate your priority above them to get timely response.  If this
    is the case, the real-time processes may be in a compute loop
    blocking other lower priority processes from executing.


 2) CHECK FOR CONSOLE ERROR MESSAGES RELATING TO PAGEFILE SPACE

    Typical errors to look for are:

        SYSTEM-W-PAGEFRAG, Pagefile badly fragmented, system continuing
        SYSTEM-W-PAGECRIT, Pagefile space critical, system trying to
                           continue

    These errors indicate that your pagefile is too small to support the
    current activity on the system.

    Reference Article:
      [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                Messages

    (Reference the discussion in "8) CHECK FOR PAGEFILE...")


 3) CHECK FOR CONSOLE ERROR MESSAGES RELATING TO POOL PROBLEMS

        SYSTEM-W-POOLEXPF, Pool expansion failure

    This error indicates that OpenVMS failed to allocate free memory to
    extend non-paged dynamic memory because the free page list contained
    less than (64k + MPW_LOLIMIT + FREELIM) pages, on OpenVMS VAX, or
    pagelets on OpenVMS Alpha.

       1. Insufficient physical memory.
       2. An application or layered product is over allocating non-paged
          dynamic memory and causing expansion.

    (Reference the discussion in "10) CHECK FOR... POOL EXPANSION")


 4) CHECK FOR CONSOLE ERROR MESSAGES RELATING TO AUDIT SERVER ALARMS

      AUDSRV-W-RESCRITICAL, security auditing resources exhausted
                            on journal SECURITY
      AUDSRV-I-RESINFO, resource information n blocks needed, n blocks
                        available

    These errors indicate that the AUDIT$SERVER process has detected
    that free space on the disk containing the audit server log file
    has fallen below the WARNING threshold.  This will be the disk
    defined by the logical VMS$AUDIT_SERVER.

    If not immediately addressed, and disk space falls below the
    defined ACTION threshold, processes will be suspended.

    To correct this problem, log into an account that has OPER as an
    authorized default privilege and delete unnecessary files from
    the disk until the free space exceeds the WARNING threshold.  The
    following command will display the WARNING and ACTION thresholds
    for the disk:

        $ SHOW AUDIT/JOURNAL

    For more information, refer to the "OpenVMS Guide To System
    Security", (AA-Q2HLA-TE), section 9.

    Reference Article:
      [OpenVMS] Methods That Can Be Used To Recover Free Space On The
                System Disk


 5) CHECK FOR PROCESSES IN SYSTEM RESOURCE WAIT STATES (RWxxx)

    Use the DCL command "SHOW SYSTEM" to check process states.  These
    are displayed in the "STATE" column.  Look for any processes in a
    wait state other than HIB, LEF, COM, CUR, or CEF.

      Note:
        It is a normal function of the scheduler to place processes
        in differing wait states for one reason or another.  It's
        when processes become hung in a particular state that you
        begin to see an impact on system resources and a degradation
        in performance.  Therefore, multiple samples of the "SHOW
        SYSTEM" display must be analyzed to determine if a problem
        exists.

    Resource Wait states which may indicate a problem are:

    RWMPB
    RWMPE
      Waiting for the Modified Page Writer to write the Modified Page
      list (MPL) to the pagefile.  These states usually indicate that
      the pagefile is getting full.

      Reference Article:
        [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                  Messages

    RWNPG
    RWPAG
      "Nonpaged Dynamic Memory full" and "Paged Dynamic Memory full".
      These states indicate that the systems pool areas are insufficient
      and that pool expansion has occurred.

      Reference Articles:
        [OpenVMS] How to Troubleshoot a Process in RWNPG
        [OpenVMS] How to Troubleshoot a Process in RWPAG

    PFW
      Page Fault Wait.  The process must wait for a page to be read
      in from disk.  This state usually occurs when a disk containing
      a pagefile, or installed image files, is overloaded with I/O
      requests or is encountering errors.

      (Reference the discussion in "11) CHECK FOR HEAVY DIRECT I/O")

    FPG
      Free page wait.  A process is waiting for free memory to be
      placed on the free list.  If processes stay in this state, use
      the following questions to investigate why the Free List has
      become depleted:

        1. Has a pagefile become full, i.e. processes in RWMPx states,
           and the Modified List taken up the rest of free memory?
           (Use the SHOW MEMORY command.)

        2. Is a single process taking up most of memory?  (Use the SHOW
           SYSTEM command and check the "Ph.Mem" column.)

        3. Is SWAPPER able to trim/swap processes to free up memory,
           i.e., do processes have more than WSDEFAULT of memory?
           (Compare the last column, "Ph.Mem", of the SHOW SYSTEM
            display to UAF or PQL_xxx values.)

        4. Has the swap/pagefile become full?  (Use the SHOW MEMORY
           command.)

        5. Does SHOW MEMORY, SHOW ERROR, or ANALYZE/ERROR show any
           memory or disk problems?

      Reference Article:
        [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                  Messages

    RWMBX
      Mailbox full.  This process cannot continue writing to a mailbox
      because the mailbox is full.

      Reference Articles:
        [OpenVMS] How to Troubleshoot a Process in RWMBX
        [OpenVMS] How to Read Mailbox, MBAn:, Data Without Removing
                  the Messages

    RWCSV
      Waiting for the CLUSTER_SERVER process.

      The normal state of the CLUSTER_SERVER process is HIB.  Most
      often processes are hung in this state because the CLUSTER_SERVER
      process is not in HIB, or doesn't exist.  You should check all
      nodes in a cluster to identify which CLUSTER_SERVER process is
      hung or missing.

      Reference Article:
        [OpenVMS] How to Troubleshoot a Process in RWCSV State

    RWSCS
      Distributed Lock Manager wait, waiting to coordinate lock activity
      on a cluster.  Processes in a cluster which make heavy use of
      either locks, or the file system, will often be seen in this state.

      Seeing processes in this state does not always mean that the
      system has a performance problem.  This state may be seen for
      short intervals while processes wait on a resource to become
      available.  Once available, the process will continue.

    MUTEX
      Mutual Exclusion Semaphore.  This state indicates that a process
      has requested exclusive access to a Mutual Exclusion Semaphore
      (MUTEX) that can not be granted, or indicates a process resource
      limitation.

      Reference Article:
        [OpenVMS] How to Troubleshoot a Process in MUTEX State

    RWAST
      The RWAST is a general purpose 'Resource Wait' state.  It indicates
      that the wait is expected to be satisfied by the delivery and/or
      enqueuing of an AST to the process.

      Processes can be in RWAST and have NO impact on the other processes
      in the system.  Unless you know the process has a resource that
      is blocking other processes from executing, it is usually best to
      look for other performance related problems before focusing on a
      process found in RWAST.

      Reference Articles:
        [OpenVMS] How To Troubleshoot A Process In RWAST On VAX or Alpha
        [OpenVMS] How To Troubleshoot a Hung Process

    SUSP
      Indicates that the process was suspended.

      - Processes could be put in this state by the AUDIT_SERVER process
        if disk space on the volume containing its log files reach the
        ACTION threshold.

        (Reference the discussion in "4) CHECK...AUDIT SERVER ALARMS")

      - A process would also enter a SUSP state if it were the target
        of a SET PROCESS/SUSPEND command.  The command SET PROCESS/RESUME
        should clear this type of suspension.

        Note:
          Some processes, such as ALL-IN-1, use SUSP as their normal
          state when not executing.

    COLPG
      Collided Page Wait.  Several processes incur simultaneous page
      faults on the same shared page.  The first process that faults
      this page will be placed into PFW.  The second and succeeding
      processes enter the COLPG state.

      If several processes are in COLPG, this may indicate that a
      shareable image, pagefile, or pageable section file may be on
      a disk that is over saturated with I/O.  This could cause a
      delay in the resolution of the initial pagefault.

      (Reference the discussion in "11) CHECK FOR HEAVY DIRECT I/O",
       and for the PFW state)


 6) CHECK TO SEE IF A PROCESS IS USING A MAJORITY OF THE CPU TIME

    Use the DCL command "MONITOR PROCESS/TOPCPU" to determine if one
    or more processes are consuming the majority of CPU resources.

    A typical scenario is for one or more compute-intensive processes,
    often BATCH jobs, blocking interactive users from being selected
    to run.

    If the "MONITOR PROCESS/TOPCPU" display shows processes are
    consistently changing, consider adjusting the display interval.
    The default interval for the MONITOR utility is 3 seconds.
    Decreasing this interval to 1 second may expose other issues
    contributing to CPU consumption and performance degradation,
    e.g., a noisy line generating many login failures.

      Example:
        Issuing the following command may show a continual flood of
        new process IDs (PIDs), with no associated process names.

           $ MONITOR PROCESS/TOPCPU/INTERVAL=1

    If this is the case, the ACCOUNTING utility may be useful in
    checking for login failures and identifying the device(s)
    generating the noise.

    Another scenario is for less frequently scheduled, compute-intensive
    processes to periodically consume large portions of CPU resources.
    This can be determined by extending the MONITOR utilities interval,
    which reduces the effects of localized spikes in measuring CPU
    usage.

      Example:
        The following command may reveal processes which aren't normally
        seen using the 3 second default interval:

            $ MONITOR PROCESS/TOPCPU/INTERVAL=30

    If the top CPU processes have low priorities, they are not "hogging"
    the system.  These processes are merely taking available CPU time
    because no process at a higher priority is computable (COM).

      Note:
        Use the DCL command SHOW SYSTEM to determine the priority of
        processes.  Typical priorities for interactive processes
        fluctuate between 4 and 9.

    If CPU time seems evenly distributed between the top CPU users, then
    the top CPU users may all be compute-intensive and blocking other
    processes.

    The CPU resource may simply be overloaded.  This may be revealed
    using the "MONITOR SYSTEM" command to determine the overall CPU
    consumption, in relationship to the number of COM processes.  If
    the CPU is near 100% busy (top left), check to see how many
    processes are in the COM state (top right).  If there are more
    than 6 computable processes, with fairly even consumption of the
    CPU, then the CPU may be saturated by the current work load.

    Some possible workarounds for CPU saturation caused by work load
    are:

      1. Shift the work load to a time when the CPU is less saturated.

      2. Acquire a more powerful CPU, or add another CPU.

      3. Share the work load with another processor.

    Examples and workarounds for some of the most common CPU consumption
    issues:

      - A BATCH job, or several jobs, running CPU intensive programs,
        at priorities equal to or greater than interactive processes,
        can easily consume all CPU resources.

        Limit the impact of batch jobs by:

          a. Initialize the queues with a lower JOB_LIMIT.  JOB_LIMIT
             specifies the number of concurrent BATCH jobs from a queue.
             Restricting the JOB_LIMIT during prime work hours could
             improve interactive performance.

          b. Initialize the queues with a lower BASE_PRIORITY.  Due to
             automatic priority boosting in OpenVMS, it's suggested that
             BASE_PRIORITY be set to 1 or 2.

     - If SWAPPER is the largest consumer of CPU resources, the system
       may be short on memory and SWAPPER is trying to reclaim it.

       (Reference the discussions in section 7), 8), 10) and 12)

     - If OPCOM is one of the top CPU consumers, determine if security
       options are activated.  If so, you may want to reconsider
       activating security options, or restrict the number of options
       activated.

     - If the system is compute bound with a mixture of interactive,
       I/O bound, and compute bound processes, the SYSGEN parameter
       QUANTUM may be reduced to improve the response time for the
       interactive users.

       QUANTUM determines the time slice for non-real-time processes,
       i.e., processes with a base priority between 0 and 15.  The
       default value is 20, which represents 200 millisecond time
       slices.

       By reducing QUANTUM you reduce the time a compute-bound process
       uses the CPU without interruption.  Conversely, the larger you
       set QUANTUM, the longer a compute bound process can use the CPU
       without interruption.

         Note:
           Reducing QUANTUM will increase scheduling overhead because
           processes have to be rescheduled more often.

       QUANTUM is a dynamic parameter, i.e., it can be adjusted on the
       active system.  If this parameter is modified and yields no
       noticeable effect, it should be reset to its original value.

       If modifying the parameter causes a noticeable change, then do
       the following to ensure it's maintained across boots:

         1. Use SYSGEN to update the parameter file.

               $ MCR SYSGEN
               SYSGEN> SET QUANTUM <value>
               SYSGEN> WRITE CURRENT
               SYSGEN> EXIT

         2. Include the new value in SYS$SYSTEM:MODPARAMS.DAT to
            preserve the change when AUTOGEN is run.  Adding a note
            explaining the change is also suggested.

       Noticeable improvement in response time for interactive processes,
       perceived to be related to the modification of QUANTUM, may be
       subjective, since system activity is constantly changing.  The
       change may, or may not, benefit the system in the long run.
       Also, larger systems are more likely to benefit from lowering
       QUANTUM.

         Note:
           The SYSGEN parameter PRIORITY_OFFSET specifies the priority
           difference required before a process can preempt the current
           process.


 7) CHECK FOR A MEMORY SHORTAGE

    Memory is the most controllable resource on the system.  Because
    of this fact, it's the most dynamic and has the potential for
    being the most miss-managed.

    To determine if memory is a possible cause of performance
    degradation, use the DCL command MONITOR PAGE and observe the
    Free Page List (FPL) dynamics.

    If the FPL meets one of the following conditions, you may have
    cause for concern.:

      1. Fluctuates above and below the SYSGEN parameter FREEGOAL*2.

      2. Is constantly below the value of the SYSGEN parameter FREEGOAL*2.

      3. Is constantly below the value of the SYSGEN parameter FREEGOAL.

      4. Is constantly below the value of the SYSGEN parameter FREELIM.

    The conditions are listed in ascending order of magnitude, 4 being
    the most critical.

      Note:
        Conditions 3 & 4 may also be accompanied by outswapped processes.

    If one of the above conditions exist a search should be made to
    determine the cause of the memory consumption.

       - Use the DCL command SHOW SYSTEM to determine if any single
         user or process is consuming large amounts of memory.

         The display can also show if more than the expected number of
         users exists.

       - Use the DCL command SHOW MEMORY and MEMORY/POOL/FULL to
         determine if the system has acquired a larger than normal
         share of memory.

     Reference Articles:
       [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                 Messages
       [OpenVMS] Details On How Proactive Memory Reclamation Works


 8) CHECK FOR PAGEFILE SPACE GETTING FULL

    Use the DCL command SHOW MEMORY/FILES to determine pagefile use.
    Typically the pagefiles on a busy system should be approximately
    50% free.  However, on large memory systems with pagefile sizes
    nearing 1 Meg, 25-30% free may suffice.  Observation and experience
    are the best tools to determine if the 25-30% metric is suited for
    you environment.

    If the pagefile becomes over allocated the system slows down and
    may hang.  Processes may be seen in RWMPB or RWMPE states.

    Reference Article:
      [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                Messages
      [OpenVMS] How To Determine PAGEFILE and SWAPFILE Usage From SDA

    If a system is operating with insufficient pagefile space, then
    pagefile space must be increased.  Depending on the extent of the
    problem the system may have to be rebooted.

    (Reference the discussion in , "2) CHECK FOR CONSOLE ERROR MESSAGES
     RELATING TO PAGEFILE SPACE)

    Use the SYSGEN command CREATE to extend an existing pagefile, or
    create a new one.

      Example:
        The following command will extend a pagefile if the <filespec>
        already exists, and the value provided with the /SIZE qualifier
        is larger than the existing file.  If the <filespec> doesn't
        exist, a new file will be created.

             SYSGEN> CREATE <filespec>/SIZE=<n>

    The following should be considered in determining if a pagefile
    should be extended, or a new one created:

      -  If a pagefile is on the system disk which also contains a
         dumpfile and disk space is limited, the pagefile and dumpfile
         can be combined, or both can be relocated off of the system
         disk.

         Reference Article:
           [OpenVMS] Managing Dumpfiles on VAX & Alpha Systems or
                     Clusters
           [OpenVMS] Enabling Dumpfiles Off the System Disk (DOSD)

      -  If a pagefile is on a disk which incurs heavy IO, move the
         pagefile to a less active drive.

         Reference Article:
           [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                     Messages (see "CAUSE 4")

      -  If the pagefile is also being used for swapping, increase the
         size of the swapfile, or create a new one.

           Note:
             Use the same command to create or extend a swapfile as
             you would a pagefile (see above).

      -  To decrease demand on pagefiles increase the working set for
         processes with high page fault rates.

         Reference Article:
           [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                     Messages (see "CAUSE 2")

    TEMPORARY FIX TO ALLEVIATE LOW PAGEFILE SPACE

    If the following conditions are true:

      *  The system has lots of free memory available.

      *  The system cannot be currently rebooted.

      *  A secondary pagefile cannot be created due to lack of disk
         space or PAGFILCNT is too small to allow another pagefile
         to be installed.

      *  The primary pagefile needs only a "little" more pagefile space
         to keep everyone going to completion.

      *  All non-essential users have been logged off the system to
         recover their pagefile space.

    Increase the threshold for the Modified Page List which determines
    when processes are placed in a RWMPx state.

      Example:
       Issue the following commands:

         $ SHOW MEMORY                   !See available Free Memory
         $ RUN SYS$SYSTEM:SYSGEN
         SYSGEN> SHOW MPW_WAITLIMIT      !Get current waitlimit.
         SYSGEN> SET MPW_WAITLIMIT 1500  !Set it to 2 or 3 times
                                         !the current value, but
                                         !less than free memory
                                         !available.
         SYSGEN> WRITE ACTIVE            !Set the value on live system.
         SYSGEN> EXIT

    If the Modified Page List continues to grow, increase MPW_WAITLIMIT
    again.

    If processes enter the RWMPx state, increase MPW_LOWAITLIMIT to
    MPW_WAITLIMIT.  This may allow processes to complete normally.

        CAUTION:
          This workaround is intended only as a *TEMPORARY FIX* until
          the pagefile space can be increased.  If it fails, the system
          may hang.  If a reboot is required, force a crash so the dump
          can be examined.

    Reference Articles:
      [OpenVMS] How To Create Secondary Page/Swap Files and Remove
                Primaries
      [OpenVMS] How To Move Satellite Page/Swap Files From System Disk
                To Local Disk
      [OpenVMS] VIRTUALPAGECNT, PGFLQUO, PAGEFILE:  How Are They Related?
      [OpenVMS] Monitoring Page And Swap File Usage From A Command
                Procedure
      [OpenVMS] What Is The Maximum Size For An Installed Pagefile?
      [OpenVMS] How To Determine PAGEFILE and SWAPFILE Usage From SDA

      There are also many articles in the OPENVMS database containing
      procedures to force crash and reboot your system.  These articles
      may be found using the following search string:

                   FORCE_CRASH


 9) CHECK FOR HEAVY PAGE FAULTING

    Page faulting occurs when a process references a page which does
    not exist in its working set, i.e., an invalid page.  When this
    occurs the process must wait for Memory Management to resolve the
    page fault by validating the referenced page.

      Note:
        To better understand working set dynamics see the 1st 3
        questions in the "Detailed Analysis" section, of the following
        article:

          [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                    Messages

    Due to the way OpenVMS implements working set management, the page
    fault may be resolved from one of 4 places:

         Free Page List (FPL)
         Modified Page List (MPL)
         Pagefile
         Disk file

    If the page fault is resolved from the FPL or MPL (memory resident
    caches) its considered a "Soft" fault.  If resolved from the
    Pagefile or disk file, its considered a "Hard" fault.  Hard faults
    are more costly than Soft in time and consumption of resources,
    because an I/O is required for their resolution.

    Resolving page faults uses CPU power.  Resolving a higher than
    normal number of page faults has a negative impact on system
    performance.  If this is further aggravated by "Hard" faults, the
    performance impact could be significant on a poorly tuned system,
    or one with insufficient memory.

    Use the MONITOR SYSTEM/ALL command to show the overall "Page Fault
    Rate".

      Example:
        $ MONITOR SYSTEM/ALL
                          OpenVMS Monitor Utility
                             SYSTEM STATISTICS
                                  CUR        AVE        MIN        MAX
        Interrupt Stack          8.85       8.82       5.79      10.90
        MP Synchronization       5.24       4.84       3.96       6.05
        Kernel Mode             21.14      21.89      18.84      25.16
        Executive Mode           0.00       0.09       0.00       0.49
        Supervisor Mode          0.00       0.02       0.00       0.16
        User Mode                3.27       3.22       1.82       4.46
        Compatibility Mode       0.00       0.00       0.00       0.00
        Idle Time              361.31     360.71     355.15     364.46
        Process Count          274.00     274.00     274.00     274.00
        Page Fault Rate          0.00       1.41       0.00       5.61
        Page Read I/O Rate       0.00       0.02       0.00       0.16
        Free List Size       19122.00   19123.42   19122.00   19124.00
        Modified List Size    1508.00    1508.14    1508.00    1508.00
        Direct I/O Rate          0.00       0.63       0.00       2.12
        Buffered I/O Rate        0.81       1.74       0.33       3.60

    The "Page Read I/O Rate" is the simplest indicator of Hard faults.
    To determine if there's excessive Hard faulting, calculate the
    percentage of Hard faults in relationship to the "Page Fault Rate".

    On OpenVMS VAX, the acceptable percentage of Hard faults is 10%,
    and between 3-5% on Alpha.

         Hard_Fault% = (Page Read I/O Rate)/(Page Fault Rate)

    To determine if a systems poor performance is due to excessive page
    faulting, compare the overall "Page Fault Rate" to the percentage
    of "Kernel Mode" in the MONITOR display.  If the percentage is
    between 25-40% of the CPU, and the "Page Fault Rate" is greater than
    (CPU_FACTOR x 100) then page faulting may need to be reduced.

      Notes:
        - A CPU_FACTOR table has been included at the end of this
          article.

        - Some database and office application products can cause
          higher percentages of "Kernel Mode".

        - Some Alpha platforms incorporating VLM can withstand page
          fault rates far beyond the calculated thresholds in the
          paragraph above.

    If excessive page faulting is indicated, investigate the following:

    - PROCESS WORKING SETS

      Increase the WSQUOTA for those processes which frequently appear
      in the MONITOR PROCESS/TOPFAULT display, and decrease the WSQUOTA
      for less active processes.

    - PROACTIVE MEMORY RECLAMATION

      Set the MMG_CTLFLAGS parameter to enable memory reclamation.

    - THE SYSTEM WORKING SET IS TOO SMALL

      Use the MONITOR PAGE command, checking the "System Fault Rate",
      to determine if the systems fault rate is greater than 3 per
      second, on average.  If so, use AUTOGEN to increase the parameter
      SYSMWCNT.

    - THE SIZE OF PHYSICAL MEMORY MAY BE INSUFFICIENT

      If the current workload is considered normal for the system, then
      physical memory may have to be increased to accommodate users.
      This can also cause an increase in the Hard fault rate.

    - APPLICATION DESIGN

      Some application designs induce heavy page faulting.  In
      particular, AI and CAD applications.  If these types of
      applications exist, they may be the cause of the high page
      fault rate.

    - THE VIOC MAY BE TOO LARGE

      Use the SHOW MEMORY/CACHE/FULL command to determine if the size
      of VIOC is inhibiting working set growth.

      Reference Articles:
        [OpenVMS] How to Interpret Info From SHOW MEMORY/CACHE/FULL
                  on Alpha
        [OpenVMS] How to Interpret Info From SHOW MEMORY/CACHE/FULL
                  on VAX
        [OpenVMS] Memory Reclamation From The Virtual I/O Cache (VIOC)

    - THE MODIFIED PAGE LIST MAY BE TOO SMALL

      Look at the bottom of the MONITOR PAGE display and observe the
      Modified List Size.  If it oscillate then the MPL may be too
      small.  A small MPL, or its associated parameters not being set
      correctly, will cause frequent flushing to disk, increasing the
      Hard fault rate.

      Reference Article:
        [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
                  Messages


10) CHECK FOR NONPAGED POOL EXPANSION

    Nonpaged pool contains data structures which must be memory resident.
    These structures are used by components running in system context,
    e.g., UCB's, IRP's, etc., or are shared by multiple processes.  The
    pool consists of 80 fixed-length lists and 1 variable-length list.
    The 80 fixed-length lists range in size from 64 to 5120 bytes, in
    increments of 64 byte.  A population history for the 80 fixed-length
    lists is maintained in SYS$SYSTEM:LISTPREPOP.DAT.  This file attempts
    to repopulate the fixed-length lists during boot.

    In OpenVMS VAX versions prior to v6.0, nonpaged pool consisted of a
    variable-length list and 3 fixed-length lists, i.e., SRP's, IRP's,
    and LRP's, each initialized and controlled via their own set of
    SYSGEN parameters. With constant vigilance and multiple adjustments,
    these lists could be tuned to for peak performance in relationship
    to the consumption of memory and CPU resources.

    OpenVMS VAX, v6.0 and OpenVMS Alpha, introduced "Adaptive Pool
    Management", simplifying the system management overhead required
    for its maintenance.

    Reference Article:
      [OpenVMS] Adaptive Pool Management: A Description of Nonpaged
                Dynamic Memory

    OpenVMS Alpha, v7.1, introduced the following SYSGEN parameters
    which enhance the algorithms for reclaiming space from the lookaside
    lists:

         NPAG_GENTLE
         NPAG_AGGRESSIVE
         NPAG_INTERVAL

    These parameters can be used to control the "Free Blocks on
    Lookasides" and are defined in detail in the "OpenVMS Version
    7.1 Release Notes".

    However, 2 facts remain the same in relationship to nonpaged pool
    and system performance:

      1. If the pool has to expand, system performance will be effected
         during the expansion epoch.

      2. Excess pool expansion, due to either overall demand or excessive
         code allocation, can consume the memory resource to the point
         where performance is degraded.  The resource consumption can be
         severe enough to cause system hangs or crashes.

         (Reference the discussion in "3) CHECK FOR CONSOLE ERROR
          MESSAGES RELATING TO POOL PROBLEMS)

    To determine if pool expansion is affecting, or has affected system
    performance use the DCL command SHOW MEMORY/POOL/FULL.

    If the "Current Size (bytes)" is greater than the "Initial Size
    (NPAGEDYN)", pool expansion has occurred and performance was
    affected.  However, it's important to note that SOME POOL EXPANSION
    IS ACCEPTABLE and not always indicative of a poorly tuned pool.
    The expansion may be from a singular demand for resources in order
    to test new code, or a temporary shift in workload.  Its also
    important to note that PERFORMANCE WAS ONLY AFFECTED DURING THE
    EXPANSION EPOCH.  If no further expansion is observed over a period
    of time, then it's safe to assume that system performance was
    unaffected.

    There's cause to monitor pool expansion if it's greater than 10-15%.
    Determine if it was uncharacteristic expansion by monitoring the
    size after it's been reinitialized by a reboot (Wait for a scheduled
    reboot.  A 10-15% expansion in nonpaged pool doesn't justify a
    reboot.).  If it expands again by 10-15%, then AUTOGEN should be
    run to tune nonpaged pool.

      Note:
        Ensure that the file SYS$SYSTEM:MODPARAMS.DAT does not contain
        hardcoded entries for NPAGEDYN or NPAGEVIR.  MIN_ values are
        acceptable.

    To preclude possible problems in the population history file for
    nonpaged pool, delete SYS$SYSTEM:LISTPREPOP.DAT after AUTOGEN has
    run just prior to the system reboot.

    Reference Articles:
      [OpenVMS] NPAGEDYN Fragmented as a Result of LISTPREPOP.DAT
      [OpenVMS] Excessive NPAGEDYN Expansion on OpenVMS VAX
      [OpenVMS] V6.n Pool Expansion Problem - Free Packets On Lookaside
                Lists

    If nonpaged pool is constantly expanding, use the System Dump
    Analyzer (SDA) in an attempt to determine what types of data
    structures are consuming the space.

      Example:
        $ ANALYZE/SYSTEM
        SDA> SET OUTPUT <filename>   ! Direct output to a data file
        SDA> SHOW POOL/SUMMARY       ! Write information to the file
        SDA> SET OUTPUT TT:          ! Redirect output to screen
        SDA> EXIT

      Note:
        Symbols appearing in this listing are defined in another database
        article.

             [OpenVMS] What Do SDA's SHOW POOL/SUMMARY Symbols Mean?

    Once it has been determined what structures are occupying all the
    space, any structures which are taking up too much space can be
    investigated.  For example, the WCB (Window Control Block) structure
    increases as files get more fragmented.  If this structure is taking
    up too much pool space, you may need to compress your disk by doing
    an IMAGE BACKUP and RESTORE.


11) CHECK FOR HEAVY DIRECT I/O

    Direct I/O performance is one of the main contributors to the
    overall efficiency of your system.  Direct I/O is typically
    attributed to disk and tape devices and is deemed "Direct I/O"
    because of the direct transfer of data between the users buffer
    and the device, i.e., little or no non-paged pool buffering, or
    CPU intervention is required.

    In evaluating system performance, disk I/O will be the main focus.

    A poorly managed I/O subsystem can degrade the use of both the CPU
    and memory resource.  If disk data is allowed to become fragmented
    CPU power is consumed compensating for the fragmented I/O, which
    robs users of CPU power.  Bottlenecks in the I/O subsystem also
    wastes CPU power as users wait for I/O completion before they can
    become computable.  These same bottlenecks may also inhibit the
    efficient management of memory.

    Traditional methods for analyzing direct I/O performance center
    around monitoring the number and types of I/O to a given device.
    Given the wide variation of acceptable I/O rates for devices,
    i.e., from 20-25 for the RA81 and 140-144 for the RRD44, a
    simpler measurement would be to determine if I/O's must wait
    to be processed.

    The command MONITOR DISK/ITEM=QUEUE can be used to determine if
    disk I/O to a specific device is affecting the systems performance.
    This display shows I/O's that the device couldn't handle because
    it was processing prior requests.

    The following should be investigated if any disk has an "AVE" value
    of 1 or higher:

     a. Use the command MONITOR PROCESS/TOPDIO to determine which
        processes are generating the most direct I/O.  If they're
        accessing the disk in question, attempt to relocate their
        files to less active device.

     b. Use the SHOW DEVICE/FILES <devname> to determine which files
        are being accessed.

        - If the device contains system files, i.e., page, swap, or
          log files, relocate them to a less active device.

          Note:
            If activity to a page, swap, or log file is excessive,
            address that limitation prior to correcting any I/O
            limitations.

       - If the device contains application data, the methods used for
         accessing this data may require analysis.

    Other areas impacting disk I/O performance are fragmentation and
    caching.

    1. Use the MONITOR IO command and check the SPLIT TRANSFER RATE.
       Split transfers are due to file fragmentation and the inability
       to complete the I/O from a single file extent.  Additional I/O's
       are required to account for the fragmentation.  The acceptable
       Split Transfer Rate is 5.

       Alleviate fragmentation with regularly scheduled backup and
       restore operations.

         Note:
           SCSI disk class drivers may limit the size of the disk
           transfer.  Large split I/O rates on SCSI based disks used
           for paging, or containing applications which do large
           transfers, and may not be due to file fragmentation.

   2. Use the MONITOR FCP command to determine the Window Turn Rate.
      Window turn occur when more retrieval pointers for an active
      open file need to read.  This may also be indicative of file
      fragmentation.  A Window Turn Rate in excess of 6 or more should
      be investigated.

      - If file fragmentation  is  a problem, compress the disks with
        an IMAGE BACKUP and RESTORE.

          Note:
            Open files are ignored by Disk compression utilities and
            may never get compressed.  This may contribute to a
            fragmentation  problem.

     - Contiguous files larger than 458745 blocks, on a disk with the
       default number of mapping pointers per window, 7, may also cause
       a high Window Turn Rate.

       This can be prevented by increasing the number of mapping
       pointers in the window.  These can be increased to the size
       of the file in blocks, divided by 65000, on a system-wide
       basis, disk-by-disk, or for specific files.

       System-Wide:
         Set the SYSGEN parameter ACP_WINDOW to the number of mapping
         pointers desired per window.  This change should be added to
         MODPARAMS.DAT and AUTOGEN run to adjust the size of non-paged
         pool.

       Disk-by-Disk:
         Include the /WINDOW=n qualifier to the MOUNT command of the
         disk.  This can be added to the file used for mounting the
         disk at startup, or dynamically with the SET VOLUME/WINDOW=n
         command.

       Specific Files:
         Set FAB$B_RTV to the desired number of mapping pointers when
         opening the file for access (see the "VMS Record Management
         Services Reference Manual." for more).

   3. Use the MONITOR FILE_SYSTEM_CACHE command to determine the
      effectiveness of the file system cache.  If the caches are
      ineffective, delays occur whenever the disk is accessed for
      more information.  The cache is considered ineffective if
      the "Hit %" is lower than 80, with an "Attempt Rate" greater
      than, or equal to 1 per second (using the "AVE" column).

        Note:
          Ignore any cache which averages under 1 attempt per second.

      If one or more of the caches is deemed ineffective, increase
      the related ACP_cache SYSGEN parameter.

        Example:
          In order to increase the effectiveness of the "File Hdr"
          cache, increase the SYSGEN parameter ACP_HDRCACHE.  See
          the "Guide to OpenVMS Performance Management", Section 5,
          for a complete SYSGEN parameter to ACP cache association.

      Add any parameter changes to MODPARAMS.DAT and run AUTOGEN.

      It's also possible that file system caches failed to allocated
      at system startup.  This can be determined by issuing the DCL
      command SHOW DEVICE/FULL <disk-name> and checking the field
      "Maximum buffers in FCP Cache".  This field shows the number of
      buffers currently allocated for the cache.  If this value is 14,
      there was insufficient PAGEDYN to allocate the entire cache.
      If this has occurred the SYSGEN parameter PAGEDYN should be
      increased, even if there's an abundance of free paged dynamic
      memory.

        Note:
          Current values for paged dynamic memory may be viewed with
          the DCL command SHOW MEMORY/POOL/FULL.

   4. Use the command SHOW MEMORY/CACHE/FULL to determine if Virtual
      I/O Cache can be enabled, or enhanced, in an attempt to reduce
      the I/O rates on frequently accessed files.

      Reference Article:
        [OpenVMS] How to Interpret Info From SHOW MEMORY/CACHE/FULL
                  on Alpha
        [OpenVMS] How to Interpret Info From SHOW MEMORY/CACHE/FULL
                  on VAX

   5. Systems in a cluster sharing disks increase the possibility of
      I/O bottlenecks.  To determine I/O rates for disks available
      cluster wide, use the MONITOR CLUSTER command.  This display
      sums the I/O from all cluster nodes to provide an overall I/O
      rate for the disks.

        Note:
          The disk farm on most clusters is large and the MONITOR
          CLUSTER display only shows the 6 disks with the highest
          I/O rate.  To see the I/O rate for all drives in a cluster
          use the following commands to collect MONITOR data and
          create an ASCII file:

              $ MONITOR/NODISPLAY/RECORD=<FILEA> CLUSTER
              $ MONITOR/NODISPLAY/INPUT=<FILEA>/SUMMARY=<FILEB> CLUSTER

          Be aware that using the /RECORD qualifier can produce large
          files.  Statistics gathered on a 7 node cluster, with 200+
          disks, for 10 minutes, can produce files in excess of 6000
          blocks.  See the "OpenVMS System Management Utilities
          Reference Manual", section 15, for information regarding the
          MONITOR utility.

      Consider the following to distribute the I/O load and improve
      throughput for drives with high I/O rates:

      - Distribute system and user files across several disks and
        HSC requestors.

      - Create bound volume sets, which help distribute the I/O load
        across multiple disks.

      - If the operations to a disk are mostly reads, Volume Shadowing
        may improve performance.

      - Multiple nodes sharing the same system disk can reduce cluster
        performance if the system disk becomes saturated.

      - Move Page and swapfiles to non-system disks to reduce the I/O
        load.

        Reference Article:
          [OpenVMS] How To Move Satellite Page/Swap Files From System
                    Disk To Local Disk


12) CHECK FOR SWAPPING

    Use the DCL command "SHOW SYSTEM" or "SHOW MEMORY/SLOTS" to
    determine if processes have been outswapped.

      Example:
        $ SHOW MEMORY/SLOTS
               System Memory Resources on 31-JUL-1997 11:29:08.37
        Slot Usage (slots):   Total   Free  Resident  Swapped
          Process Entry Slots    70     45        24        1 <-+
          Balance Set Slots      63     41        22        0   |
                                                                |
          Note:                                                 |
            Both commands show an outswapped process.  ---------+
                                                                |
        $ SHOW SYSTEM                                           |
            OpenVMS V6.2  on a node  31-JUL-1997 11:31:14.60    |
        Pid    Process Name State Pri  I/O       CPU            |
        00000081 SWAPPER      HIB    16    0   0 00:00:15.04    |
                   .                 .            .             |
                   .                 .            .             |
        0000008F EVL          HIB     6   49   0 00:00:00.48    |
        00000090 REMACP       HIBO    8    --swapped out--  <---+

    Processes are outswapped by OpenVMS for 1 of 3 reasons:

      1. Insufficient memory for the current workload.
      2. Insufficient balance set slots for the current workload.
      3. Proactive memory management determines that processes
         need to be outswapped.

    Swapping is a function of the Memory Management piece of
    OpenVMS, which attempts to ensure the efficient use of a
    systems physical memory in relationship to the requirements
    placed on it by the users, applications, and system code.
    The thresholds used to define this relationship are driven
    by the requirements of the operating system and the demands
    of the user community.

    When the current demands on memory resources can no longer
    be managed within memory's physical boundaries, or user
    defined thresholds are reached, Memory Management will begin
    to outswap processes.

    Swapping processes out of physical memory isn't necessarily
    bad.  It should be investigated if the Ave "Inswap Rate"
    of the MONITOR IO display is greater than 1.  Outswapped
    process' are treated like any other process and will be
    inswapped when they become computable.  However, once a
    process has been outswapped, it will remain outswapped until
    it has work to do and becomes computable.  The process will
    remain outswapped, even if there's an abundance of free
    memory and balance slots.

    Reference Article:
      [OpenVMS] Why Processes Are Swapped Out Of Physical Memory


13) CHECK FOR PAGED POOL DEPLETION

    Paged pool contains data structures used by multiple processes but
    not required to be permanently memory-resident.  It contains the
    shareable logical name table, XQP caches, access control list
    elements, and global section descriptors to name but a few.

    Performance impacts caused by the depletion of paged pool are the
    degradation in I/O processing due to reduced XQP caches, or the
    inability of users to login because logical name tables can't be
    created.

    To determine if the depletion of paged pool is effecting system
    performance, use the command SHOW MEMORY/POOL/FULL.

    If "Free Space (bytes)" is less than 20% of "Current Size (PAGEDYN)",
    or the "Size of Largest Block" is less than 20K bytes, then the
    consumption of paged pool needs to be analyzed.

    Reference Article:
      [OpenVMS] Insufficient Dynamic Memory Errors (INSFMEM)

    Use AUTOGEN with FEEDBACK to modify the PAGEDYN parameter.  Ensure
    that AUTOGEN's input file, SYS$SYSTEM:MODPARAMS.DAT doesn't contain
    an absolute definition for PAGEDYN.  A MIN_ value based on the
    following may be included:

       MIN_PAGEDYN = CUR_PAGEDYN+((CUR_PAGEDYN/10)*2)

    If paged pool is constantly being depleted, use the System Dump
    Analyzer (SDA) in an attempt to determine what types of data
    structures are consuming the space.

      Example:
         $ ANALYZE/SYSTEM
         SDA> SET OUTPUT <filename>   ! Direct output to a data file.
         SDA> SHOW POOL/SUMMARY       ! Write information to the file
         SDA> SET OUTPUT TT:          ! Redirect output to screen
         SDA> EXIT

      Note:
        Symbols appearing in this listing are defined in another
        database article.

          [OpenVMS] What Do SDA's SHOW POOL/SUMMARY Symbols Mean?

    Once it has been determined what structures are occupying all the
    space, any structures which are taking up too much space can be
    investigated.


14) CHECK FOR AUTOMATIC WORKING SET DECREMENTING

    To determine if Automatic Working Set Decrementing is enabled on
    your system, check the SYSGEN parameters 'PFRATL' and 'WSDEC'.
    If both parameters are set to non-zero values, Automatic Working
    Set Decrementing is turned on.

    Automatic Working Set Decrementing removes working set pages from
    processes that are Page Faulting less than PFRATL.  This often
    induces more Page Faulting and can degrade system performance.
    For example, if several SHOW SYSTEM displays indicate that working
    sets are oscillating (e.g., 'Ph.Mem'" keeps rising and falling for
    many processes), but there is always plenty of memory on the Free
    List, then Automatic Working Set Decrementing could be slowing down
    your system.

    Automatic Working Set Decrementing should be turned OFF unless you
    know that your system benefits from it.  If you have Automatic
    Working Set Decrementing turned ON because it was once beneficial,
    you might want to reevaluate its current benefit by testing the
    system for a period of time with it turned OFF.  For most systems,
    this feature decreases system performance rather than increases it.

    To turn OFF Automatic Working Set Decrementing, the SYSGEN parameter
    'PFRATL' should be set to 0.  This is a dynamic parameter and can be
    changed on a live system.  If changing PFRATL improves performance,
    be sure to change this parameter in the SYSGEN CURRENT database so
    the new value will be retained after the next reboot.  You should
    also add this parameter to SYS$SYSTEM:MODPARAMS.DAT so it will be
    retained by AUTOGEN.

    For more information on Automatic Working Set Decrementing, refer
    to the "Guide to VMS Performance Management", Sections 2.2.1,
    4.2.1.7, 5.2.7 and 5.2.8.

      Note:
        Some layered products automatically set the parameters 'PFRATL'
        and 'WSDEC' in the file SYS$SYSTEM:MODPARAMS.DAT during
        installation.


15) CHECK FOR HEAVY BUFFERED I/O

    Buffered I/O involves a data transfer between user space and a
    system space buffer in nonpaged pool.  A higher than normal
    buffered I/O rate will consume nonpaged pool and CPU resources

    Buffered I/O activity is typically attributed to:

         - Communications devices
         - Line printers
         - Graphic devices
         - Instrumentation monitors
         - Terminal Emulation Devices
         - Mailbox I/O
         - Office applications

    To determine the systems overall buffered I/O rate issue the command
    "MONITOR IO".

    The best way to determine if the system has a "HEAVY" buffered I/O
    rate is by comparing the current rate with those experienced when
    the system's performance wasn't in question. You can also compare
    the current rate with (CPU_FACTOR x 100).

      Note:
        A CPU_FACTOR table has been included at the end of this article.

    The MONITOR PROCESS/TOPBIO command can be used to display
    users with the highest buffered I/O rates.


16) CHECK PROCESSOR MODES

    The time a CPU spends in a given mode (Kernel, Super, Exec, or
    User), or on the interrupt stack, may be indicative of the overall
    performance health of your system.  Use the MONITOR MODE command
    to determine if your processor is spending excessive time in any
    one mode.  Use the "AVE" column for a basic assessment of the time
    spent in a given mode.

    The numbers given below are typical numbers for general timesharing,
    engineering/scientific, and commercial (I/O, Database, or ALL-IN-1
    intensive) environments.

                              Kernel  Super  Exec  User
      Engineering/Scientific:   10%-    1%     3%   80%+
         General Timesharing:   25%     3%     8%   50%+
                  Commercial:   38%+    1%    10%   40%-

    However, the best way to determine the CPU mode statistics for your
    environment is by comparing the current rates with those experienced
    when the system's performance was at an acceptable level.

    Following are examples and basic explanations for CPU modes in a
    typical timesharing environment:

    *  INTERRUPT STACK

       The time the processor spends responding to interrupts. If it
       averages more than 15%, it could indicate possible high device
       interrupts, often from character interrupt devices such as DZ11s.
       If it is consistently very high (over 70%) it could indicate a
       faulty device continually interrupting the processor.

       Other things to look for with high interrupt stack time include
       SCS cluster traffic, DLOCK (Distributed Lock Management) and
       LAVC disk serving.

         Note:
           Often in NI and MI clusters, high interrupt stack time is
           directly proportional to the amount of disk serving on a
           given node.  Keeping I/O intensive applications on nodes
           with direct access to the applications disks is often the
           best solution.

    *  MP SYNCHRONIZATION

       The time in which a multiprocessing system experiences acquiring
       spinlocks.  OpenVMS uses spinlocks to synchronize access to data
       structures shared by the CPUs in a Symmetric Multiprocessing
       (SMP) environment.

       Reference Article:
         [OpenVMS] Excessive Amounts Of "MP Synchronization" Time On
                   SMP System

    *  KERNEL MODE

       The time the processor spends in system functions such as
       system services, handling page faults, file processing, DECnet
       processing, handling local lock management and setting up
       physical I/O to devices.

       File fragmentation can also cause high KERNEL mode time because
       I/O operations must be split into multiple requests to access
       the fragmented pieces of the file. This is most noticeable on
       slower processors.

       A KERNEL mode time over 25% indicates a possible problem and
       should be investigated.

         Note:
           Some applications such as ALL-IN-1 often cause KERNEL mode
           times to be greater than 25%, because their services execute
           in that mode.  These cases are not a cause for concern.

       INTERRUPT STACK time and KERNEL mode time together should
       average under 40% of the total CPU time.

         CAUTION:
           Running BACKUP during prime time can greatly degrade overall
           system performance.  BACKUP can take over 50% of the CPU
           time, mostly in KERNEL mode.

    *  EXECUTIVE MODE

       The time the processor spends in RMS activity, often directly
       related to file design.

    *  SUPERVISOR MODE
        The time the processor spends in DCL functions.  This is usually
       a small amount of time, unless you are heavily using DCL command
       procedures, such as a DCL menu interface.

       If SUPERVISOR mode is over 15%, you might consider rewriting some
       of your command procedures as programs instead.

         Note:
           Some third-party applications run in SUPERVISOR mode.  In
           these cases, elevated SUPERVISOR mode is normal.  For example,
           earlier versions of the ORACLE [R] product run in SUPERVISOR
           mode (5-Apr-1991).

    *  USER MODE

       The time spent running user programs.  This should be as high
       as possible.

    *  IDLE TIME

       The time spent in the NULL process; "unused" CPU time, when no
       process is ready to use the CPU.


17) CHECK DECnet ACTIVITY

    Use the MONITOR DECnet command to examine packet traffic. Add up
    the following from the "AVE" column:

      a. Arriving Local Packet Rate
      b. Departing Local Packet Rate
      c. Arriving Trans Packet Rate

    If packet traffic is greater than the results from the formula
    (CPU_FACTOR x 100), there's a possibility that 30% of the CPU is
    being used for communications.

      Note:
        A table containing the CPU_FACTOR for most CPU configurations
        is included at the end of this article.

    The number of packets observed in the MONITOR display is entirely
    dependent on the operating environment of the system:

    - Heavy DECnet traffic should be expected on systems operating as
      a "server" in a client-server environment.

    - Configuring a system for routing increases packet traffic. The
      packet traffic can be decreased using a router, e.g., DECNIS or
      RouteAbout.

    - DECnet buffer configurations can effect packet traffic. By default
      DECnet configures the receive buffer size to be the maximum that
      the device can handle.  However, these values can be modified and
      may be affecting packet traffic.

      Reference Article:
        The following article is in the DECNET-VMS database.

          [DECnet-VAX] Setting DECnet Executor Pipeline Quota, with Case
                       Studies

    - Communication errors may be impacting performance.  Use the
      following command to check for "Response timeouts":

         NCP> SHOW EXECUTOR COUNTERS

      If these errors are being logged it indicates that data was
      transmitted but not acknowledgement.  When this occurs the system
      must retransmit the data, which consumes CPU power.  Initial
      indications will typically be complaints of slow network response.

      If "Response timeouts" are being logged and network response time
      is affected, troubleshoot the network to determine the cause of
      the lost packets.

    If it's believed that network activity is impacting the systems
    performance, contact network support.


SUMMARY:

For a more detailed analysis of your systems performance use the
"OpenVMS Performance Management" manual, or a performance analysis
application like "POLYCENTER Performance Advisor for OpenVMS".

The following article may also be of assistance in the future:

   [OpenVMS] Tuning an OpenVMS System After Adding/Removing Memory


CPU_FACTOR TABLES:

The CPU_FACTOR variable is an unofficial, gross estimation of a systems
resource potential.  The variable is only meant for use in this article.

  NO BENCHMARK STUDY WAS CONDUCTED TO DETERMINE THEIR VALUES!

The CPU_FACTOR tables contain values for a majority of the single CPU
system running OpenVMS.  To determine the CPU_FACTOR for single CPU
configurations, use the numeric portion of the output from the following
command in the tables below, i.e., a "DEC 7000 Model 610" is in the
Alpha systems table as 7000-610:

    $ WRITE SYS$OUTPUT F$GETSYI("HW_NAME")

For multi-CPU configurations, use the output from the above command
and search the tables for the single CPU version of your configuration.

  Example:
    The "7000-610" is the single CPU version for a "DEC 7000 Model 640".

Use this CPU_FACTOR in the following command to determine the
CPU_FACTOR for the multi processor.  Replace the "FAC" variable with
the CPU_FACTOR from the table and "CPU" with the number of CPUs in the
multi processor:

     $ CPU_FACTOR = FAC+((((FAC/10)/2)*16)*(CPU-1))
     $ SHOW SYMBOL CPU_FACTOR

   Example:
     Determining the CPU_FACTOR for a "DEC 7000 Model 640", a 4 CPU
     multi processor with a single CPU_FACTOR of 132.

        $ CPU_FACTOR = 132+((((132/10)/2)*16)*(4-1))
        $ SHOW SYMBOL CPU_FACTOR
        CPU_FACTOR = 420


                      VAX Systems:
     +------------+-------+  +------------+-------+
     | HW_        |CPU_   |  | HW_        |CPU_   |
     |     Name   | FACTOR|  |     Name   | FACTOR|
     +------------+-------+  +------------+-------+
     | 3100-M76   |     5 |  | 3100/SPX   |     3 |
     | 3100       |     2 |  | 3100-30/40 |     2 |
     | 3100-80    |     8 |  | 3100-85    |    15 |
     | 3100-90    |    19 |  | 3100-95    |    27 |
     | 3400       |     2 |  | 3600       |     2 |
     | 3900       |     3 |  | 4000 VLC   |     5 |
     | 4000-30    |     1 |  | 4000-60    |    10 |
     | 4000-90    |    19 |  | 4000-90A   |    24 |
     | 4000-100   |    19 |  | 4000-105A  |    27 |
     | 4000-200   |     4 |  | 4000-300   |     6 |
     | 4000-400   |    13 |  | 4000-500   |    19 |
     | 4000-600   |    25 |  | 4000-700A  |    33 |
     | 4000-705A  |    37 |  | 6000-610   |    23 |
     | 6000-210   |     2 |  | 6000-310   |     3 |
     | 6000-410   |     5 |  | 6000-510   |    11 |
     | 7000-600   |    25 |  | 7000-610   |    25 |
     | 7000-700   |    43 |  | 7000-710   |    43 |
     | 7000-810   |    52 |  | 8250       |     1 |
     | 8500       |     3 |  | 8530       |     3 |
     | 8550       |     4 |  | 8600       |     3 |
     | 8650       |     4 |  | 8700       |     4 |
     | 8800       |     4 |  | 8810       |     4 |
     | 9000-110   |    32 |  | 9000-210   |    30 |
     | 9000-310   |    32 |  |10000-610   |    25 |
     +--------- --+-------+  +------------+-------+

                   Alpha Systems:
     +------------+-------+  +------------+-------+
     | HW_        |CPU_   |  | HW_        |CPU_   |
     |     Name   | FACTOR|  |     Name   | FACTOR|
     +------------+-------+  +------------+-------+
     |  200 4/100 |    54 |  |  200  4/133|    74 |
     |  200 4/166 |   116 |  |  200  4/233|   113 |
     |  200 4/266 |   198 |  |  200  4/300|   215 |
     |  250 4/133 |    74 |  |  250  4/166|   116 |
     |  250 4/200 |   131 |  |  250  4/233|   131 |
     |  250 4/266 |   198 |  |  250  4/300|   215 |
     |  255 4/133 |    74 |  |  255  4/166|   116 |
     |  255 4/200 |   131 |  |  255  4/233|   180 |
     |  255 4/266 |   198 |  |  255  4/300|   215 |
     |  400 4/166 |   116 |  |  400  4/233|   161 |
     |  400 4/300 |   215 |  |  500  5/266|   329 |
     |  500 5/300 |   319 |  |  500  5/333|   389 |
     |  600 5/333 |   412 |  |  600  5/300|   337 |
     |  600 5/266 |   329 |  | 1000  4/200|   135 |
     | 1000 4/233 |   165 |  | 1000A 4/266|   197 |
     | 1000 5/333 |   389 |  | 2000  4/200|   131 |
     | 2000 4/275 |   202 |  | 2000  5/250|   277 |
     | 2000 4/233 |   177 |  | 2000  5/300|   319 |
     | 2000-300   |    80 |  | 2000-500   |    81 |
     | 2100 4/200 |   131 |  | 2100  4/233|   177 |
     | 2100 5/300 |   319 |  | 2100  4/275|   202 |
     | 2100 5/250 |   277 |  | 3000-300   |    66 |
     | 3000-300L  |    45 |  | 3000-300LX |    68 |
     | 3000-300x  |    90 |  | 3000-400   |    74 |
     | 3000-500X  |   110 |  | 3000-500   |    84 |
     | 3000-600   |   114 |  | 3000-700   |   162 |
     | 3000-800   |   138 |  | 3000-900   |   200 |
     | 4000-610   |    94 |  | 4000-710   |   122 |
     | 7000-610   |   132 |  | 7000-710   |   200 |
     | 8200 5/300 |   341 |  | 8200 5/350 |   432 |
     | 8400 5/300 |   341 |  | 8400 5/350 |   432 |
     |10000-610   |   132 |  |            |       |
     +------------+-------+  +------------+-------+


RELATED ARTICLES:

The following articles have been referenced throughout this document,
the majority of which may be found in the OPENVMS or DECNET-VMS
database.  The titles displayed represents the title text at the time
this article was written and may change without notice.

  [OpenVMS] Reasons for RWMPB/RWMPE States and PAGEFRAG/PAGECRIT
            Messages
  [OpenVMS] How to Troubleshoot a Process in RWNPG
  [OpenVMS] How to Troubleshoot a Process in RWPAG
  [OpenVMS] How to Troubleshoot a Process in RWMBX
  [OpenVMS] How to Troubleshoot a Process in RWCSV State
  [OpenVMS] How to Troubleshoot a Process in MUTEX State
  [OpenVMS] How To Troubleshoot A Process In RWAST On VAX or Alpha
  [OpenVMS] How To Troubleshoot a Hung Process
  [OpenVMS] Why Processes Are Swapped Out Of Physical Memory
  [OpenVMS] How To Determine PAGEFILE and SWAPFILE Usage From SDA
  [OpenVMS] Managing Dumpfiles on VAX & Alpha Systems or Clusters
  [OpenVMS] Enabling Dumpfiles Off the System Disk (DOSD)
  [OpenVMS] How To Create Secondary Page/Swap Files and Remove
            Primaries
  [OpenVMS] How To Move Satellite Page/Swap Files From System Disk To
            Local Disk
  [OpenVMS] VIRTUALPAGECNT, PGFLQUO, PAGEFILE:  How Are They Related?
  [OpenVMS] Monitoring Page And Swap File Usage From A Command
            Procedure
  [OpenVMS] What Is The Maximum Size For An Installed Pagefile?
  [OpenVMS] Adaptive Pool Management: A Description of Nonpaged Dynamic
            Memory
  [OpenVMS] NPAGEDYN Fragmented as a Result of LISTPREPOP.DAT
  [OpenVMS] Excessive NPAGEDYN Expansion on OpenVMS VAX
  [OpenVMS] V6.n Pool Expansion Problem - Free Packets On Lookaside
            Lists
  [OpenVMS] Methods That Can Be Used To Recover Free Space On The
            System Disk
  [OpenVMS] How to Read Mailbox, MBAn:, Data Without Removing the
            Messages
  [OpenVMS] What Do SDA's SHOW POOL/SUMMARY Symbols Mean?
  [OpenVMS] Insufficient Dynamic Memory Errors (INSFMEM)
  [OpenVMS] Details On How Proactive Memory Reclamation Works
  [OpenVMS] How to Interpret Info From SHOW MEMORY/CACHE/FULL on Alpha
  [OpenVMS] How to Interpret Info From SHOW MEMORY/CACHE/FULL on VAX
  [OpenVMS] Excessive Amounts Of "MP Synchronization" Time On SMP
            System
  [OpenVMS] Memory Reclamation From The Virtual I/O Cache (VIOC)
  [DECnet-VAX] Setting DECnet Executor Pipeline Quota, with Case
               Studies
\
\
\^ 0093AD55-AED9C840-1C03C5            ! Excessive "MP Synchronization"
\^ CHAMP_SRC941004004748               ! SHOW MEMORY/CACHE/FULL VAX
\^ CHAMP_SRC950314006630               ! SHOW MEMORY/CACHE/FULL Alpha
\^ 0096BFEE-A02B1CA0-1C0096            ! Proactive Memory Reclamation
\^ 00945720-52DC0CE0-1C0181            ! INSFMEM Errors
\^ 0903A27-99D45720-0801E7             ! SDA's POOL Symbol Meaning
\^ CHAMP_SRC940308008176               ! Memory Reclamation VIOC
\^ 0092C1BE-47974A00-1C0069            ! How to Read Mailbox
\^ 00930EB4-FB827740-1C01E7            ! Recovering System Disk Space
\^ 0092E439-7B137FA0-1C01E7            ! Tuning After Adding Memory
\^ 0093B3C6-90E3A140-1C01E7            ! VIRTUALPAGECNT PGFLQUO PAGEFILE
\^ 00968F6D-A41BF020-1C0186            ! Managing Dumpfiles
\^ 00991660-11C190A0-1C0096            ! Enabling DOSD
\^ CHAMP_SRC940122000268               ! NPAGEDYN Fragmented LISTPREPOP.DAT
\^ CHAMP_SRC940302008138               ! NPAGEDYN Expansion VAX
\^ 0097DE0E-FA99C940-1C0186            ! V6.n Pool Expansion Problem
\^ CHAMP_SRC931013001093               ! RWMPB/RWMPE PAGEFRAG/PAGECRIT
\^ 00982B25-CD907340-1C0186            ! Troubleshoot RWNPG
\^ 00905F41-5E6932C0-0801E7            ! Troubleshoot RWPAG
\^ 009913AC-9FA936E0-1C0186            ! Troubleshoot RWMBX
\^ 0095E662-43AECB40-1C02A1            ! Troubleshoot RWCSV
\^ 0098351C-C6948BC0-1C0096            ! Troubleshoot MUTEX
\^ 0094A663-57DAB060-1C0069            ! Troubleshoot RWAST
\^ 00948BB0-0F102B80-1C0069            ! Troubleshoot Hung Process
\^ CHAMP_SRC931013003699               ! Why Swapped Out
\^ 00924724-D81F9920-1C0069            ! PAGEFILE SWAPFILE SDA
\^ 0090746C-5EF5EE00-0801E7            ! Monitoring Page Swap Usage
\^ CHAMP_SRC930928003418               ! Maximum Size Pagefile
\^ CHAMP_SRC940802005584               ! Adaptive Pool Management
\^ 0093BF7C-BF52F000-1C009F            ! Create Secondary Page/Swap Files
\^ 0091A227-EF8A7D20-1C01E7            ! Move Satellite Page/Swap Files
\^ 009BD243-C877FC20-1C0186            ! Unique id of this article


REFERENCES:

"OpenVMS Performance Management", (AA-R237A-TE)

"Volume Shadowing for OpenVMS", (AA-PVXMD-TE)

"OpenVMS Guide to System Security", (AA-Q2HLC-TE)

"VMS Internals and Data Structures, V5.2", (EY-C171E-DP-ECG)

"OpenVMS AXP Internals and Data Structures, V1.5", (EY-Q770E-DP)

"OpenVMS Alpha System Dump Analyzer Utility Manual", (AA-PV6UC-TE)

"OpenVMS VAX System Dump Analyzer Utility Manual", (AA-PV6TB-TE)

"OpenVMS Version 7.1 Release Notes", (AA-QSBTB-TE)

"OpenVMS System Management Utilities Reference Manual: A-L",
 (AA-PV5PD-TK)

"OpenVMS System Management Utilities Reference Manual: M-Z",
 (AA-PV5QD-TK)

"OpenVMS System Manager's Manual: Tuning, Monitoring, and Complex
 Systems), (AA-PV5ND-TK)

