1291 lines
63 KiB
HTML
1291 lines
63 KiB
HTML
<HTML><HEAD><TITLE>SGI Embedded Support Partner - Help</TITLE></HEAD>
|
|
<BODY BGCOLOR="#ffffcc">
|
|
<P> </P>
|
|
<font face="Arial,Helvetica">
|
|
<A NAME="readme_first"></A>
|
|
<b>Embedded Support Partner ASCII console usage notes</b>
|
|
<p>The SGI Embedded Support Partner ASCII console is a set of means
|
|
that provides access to the SGI Embedded Support Partner
|
|
facilities for users running cursor-addressable,
|
|
character-cell display devices (e.g., vt100 terminals,
|
|
vt100 emulators, or any other "curses-oriented" displays).
|
|
|
|
<p>In order to operate the Embedded Support Partner User Interface
|
|
from such display device, the Lynx WEB browser must be used.
|
|
It is expected that the executable file of Lynx browser will be
|
|
installed into /usr/local/bin subdirectory. Please, refer
|
|
to the Lynx's documentation about the installation of this
|
|
browser.
|
|
|
|
<p>Since there are significant differences between usage of
|
|
graphics-based Web Browser (Netscape) and ascii-based
|
|
Web Browser (Lynx) it is strongly recommended for a person
|
|
who does not have previous experience of working with Lynx
|
|
to refer to the documentation about general usage of this
|
|
WEB Browser as well as intradocument and interdocument
|
|
navigation.
|
|
|
|
<p>Due to dynamic nature of the user interface it is essential
|
|
to ensure that the HTML pages displayed by Lynx are current
|
|
and have not been loaded from Lynx's cache.
|
|
|
|
<p>To ensure that you have to follow a few simple rules:
|
|
|
|
<ol><li>Use "x" (NOCACHE on keymap) to activate links on the page
|
|
to ensure that this page is loaded from the server and not from
|
|
the cache.
|
|
|
|
<li>If you used "Backspace" or "Delete" ("HISTORY" on keymap) to
|
|
get history of the visited pages, you can must "x" (NOCACHE) to
|
|
return to the page that you selected.
|
|
|
|
<li>If you used "PREV_DOC" key to return to the previous document and
|
|
you need to refresh this page, hit "Backspace" then "x" to do the job.
|
|
</ol>
|
|
|
|
<p><b>Tip.</b> Press "k" to get current key assignment.
|
|
|
|
<A NAME="sysinfo"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > Introduction</b>
|
|
<P>The <b>SYSTEM INFORMATION</b> category provides information about the
|
|
system on which the Single System Manager is running.</P>
|
|
<P>Use the commands in this category to display the following types of
|
|
system information:</P>
|
|
<UL>
|
|
<LI>Hardware configuration for a specific date and time
|
|
<LI>Software configuration and version information for a specific date and time
|
|
<LI>System changes between a range of dates
|
|
<LI>Part changes for a specific hardware component
|
|
<LI>Events that have occurred on the system
|
|
<LI>Actions that the SGI Embedded Support Partner has performed
|
|
<LI>Availability information for a specified range of dates
|
|
</UL>
|
|
All reports in this category display general system information:
|
|
<UL>
|
|
<LI>System name
|
|
<LI>System identification number
|
|
<LI>System serial number
|
|
<LI>IP type
|
|
<LI>System IP address
|
|
</UL>
|
|
|
|
<A NAME="sysinfo_hardware"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > Hardware</b>
|
|
<P>Use this command to display the hardware configuration of the system,
|
|
which existed at a specific time on a specific date.</P>
|
|
<p>Hardware configuration information is available for the following systems:</p>
|
|
<ul>
|
|
<li>IP19 - Challenge/Onyx
|
|
<li>IP21 - Power Challenge/Power Onyx
|
|
<Li>IP25 - Power Challenge 10000/Power Onyx 10000
|
|
<li>IP27 - Origin2000/Onyx2
|
|
<li>IP29 - Origin200
|
|
<li>IP30 - Octane
|
|
<li>IP32 - O2
|
|
</ul>
|
|
<P>If you are interested in hardware information for a specific date/time, enter the desired date/time in the
|
|
appropriate field.</p>
|
|
<p>You must select a database that corresponds to the date that you specified.<p>
|
|
<P>The information is displayed in a hierarchical manner.
|
|
If information is not available or not applicable, "N/A" is displayed.</P>
|
|
<P>The first column of the report table can include the following symbols: "[+]" or "[-]".
|
|
Selecting "[+]" symbol expands the table to display the subcomponents that compose the selected component.
|
|
Selecting "[-]" symbol collapses the subcomponent display.</P>
|
|
The other columns of the table contain the following information:
|
|
<pre> NAME The name of the component
|
|
LOCATION The location of the component
|
|
PART_NUMBER The part number of the component
|
|
SERIAL_NUMBER The serial number of the component
|
|
REVISION The revision level of the component</pre>
|
|
<A NAME="sysinfo_software"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > Software</b>
|
|
<P>Use this command to display the software configuration of the system and version information
|
|
that existed at a specific time on a specific date.</P>
|
|
<p>If you are interested in software information for a specific date/time, enter the desired date/time in
|
|
the appropriate field.
|
|
Otherwise, the latest available information wil be displayed.</p>
|
|
<p>You must select the database that corresponds to the date that you specified.</p>
|
|
|
|
<p>This report lists the software that was installed on the system at the time you specify.
|
|
The installed software is listed 10 items per page.
|
|
Symbol ">" lists the next 10 pages,
|
|
">>" goes to the last page.
|
|
Symbol "<" lists the previous 10 pages, and "<<" returns to the first page.
|
|
<p>The report table provides the following information:
|
|
<pre> NAME The name of the software
|
|
VERSION The version number of the software
|
|
INSTALL_DATE The date on which the software was installed
|
|
DESCRIPTION A description of the software
|
|
</pre>
|
|
<A NAME="sysinfo_system_changes"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > System Changes</b>
|
|
<P>Use this command to view any system changes that occurred within the range of dates that you specify.</P>
|
|
<p>If you do not specify a date, all system configuration changes are displayed.
|
|
<p>You must select the database that corresponds to the dates that you specified.</p>
|
|
<p>System change information can be collected from only one database at a time.</p>
|
|
<P>The SGI Embedded Support Partner tracks the following types of system changes:</P>
|
|
<ul>
|
|
<li>Software changes
|
|
<li>Hardware changes
|
|
<li>System changes
|
|
</ul>
|
|
<P>The software table describes all software changes that occurred during the period
|
|
of time that you specified. The table provides the following information:
|
|
<pre> NAME The name of the software
|
|
VERSION The version number of the software
|
|
INSTALL_DATE The date on which the software was installed
|
|
DEINSTALL_DATE The date on which the software was deinstalled
|
|
DESCRIPTION A description of the software
|
|
</pre>
|
|
<P>The hardware table describes all hardware changes that occurred during the period of time
|
|
that you specified. The table provides the following information:
|
|
<pre> NAME The name of the part
|
|
LOCATION The location of the part
|
|
PART_NUMBER The part number for the part
|
|
SERIAL_NUMBER The serial number of the part
|
|
REVISION The revision level of the part
|
|
INSTALL_TIME The date on which the component was installed
|
|
DEINSTALL_TIME The date on which the component was deinstalled.
|
|
</pre>
|
|
<P>The system changes table describes all system changes (for example, hostname, IP address change, and so on)
|
|
that occurred during the period of time that you specified.
|
|
|
|
<A NAME="sysinfo_part_changes"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > Part Changes</b>
|
|
<P>Use this command to view the transaction history of a part.</p>
|
|
<P>You must enter the component serial number. (If
|
|
necessary, use the <a href="#sysinfo_hardware">SYSTEM Information >
|
|
Hardware</a> to locate a serial number.)</P>
|
|
<p>You must choose a database to view the history of
|
|
the component whose serial number you entered above.</P>
|
|
<p>The report table lists
|
|
the name of the component,
|
|
the module number in which the component was installed,
|
|
the part number of the component,
|
|
the serial number of the module,
|
|
the revision number of the part,
|
|
and the slot number in which the component was installed.</P>
|
|
<A NAME="sysinfo_events"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > Events Registered</b>
|
|
<P>Use this command to view information about events that SGI Embedded
|
|
Support Partner has registered.</P>
|
|
<P>Enter a range of dates for the events that you want to view.
|
|
Then, choose the type of event information that you want to view.
|
|
The following options are available:</P>
|
|
<pre>
|
|
All System Events
|
|
Specific System Event
|
|
System Events by Class
|
|
</pre>
|
|
<A NAME="screens_all_sys_events1"></A>
|
|
<p><b>All System Events</b>
|
|
<P>The report table provides the following information about events that were registered
|
|
within the selected range of dates:</P>
|
|
<pre>
|
|
Event Class The class in which the event belongs
|
|
(for example, Availability)
|
|
|
|
Event Description A brief description of the event
|
|
|
|
Event ID The unique identification number assigned
|
|
to this event. You can use this number to
|
|
find this event via SYSLOG
|
|
|
|
First Occurrence The date and time that the event first occurred
|
|
|
|
Last Occurrence The date and time that the event last occurred.
|
|
If Number of Occurrences is 1, the time value of
|
|
the First Occurrence and the time value of the
|
|
Last Occurrence will be identical
|
|
|
|
Number of Occurrences The number of times that the event occurred.
|
|
This number corresponds to the number of events
|
|
that must occur before registration begins.
|
|
By default, this number is 1.
|
|
</pre>
|
|
<A NAME="screens_specific_sys_event1"></A>
|
|
<p><b>Specific System Event</b>
|
|
<P>Use this report to track a specific event that is associated with an actual or suspected
|
|
system problem. Choose an event class from the list that appears.</P>
|
|
<A NAME="screens_specific_sys_event2"></A>
|
|
<P>Use this page to specify the event that you want to view. Choose the
|
|
event from the list of events in the class that you have already specified.</P>
|
|
<A NAME="screens_specific_sys_event3"></A>
|
|
<P>The report table provides the following information about the event registrations between the
|
|
selected range of dates:</P>
|
|
<pre>
|
|
First Occurrence The date and time that the event first occurred
|
|
Last Occurrence The date and time that the event last occurred.
|
|
If Number of Occurrences is 1, the time value of
|
|
the First Occurrence and thetime value of the
|
|
Last Occurrence will be identical.
|
|
|
|
Number of Events The number of times that the event occurred.
|
|
This number corresponds to the number of events
|
|
that must occur before registration begins.
|
|
By default, this number is 1.</TD>
|
|
</pre>
|
|
<A NAME="screens_sys_event_class"></A>
|
|
<P><b>System Events by Class</b>
|
|
<P>Use this report when you need information about events that are associated with a specific
|
|
class. For example, use Memory class to track various memory
|
|
events. Choose the appropriate class for the event that you want to view. </P>
|
|
<A NAME="screens_sys_event_class2"></A>
|
|
<P>The report table provides the following information about events that were registered between the
|
|
selected range of dates: </P>
|
|
<pre>
|
|
Event Description A brief description of the event
|
|
|
|
Event ID The unique identification number
|
|
assigned to this event
|
|
|
|
First Event Occurrence The date and time that the event first occurred
|
|
|
|
Last Event Occurrence The date and time that the event last occurred.
|
|
If Number of Occurrences is 1, the time value of
|
|
the First Occurrence and the time value of the
|
|
Last Occurrence will be identical
|
|
|
|
Number of Events The number of times that the event occurred.
|
|
This number corresponds to the number of events
|
|
that must occur before registration begins.
|
|
By default, this number is 1.
|
|
</pre>
|
|
<A NAME="sysinfo_actions"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > Actions Taken</b>
|
|
<P>Use this command to display information about actions that have been performed
|
|
by SGI Embedded Support Partner.</P>
|
|
<p>Specify the range of dates for which you want to report actions taken.
|
|
If you do not enter a date, this option defaults to the current date.</p>
|
|
<p>You must choose one of the two available types of reports:</p>
|
|
<pre>
|
|
All Actions Taken
|
|
Actions Taken for a Specific Event
|
|
</pre>
|
|
<a name="all_actions_taken"></a>
|
|
<p><b>All Actions Taken</b></p>
|
|
<P>This option displays the actions that the SGI Embedded
|
|
Support Partner performed within the range of dates that you specified.
|
|
The report table provides the following information about actions that were taken for all events
|
|
between the selected range of dates:</P>
|
|
<pre>
|
|
Event Class The class in which the event belongs
|
|
for example, Availability)
|
|
|
|
Event Description A brief description of the event
|
|
|
|
Event ID The unique identification number
|
|
assigned to this event
|
|
|
|
Action Description A brief description of the action
|
|
|
|
Action Taken The action that SGI Embedded Support Partner
|
|
performed in response to the event
|
|
|
|
Time of Action The date and time that SGI Embedded Support Partner
|
|
performed the action
|
|
</pre>
|
|
<a name="specific_action_taken"></a>
|
|
<p><b>Actions Taken for a Specific Event</b></p>
|
|
<p>Use this option when you want to view actions taken for specific events.
|
|
Choose an event class that contains the event that you want to select.</p>
|
|
<a name="specific_action_taken1"></a>
|
|
<p>From the list of events, choose the event that you want to research.</p>
|
|
<P>The report table provides the following information about actions that were taken for the
|
|
specified event between the selected range of dates:</P>
|
|
<pre>
|
|
Action Description A brief description of the action
|
|
|
|
Action Taken The action that the SGI Embedded Support Partner
|
|
performed in response to the event
|
|
|
|
Time of Action The date and time that SGI Embedded Support Partner
|
|
performed the action
|
|
</pre>
|
|
|
|
<a name="report_diags_result"></a>
|
|
<p><hr>
|
|
<b>Diagnostics Results</b>
|
|
<p>This command displays the results of the diagnostics that you run on the system.
|
|
<p>You must specify the range of dates for which you want to view diagnostics results.
|
|
<p>The top portion of the diagnostic report contains the information that pertains
|
|
to the system from which you requested the report.
|
|
<p>The diagnostics results table provides the following information for all diagnostics that were run
|
|
on the system during the period of time that you specified:
|
|
<p>
|
|
<pre>
|
|
Diagnostic Name Contains the name of diagnostic.
|
|
In cases where multiple tests run as a group under
|
|
one program (for example, under SVP), the total
|
|
number of tests is indicated in parentheses next
|
|
to the name of the diagnostic:
|
|
|
|
SVP (86) means that 86 tests ran under
|
|
the SVP program.
|
|
|
|
Diagnostic Status Diagnostic status can be PASS, FAIL or COMPLETE.
|
|
|
|
PASS indicates that the diagnostic completed
|
|
successfully
|
|
FAIL indicates a failure occurred
|
|
|
|
COMPLETE indicates that multiple tests ran, and
|
|
one or more of them failed and others
|
|
completed successfully
|
|
|
|
Diagnostic Result
|
|
Time The time when the diagnostic test completed.
|
|
When multiple tests run under one program, the
|
|
Diagnostic Result Time indicates the time when
|
|
the entire program completed.
|
|
</pre>
|
|
<p>
|
|
<A NAME="sysinfo_availability"></A>
|
|
<hr width=100%>
|
|
<b>SYSTEM INFORMATION > Availability</b>
|
|
<P>This command displays system availability statistics. The upper portion of this page displays the total availability percentage
|
|
and the mean time between interrupts (MTBI) in minutes.</P>
|
|
<P>You must specify the range of dates and type of availability information that you want to view.
|
|
Two types of availability information are curently available:
|
|
<pre>
|
|
Overall Availability
|
|
Availability Events List
|
|
</pre>
|
|
<A NAME="screen_over_avail"></a>
|
|
<p><b>Overall Availability</b>
|
|
<p>The <b>Overall Availability</b> covers the aggregation of events for the given system.
|
|
Events are grouped as either "Unscheduled" or "Service Action" (controlled shutdown) events. Events are
|
|
further classified by categories within these two groups. For each category, overall availability report includes
|
|
the count of events in that category, the total downtime (in minutes), the MTBI (mean time between interrupts, in
|
|
minutes) and the availability as a percentage. MTBI and availability per category are computed for events within the
|
|
category as applied to the entire time period of the report. Count, total downtime, MTBI, and availability are also
|
|
displayed for the two groups, as well as the final total of all the events.</p>
|
|
<p>The average, least, and most uptimes and downtimes are also included in the report in addition to logging start time
|
|
and the duration of system uptime since the last boot.</p>
|
|
<P>The <b>Overall Availability</b> table summarizes the overall availability of the system:</P>
|
|
<UL>
|
|
<LI>Information about service actions (number, downtime, MTBI, and availability percentage)
|
|
<LI>Average uptime
|
|
<LI>Least uptime
|
|
<LI>Most uptime
|
|
<LI>Average downtime
|
|
<LI>Least downtime
|
|
<LI>Most downtime
|
|
<LI>The time at which availability monitoring was started
|
|
<LI>The time the last boot occurred
|
|
<LI>The amount of time that the system has been up
|
|
</UL>
|
|
|
|
<P>Use the <b>Event Availability Information</b> link at the bottom of the page
|
|
to access information about the individual availability events that the system has registered.</P>
|
|
|
|
<p><b>Event Availability Information</b></p>
|
|
<p>In the events list display, the fields shown are Start time (when the system was previously booted), the Incident Time,
|
|
when the event occurred, the uptime and downtime in minutes, and a very brief description of the event type or cause of the
|
|
event. The <b>Summary</b> displays the event information with more details, including a complete event type description.</p>
|
|
<a name="event_summary"></a>
|
|
<P>The report provides a summary of an event that includes the following information:</P>
|
|
<UL>
|
|
<LI>The hostname of the system
|
|
<LI>The reason for the shutdown
|
|
<LI>The time that the system was initially started
|
|
<LI>The time that the incident occurred
|
|
<LI>The time that the system was restarted after the incident occurred
|
|
<LI>The amount of time that the system was up before the incident occurred
|
|
<LI>The amount of time that the system was down because of the incident
|
|
</UL>
|
|
<p>If a system panic occurs, this report also includes a brief summary of why the system panicked.</p>
|
|
|
|
<A NAME="setup_intro"></A>
|
|
<hr width=100%>
|
|
<p><b>Setup > Introduction</b></p>
|
|
<p>Embedded Support Partner is a configuration driven system. From this
|
|
section, you can setup SGI Embedded Support Partner to suit your specific
|
|
needs. On the left is a menu consisting of various items organized
|
|
in groups each of which belongs to a specific component of SGI Embedded
|
|
Support Partner. A brief description of these components is given below.
|
|
A context sensitive help is also available for all applicable menu items
|
|
and can be viewed by selecting 'Help' button on the top right-hand corner
|
|
of the menu item. You can always view the current settings by selecting
|
|
'View Current Setup' item for any of the components.<br>
|
|
<ul>
|
|
<li>
|
|
<b>Global Setup</b> allows you to setup permissions for other systems to connect
|
|
remotely to SGI Embedded Support Partner. This can be accomplished
|
|
by selecting Server item. Global Configuration lets you modify
|
|
the behavior of SGI Embedded Support Partner Event Manager.<br> </li>
|
|
|
|
<li>
|
|
<b>Events Setup</b> lets you add new events, update existing events and delete
|
|
any unwanted events. You can also associate an action to an event
|
|
from a list of available actions. Please choose Actions setup to
|
|
add custom actions.<br> </li>
|
|
|
|
<li>
|
|
In <b>Actions Setup</b>, you can create new actions or update existing actions
|
|
or delete any unwanted actions.<br> </li>
|
|
|
|
<li>
|
|
The <a href="#setup_notification">Paging Setup</a> lets you configure QuickPage application to suit your
|
|
needs. Please note that in order for Paging Setup to take effect,
|
|
you must chkconfig quickpage on.<br> </li>
|
|
|
|
<li>
|
|
<a href="#setup_availmon">Availability Monitoring</a> Setup lets you configure availmon that is available
|
|
on your system. You can set the parameters of Availmon and you can
|
|
also setup the mailing lists to whom availmon will notify any system interrupts.<br> </li>
|
|
|
|
<li>
|
|
<b>Performance Monitoring</b> Setup lets you enable or disable performance monitoring
|
|
metrics.<br> </li>
|
|
</ul>
|
|
<p>Caution must be observed while changing any of the settings. If you
|
|
are in doubt, please read the help carefully before committing any changes.
|
|
You can also refer to SGI Embedded Support Partner User Guide for
|
|
more information.
|
|
|
|
<A NAME="setup_global_web_access_cfg"></A>
|
|
<p><b>SETUP > Global > Server</b>
|
|
<p>This command configures the Web server that SGI Embedded Support Partner uses.
|
|
Use this command to perform the following functions:</P>
|
|
<UL>
|
|
<LI>Display the current server port, version number, and identification information
|
|
<LI>Specify access privileges to the system via IP addresses
|
|
<LI>Change the username of the current Web server user
|
|
<LI>Change the password of the current Web server user
|
|
</UL>
|
|
|
|
<P>The upper portion of this page displays the following information:</P>
|
|
<pre>
|
|
Server Identification The name of the Web server software in use
|
|
|
|
Server Version The version level of the Web server
|
|
software and its installation date
|
|
|
|
Server Port The Web server connection port in use
|
|
</pre>
|
|
|
|
<P>The lower portion of this page displays the following selectable options:</P>
|
|
<pre>
|
|
Server Access Permissions Enables or restricts access
|
|
by external systems
|
|
|
|
Name & Password Change Enables you to change the current
|
|
username and password
|
|
</pre>
|
|
|
|
<A NAME="server_access_option"></A>
|
|
<p><b>Server Access Permissions</b>
|
|
<P>Use this page to specify which systems can access the SGI Embedded
|
|
Support Partner Web server. Any change that you make to the server access
|
|
list takes effect immediately.</P>
|
|
<p>You can specify the exact IP address or IP address mask using a wildcard. For example,
|
|
197.23.14.5, or 135.*.*.5, or *.*.*.*, and so on.</p>
|
|
|
|
<A NAME="user_name_change_option"></A>
|
|
<b>User Name and Password Change</b>
|
|
<P>Use this page to change a current username or password that enables access to
|
|
SGI Embedded Support Partner. Any change that you make to a username or
|
|
password takes effect immediately.</P>
|
|
<p>The username and password must each contain between 1 and 128 characters.
|
|
Characters like "*", "&", and ":" are not allowed in the username
|
|
and password strings.</p>
|
|
<p>The default username <b>administrator</b> and the default password
|
|
<b>partner</b> must be changed immediately after installation.</p>
|
|
|
|
<A NAME="setup_global_event_cfg"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Global > Global Configuration</b>
|
|
<P>An <I>event</I> is a happening or an occurrence that takes place on the
|
|
system that SGI Embedded Support Partner is monitoring. A few examples of
|
|
events follow: parity errors, disk full, nonmaskable interrupts (NMI), and
|
|
even activities of the SGI Embedded Support Partner itself.</P>
|
|
<P>Use this page if you want to reset the following parameters for all events on the system.
|
|
<UL>
|
|
<LI>The <b>Log events</b> parameter enables or disables global
|
|
event logging. Select <b>Log events</b> checkbox to log events in the SGI
|
|
Embedded Support Partner database. Deselect <b>Log events</b> if you
|
|
do not want to log events in the SGI Embedded Support Partner database.
|
|
You can disable event logging if you are not interested in the history of
|
|
events on the system.<p>
|
|
|
|
<LI>The <b>Throttle events</b> parameter enables or disables event
|
|
registration requirements for all events. Select <b>Throttle events</b> checkbox to require that a
|
|
specific number of events (a threshold) must occur before the event is registered in the
|
|
SGI Embedded Support Partner database. Deselect <b>Throttle events</b> checkbox to register
|
|
every event in the SGI Embedded System Partner database. Enable event throttling,
|
|
if you are not interested in every event of a particular type, but you are interested only
|
|
when this event occurs a specified number of times.<p>
|
|
<LI>The <b>Act on event</b> parameter enables or disables the capability of SGI Embedded Support
|
|
Partner to react (respond) to events. Select <b>Act on event</b> checkbox to specify that the
|
|
SGI Embedded Support Partner should respond (react) to all events.
|
|
Deselect <b>Act on event</b> checkbox to specify that the SGI Embedded Support Partner should
|
|
not respond (react) to any events.
|
|
<p>Note: Refer to the SETUP > Events and the SETUP > Actions menus for
|
|
additional information about events and actions.
|
|
</UL>
|
|
<p>Note: The Global Configuration setting will override individual event setting.</p>
|
|
|
|
<A NAME="setup_events_viewcurr"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Events > View Current Setup</b>
|
|
<P>Because the number of events can be extensive, events are divided into sets called <I>classes</I>. This scheme simplifies
|
|
the management of events, enables more efficient use of displays, and facilitates navigation within
|
|
the program.</P>
|
|
<p>The following options are available:</p>
|
|
<ul>
|
|
<li>View Event
|
|
<li>View Event List
|
|
<li>View Classes
|
|
</ul>
|
|
<a name="setup_view_event"></a>
|
|
<p><b>View Event</b>
|
|
<p>Use this option to determine the current setting of an individual event. This option allows you to view:
|
|
<ul>
|
|
<li>Event class ID and class name
|
|
<li>Event ID and event description
|
|
<li>Event registration
|
|
<Li>Number of events that must occur per registration
|
|
<li>Actions for the specified event
|
|
</ul>
|
|
<a name="setup_view_event_list"></a>
|
|
<p><b>View Event List</b></p>
|
|
<p>Use this option when you want to obtain a list of all events compatible with the SGI Embedded Support Partner.
|
|
The report allows you to view:</p>
|
|
<ul>
|
|
<li>Class Name
|
|
<li>Event Description
|
|
</ul>
|
|
<a name="setup_view_classes"></a>
|
|
<p><b>View Classes</b></p>
|
|
<p>Use this option when you want to view all classes available on the system. The report allows you to view:</p>
|
|
<ul>
|
|
<LI>Class ID
|
|
<LI>Class Name
|
|
</ul>
|
|
|
|
<A NAME="screens_setup_events_update"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Events > Update</b>
|
|
<p>Use this command to update (change settings for ) an existing event.
|
|
Only one event at a time can be updated using Ascii console.
|
|
|
|
<A NAME="screens_setup_events_update_settings"></A>
|
|
<p><b>SETUP > Events > Update > Change Settings</b>
|
|
|
|
<p>1. Set checkmark to enable the registration of chosen event with SGI Embedded Support Partner.
|
|
Remove checkmark to disable the registration of chosen event with SGI Embedded Support Partner.
|
|
|
|
<p>2. Enter the number of events that must occur before registration begins.
|
|
|
|
<p>3. Select <b>Accept</b> button to set your changes.
|
|
|
|
<p>4. Select <b>Change Action Settings</b> link to change the action(s) that will be taken upon the
|
|
occurrence of the chosen event.
|
|
|
|
<p>5. Select <b>Return to Update > Select Event page</b> link to select another event.
|
|
|
|
<A NAME="setup_events_update_event_actions"></A>
|
|
<p><b>SETUP > Events > Update Actions</b></p>
|
|
<P>An <b>event/action assignment</b> defines the action that the SGI Embedded
|
|
Support Partner performs when it registers a specific event. An <b>event/action</b>
|
|
is a cause-and-effect relationship between an event and an ensuing action.
|
|
Use this command to modify an event/action assignment; that is, to replace,
|
|
add, or delete event/action assignments.</P>
|
|
|
|
<p>In order to Update event/action relationship you must:
|
|
|
|
<p>1. Select the event for which you want to update the action assignment.</P>
|
|
<p>2. Select <b>Change Action Settings</b> link on <b>SETUP > Events > Update > Change Settings</b> page.
|
|
The list of actions that are curently available will be displayed.</p>
|
|
<p>3. Select actions that you want to be assign to chosen event.</p>
|
|
<p>4. Select <b>Accept</b> button to assign selected actions.</p>
|
|
<p>5. Select <b>Return to Update Event page</b> link to return to <b>SETUP > Events > Update > Change Settings</b> page</p>
|
|
|
|
<A NAME="setup_events_add"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Events > Add</b>
|
|
<P>Use this command to add new events for the SGI Embedded Support
|
|
Partner to monitor.
|
|
<pre>
|
|
To add the new event you must:
|
|
|
|
1. Using provided listbox, specifies the existing class
|
|
to which you want to add the new event
|
|
|
|
OR
|
|
|
|
Set checkmark, if you want to create the new class for this event,
|
|
and enter a new class name in the next input field.
|
|
|
|
<b>Note</b>. The checkmark must be removed in order to add
|
|
the new event into an existing class.
|
|
|
|
2. Enter a name for the new event
|
|
|
|
3. Specifies a description of the event that is shown in the interface
|
|
|
|
4. Set checkmark to enable the registration of this event with
|
|
SGI Embedded Support Partner
|
|
|
|
5. Enter the number of events that must occur before registration begins
|
|
|
|
6. Press <b>Accept</b> button to add the new event
|
|
|
|
OR
|
|
|
|
Press <b>Clear</b> button to clear fields and start from the beginning.
|
|
</pre>
|
|
<a name="setup_events_delete"></a>
|
|
<hr width=100%>
|
|
<p><b>SETUP > Events > Delete Custom Events</b></p>
|
|
<p>Use this command to delete custom event(s) from the SGI Embedded Support Partner.
|
|
All records and information associated with these classes/events will also be deleted.
|
|
Empty classes will be automatically deleted.
|
|
|
|
<pre>
|
|
In order to select event(s) to be deleted you must
|
|
|
|
Press <b>Show all custom events</b> button
|
|
to display the list of all custom events
|
|
|
|
OR
|
|
|
|
Choose the event class and
|
|
press <b>Show custom events for selected class</b> button
|
|
to display the list of all custom events for selected class.
|
|
</pre>
|
|
<a name="setup_events_delete1"></a>
|
|
<pre>
|
|
Set checkmarks for the event(s) that you want to delete.
|
|
|
|
Press <b>Delete Selected Events</b> button.
|
|
</pre>
|
|
|
|
<A NAME="setup_actions_viewcurr"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Actions > View Current Setup</b>
|
|
<P>Use this command to view the current configuration of actions. The following options are available:</P>
|
|
<pre>
|
|
View Action Setup Displays the configuration information
|
|
for a specific action
|
|
|
|
View Available Actions List Displays a table of all actions
|
|
that are currently available
|
|
</pre>
|
|
<A NAME="screens_action_setup"></A>
|
|
<p><b>View Action Setup</b></p>
|
|
<p>You must choose an action whose information you want to view.</p>
|
|
<p>This option allows you to view the following action information:</p>
|
|
<ul>
|
|
<li>Action command string - an exact command that will execute
|
|
<li>Action description - simple description of the action
|
|
<Li>Execute this action as nobody, guest, etc. (default = nobody)
|
|
<li>Action timeout (default = 600 seconds)
|
|
<li>Number of times the event must be registered before an action will be taken (default = 1)
|
|
<li>Retry times (default = 0)
|
|
</ul>
|
|
|
|
<A NAME="screens_avail_action"></A>
|
|
<p><b>View Available Actions List</b></p>
|
|
<P>This report displays all actions that are currently available.
|
|
The table includes the following information:</P>
|
|
<ul>
|
|
<li>Action order number
|
|
<li>Action command string - an exact command that will execute
|
|
<li>Action description - simple description of the action
|
|
</ul>
|
|
|
|
<A NAME="setup_actions_update"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Actions > Update</b>
|
|
<P>Use this command to update an existing action.</p>
|
|
<p>Select an action that you want to update. You can modify all of the action parameters, except the action description:</p>
|
|
<pre>
|
|
Actual action command string Specifies the command that action executes
|
|
|
|
A username to execute the action Specifies the user account that the SGI
|
|
as (Default = nobody) Embedded Support uses to execute the
|
|
command
|
|
|
|
|
|
Action timeout Specifies the time period for which the
|
|
action can run without being killed.
|
|
The value that you specify must be a
|
|
multiple of 5. (Default = 600 seconds)
|
|
|
|
The number of times that Specifies how many times the event must be
|
|
the event must be registered registered before the SGI Embedded Support
|
|
before an action will be taken Partner performs this action
|
|
|
|
|
|
The number of retry times Specifies the number of times that the SGI
|
|
Embedded Support Partner attempts to
|
|
execute the action before it stops.
|
|
The value cannot exceed 23; however, it is
|
|
not recommended to set it greater than 4.
|
|
</pre>
|
|
<a name="setup_action_example"></a>
|
|
<pre>
|
|
For example:
|
|
action to run is <b>diagnostic</b>
|
|
|
|
username to execute an action <b>nobody</b>
|
|
|
|
action timeout <b>3600</b>
|
|
|
|
the number of times that <b>5</b>
|
|
the event must be registered
|
|
before an action will be taken
|
|
|
|
the number of retry times <b>2</b>
|
|
</pre>
|
|
<p>This diagnostic will run after the event is registered in the SGI Embedded Support Partner database 5 times.
|
|
It will be executed with nobody privileges. If the diagnostic is still running after an hour (3600 seconds),
|
|
it will be killed and restarted a second time (retry times = 2).
|
|
|
|
<A NAME="setup_actions_add"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Actions > Add</b>
|
|
<P>Use this command to add a new action. The following options are available:</P>
|
|
<pre>
|
|
Action description Provides a description of the action.
|
|
Example: page to John Dow
|
|
|
|
Action command string Specifies the exact action command
|
|
to execute.
|
|
Example: /usr/bin/espnotify -p 1234567
|
|
|
|
Username to execute the action Specifies the user account that the SGI
|
|
as (default = nobody) Embedded Support uses to execute the
|
|
command. (Default = nobody)
|
|
|
|
Action timeout Specifies the time period for which the
|
|
action can run without being killed.
|
|
The value that you specify must be a
|
|
multiple of 5. (Default = 600 seconds)
|
|
|
|
The number of times an event Specifies how many times the event must
|
|
must be registered before an be registered before the SGI Embedded
|
|
action will be taken Support Partner performs this action.
|
|
|
|
The number of retry times Specifies the number of times that
|
|
the SGI Embedded Support Partner attempts
|
|
to execute the action before it stops.
|
|
The value cannot exceed 23; however,
|
|
it is not recommended that you set it
|
|
greater than 4.
|
|
</pre>
|
|
<pre>
|
|
For example:
|
|
action to run is <b>diagnostic</b>
|
|
|
|
username to execute an action <b>nobody</b>
|
|
|
|
action timeout <b>3600</b>
|
|
|
|
the number of times that <b>5</b>
|
|
the event must be registered
|
|
before an action will be taken
|
|
|
|
the number of retry times <b>2</b>
|
|
</pre>
|
|
<p>This diagnostic will run after the event is registered in the SGI Embedded Support Partner database 5 times.
|
|
It will be executed with nobody privileges. If the diagnostic is still running after an hour (3600 seconds),
|
|
it will be killed and restarted a second time (retry times = 2).
|
|
|
|
<P>Examples of notification options:</p>
|
|
<ul>
|
|
<li>/usr/bin/espnotify -E your_email@sgi.com,your_email2@sgi.com -e email_subject (email notification)
|
|
<li>/usr/bin/espnotify -p pager_id (pager notification)
|
|
<li>/usr/bin/espnotify -A message_string (display message on the console)
|
|
<li>/usr/bin/espnotify -D your_system_name:0.0 -c %D (graphical pop-up window with an event data in it)
|
|
</ul>
|
|
|
|
<p>For more information regarding notification options, refer to the <b>espnotify</b> man page.</p>
|
|
<p>The following list includes the accepted user format strings and any action-specific options:</p>
|
|
<ul>
|
|
<li>%C = event class
|
|
<li>%T = event type
|
|
<li>%D = event data (this is the data received along with the event.)
|
|
<li>%H = host name from which event originated
|
|
<li>%S = Event time stamp, time event occured (in seconds since Jan 1 1970)
|
|
<li>%F = forwarder hostname (in case of DSM.)
|
|
<li>%I = sys id
|
|
<li>%t = time string (current)
|
|
<li>%s = seconds since Jan 1 1970 (current)
|
|
<li>%m = current minute of the hour 0-59 (current)
|
|
<li>%M = current month of the year 0-11 (current)
|
|
<li>%h = current hour of the day 0-23 (current)
|
|
<li>%y = current year (current)
|
|
<li>%d = day of the month (current)
|
|
</ul>
|
|
<p>For example: /usr/bin/espnotify -D system_name.sgi.com:0.0 -c %D</p>
|
|
<p>This displays a window on the machine system_name.sgi.com. The window contains data that is significant to the event.</p>
|
|
|
|
<A NAME="setup_actions_delete"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Actions > Delete</b>
|
|
<P>Use this command to delete an action. Choose an action that you want to delete.</p>
|
|
<p>Note: The action will be deleted from the SGI Embedded Support Partner database. If this action is
|
|
assigned to some events, the list of all affected events is displayed. You have a choice to cancel or proceed with deletion.
|
|
Press <b>Yes</b> button to delete the action and remove the selected action from all events to which
|
|
it is assigned. To cancel operation return back to previous page.
|
|
|
|
<A NAME="setup_notification"></a>
|
|
<hr width=100%>
|
|
<b>SETUP > Paging</b>
|
|
<p>Use <b>espnotify</b> action to deliver a text/numeric message to a pager by specifying appropriate
|
|
command line options. You may obtain more information on espnotify by using the <b>man espnotify</b> command.</p>
|
|
<p>To work properly, paging has to be configured. The SGI Embedded Support Partner provides the User Interface to set required
|
|
configuration parameters. All the parameters are written to <b>/etc/qpage.cf</b> file.</p>
|
|
<p><b>Paging</b> requires that a modem be connected to the system to dial the paging service provider
|
|
to deliver a page. The Modem/Admin section enables modem configuration. The Service section enables configuration of the parameters
|
|
of the Paging Service Provider(s). Because the service provider normally identifies each individual pager by means of a pager ID
|
|
(which does not have to be the pager Touch-tone number), a pager ID must be provided in order to deliver the page. The Pager section
|
|
enables you to configure different pagers that are associated with the Service.</p>
|
|
<A NAME="setup_notification_viewcurr"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Paging > View Current Setup</b>
|
|
<P>Use this command to display the current values of the paging parameters
|
|
and the following types of information:</P>
|
|
<UL>
|
|
<LI><b>espnotify</b> Administration Variables
|
|
<LI>Modem Setup Parameters
|
|
<LI>Services Setup Parameters
|
|
<LI>Pager Setup Parameters
|
|
</UL>
|
|
<A NAME="setup_notification_modem"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Paging > Modem/admin</b>
|
|
<P>You can configure the following <b>Modem setup</b> parameters:</P>
|
|
<p><b>Modem name</b><br>
|
|
Specifies a unique name that the SGI Embedded
|
|
Support Partner uses to identify a modem. Entering an existing modem name will update the modem name.
|
|
No spaces are allowed.
|
|
|
|
<p><b>Modem device</b><br>
|
|
Specifies the device to which the modem is connected (for example, <b>/dev/ttya</b>)
|
|
|
|
<p><b>Modem initialization command</b><br>
|
|
Specifies the command that the SGI Embedded Support Partner should use to initialize the modem
|
|
before dialing the Service Provider. These initialization commands are modem specific and are available
|
|
in your modem manual. For example, many paging services require that error correction be turned off on your modem.
|
|
For some modems, this can be done by including &A0&K0&M0 in the modem initialization command
|
|
|
|
<P>You can configure the following <b>Administration Setup</b> parameters:</P>
|
|
<pre>
|
|
Administrator's e-mail address Specifies the e-mail address of
|
|
the person to contact if <b>Paging</b>
|
|
fails to deliver a pager
|
|
|
|
The time interval for retrying Specifies the amount of time that the
|
|
<b>espnotify</b> should wait between retries
|
|
</pre>
|
|
<A NAME="setup_notification_service"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Paging > Service</b>
|
|
<P>Use this command to set up information about a paging service.</P>
|
|
<P>You can configure the following parameters:</P>
|
|
|
|
<p><b>Service name</b><br>
|
|
Specifies the unique name that the SGI Embedded Support Partner uses to identify paging service provider.
|
|
Entering an existing service name will result update the service name. No spaces are allowed.
|
|
|
|
<p><b>Device</b><br>
|
|
Specifies the device (modem name) that the SGI Embedded Support Partner should use to dial the service provider.
|
|
Use <a href="#setup_notification_modem">SETUP > Paging > Modem/Admin</a> to set up any modems.
|
|
|
|
<p><b>Maximum number of retries</b><br>
|
|
Specifies the maximum number of times the SGI Embedded Support Partner should attempt to access this service
|
|
before it quits trying.
|
|
|
|
<p><b>Maximum length of the message</b><br>
|
|
Specifies the maximum number of characters that can be sent using this service. This depends on your service provider.
|
|
|
|
<p><b>Phone number of the paging service</b><br>
|
|
Specifies the IXO/TAP telephone number of the Service Provider. Do not confuse your pager's Touch-tone telephone
|
|
number with the service provider's IXO/TAP telephone number. They are never the same.<p>The telephone number
|
|
should contain at least 7 numbers and should not include any spaces, "-", or other symbols.
|
|
|
|
<A NAME="setup_notification_pager"></A>
|
|
<hr width=100%>
|
|
<b>SETUP > Paging > Pager</b>
|
|
<P>Use this command to set up a specific pager.</P>
|
|
<P>You can configure the following parameters:</P>
|
|
<pre>
|
|
Pager Name Specifies a unique name to identify this pager
|
|
|
|
Pager ID Specifies the ID that is used by your Paging
|
|
service provider to identify the pager.
|
|
The ID is not necessarily be the touch-tone
|
|
phone number that you dial to access the pager.
|
|
Please, contact your service provider to get
|
|
this information.
|
|
|
|
Service Name Specifies the paging service (service name) to which
|
|
<b>espnotify</b> should deliver the page for this pager
|
|
Use the <a href="#setup_notification_service">SETUP > Paging > Service</a>
|
|
to set up any paging services that you want to use</a>
|
|
</pre>
|
|
|
|
<A NAME="setup_availmon"></a>
|
|
<hr width=100%>
|
|
<b>SETUP > Availability Monitoring</b>
|
|
<p>The Availability Monitoring is a set of tools that collectivly monitor and report the availability of
|
|
system(s) and diagnosis of system crashes. Availability monitoring tools gather information from diagnostic
|
|
programs like ICRASH, FRU Analyzer, SYSLOG and identify the cause of system shutdowns. The system configuration
|
|
information comes from configmon, hinv and versions. Availability monitoring tools can report data to various locations
|
|
based on the Availability MailList setting.
|
|
|
|
<A NAME="setup_availmon_viewcurr"></a>
|
|
<hr width=100%>
|
|
<b>SETUP > Availability Monitoring > View Current Setup</b>
|
|
<P>Use this command to view the current values of the availability monitor parameters. It displays the
|
|
following information:</P>
|
|
<UL>
|
|
<LI>General availability monitoring parameters
|
|
<LI>Availability monitoring e-mail list parameters
|
|
</UL>
|
|
<A NAME="setup_availmon_configuration"></a>
|
|
<hr width=100%>
|
|
<b>SETUP > Availability Monitoring > Configuration</b>
|
|
<P>Use this command to set up the <font face="Courier"><TT>availability monitor</TT></font> component of
|
|
the SGI Embedded Support Partner.</P>
|
|
<P>You can configure the following parameters:</P>
|
|
<pre>
|
|
<b>Automatic e-mail distribution</b> (Enable or Disable)
|
|
Specifies whether <b>availability monitor</b> should
|
|
automatically distribute reports by e-mail.
|
|
|
|
<b>Display of shutdown reason</b> (Enable or Disable)
|
|
Specifies whether <b>availability monitor</b> should
|
|
display the reason for a shutdown
|
|
|
|
<b>Include HINV information into e-mail</b> (Yes or No)
|
|
Specifies whether <b>availability monitor</b> should
|
|
include HINV information in the diagnostic e-mail
|
|
messages that it generates.
|
|
|
|
<b>Capturing of important system messages</b> (Enable or Disable)
|
|
Specifies whether <b>availability monitor</b> should
|
|
capture important system messages.
|
|
|
|
<b>Start uptime daemon</b> (Yes or No)
|
|
Specifies whether <b>availability monitor</b> should
|
|
start the uptime daemon
|
|
|
|
<b>Number of days between status updates</b> (0 - 300,Default-60)
|
|
<b>Availability monitor</b>, using eventmond, periodically sends a status
|
|
report if the system is up for an extended period of time. This value
|
|
specifies the number of days after which a status report should be sent.
|
|
|
|
<b>Interval in seconds between uptime check </b> (User specified)
|
|
Specifies the number of seconds that <b>event monitor</b>
|
|
should wait before it performs an uptime check on the system.
|
|
(default = 300 seconds)
|
|
</pre>
|
|
<A NAME="setup_availmon_email"></a>
|
|
<hr width=100%>
|
|
<b>SETUP > Availability Monitor > e-mail List</b>
|
|
<P>Use this command to set up the e-mail lists for availability information reports.</P>
|
|
<P>You can set up e-mail lists for the following reports:</P>
|
|
<UL>
|
|
<LI>Availability report in text format
|
|
<LI>Availability report in compressed format
|
|
<LI>Availability report in compressed format (encrypted)
|
|
<LI>Diagnostic report in text format
|
|
<LI>Diagnostic report in compressed format
|
|
<LI>Diagnostic report in compressed format (encrypted)
|
|
<li>Pager report in concise text form
|
|
</UL>
|
|
<P>The availability report contains computed system availability metrics.</P>
|
|
<P>The diagnostic report includes all of the availability report data and
|
|
diagnostic data for troubleshooting.</P>
|
|
|
|
<a name="setup_performance_viewcurr"></a>
|
|
<hr width=100%>
|
|
<b>SETUP > Performance Monitoring > View Current Setup</b>
|
|
<p>All performance rules can be enabled or disabled via user interface.Use this command to display performance rules status.</p>
|
|
<p>The report table displays the following information:</p>
|
|
<ul>
|
|
<li>PMIE Rule Description
|
|
<li>PMIE Rule
|
|
<li>Status (enabled/disabled)
|
|
</ul>
|
|
|
|
<a name="setup_performance_config"></a>
|
|
<hr width=100%>
|
|
<b>SETUP > Performance Monitoring > Configuration</b>
|
|
<p>There is a set of rules available to set up for performance monitoring.</p>
|
|
<p>The table below provides a short description for each rule:</p>
|
|
|
|
<p><pre> <b>cpu.context_switch</b> High aggregate context switch rate</pre>
|
|
|
|
Average number of context switches per CPU per second exceeded threshold over the past sample interval.
|
|
|
|
<p><pre> <b>cpu.excess_fpe</b> Possible high floating point exception rate</pre>
|
|
|
|
This predicate attempts to detect processes generating very large
|
|
numbers of floating point exceptions (FPEs). Characteristic of
|
|
this situation is heavy system time coupled with low system call
|
|
rates (exceptions are delivered through the kernel to the process,
|
|
taking some system time, but no system call is serviced on the
|
|
application's behalf).
|
|
|
|
<p><pre> <b>cpu.load_average</b> High 1-minute load average</pre>
|
|
|
|
The current 1-minute load average is higher than the larger of
|
|
min_load and ( per_cpu_load times the number of CPUs ).
|
|
The load average measures the number of processes that are running,
|
|
runnable or soon to be runnable (i.e. in short term sleep).
|
|
|
|
<p><pre> <b>cpu.low_util</b> Low average processor utilization</pre>
|
|
|
|
The average processor utilization over all CPUs was below threshold
|
|
percent during the last sample interval.
|
|
This rule is effectively the opposite of cpu.util and is disabled by
|
|
default - it is only useful in specialized environments where, for
|
|
example, processing is batch oriented and low processor utilization
|
|
is indicative of poor use of system resources. In such a situation
|
|
the cpu.low_util rule should be enabled, and cpu.util disabled.
|
|
|
|
<p><pre> <b>cpu.syscall</b> High aggregate system call rate</pre>
|
|
|
|
Average number of system calls per CPU per second exceeded
|
|
threshold over the past sample interval.
|
|
|
|
|
|
<p><pre> <b>cpu.system</b> Busy executing in system mode</pre>
|
|
Over the last sample interval, the average utilization per CPU was
|
|
busy percent or more, and the ratio of system time to busy time
|
|
exceeded threshold percent.
|
|
|
|
<p><pre> <b>cpu.util</b> High average processor utilization</pre>
|
|
|
|
The average processor utilization over all CPUs exceeded threshold
|
|
percent during the last sample interval.
|
|
|
|
|
|
<p><pre> <b>craylink.node_cb_errs</b> CrayLink checkbit errors on Origin node</pre>
|
|
|
|
For some Origin 2000 node, at least one checkbit error was
|
|
observed on the node (CrayLink) interface and/or the I/O interface in the last sample interval. Use the command
|
|
<br><center>$ pminfo -f hinv.map.node</center> to discover the abbreviated PCP names of the installed nodes and
|
|
their corresponding full names in the <b>/hw</b> file system.
|
|
|
|
<p><pre> <b>craylink.router_cb_errs</b> CrayLink checkbit errors on Origin route</pre>
|
|
|
|
For some CrayLink router port, at least one checkbit error was
|
|
observed in the last sample interval. Use the command
|
|
<br><center>$ pminfo -f hinv.map.routerport</center>
|
|
to discover the abbreviated PCP names of the installed router ports
|
|
and their corresponding full names in the <b>/hw</b> file system.
|
|
|
|
|
|
<p><pre> <b>filesys.buffer_cache</b> Low buffer cache read hit ratio</pre>
|
|
|
|
Some filesystem read activity (at least min_lread Kbytes per
|
|
second of logical reads), and the read hit ratio in the buffer
|
|
cache is below threshold percent.Note: It is possible for the read hit ratio to be negative
|
|
more phsical reads than logical reads) - this can be as a result of:
|
|
<ul><li>XLV striped volumes, where blocks span stripe boundaries
|
|
<li>very large files, where the disk controller has to read
|
|
blocks indirectly (multiple block reads to find a single
|
|
data block result)
|
|
<li>file system read-ahead pre-fetching blocks which are not
|
|
subsequently read
|
|
</ul>
|
|
|
|
<p><pre> <b>filesys.dnlc_miss</b> High directory name cache miss rate</pre>
|
|
|
|
With at least min_lookup directory name cache (DNLC) lookups per
|
|
second being performed, threshold percent of lookups result in
|
|
cache misses.
|
|
|
|
|
|
<p><pre> <b>filesys.filling</b> File system is filling up</pre>
|
|
Filesystem is at least threshold percent full and the used space
|
|
is growing at a rate that would see the file system full within
|
|
lead_time.
|
|
|
|
|
|
<p><pre> <b>memory.exhausted</b> Severe demand for real memory</pre>
|
|
|
|
The system is swapping modified pages out of main memory to the
|
|
swap partitions, and has been doing this at the rate of at least
|
|
threshold pages swapped out per second for at least pct of the last
|
|
10 samples, ie. sustained page out activity.
|
|
|
|
|
|
<p><pre> <b>memory.swap_low</b> Low free swap space</pre>
|
|
|
|
There is only threshold percent swap space remaining - the system
|
|
may soon run out of virtual memory. Reduce the number and size of
|
|
the running programs or add more swap(1) space before it completely
|
|
runs out.
|
|
|
|
<p><pre> <b>network.buffers</b> Serious demand for network buffers</pre>
|
|
|
|
During the last sample interval the rate at which processes tried to
|
|
acquire network buffers (mbufs) and either failed or were stalled
|
|
waiting for a buffer to be freed is greater than threshold times per
|
|
second.
|
|
|
|
<p><pre> <b>network.tcp_drop_connects</b> High ratio of TCP connections dropped</pre>
|
|
|
|
There is some TCP connection activity (at least min_close
|
|
connections closed per minute) and the ratio of TCP dropped
|
|
connections to all closed connections exceeds threshold percent
|
|
during the last sample interval. High drop rates indicate either
|
|
network congestion (check the packet retransmission rate) or an
|
|
application like a Web browser that is prone to terminating TCP
|
|
connections prematurely, perhaps due to sluggish response or user
|
|
impatience.
|
|
|
|
|
|
<p><pre> <b>network.tcp_retransmit</b> High number of TCP packet retransmissions</pre>
|
|
|
|
There is some network output activity (at least 100 TCP packets per
|
|
second) and the average ratio of retransmitted TCP packets to output
|
|
TCP packets exceeds threshold percent during the last sample
|
|
interval. High retransmission rates are suggestive of network congestion, or
|
|
long latency between the end-points of the TCP connections.
|
|
|
|
<p><pre> <b>per_cpu.context_switch</b> High per CPU context switch rate</pre>
|
|
|
|
The number of context switches per second for at least one CPU
|
|
exceeded threshold over the past sample interval. This rule only applies to multi-processor systems, for
|
|
single-processor systems refer to the cpu.context_switch rule. For Origin 200 and Origin 2000 systems, use the command
|
|
<br><center>$ pminfo -f hinv.map.cpu</center> to discover the abbreviated PCP names of the installed CPUs and
|
|
their corresponding full names in the <font face="Courier"><tt>/hw</tt></font> file system.
|
|
|
|
<p><pre> <b>per_cpu.many_util</b> High number of saturated processors</pre>
|
|
|
|
The processor utilization for at least pct percent of the CPUs
|
|
exceeded threshold percent during the last sample interval. Only applies to multi-processor systems having more than min_cpu_count
|
|
processors - for single-processor systems refer to the cpu.util rule, for multi-processor systems with less than min_cpu_count
|
|
processors refer to the per_cpu.some_util rule.
|
|
|
|
<p><pre> <b>per_cpu.some_util</b> High per CPU processor utilization</pre>
|
|
|
|
The processor utilization for at least one CPU exceeded threshold
|
|
percent during the last sample interval. Only applies to multi-processor systems with less than max_cpu_count processors -
|
|
for single-processor systems refer to the cpu.util rule, and for multi-processor systems with more than max_cpu_count processors
|
|
refer to the cpu.many_util rule. For Origin 200 and Origin 2000 systems, use the command
|
|
<br><center>$ pminfo -f hinv.map.cpu</center>to discover the abbreviated PCP names of the installed CPUs and
|
|
their corresponding full names in the <font face="Courier"><tt>/hw</tt></font> file system.
|
|
|
|
<p><pre> <b>per_cpu.syscall</b> High per CPU system call rate</pre>
|
|
|
|
The number of system calls per second for at least one CPU
|
|
exceeded threshold over the past sample interval. This rule only applies to multi-processor systems, for
|
|
single-processor systems refer to the cpu.syscall rule. For Origin 200 and Origin 2000 systems, use the command
|
|
<br><center>$ pminfo -f hinv.map.cpu</center>to discover the abbreviated PCP names of the installed CPUs and
|
|
their corresponding full names in the <font face="Courier"><tt>/hw</tt></font> file system.
|
|
|
|
|
|
<p><pre> <b>per_cpu.system</b> Some CPU busy executing in system mode</pre>
|
|
|
|
Over the last sample interval, at least one CPU was active for
|
|
busy percent or more, and the ratio of system time to busy time exceeded threshold percent. Only applies to multi-processor
|
|
systems, for single-processor systems refer to the cpu.system rule. For Origin 200 and Origin 2000 systems, use the command
|
|
<br><center>$ pminfo -f hinv.map.cpu</center>to discover the abbreviated PCP names of the installed CPUs and
|
|
their corresponding full names in the <b>/hw</b> file system.
|
|
|
|
<p><pre> <b>per_disk.util</b> High per spindle disk utilization</pre>
|
|
|
|
For at least one spindle, disk utilization exceeded threshold percent during the last sample interval.
|
|
|
|
<p><pre> <b>per_netif.collisions</b> High collision rate in packet sends</pre>
|
|
More than threshold percent of the packets being sent across an
|
|
interface are causing a collision, and packets are being sent across the interface at packet_rate packets per second.
|
|
Ethernet interfaces expect a certain number of packet collisions, but a high ratio of collisions to packet sends is indicitive of a
|
|
saturated network.
|
|
|
|
|
|
<p><pre> <b>per_netif.errors</b> High network interface error rate</pre>
|
|
For at least one network interface, the error rate exceeded threshold errors per second during the last sample interval.
|
|
|
|
|
|
<p><pre> <b>per_netif.packets</b> High network interface packet transfers</pre>
|
|
|
|
For at least one network interface, the average rate of packet
|
|
transfers (in and/or out) exceeded the threshold during the last sample interval.
|
|
This rule is disabled by default because the per_netif.util rule is more generally useful as it takes into consideration each
|
|
network interfaces' reported bandwidth. However, there are some situations in which this value is zero, in which case an absolute
|
|
threshold-based rule like this one will make more sense (for this reason it should typically be applied to some network interfaces,
|
|
but not others - use the "interfaces" variable to filter this).
|
|
|
|
|
|
<p><pre> <b>per_netif.util</b> High network interface utilization</pre>
|
|
For at least one network interface, the average transfer rate (in
|
|
and/or out) exceeded threshold percent of the peak bandwidth of the
|
|
interface during the last sample interval.
|
|
|
|
|
|
<p><pre> <b>rpc.bad_network</b> RPC network transmission failure</pre>
|
|
More than threshold percent of sent client remote procedure call
|
|
(RPC) packets are timing out before the server responds and the
|
|
number of timeouts is significantly more than the number of duplicate
|
|
packets being received (indicating lost packets).
|
|
The networked file system (NFS) utilizes the RPC protocol for its
|
|
client-server communication needs. This high failure rate when sending
|
|
RPC packets may be due to faulty network hardware or inappropriately
|
|
sized NFS packets (packets possibly too large).
|
|
|
|
<p><pre> <b>rpc.slow_response</b> RPC server response is slow</pre>
|
|
|
|
More than threshold percent of sent client remote procedure call
|
|
(RPC) packets are timing out before the server responds and the
|
|
number of timeouts is roughly equivalent to the number of duplicate
|
|
packets being received.
|
|
The network file system (NFS) utilizes the RPC protocol for its
|
|
client-server communication needs. This high timeout rate when
|
|
sending RPC packets may be because the NFS server is processing
|
|
duplicate requests from the clients which were sent after the
|
|
original requests timed out.
|
|
|
|
|
|
<p><pre> <b>espping.response</b> System Group Manager slow service response</pre>
|
|
|
|
A service being monitored by the SGI Embedded Support Partner Group
|
|
Manager has taken more than threshold milliseconds to complete, during the last sample interval. The hosts parameter specifies
|
|
hosts running the ssping PMDA, not hosts being monitored by this PMDA. The latter are encoded in the "instances" for each
|
|
espping PMDA metric - run<br><center>$ pminfo -f espping.cmd</center>to list the instances and values for the espping.cmd metric.
|
|
|
|
|
|
<p><pre> <b>espping.status</b> System Group Manager service probe failure</pre>
|
|
|
|
A service being monitored by the SGI Embedded Support Partner Group
|
|
Manager has either failed, or not responded within a timeout period (as defined by espping.control.timeout) during the last sample
|
|
interval. The hosts parameter specifies hosts running the espping PMDA, not hosts being monitored by this PMDA. The latter are
|
|
encoded in the "instances" for each espping PMDA metric - run<br><center>$ pminfo -f espping.cmd</center>to list the instances
|
|
and values for the espping.cmd metric.
|
|
|
|
<A NAME="archive_database"></A>
|
|
<hr width=100%>
|
|
<p><b>Archive Database</b></p>
|
|
<P>Use the <b>Archive Database</b> command to delete a previously archived
|
|
database or to get instructions for archiving.</P>
|
|
<UL>
|
|
<LI>The <font face="Courier"><TT>Archive</TT></font> database option conserves disk space by compressing the
|
|
current database. The SGI Embedded Support Partner can continue to read the compressed data. To ensure data
|
|
integrity, you must execute the command <font face="Courier"><TT>esparchive</TT></font> from a command line.
|
|
All Embedded Support Partner daemons are shut down during this operation and will be automatically restarted
|
|
when archiving is completed. Archiving is possible only when the size of the current database is at least
|
|
10 megabytes; the compression mechanism will not work if you try to archive a smaller database.<p>
|
|
<LI>The <font face="Courier"><TT>Delete Database</TT></font> option deletes an archived database that you no
|
|
longer need.
|
|
</UL>
|
|
<p><HR NOSHADE SIZE="3">
|
|
</BODY>
|
|
</HTML>
|