Free performance monitoring for VMware™ and IBM Power Systems™

Alerting

LPAR2RRD has build-in alerting feature. This is not implemented for VMware yet, only IBM Power Systems.
You can define alarms based on performance data for any CPU pool or LPAR in your environment.

Metrics

objectmetricvaluedata source
ALL servers Paging MB/sec OS agent
server CPU CPU core or % HMC
CPU POOL CPU CPU core or % HMC
LPAR CPU CPU core or % HMC
LPAR CPU OS % OS agent
LPAR Memory used % OS agent
LPAR Paging MB/sec OS agent
LPAR Paging allocation % OS agent
LPAR LAN MB/sec OS agent
LPAR SAN MB/sec OS agent
LPAR SAN IOPS OS agent
LPAR SAN response time mili seconds OS agent

Alerting possibilities
  • Emailing. You can place direct email address on each directive, use email groups or default email address.
  • Nagios support. You can configure Nagios to pick up alarms from LPAR2RRD via standard NRPE module.
    LPAR2RRD Nagios plug-in installation
  • External alerting via external shell script. Each alert can invoke defined script with given parameters. You can use it for your integration needs.
  • SNMP trap: it is implemented since 4.96, follow this to configure it
  • Alert plug-ins to other monitoring tools can be developed on demand especially for customers under support contract

Configuration v5.00+

Since version 5.00 is all configuration managed through the GUI ➡ IBM Power ➡ Alerting
Upgrade to 5.00 converts all already defined alerts into a new format, do not more use procedure below in 5.00+.

Configuration in older versions

Alerting thresholds
  • CPU core
    Example shows a rule for CPU pool of server p795 which issue an aleart when CPU pool utilization overcome 10 cores or goes below 1 core.
    POOL:p795:all_pools:10:1:::
    
  • percentage of maximum CPOU utilization what can CPU pool or LPAR reach. It is CPU pool maximum cores in a pool or in case of LPAR it is number of logical (virtual) CPUs.
    This is supported since LPAR2RRD version 4.80.
    Example shows a rule for CPU pool of server p795 which issue an alert when CPU pool utilization overcome 80% of maximal utilization or goes below 5%.
    POOL:p795:all_pools:80%:5%:::
    

This is configurable
  • CPU maximum and minimum for alert issuing in CPU cores or in percentage (place % sign after the value)
  • Time of CPU peak. When CPU average utilization is in given time in average above the limit
  • You can create different email groups and direct alarms to them
  • CPU warning in percentage of CPU Critical alarm
  • Alert retention. Time between alerting of the same issue
  • Create configuration file
    (upgrade process creates configuration file automatically, so you might skip this)
    $ cd /home/lpar2rrd/lpar2rrd
    $ ./scripts/update_cfg_alert.sh
    
    it creates this configuration file: etc/alert.cfg
  • edit ./etc/alert.cfg and configure alerts
  • place into crontab following script:
    0,10,20,30,40,50 * * * * /home/lpar2rrd/lpar2rrd/load_alert.sh > /home/lpar2rrd/lpar2rrd/load_alert.out 2>&1
    
  • Check whether emailing is working from LPAR2RRD hosted server
    Replace your_addr\@lpar2rrd.com by your email, place "\" before "@":
    perl -le 'print "To: your_addr\@lpar2rrd.com\nSubject: LPAR2RRD test\n\nJust a test\n\n"'|/usr/sbin/sendmail -t
    
  • when you want to refresh list of servers/pools/lpars within alert.cfg then just run again:
    $ ./scripts/update_cfg_alert.sh

Note: If there are configured hundreds of lpars or CPU pools for alerting then it might have impact on performance of the HMCs.
After each big change in alert configuration run ./load_alert.sh from the cmd line to find out typical run-time duration (it is printed out at the end).
It should not be too close of the time range when it is scheduled from crontab (10minutes typically, we do not recommend less)

Paging/Swapping alerting

    Once you have installed and configured OS agents then you might configure alerting for paging activity.
    You need at first configure alerting generally if you do not use it yet:
    Follow Alerting install instructions

    Then edit etc/alert.cfg:
    $ vi etc/alert.cfg
    
    #SWAP:server:lpar name:swapping in kB/sec::peek time in min:alert repeat time in min:email group
    #========================================================================================================================
    SWAP:.*:.*:10:::lpar2rrd@lpar2rrd.com
    
    Above example will alert for every server and lpar if paging goes above 10kByes per second in 10 minute average.
    Alerts will be send to email lpar2rrd@lpar2rrd.com