AlertingLPAR2RRD has build-in alerting feature. This is not implemented for VMware yet, only IBM Power Systems.
You can define alarms based on performance data for any CPU pool or LPAR in your environment.
|ALL servers||Paging||MB/sec||OS agent|
|server||CPU||CPU core or %||HMC|
|CPU POOL||CPU||CPU core or %||HMC|
|LPAR||CPU||CPU core or %||HMC|
|LPAR||CPU OS||%||OS agent|
|LPAR||Memory used||%||OS agent|
|LPAR||Paging allocation||%||OS agent|
|LPAR||SAN response time||mili seconds||OS agent|
- Emailing. You can place direct email address on each directive, use email groups or default email address.
Nagios support. You can configure Nagios to pick up alarms from LPAR2RRD via standard NRPE module.
LPAR2RRD Nagios plug-in installation
- External alerting via external shell script. Each alert can invoke defined script with given parameters. You can use it for your integration needs.
- SNMP trap: it is implemented since 4.96, follow this to configure it
- Alert plug-ins to other monitoring tools can be developed on demand especially for customers under support contract
Configuration v5.00+Since version 5.00 is all configuration managed through the GUI ➡ IBM Power ➡ Alerting
Upgrade to 5.00 converts all already defined alerts into a new format, do not more use procedure below in 5.00+.
Configuration in older versionsAlerting thresholds
Example shows a rule for CPU pool of server p795 which issue an aleart when CPU pool utilization overcome 10 cores or goes below 1 core.
percentage of maximum CPOU utilization what can CPU pool or LPAR reach. It is CPU pool maximum cores in a pool or in case of LPAR it is number of logical (virtual) CPUs.
This is supported since LPAR2RRD version 4.80.
Example shows a rule for CPU pool of server p795 which issue an alert when CPU pool utilization overcome 80% of maximal utilization or goes below 5%.
This is configurable
- CPU maximum and minimum for alert issuing in CPU cores or in percentage (place % sign after the value)
- Time of CPU peak. When CPU average utilization is in given time in average above the limit
- You can create different email groups and direct alarms to them
- CPU warning in percentage of CPU Critical alarm
- Alert retention. Time between alerting of the same issue
- Create configuration file
(upgrade process creates configuration file automatically, so you might skip this)
$ cd /home/lpar2rrd/lpar2rrd $ ./scripts/update_cfg_alert.shit creates this configuration file: etc/alert.cfg
- edit ./etc/alert.cfg and configure alerts
place into crontab following script:
0,10,20,30,40,50 * * * * /home/lpar2rrd/lpar2rrd/load_alert.sh > /home/lpar2rrd/lpar2rrd/load_alert.out 2>&1
Check whether emailing is working from LPAR2RRD hosted server
Replace your_addr\@lpar2rrd.com by your email, place "\" before "@":
perl -le 'print "To: your_addr\@lpar2rrd.com\nSubject: LPAR2RRD test\n\nJust a test\n\n"'|/usr/sbin/sendmail -t
when you want to refresh list of servers/pools/lpars within alert.cfg then just run again:
Note: If there are configured hundreds of lpars or CPU pools for alerting then it might have impact on performance of the HMCs.
After each big change in alert configuration run ./load_alert.sh from the cmd line to find out typical run-time duration (it is printed out at the end).
It should not be too close of the time range when it is scheduled from crontab (10minutes typically, we do not recommend less)
Once you have installed and configured OS agents then you might configure alerting for paging activity.
You need at first configure alerting generally if you do not use it yet:
Follow Alerting install instructions
Then edit etc/alert.cfg:
$ vi etc/alert.cfg #SWAP:server:lpar name:swapping in kB/sec::peek time in min:alert repeat time in min:email group #======================================================================================================================== SWAP:.*:.*:10:::firstname.lastname@example.orgAbove example will alert for every server and lpar if paging goes above 10kByes per second in 10 minute average.
Alerts will be send to email email@example.com