System Monitoring

In mission critical environments, each systems’ health is important in sustaining the overall functionality and dependability of the system. This includes the health of the network, local and remote servers, and the applications running on each of the servers. It is critical to know where maintenance is needed, which servers are stressed from load and resources, internal status of CPU utilitization, memory swapping and paging and internal disk status. Monitoring all of these situations for every asset can be overwhelming without some visuals for reviewing the health of the overall situation and the individual components. Primate has a great solution through joining the vast library of interfaces for data retrieval with the GridGuardian product and the visualization techniques of the BlackBoard product.

GridGuardian allows for reports to be generated and BlackBoard provides the real-time displays to show current status and conditions. Based upon situations, GridGuardian has rules to automatically notify support personnel, launch external programs, execute JavaScript code, and/or stream sound files through the server’s sound card to be sure that appropriate action occurs based upon the health of the systems.

Although executing as standalone products, GridGuardian and BlackBoard are delivered with an application program interface (API) so administrators and programmers may easily expand the capabilities and bi-directional communication with other Information Technology Systems.

Many tools and applications exist to monitor the health of a system; however, few (if any) monitor at a application level across multiple supplier systems. For example, the North American Electric Reliability Corporation’s (NERC’s) investigation into the August 14, 2003 blackout identified several issues as problems or potential problems for all North American utilities as related to the failure of an application on a computer system. There was insufficient awareness by operations and support of the computer system’s failure. NERC asked utilities to “establish an automated method to alert power system operators and technical support personnel when power system status indications are not current, or that alarms are not being received or annunciated.” Today, as a part of a NERC readiness audit, these issues are explored to determine how the electric utility handles this level of monitoring. GridGuardian is an excellent independent and unprejudiced solution complimentary to existing supervisory control and data acquisition (SCADA) platforms to determine the health of all the applications.