I’m currently working on F5 BIG-IP SCOM
Management Pack. Comtrade, a company I work for, already has a BIG-IQ SCOM Management Pack, which was very well accepted by F5
product management and customers involved in Beta Program. Encouraged by that fact
and market demand, we also decided to build a separate BIG-IP SCOM Management
Pack. Early Access version will be released in the following weeks.
Let’s start this blog series by describing the
main problems I’m tackling.
Purpose of BIG-IP device is to provide optimal
access to applications (such as Microsoft Exchange, Lync and SharePoint), both from
the user’s point of view and application host’s point of view. So, we can say that the application is the
king. Let’s make sure users can access these applications anytime!
F5 BIG-IP, as a device, can become unavailable,
or start behaving unexpectedly. This can be due to a number of hardware related
reasons: high CPU usage at some periods, no free storage, expired certificates,
expired license and interfaces being down. In these cases, all application
services configured on a specific BIG-IP may become unavailable, experience
performance issues and have unexpected side-effects.
It is possible to configure BIG-IPs to send
email notifications when certain events occur on the BIG-IP system, but it’s by
no means trivial, as it is difficult to use this information for reporting and
SLA dashboard purposes. It would be better (better as in less work to
configure, check health and identify device having a problem and problem’s root
cause) if when incident is about to happen (i.e. storage almost full,
certificate due to expire, license due to expire, CPU working at high usage
level for prolonged times), we can proactively, without specific per-device
configuration, be warned about our applications that are in danger. This
warning would, of course, come with enough information about the threat and
this information would be collected to a single point.
There may be situations when a device becomes
unavailable (i.e. stuck during reboot, disconnected from network etc.) and in those
cases, in order to optimize your applications availability, it’s important to
be aware of the issue, with all the relevant details as soon as possible.
If something might go wrong, we want to know
that before it becomes an issue. How this sounds? Do you encounter these
problems? What do you think about:
- Monitoring BIG-IP device availability?
- Making sure applications are working?
- Navigating multiple devices in order to understand where application issues are?