Icinga in action – Monitoring at AudiSurface Transport & Logistics
Monitoring en masse at Audi, Germany. Icinga monitors more than 10,000 hosts and 50,000 services at the automotive giant.
Just as smooth as a ride in an Audi, so should the production process be. Coordinating components owing through conveyor belts ‘just-in-time’ to meet their corresponding parts is a complicated dance. That’s where ICINGA has been able to assist, ensuring the right chassis meets the right engine to form its custom manufactured whole. Multiply this process to hundreds of conveyor belts across 7 international production sites, and you have a distributed monitoring system to the tune of 10,000 + hosts and over 50,000 services.
Audi was in search of an infinitely scalable and flexible monitoring system, which could easily be distributed for high availability.
In Audi’s state of the art manufacturing facilities, components with unique IDs move between multiple assembly lines to meet their counterparts ‘just-in-time’. Synchronizing these alongside robot arms and human hands that create their customized cars is a wonder of millisecond accurate coordination. Monitoring it all is no walk in the park.
The project would require the proprietary Tivoli system to be replaced, and monitor 10,000+ hosts and over 50,000 services distributed over 4 sites. Not only was Icinga implemented in a failover setup, but to tackle the configuration of the mammoth monitoring environment LConf was also born.
With high availability and extensibility in mind, ICINGA was implemented as a master with three slave satellite clusters in geographically disparate locations. The master collated passive check results from the slave clusters which monitored Audi’s production sites in Germany (Ingolstadt, Neckarsulm). These results were then graphically displayed in ICINGA Web. With this design, the entire monitoring system could be easily managed by just two Audi administrators via the central master instance.
Each satellite consisted of two HP DL380 servers with 8GB RAM and two quad-core processors in a high availability cluster, which could juggle the load of thousands of services between each other. Should one server fail, the other would automatically take over. This was also possible in the case of the satellites themselves, where hosts monitored by one satellite could be assigned to another with a few mouse clicks.
To enable Audi’s Control Center to continue to use their familiar infrastructure manager CA Spectrum, ICINGA was integrated to forward all alerts to the existing system. The alerts were enriched with additional information to improve their speed and accuracy, attributed as custom host and service variables in ICINGA Web. Thanks to the easily customizable views and Cronks in ICINGA, many hosts and services could be viewed at a glance, from many perspectives. Either as business processes, middleware components, production sites or as assembly lines, views were only restricted by the user’s rights. Because authorization could be configured to the detail of individual hosts and services, views were even more user-tailored.
DRAG ‘N’ DROP MASS CONFIGURATION
Configuring the 10,000 or so hosts and 50,000 services from so many locations required a special solution. With the help of ICINGA professionals, LConf was born. The LDAP tree directory coupled with a Perl script, provided a user friendly back end to configure ICINGA objects. While LDAP offered a structured, graphical overview of the entire IT environment, LConf automatically generated executable ICINGA configurations with a few clicks. Where services reappeared throughout a cluster, they could simply be attributed to all hosts by moving their position higher up the tree with a drag ‘n’ drop. LConf saved weeks of tedious configuration work and would come in handy for future changes. As with all good open source innovations, the tool that Audi inspired was released to the community.