Synopses & Reviews
Written by Ganglia designers and maintainers, this book shows you how to collect and visualize metrics from clusters, grids, and cloud infrastructures at any scale. Want to track CPU utilization from 50,000 hosts every ten seconds? Ganglia is just the tool you need, once you know how its main components work together. This hands-on book helps experienced system administrators take advantage of Ganglia 3.x.
Learn how to extend the base set of metrics you collect, fetch current values, see aggregate views of metrics, and observe time-series trends in your data. Youll also examine real-world case studies of Ganglia installs that feature challenging monitoring requirements.
- Determine whether Ganglia is a good fit for your environment
- Learn how Ganglias gmond and gmetad daemons build a metric collection overlay
- Plan for scalability early in your Ganglia deployment, with valuable tips and advice
- Take data visualization to a new level with gweb, Ganglias web frontend
- Write plugins to extend gmonds metric-collection capability
- Troubleshoot issues you may encounter with a Ganglia installation
- Integrate Ganglia with the sFlow and Nagios monitoring systems
Contributors include: Robert Alexander, Jeff Buchbinder, Frederiko Costa, Alex Dean, Dave Josephsen, Peter Phaal, and Daniel Pocock. Case study writers include: John Allspaw, Ramon Bastiaans, Adam Compton, Andrew Dibble, and Jonah Horowitz.
With Ganglia, you can monitor the performance of several deployment scenarios—but this tools strength can also be a weakness if you dont know how all its pieces work together. This book shows you how to configure Ganglia to monitor clusters, grids, or cloud infrastructures at very large scales on the order of thousands of machines.
Experienced users will get up to speed on the latest Ganglia release (3.x), including several recent features, such as sFlow support and Ganglias new web frontend. Youll learn how to extend the base set of metrics you collect, fetch current values, see aggregate view of metrics, and look at time-series trends in your data.
About the Author
Matt Massie open-sourced Ganglia in 2000 while working as a Staff Researcher at the University of California, Berkeley. He designed ganglia to monitor a shared computational grid of clusters distributed across the United States for scientific research. In 2010, he contributed a chapter on cluster monitoring for the O'Reilly book "Web Operations: Keeping the Data On Time" by John Allspaw and Jesse Robbins. Matt is currently a software engineer at Cloudera focused on Apache Hadoop enterprise management and monitoring.
Bernard Li is a High Performance Computing (HPC) Systems Engineer at Lawrence Berkeley National Laboratory. He is currently one of the maintainers of the Ganglia project. He has been involved with HPC since 2003 and has worked on Open Source projects such as OSCAR, SystemImager and Warewulf.
Vladimir Vuksan (Broadcom) has worked in technical operations, systems engineering and software development for over 15 years. Prior to Broadcom he has worked at Mocospace, Rave Mobile Safety, Demandware, University of New Mexico implementing high availability solutions and building tools to make managing and running infrastructure easier.
Table of Contents
Preface; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Chapter 1: Introducing Ganglia; 1.1 It's a Problem of Scale; 1.2 Hosts ARE the Monitoring System; 1.3 Redundancy Breeds Organization; 1.4 Is Ganglia Right for You?; 1.5 gmond: Big Bang in a Few Bytes; 1.6 gmetad: Bringing It All Together; 1.7 gweb: Next-Generation Data Analysis; 1.8 But Wait! That's Not All!; Chapter 2: Installing and Configuring Ganglia; 2.1 Installing Ganglia; 2.2 Configuring Ganglia; 2.3 Postinstallation; Chapter 3: Scalability; 3.1 Who Should Be Concerned About Scalability?; 3.2 gmond and Ganglia Cluster Scalability; 3.3 gmetad Storage Planning and Scalability; Chapter 4: The Ganglia Web Interface; 4.1 Navigating the Ganglia Web Interface; 4.2 The gweb Search Tab; 4.3 The gweb Views Tab; 4.4 The gweb Aggregated Graphs Tab; 4.5 The gweb Compare Hosts Tab; 4.6 The gweb Events Tab; 4.7 The gweb Automatic Rotation Tab; 4.8 The gweb Mobile Tab; 4.9 Custom Composite Graphs; 4.10 Other Features; 4.11 Authentication and Authorization; Chapter 5: Managing and Extending Metrics; 5.1 gmond: Metric Gathering Agent; 5.2 Base Metrics; 5.3 Extended Metrics; 5.4 Extending gmond with Modules; 5.5 Extending gmond with gmetric; 5.6 How to Choose Between C/C++, Python, and gmetric; 5.7 XDR Protocol; 5.8 Java and gmetric4j; 5.9 Real World: GPU Monitoring with the NVML Module; Chapter 6: Troubleshooting Ganglia; 6.1 Overview; 6.2 Useful Resources; 6.3 Monitoring the Monitoring System; 6.4 General Troubleshooting Mechanisms and Tools; 6.5 Common Deployment Issues; 6.6 Typical Problems and Troubleshooting Procedures; Chapter 7: Ganglia and Nagios; 7.1 Sending Nagios Data to Ganglia; 7.2 Monitoring Ganglia Metrics with Nagios; 7.3 Displaying Ganglia Data in the Nagios UI; 7.4 Monitoring Ganglia with Nagios; Chapter 8: Ganglia and sFlow; 8.1 Architecture; 8.2 Standard sFlow Metrics; 8.3 Configuring gmond to Receive sFlow; 8.4 Host sFlow Agent; 8.5 Troubleshooting; 8.6 Using Ganglia with Other sFlow Tools; Chapter 9: Ganglia Case Studies; 9.1 Tagged, Inc.; 9.2 SARA; 9.3 Reuters Financial Software; 9.4 Lumicall (Mobile VoIP on Android); 9.5 Wait, How Many Metrics? Monitoring at Quantcast; 9.6 Many Tools in the Toolbox: Monitoring at Etsy; Advanced Metric Configuration and Debugging; Module Metric Definitions; Advanced Metrics Aggregation and You; rrdcached; Debugging with gmond-debug; Ganglia and Hadoop/HBase; Introducing Hadoop and HBase; Configuring Hadoop and HBase to Publish Metrics to Ganglia; Colophon;