graphiteutils: input and output for Graphite

What is it?

This is the project home page for my collection of utilities to work with graphite.

BETA(-) status: they are underdocumented and do not have any significant negative testing, but they are currently working in a production environment

There are two primary groupings of graphite utilities:

  • xymon: A Xymon script which sets Xymon column status based on Graphite data
  • probes: system collection tools for linux to feed graphite

(Additionally, there is sysv init script to run the graphite servers.)

How to download

These tools to not yet have a distribution package. You can get the from GitHub at https://github.com/dmsasser/graphiteutils

Bugs, patches and suggestions

Please use GitHub to report bugs or suggest improvements.

Patches will be accepted through GitHub or via email in Git format to dewey@deweysasser.com.

Xymon Adapter

One of the tools easy to use Xymon script. This is used by causing it to run periodically (Xymon's tasks.d directory is an ideal way to do this) whereupon it will consume a data file (~xymon/server/etc/graphite.cfg), process data from graphite and set Xymon column colors per the results, including links to the graphs showing why the status was set.

Probes

As there are tons of probes out there, why did I write my own?

Several reasons:

  • Maximize easy of deployment and management (and minimize run-time dependencies)
  • allow fine grain control over data collected
  • perform no collection-side data processing

Graphite can quickly overwhelm a machine's ability to swallow data. 700 machines * 300 metrics every minute is a lot of IOPs on the hard drives. (Specifically, about 6500 metrics/minute results in around 177 writes/second on one of my graphite servers) There were a number of graphite probes that either collected everything under the sun or only the basics.

Advantages of this set of probes:

  • they depend only on a standard PERL installation
  • they collect "enough but not too much" data about machine health
  • there is some level of granular control over what is collected/reported to allow you to manage load on a graphite server
  • they are simple (cron-friendly) to run and (except for the ping time probe) do not require process management.
  • they are designed for trivial management.
  • most of these probes do NOT have to run as root

Disadvantages:

  • they currently send data directly to a graphite server and cannot use statsd. (This is planned and needed for the future.)

Some people (I'm looking at you, Jim) might also count the PERL implementation a disadvantage as well. Such a discussion is beyond the scope of this page.

Probes can be added/removed simply by including them in a directory

List of probes:

  • graphite-probe-linux: Report general Linux stats such as system, disk, network and RAM load.
  • graphite-log: various syntax (syslog, Apache, fail2ban) log processing to collect metrics. This is similar to Etsy's Logster save that it meets my needs a bit better.
  • graphite-probe-sensors: report the values from Linux "sensors" output
  • graphite-probe-smartmon: report some drive statistics given by smartmontools
  • graphite-exim: basic statistics about the Exim mail server
  • graphite-probe-disk-latency: stats similar to iostat, but does NOT use the sysstat package
  • graphite-probe-nfs-server: basic information about NFS server stats
  • graphite-probe-vmware-server: collect per VM CPU and IO consumption from VMware Server and Workstation on Linux hosts. WARNING: stats gathered at this level should be used for relative comparisons and do NOT sum to the actual system load.

Other programs:

  • graphite-run-probes: A master script which manages and runs the above probes
  • graphite-ping: the only probe that runs on a continuous basis, uses long-running ping to measure ping times and packet losses.

Tags: