Client: True Blue
Contract to hire
GC or US Citizen only
Look for local first.
Monitoring Engineer
Join the monitoring team in developing and maintaining our enterprise-grade platform. Dig deep into open-source tools both stalwart and cutting-edge, such as Nagios, Thruk, collectd, Graphite and Grafana. Provide full visibility into every corner of our environment, and help the rest of the organization make the best use of your efforts.
The monitoring engineer's essential duties include:
- Maintaining and upgrading a distributed architecture of multiple nagios and graphite backends tied together with Thruk and Grafana frontends
- A near-pathological aversion to monitoring noise, and a willingness to stamp it out wherever it rears its nasty little head
- An understanding of the monitoring ecosystem as a production-equivalent service, and a commitment to treat its uptime and day-to-day operations accordingly
- Working closely with development and support groups to determine what needs to be monitored at what levels, and assisting them in eventually taking ownership of the monitors themselves.
- Building new scripts and tools to act as Nagios plugins or collectd feeder scripts, using Perl, Python or Ruby
- Building new dashboards and alert boards in Thruk, Grafana and other tools, and training other groups to build their own
While required skills consist of:
- 5-7 years' experience as a unix systems administrator
- Solid understanding of Nagios: fully able to define new services, commands, etc.
- Familiarity with tools used for gathering and presenting performance metrics, such as Graphite, collectd or Cacti. Experience with Grafana is a big plus.
- Solid scripting abilities with Perl, Python or Ruby, ideally including experience writing Nagios plugins in any of those languages
- Ability to troubleshoot Perl scripts at a minimum
- General understanding of essential network communications and protocols. Ability to troubleshoot tcp or udp connections or parse web pages with command-line tools.
General understanding of snmp: Able to perform an snmpwalk against a network or storage device and understand the results, know the difference between an snmp get request and a trap.