Monitoring your servers with free tools is easy

Written by Christophe Limpalair on 11/10/2015

"Scaling isn't just performance, monitoring is also a big piece of it" - Steve Corona

UPDATE: added DataDog.

In case you missed my interview about scaling with Steve Corona, Steve says that monitoring your stack is incredibly important.

If something doesn't behave properly, you need to be able to pinpoint exactly where, why, and when. Even if it is running fine, you can find services that are using more resources than they need to and fixing that can help keep your server stay up longer.

To help you navigate the scary seas of monitoring, I set out on a mission to test a few out and give you a report.

(I’m sticking to the free ones for now)

DataDog

DataDog is a cloud monitoring service which has been mentioned a few times on the show. I wanted to check it out and see what the fuss was all about.

Turns out that I like it quite a bit. Since it's extremely easy to set up and it has a free tier, I figured it would make for a great addition to this article.

Let's walk through the process of setting it up to track server performance, and then I'll touch on monitoring other parts of your infrastructure (like Nginx, Redis, your app, etc...)

All you really have to do to get started is:

  1. Sign up
  2. Copy/paste a command that they give you, depending on your platform
  3. Configure your dashboard to your liking
  4. Connect other services if you want

It doesn't get any easier than that. Let's do it.

After signing up, you'll be taken to page that gives you installation instructions.



Say you're running Ubuntu, your command will look something like this:

DD_API_KEY=random_key_here bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/dd-agent/master/packaging/datadog-agent/source/install_agent.sh)"



Wait a few moments, and you will start receiving metrics in your dashboard.

Now, you can go in and create custom dashboards or integrate other services to monitor. Let's create a custom dashboard first.

If you're not already there, go to your main app control panel. Then click on Dashboards and Create Dashboard.

You can have multiple different dashboards to track certain aspects of your system, or multiple different systems. We're going to stick to one dashboard right now. Go ahead and add a few graphs and play around with that.



If you're not sure what some of the metrics are, like system.io.w_s, check out this explanation from one of DataDog's Software Engineers.

You can also have multiple metrics in one graph. For example, in the picture above, I have the amount of free and used memory in the same graph.


There are also different graphs to choose from. Seems like Timeseries is the best choice for showing one or more metrics over time. The others are better for showing metrics aggregated across many tags (like hosts). Read more here.

Reporting more services to DataDog

Now that we are monitoring our system(s), let's add our services which are running inside of those systems. I'm talking about things like MySQL, Redis, or even your application.

In your control panel, click on Integrations. As you can see, they have quite a few available to choose from.

Let's start with Redis, because it's super easy ;).

If you click on it, it will actually give you all of these instructions, but I'll write them here for reference.

You need to edit the Redis configuration for the DataDog agent. Configuration files for integrations are located in /etc/dd-agent/conf.d as mentioned here.

I just made a copy of the sample and renamed it to redisdb.yaml, with the following settings:

init_config:

instances:
- host: localhost
port: 6379


I just showed the minimum you need to make it work, but you could also configure tags and other things.



You should now restart the agent:
sudo /etc/init.d/datadog-agent restart


If you execute the info command:
sudo /etc/init.d/datadog-agent restart


You should see this:
Checks
======
[...]
redisdb
-------
- instance #0 [OK]
- Collected 8 metrics & 0 events


Now click on Install Integration back in your DataDog control panel.

Tada! You now have access to the following metrics:
redis.aof.buffer_length
redis.aof.last_rewrite_time
redis.aof.rewrite
redis.aof.size
redis.clients.biggest_input_buf
redis.clients.blocked
redis.clients.longest_output_list
redis.cpu.sys
redis.cpu.sys_children
redis.cpu.user
redis.cpu.user_children
redis.keys.evicted
redis.keys.expired
redis.mem.fragmentation_ratio
redis.mem.lua
redis.mem.peak
redis.mem.rss
redis.mem.used
redis.net.clients
redis.net.rejected
redis.net.slaves
redis.perf.latest_fork_usec
redis.pubsub.channels
redis.pubsub.patterns
redis.rdb.bgsave
redis.rdb.changes_since_last
redis.rdb.last_bgsave_time
redis.replication.last_io_seconds_ago
redis.replication.sync
redis.replication.sync_left_bytes
redis.stats.keyspace_hits
redis.stats.keyspace_misses


And there should also be a pre-populated dashboard. Of course you can tweak it to your liking doing what we talked about earlier.



Want to add more services? Do the same thing we just did. Other integrations may not be quite as easy, but in my experience there has been ample instruction on how to set things up.

That's it for DataDog. Enjoy!

NewRelic APM

NewRelic APM (Application Performance Monitoring) helps you understand what’s going on with your application and network for every request. You can see response times, throughput, and data transfer size to name a few. If requests are taking longer than you’d like, you can tell if it’s your application or your network slowing it down.

Getting started is quick and simple.

The first step after registering your account and selecting NewRelic APM, is to select which language you're application is written in. As you can see, this should cover most of your web apps.



I’ve only tried the PHP installation so far but it is straightforward. There was only one small issue with permissions that I show you how to avoid.

You choose what distribution you have (Debian, Ubuntu, etc..). I’ll use Debian to illustrate:

Add your key with:

wget –O – https://download.newrelic.com/548C16BF.gpg | sudo apt-key add –


This part isn’t included in the NewRelic guide, but it didn’t work for me unless I created newrelic.list before I ran the next command. It would give me a file not found error.

sudo touch /etc/apt/sources.list.d/newrelic.list


Now we can add the repository: (this is where the permission issues happened)

sudo sh –c 'echo "deb http://apt.newrelic.com/debian/ newrelic non-free"
> /etc/apt/sources.list.d/newrelic.list'


Since it wouldn’t run the > in sudo, I had to do this:

sudo –i
echo "deb http://apt.newrelic.com/debian/ newrelic non-free" >> /etc/apt/sources.list.d/newrelic.list
exit


The repository is now added in newrelic.list, which you can verify with a text editor.

Time to install and configure:

sudo apt-get update
sudo apt-get install newrelic-php5
sudo newrelic-install install


At this point it will ask you to input a license key (which is the first point in the instructions on their website).

All that’s left to do is restart your PHP services.
If you have Nginx and PHP-FPM:

sudo service nginx reload
sudo service php5-fpm restart


Your data will start displaying on the same page after a few minutes. If not, try refreshing.

NewRelic APM main dashboard

The first dashboard that pops up contains graphs that show you response times and a lot of other things.

NewRelic Servers

See which processes use most memory or CPU and get alerts for health issues. It also tracks disk I/O and networking. I really like this service, and at the time of writing it’s free for any number of servers.

The following platforms are supported, which covers a whole lot of them.

NewRelic server supported platforms

If you followed the APM installation above, then this one is even easier to install!
All you have to run is:

apt-get install newrelic-sysmond


This one does not prompt you for a license, but you can easily set it with:

sudo nrsysmond-config –set license_key=yourlicensekeyhere


Then start it with:

sudo /etc/initd/newrelic-sysmond start


Logs are saved in:

/var/log/newrelic/nrsysmond.log


It will start receiving data after a few minutes and the main dashboard looks like this:

NewRelic Server dashboard

What I really like about it is how it breaks every process down by CPU and memory consumption:

NewRelic Server processes dashboard

This is incredibly useful in finding the resource hogs and why your app is crashing down, or even just how you can make your stack more efficient and avoid going down!
You can also see network and disk activity, which are all very useful.

I’ll keep playing around with NewRelic, but so far I am impressed.

What do you use?

Do you already use a monitoring service that I haven't listed here? I'd like to hear about it!