Monitoring Docker Containers with AppDynamics

Posted on

As more organizations break down their monolithic applications into dozens of microservices, monitoring containers is more important, and complicated, than ever. Fortunately, AppDynamics has built Docker Monitoring to give you end-to-end monitoring for all your Docker containers. This allows you to immediately know when you have containers that are causing resource contention issues. This can also help you solve infrastructure-related issues with your containers dynamically. You can see below how AppD gives an overview of your running containers and details into each of them:

The list of running containers

Container Insights

This information is invaluable, but when searching through AppDynamics documentation, it can be confusing to understand how to set up container monitoring. Making this even more challenging is the provided AppDynamics images, as of this time, are not up to date with the current agents. Evolving Solutions recognizes this problem and has addressed it by automatically building new Docker Images when new AppDynamics Machine Agents are released. If you would like to use our images, you can find our repo here. Otherwise, if you’d like to build your own image of the agent, continue reading on.

In this blog, I’m going to walk you through the process of creating your own Docker Image to run the AppDynamics Machine Agent. When you run the image as a sidecar to your Docker containers, it will provide:

  • Infrastructure Information for your containers, such as
    • Container metadata
    • Tags
      • Name-Value pairs derived from Docker/Kubernetes
      • AWS tags where applicable
    • Infrastructure health insights, such as CPU, memory, network, and disk usage

Prerequisites

  1. Container monitoring requires a Server Visibility license >=4.3.3, for both the Controller and the Machine Agent.
  2. AppDynamics recommends that you use Docker CE/EE 17.03 or Docker Engine 1.13 with this product.
  3. Container Monitoring is not supported on Docker for Windows, or Docker for Mac.
  1. Server Visibility Enabled – Enable Server Visibility – Shown in sample docker file below (1.1)
  2. Docker Enabled – Enable Docker Visibility – Shown in sample docker file below (1.1)

Creating the Dockerfile

  1. Download Machine agent installer for Linux with bundled JRE and unzip and then rezip it as machine-agent.zip. This can be done here: https://download.appdynamics.com/download/.

This is important, sometimes the zip from the download is not read properly to create the folder structure. Do this on the machine where the docker image will be run.

  1. Create a directory named MachineAgent on your machine where you will run the docker instance. e.g. /Users/<username>/Docker/DockerVisibilty/MachineAgent. (Or to any directory of your choosing)
  2. Copy the machine-agent.zip to this location.
  3. Create a new file called Dockerfile with the following code and give 744 permission:
# Sample Dockerfile for the AppDynamics Standalone Machine Agent
FROM ubuntu:14.04
# Install required packages
RUN apt-get update && apt-get upgrade -y && apt-get install -y unzip && apt-get clean
# Install AppDynamics Machine Agent
ENV APPDYNAMICS_HOME /opt/appdynamics
ADD machine-agent.zip /tmp/
RUN mkdir -p ${APPDYNAMICS_HOME} && unzip -oq /tmp/machine-agent.zip -d ${APPDYNAMICS_HOME} && rm /tmp/machine-agent.zip
# Setup MACHINE_AGENT_HOME
ENV MACHINE_AGENT_HOME /opt/appdynamics/machine-agent
#Comment this section only if you are using docker-compose to build/run the machine-agent container
ENV MA_PROPERTIES "-Dappdynamics.controller.hostName=<<ControllerHost>> -Dappdynamics.controller.port=<<ControllerPort>> -Dappdynamics.controller.ssl.enabled=false -Dappdynamics.agent.accountName=<<accountname>> -Dappdynamics.agent.accountAccessKey=<<accesskey>> -Dappdynamics.sim.enabled=true -Dappdynamics.docker.enabled=true -Dappdynamics.docker.container.containerIdAsHostId.enabled=true"
# Include start script to configure and start MA at runtime
ADD start-appdynamics ${APPDYNAMICS_HOME}
RUN chmod 744 ${APPDYNAMICS_HOME}/start-appdynamics
# Configure and Run AppDynamics Machine Agent
CMD "${APPDYNAMICS_HOME}/start-appdynamics"

Depending on how you are building and running the machine-agent container in docker (i.e. via docker-compose or docker build/docker run), you’ll need to comment/un-comment the portions in the file. This script will set the AppDynamics specific environment variable needed for the MachineAgent and execute the machineagent.jar file.

5. This docker file will:

    • Use an ubuntu:14.04 base image (you can use any base image you want)
    • Install the unzip package
    • Copy the machine-agent.zip to the /tmp directory of the image
    • Extract the MachineAgent artifacts to /opt/appdynamics/machine-agent/
    • Clear up the /tmp directory
    • Copy the MachineAgent startup script called start-appdynamics onto /opt/appdynamics/machine-agent/ directory
    • Run the script

Note: we are using our own controller parameters in MA_PROPERTIES environment variable in the Dockerfile. You’ll need to use your own controller information in this environment variable.

Creating the Docker Start Script

  1. Create another file called start-appdynamics in the same MachineAgent folder with the following:
#!bin/bash
# Sample Docker start script for the AppDynamics Standalone Machine Agent
# In this example, APPD_* environment variables are passed to the container at runtime
# Uncomment all the lines in the below section when you are using docker-compose to build and run machine-agent container
#MA_PROPERTIES="-Dappdynamics.controller.hostName=${APPD_HOST}"
#MA_PROPERTIES+=" -Dappdynamics.controller.port=${APPD_PORT}"
#MA_PROPERTIES+=" -Dappdynamics.agent.accountName=${APPD_ACCOUNT_NAME}"
#MA_PROPERTIES+=" -Dappdynamics.agent.accountAccessKey=${APPD_ACCESS_KEY}"
#MA_PROPERTIES+=" -Dappdynamics.controller.ssl.enabled=${APPD_SSL_ENABLED}"
#MA_PROPERTIES+=" -Dappdynamics.sim.enabled=${APPD_SIM_ENABLED}"
#MA_PROPERTIES+=" -Dappdynamics.docker.enabled=${APPD_DOCKER_ENABLED}"
#MA_PROPERTIES+=" -Dappdynamics.docker.container.containerIdAsHostId.enabled=${APPD_CONTAINERID_AS_HOSTID_ENABLED}"
# Start Machine Agent
${MACHINE_AGENT_HOME}/jre/bin/java ${MA_PROPERTIES} -jar ${MACHINE_AGENT_HOME}/machineagent.jar
  1. Give appropriate read/write permissions for the file (i.e. 777)

Creating the Docker Build Script

Create a script called build-docker.sh in the same MachineAgent folder with the following:

docker build -t appdynamics/docker-machine-agent:latest

Note: This file also needs appropriate read/write permissions. If you use docker-compose then this is not needed.

Creating the Docker Run Script

Create a script called run-docker.sh in the same MachineAgent folder with the following:

docker run --rm -it -v /:/hostroot:ro -v /var/run/docker.sock:/var/run/docker.sock appdynamics/docker-machine-agent

Note: Give appropriate read/write permission to this file. Again, if docker-compose is used this is not needed.

Build and Run the Image

To build the image run ./build-docker.sh and then to run the docker image run ./run-docker.sh

Docker-Compose

If you wish to use docker-compose, create a file docker-compose.yml in the same MachineAgent directory and the following code.

version: '3'
services:
docker-machine-agent:
build: .
container_name: docker-machine-agent
image: appdynamics/docker-machine-agent
environment:
- APPD_HOST=<<CONTROLLER HOST>>
- APPD_PORT=<<CONTROLLER PORT>>
- APPD_ACCOUNT_NAME=<<CONTROLLER ACCOUNT>>
- APPD_ACCESS_KEY=<<CONTROLLER ACCESS KEY>>
- APPD_SSL_ENABLED=false
- APPD_SIM_ENABLED=true
- APPD_DOCKER_ENABLED=true
- APPD_CONTAINERID_AS_HOSTID_ENABLED=true
volumes:
- /:/hostroot:ro
- /var/run/docker.sock:/var/run/docker.sock

Use the commands docker-compose build and docker-compose run to build and run respectively.

Automation

Would you like to learn how Evolving Solutions used the steps above to automate the build and deploy of newly released AppDynamics Docker agents? Look out for our upcoming blog posts!

Getting started with AppDynamics

If you don’t have an AppDynamics account, you can start your own free trial here.

After you’ve created your account, you can visit AppDynamics’s University to view hundreds of bite-size videos covering all things AppD.

Evolving Solutions Author:
Steven Colliton
Delivery Consultant
Enterprise Monitoring & Analytics
steven.c@evolvingsol.com

Dynatrace Metrics Ingest

Posted on

Today we’re going to be talking about some exciting new functionality that was recently added to Dynatrace. We’ve talked about Dynatrace in this blog before, but for those who may not be familiar, Dynatrace is an all-in-one software intelligence platform and a leader in the Gartner magic quadrant for APM. Dynatrace has always been a frontrunner in understanding application performance and their AI and automation help tackle many challenges that would require countless hours of manpower.

Most of the data captured in Dynatrace, up until this point, was gathered from the Dynatrace OneAgent or from Dynatrace Extensions, which pulled data from APIs. This meant that if the metrics weren’t native to Dynatrace, they wouldn’t be consumable into the Dynatrace platform. But,

  • What if you want to keep track of a certain file’s size on a disk?
  • What if you have an important InfluxDB you want to monitor?
  • What if you want to know the number of currently running Ansible deployments, or the failed ones?

This blog will cover:

  1. A high-level overview of the “New Metrics Ingestion Methods
  2. A “Cheat Sheet” for selecting which method is best for you
  3. A “Brief Review on Ingestion Methods
  4. Shortcomings of this Early Adopter release, and what we hope to see in the future
  5. An example – “Ansible Tower Overview Dashboard

New Metrics Ingestion Methods

Historically, teams could write OneAgent plugins, but they required development effort and a knowledge of Python. Now that Dynatrace has released the new Metric ingestion, any custom metrics can be sent to the AI-powered Dynatrace platform easier than ever. There are four main ways to achieve this, and they are:

Dynatrace has already written technical blogs about how to send the metrics (linked above), so this blog will aim to discuss the pros and cons of each method, along with giving a cheat sheet on which path is likely best depending on your business use case.

Cheat Sheet

When deciding which route to take, follow this cheat sheet:

  • Is Telegraf already installed and gathering metrics? Use the Dynatrace Telegraf Plugin
    • Or, does Telegraf has an Input Plugin built in for the technology that requires monitoring? Telegraf may still be the best route because capturing the metrics will be effortless.
  • Is something already scraping metrics in StatsD format? Use the StatsD Implementation
  • If none of the above, the best route is likely to use the Metrics API v2 / OneAgent REST API.

Brief Review on Ingestion Methods

Since Dynatrace has already written about each method, except Telegraf, those details won’t be duplicated in this blog. Instead, here’s a quick overview on each Ingestion Method:

  • Dynatrace StatsD Implementation – If there’s an app that’s already emitting StatsD-formatted metrics, this implementation would be the most direct. The OneAgents listen on port 18125 for StatsD metrics sent via UDP. Dynatrace has enhanced the StatsD protocol to support dimensions (for tagging, filtering). The StatsD format is not as sleek as the new Dynatrace Metrics Syntax, so this path is not recommended unless StatsD is already present.
  • Metrics API v2 (OneAgent REST API) – There is an API endpoint listening for metrics in the Dynatrace Metrics Syntax (if you happen to be familiar with Influx’s Influx Line Protocol, it’s almost identical)
  • Dynatrace Telegraf Output – The latest releases of Telegraf now include a dedicated Dynatrace output, which makes sending metrics to Dynatrace extremely easy when Telegraf is installed. Telegraf can either push metrics to the local OneAgent or out to the Dynatrace cluster.
    • If Telegraf is not yet installed, it still may be the easiest route forward if Telegraf natively supports a technology that needs to be monitored. The list of Telegraf “inputs” can be found here. Installing Telegraf is quite easy, and the Telegraf configuration is detailed well in the Dynatrace documentation.
  • Scripting Languages (Shell) – If code has to be written to support outputting Dynatrace Metrics Syntax or StatsD metrics, the code can be slightly simplified by using the OneAgent dynatrace_ingest script provided with each OneAgent. This script can be invoked instead of writing networking code to push the metrics. Instead, metrics can simply be piped into this executable.

These ingestion methods allow Dynatrace to contend with open monitoring platforms, but they’re not without their own faults. Before moving to the example use case and dashboard, the most important caveats we discovered in Metric Ingestion will be discussed.

Early Adopter Shortcomings

Throughout evaluating this new functionality, a couple of missing features surfaced. Highlighted below are the most challenging issues faced, and then also a proposed solution to remedy the shortcoming.

No Query Language Functions

Problem – The largest shortcoming of this Explorer is the limited aggregation options presented.

Example Use Case –

  • If an ingested metric is a COUNT over time, its value can become astronomically large. For a COUNT type of metric, a user may want to see the overall count, but likely the delta is more important.
  • Another example is if there’s a metric which needs arithmetic applied to it – say the value of a query needs to be multiplied by 10 or divided by 100 – it’s not possible.
  • And another example is when the difference between two different queries needs to be calculated (CPU Used – CPU System = CPU not used by OS) – it’s also not possible.

The workaround here is to modify the metrics before they’re sent to Dynatrace, but that’s not practical for a lot of use cases.

Proposed Solution – Add mathematical operators and query functions. For example, Grafana has dozens built into its product that make data manipulation at query time very easy.

Incomplete Metrics List in Explorer

Problem – The list of metrics presented in the Custom Charts Explorer is not complete, which can be misleading.

Example use case – If a user searches for “awx” they will find up to 100 metrics with a matching name. If that user scrolls through the list, exploring the new metrics, they may believe the 100 metrics were the only ones available, leading to confusion.

Proposed Solution – The list of metrics should indicate whether the list is complete.

New Metrics Registration is Slow

Problem – The time it takes for a new metric to be registered and queryable in Dynatrace takes up to 5 minutes.

Example use case – If you are very familiar with this new Metrics Ingestion, you can send metrics and assume they will properly register. But, when new users are testing out the functionality and developing their workflows, this delay can become a real headache.

Proposed Solution – As soon as a metric has been sent, it should be registered and then shown in the Metrics Explorer. Even if the data itself hasn’t been stored, the metric name should still be queryable near instantaneously.

Although these gaps in functionality are annoying at this time, the new Metrics Ingestion still allows for insightful 3rd-party dashboards to be made.

Example – Ansible Tower Overview Dashboard

At Evolving Solutions, we’re a Red Hat Apex partner and we use a lot of Ansible. If you haven’t seen it yet, Ansible Tower is a very extensible solution for managing your deployment and configuration pipelines. I wanted to try to gather metrics from Ansible Tower’s Metrics API so I could track how many jobs were running and completed.

I wrote two applications which read from the local Ansible Tower Metrics API and scrapes those metrics. One of the apps prints the output to stdout, while the other pushes metrics via UDP to the StatsD Metrics listening port. The one which writes to stdout can be used as a Telegraf input (exec input) or piped into the dynatrace_ingest script.

With the data sent to Dynatrace, I made an example dashboard of how these metrics could be used. In the dashboard, I leveraged

  • Dynatrace (Agent-gathered) Metrics:
    • Host Health Overview
    • Host Metrics
    • Automatically Detected Problems
  • Ansible Tower Metrics (through the Telegraf metrics ingest):
    • Overall Job Status & Status Over Time (Successful vs Failed vs Cancelled jobs)
    • Tower Job Capacity, number of Executing Jobs, and the number of Pending Jobs
    • Ansible Tower Usage Stats (User count, Organizations count, Workflow count)

As you can see, sending these extra Ansible Tower Metrics to Dynatrace allows us to build a detailed overview of the Ansible Tower platform. With new features like open Metrics Ingestion, Dynatrace is continuing to differentiate itself and disrupt the APM market.

Ansible Tower monitoring is a great use case, but it’s only one of an endless number of use cases – do you have any systems you’d like deeper monitoring into with Dynatrace? Reach out to us at Evolving Solutions and we can help you gain complete visibility of your critical systems.

(21/01/11) – A previous version of this blog said that metrics could not have dimensions added or removed after they’ve been set. After speaking with Dynatrace Product Management, it was discovered that this is not true, and instead an obscure edge case was encountered. If you encounter hiccups with the new Metrics Ingestion, click the “Contact Us” button below.

Evolving Solutions Author:
Brett Barrett
Senior Delivery Consultant
Enterprise Monitoring & Analytics
brett.b@evolvingsol.com

Tackling Common Mainframe Challenges

Posted on

Today, we’re going to talk about the mainframe. Yes, that mainframe which hosts more transactions, daily, than Google; that mainframe which is used by 70% of Fortune 500 companies; that mainframe which is currently seeing a 69% sales rise since last quarter. Over the decades, analysts predicted the mainframe would go away, particularly in current years with the ever-expansive public cloud, but it’s challenging to outclass the performance and reliability of the mainframe, especially when new advancements have been made such that 1 IBM Z system can process 19 billion transactions per day.

Since the mainframe seems here to stay and it’s a critical IT component, we need to make sure it’s appropriately monitored. If you Google “Mainframe Monitoring Tools”, you’ll find a plethora of bespoke mainframe tools. Most of these tools are great at showcasing what’s happening inside the mainframe…but that’s it. Yes, of course we need to know what’s happening in the mainframe, but when leadership asks, “why are we having a performance issue”, silo’d tools don’t provide the necessary context to understand where the problem lays. So, what can provide this critical context across multiple tools and technologies?

The Dynatrace Software Intelligence Platform was built to provide intelligent insights into the performance and availability of your entire application ecosystem. Dynatrace has been an industry leader in the Gartner APM quadrant ever since the quadrant was created and it’s trusted by 72 of the Fortune 100. Maybe more pertinent to this blog, Dynatrace is built with native support for IBM mainframe, in addition to dozens of other commonly-used enterprise technologies. With this native support for mainframe, Dynatrace is able to solve some common mainframe headaches, which we’ll discuss below.

End-to-End Visibility

As mentioned earlier, tailored mainframe tools allow us to understand what’s happening in the mainframe, but not necessarily how or why. Dynatrace automatically discovers the distributed applications, services and transactions that interact with your mainframe and provides automatic fault domain isolation (FDI) across your complete application delivery chain.

In the screenshot above, we can see the end-to-end flow from an application server (Tomcat), into a queue (IBM MQ), and then when that message was picked up and processed by the mainframe “CICS on ET01”. With this service-to-service data provided automatically, out-of-the-box, understanding your application’s dependencies and breaking points has never been easier.

Quicker Root Cause Analysis (RCA)

With this end-to-end visibility, RCA time is severely reduced. Do you need to know if it was one user having an issue, or one set of app servers, or an API gateway, or the mainframe? Dynatrace can pinpoint performance issues with automated Problem Cards to give you rapid insight into what’s happening in your environment.

When there are multiple fault points in your applications, Dynatrace doesn’t create alert storms for each auto-baselined metric. Instead, Dynatrace’s DAVIS AI correlates those events, with context, to deliver a single Problem, representing all related impacted entities. An example of a Problem Card is displayed below:

In the screenshot above, there are a couple key takeaways:

  1. In the maroon square, Dynatrace tells you the Business impact. When multiple problems arise, you can prioritize which are the most important by addressing first the Problem with the largest impact
  2. In the blue square, Dynatrace provides the underlying root cause(s). Yes, we can also see there were other impacted services, but at the end of the day, long garbage-collection time cause slow response times.
    1. This is a critical mainframe example. Yes, the mainframe is not the root cause in this case, but that’s great to know! Now, we don’t have to bring in the mainframe team, or even the database team. We can go straight to the frontend app developers and start talking garbage collection strategies.
  3. Finally, Dynatrace is watching your entire environment with intuitive insights into how your systems interact. Because this problem was so large, there were over a billion dependencies (green square) that were analyzed before providing you a root cause. There is simply no way this could be manually.

Optimize and Reduce Mainframe Workloads

The IBM Mainframe uses a consumption-based licensing model, where the cost is related to the number of transactions executed (MSUs; “million service units”). As more and more applications are built that rely on the mainframe, the number of MIPS required increases. Tools that focus only on understanding what happens in the mainframe can tell you 10,000 queries were made, but not why. Because Dynatrace provides end-to-end visibility from your end user to your hybrid cloud into the mainframe, it can tell you exactly where those queries came from. These insights are critical to identifying potential optimization candidates, and can help you tackle your MIPS death from a thousand paper cuts.

In the screenshot below, you can see that (in green) that 575 messages were read off the IBM MQ, but then that caused 77,577 interactions on the mainframe! Likely, there is room for great optimization here.

Yes, those requests may have executed quickly, but maybe they could been optimized so that only 10 mainframe calls needed to be executed, or even 5, as opposed to ~135. Without Dynatrace, it is an intense exercise for mainframe admins to track down all of the new types of queries that are being sent to them.

In Closing

With Dynatrace, all of your teams can share a single pane of glass to visualize where performance degradations and errors are introduced across the entire application delivery chain. With its native instrumentation of IBM mainframe, Dynatrace provides world-class insights into what’s calling your mainframe, and how that’s executing inside the system.

Now that we’ve discussed the common mainframe headaches of end-to-end visibility, root cause analysis, and workload optimization, it’s time to conclude this high-level blog. Hopefully this blog has given you insight into common use cases where Dynatrace provides immense value to mainframe teams, application developers, and business owners. Soon, we’ll be following up with a more technical Dynatrace walkthrough to show you exactly how to get to this data.

Until then, if you have any questions or comments, feel free to reach out to us at ema@evolvingsol.com. We’d love to chat and learn the successes and challenges you have with monitoring your mainframe environments.

Evolving Solutions Author:
Brett Barrett
Senior Delivery Consultant
Enterprise Monitoring & Analytics
brett.b@evolvingsol.com