Dynatrace Metrics Ingest

Posted on

Today we’re going to be talking about some exciting new functionality that was recently added to Dynatrace. We’ve talked about Dynatrace in this blog before, but for those who may not be familiar, Dynatrace is an all-in-one software intelligence platform and a leader in the Gartner magic quadrant for APM. Dynatrace has always been a frontrunner in understanding application performance and their AI and automation help tackle many challenges that would require countless hours of manpower.

Most of the data captured in Dynatrace, up until this point, was gathered from the Dynatrace OneAgent or from Dynatrace Extensions, which pulled data from APIs. This meant that if the metrics weren’t native to Dynatrace, they wouldn’t be consumable into the Dynatrace platform. But,

  • What if you want to keep track of a certain file’s size on a disk?
  • What if you have an important InfluxDB you want to monitor?
  • What if you want to know the number of currently running Ansible deployments, or the failed ones?

This blog will cover:

  1. A high-level overview of the “New Metrics Ingestion Methods
  2. A “Cheat Sheet” for selecting which method is best for you
  3. A “Brief Review on Ingestion Methods
  4. Shortcomings of this Early Adopter release, and what we hope to see in the future
  5. An example – “Ansible Tower Overview Dashboard

New Metrics Ingestion Methods

Historically, teams could write OneAgent plugins, but they required development effort and a knowledge of Python. Now that Dynatrace has released the new Metric ingestion, any custom metrics can be sent to the AI-powered Dynatrace platform easier than ever. There are four main ways to achieve this, and they are:

Dynatrace has already written technical blogs about how to send the metrics (linked above), so this blog will aim to discuss the pros and cons of each method, along with giving a cheat sheet on which path is likely best depending on your business use case.

Cheat Sheet

When deciding which route to take, follow this cheat sheet:

  • Is Telegraf already installed and gathering metrics? Use the Dynatrace Telegraf Plugin
    • Or, does Telegraf has an Input Plugin built in for the technology that requires monitoring? Telegraf may still be the best route because capturing the metrics will be effortless.
  • Is something already scraping metrics in StatsD format? Use the StatsD Implementation
  • If none of the above, the best route is likely to use the Metrics API v2 / OneAgent REST API.

Brief Review on Ingestion Methods

Since Dynatrace has already written about each method, except Telegraf, those details won’t be duplicated in this blog. Instead, here’s a quick overview on each Ingestion Method:

  • Dynatrace StatsD Implementation – If there’s an app that’s already emitting StatsD-formatted metrics, this implementation would be the most direct. The OneAgents listen on port 18125 for StatsD metrics sent via UDP. Dynatrace has enhanced the StatsD protocol to support dimensions (for tagging, filtering). The StatsD format is not as sleek as the new Dynatrace Metrics Syntax, so this path is not recommended unless StatsD is already present.
  • Metrics API v2 (OneAgent REST API) – There is an API endpoint listening for metrics in the Dynatrace Metrics Syntax (if you happen to be familiar with Influx’s Influx Line Protocol, it’s almost identical)
  • Dynatrace Telegraf Output – The latest releases of Telegraf now include a dedicated Dynatrace output, which makes sending metrics to Dynatrace extremely easy when Telegraf is installed. Telegraf can either push metrics to the local OneAgent or out to the Dynatrace cluster.
    • If Telegraf is not yet installed, it still may be the easiest route forward if Telegraf natively supports a technology that needs to be monitored. The list of Telegraf “inputs” can be found here. Installing Telegraf is quite easy, and the Telegraf configuration is detailed well in the Dynatrace documentation.
  • Scripting Languages (Shell) – If code has to be written to support outputting Dynatrace Metrics Syntax or StatsD metrics, the code can be slightly simplified by using the OneAgent dynatrace_ingest script provided with each OneAgent. This script can be invoked instead of writing networking code to push the metrics. Instead, metrics can simply be piped into this executable.

These ingestion methods allow Dynatrace to contend with open monitoring platforms, but they’re not without their own faults. Before moving to the example use case and dashboard, the most important caveats we discovered in Metric Ingestion will be discussed.

Early Adopter Shortcomings

Throughout evaluating this new functionality, a couple of missing features surfaced. Highlighted below are the most challenging issues faced, and then also a proposed solution to remedy the shortcoming.

No Query Language Functions

Problem – The largest shortcoming of this Explorer is the limited aggregation options presented.

Example Use Case –

  • If an ingested metric is a COUNT over time, its value can become astronomically large. For a COUNT type of metric, a user may want to see the overall count, but likely the delta is more important.
  • Another example is if there’s a metric which needs arithmetic applied to it – say the value of a query needs to be multiplied by 10 or divided by 100 – it’s not possible.
  • And another example is when the difference between two different queries needs to be calculated (CPU Used – CPU System = CPU not used by OS) – it’s also not possible.

The workaround here is to modify the metrics before they’re sent to Dynatrace, but that’s not practical for a lot of use cases.

Proposed Solution – Add mathematical operators and query functions. For example, Grafana has dozens built into its product that make data manipulation at query time very easy.

Incomplete Metrics List in Explorer

Problem – The list of metrics presented in the Custom Charts Explorer is not complete, which can be misleading.

Example use case – If a user searches for “awx” they will find up to 100 metrics with a matching name. If that user scrolls through the list, exploring the new metrics, they may believe the 100 metrics were the only ones available, leading to confusion.

Proposed Solution – The list of metrics should indicate whether the list is complete.

New Metrics Registration is Slow

Problem – The time it takes for a new metric to be registered and queryable in Dynatrace takes up to 5 minutes.

Example use case – If you are very familiar with this new Metrics Ingestion, you can send metrics and assume they will properly register. But, when new users are testing out the functionality and developing their workflows, this delay can become a real headache.

Proposed Solution – As soon as a metric has been sent, it should be registered and then shown in the Metrics Explorer. Even if the data itself hasn’t been stored, the metric name should still be queryable near instantaneously.

Although these gaps in functionality are annoying at this time, the new Metrics Ingestion still allows for insightful 3rd-party dashboards to be made.

Example – Ansible Tower Overview Dashboard

At Evolving Solutions, we’re a Red Hat Apex partner and we use a lot of Ansible. If you haven’t seen it yet, Ansible Tower is a very extensible solution for managing your deployment and configuration pipelines. I wanted to try to gather metrics from Ansible Tower’s Metrics API so I could track how many jobs were running and completed.

I wrote two applications which read from the local Ansible Tower Metrics API and scrapes those metrics. One of the apps prints the output to stdout, while the other pushes metrics via UDP to the StatsD Metrics listening port. The one which writes to stdout can be used as a Telegraf input (exec input) or piped into the dynatrace_ingest script.

With the data sent to Dynatrace, I made an example dashboard of how these metrics could be used. In the dashboard, I leveraged

  • Dynatrace (Agent-gathered) Metrics:
    • Host Health Overview
    • Host Metrics
    • Automatically Detected Problems
  • Ansible Tower Metrics (through the Telegraf metrics ingest):
    • Overall Job Status & Status Over Time (Successful vs Failed vs Cancelled jobs)
    • Tower Job Capacity, number of Executing Jobs, and the number of Pending Jobs
    • Ansible Tower Usage Stats (User count, Organizations count, Workflow count)

As you can see, sending these extra Ansible Tower Metrics to Dynatrace allows us to build a detailed overview of the Ansible Tower platform. With new features like open Metrics Ingestion, Dynatrace is continuing to differentiate itself and disrupt the APM market.

Ansible Tower monitoring is a great use case, but it’s only one of an endless number of use cases – do you have any systems you’d like deeper monitoring into with Dynatrace? Reach out to us at Evolving Solutions and we can help you gain complete visibility of your critical systems.

(21/01/11) – A previous version of this blog said that metrics could not have dimensions added or removed after they’ve been set. After speaking with Dynatrace Product Management, it was discovered that this is not true, and instead an obscure edge case was encountered. If you encounter hiccups with the new Metrics Ingestion, click the “Contact Us” button below.

Evolving Solutions Author:
Brett Barrett
Senior Delivery Consultant
Enterprise Monitoring & Analytics
brett.b@evolvingsol.com

Key Takeaways from IBM Think 2019

Posted on

By Bo Gebbie, Vice President, Sales, Services and Marketing

During the week of February 11, IBM hosted IBM Think 2019 in San Francisco, CA.  In addition to Think, the annual IBM PartnerWorld Conference was also held during the same time.  Hosting these events together saw almost 30,000 attendees descend on San Francisco for the week.

Evolving Solutions had several team members and clients attend the event.  It was an outstanding opportunity to learn about IBM’s 2019 strategic imperatives, new solution announcements and new channel program enhancements.  Additionally, the event afforded us the opportunity to network with hundreds of clients and IBMers from across the country.

As I reflect back on Think, there were several key take-aways for me.

  • Hybrid Cloud is where it’s at. According to the IBM Institute for Business Value, clients are already using between two to 15 clouds.  These environments might be on-premise or in the public cloud.  As the number of cloud silos continue to grow, clients need an easier way to connect the applications, no matter where the data resides.  During Think, the IBM Cloud Integration Platform was announced.  This platform is “designed to securely connect applications, software and services from any vendor regardless of whether those systems are on-premises, in a public cloud or a private cloud,” according to IBM.
  • Data.  Data. Data.  According to IBM CEO Ginny Rometty, “only 20% of the world’s data is searchable.”  Applications such as Office365, CRM, expense management, etc. were the “easy apps” to get into the cloud.  There is 80% of data out there that is critical to an organization’s overall success.  Those core systems will be the focus of the next era.  How do clients utilize that data to make gather meaningful business insights, and make-real time decisions?  IBM is placing big bets on “the next 80%” to help grow its solution portfolio.
  • Artificial Intelligence will transform how we all work. Many people still think of AI as something you would see in a Sci-Fi movie.  But, according to Rometty and several executives we spoke with, AI will be embedded in applications or layered on top of them in the future.  AI will enable new or improved business processes.  AI will enable every single person to do their job in a more efficient or data-led way.  AI will not replace jobs; rather it will augment how we do work.  I like the theory, and look forward to seeing how this plays out over time.
  • Red Hat is key to IBM’s future. The IBM executives could not say a ton about the Red Hat acquisition, since the deal has not closed yet. But, every single IBMer was excited for what Red Hat can bring to IBM in terms of new client opportunity, innovating the product stack and keeping IBM relevant to the next generation of buyers.
  • Holy moly, the Moscone Center is huge. One of my non-business learnings is that the Moscone Center in downtown San Francisco is enormous.  It is no small feat to fit tens of thousands of people into one place!

Over the next few months, we will have Evolving Solutions team members at several of our other strategic partners’ premier conferences.  We will share our insights from those events on our blog.

I am looking forward to returning to San Fran next May for Think 2020.  Hope to see you there!

Bo Gebbie joined Evolving Solutions in January, 2017 as the Vice President of Sales, Services and Marketing. In this role, he is responsible for the sales, pre-sales services, post-sales consulting and marketing pillars within the organization. He is a member of the Executive and Management Teams, and represents the organization externally on vendor and partner Advisory Councils.

Before joining Evolving Solutions, Bo was an IBM executive, serving last as Vice President of Server Solution Sales, North America. In that role, he had sales and profit responsibility for a $3B portfoilio across Power Systems, z Systems and the Server Linux brands. Prior to that, Bo was the Business Unit Executive, Power Systems Sales, North America.

Like what you read? Follow Bo on LinkedIn.