Using Synthetic Transaction Monitoring for an Outside-in View of App Health

Using Synthetic Transaction Monitoring for an Outside-in View of App Health James Baldassari October 15, 2020

March 26, 2026 · 9 min read · Testing Guide

Using Synthetic Transaction Monitoring for an Outside-in View of App Health

James Baldassari

October 15, 2020

We love metrics. The founders of mabl also foundStackdriver, which was primitively a SaaS solution to grapple metrics, splashboard, and alerting at scale. Many of mabl ’ s engineers hold deal operations for large scale,cloud-basedapplications. We rely on metric-based alerts to help us understand the health of our infrastructure and systems, and even waken us up in the middle of the night when there are problems. But for all our love for those measurements, we ’ ve semen to take the undermentioned fact:

“Good”metrics are insufficient to prove that our application is salubrious from a user perspective.
“Bad”metrics are deficient to prove that our coating is unhealthy from a user position.

So, how do we demonstrate that an application is (or isn ’ t) healthy from a user perspective? -- with synthetic dealing monitoring (a.k.a. “ Synthetics, ” “ Proactive Monitoring, ” “ Active Monitoring, ” “ Testing in Production ”), which is delimit as:

“ ... a monitoring technique that is execute by using an emulation or script transcription of dealing. Behavioral scripts (or itinerary) are created to simulate an action or path that a client or end-user would take on a site, covering or other package (or even hardware). Those paths are then continuously monitor at specified intervals for execution, such as: functionality, availability, and response time measures. ”

At mabl, we unite synthetic (deliver by mabl!) and metrics (delivered by Stackdriver!) to provide a comprehensive, real-time perspective into the wellness of our application. Here ’ s a quick overview of what we get from each.

Detecting user-impacting number quick

We use mabl trial (referred to as “ synthetics ” hereafter) to validate all of our key user journeys endlessly in production and to alarm us (via OpsGenie) upon failure. This detect issues faster than on metrics because our metrics vary based on day, book, and early factors, which compels us to wait for several “ unnatural ” samples before alerting. Synthetics, on the other handwriting, are controlled; they behave the same way with the same results on every run, so we solely need a individual failure before alerting.

Let ’ s look at 5XX response codes on a relatively low-volume but critical part of the coating (where we did not have any alertable incidents) over the preceding month:
How would you craft an alert that find an issue quickly? Wherever I set the threshold, given the spikiness in the pace, I ’ d be inclined to require two or more unnatural sample, which would be a error, because an incident could seem like this:

Notice that the spikes are several time higher than the spikes in the first graph. The inaugural unnatural measurement was at 12:12 AM, but it wasn ’ t for another four hours that we had two consecutive measurements that you would relegate as unnatural.

An effective trial could have caught the issue 4 hr sooner than the metrics, because we would have tested the specific scenario that triggered the first error in a controlled and true way.

Detecting system or base issues quickly

While synthetics help us identify user experience issues rapidly, metrics ply us with better visibility into the wellness of our scheme and infrastructure. Using monitoring tools such as Google Ops and Datadog, you get real-time insight into your entire stack, whether you ’ re primarily concerned with capacity, uptime, utilization, errors, or latency. You can configure alerts to notify you when metric values or tendency are abnormal, and you can automate some remediation actions (such as mechanically replacing a cluster node that appears unhealthy).

Metric trending is also useful because, unlike synthetics, metrics can be prognosticative. For model, we can use metric to alarm us when scheme are approaching their quota, whereas synthetic will continue to appear “ healthy ” until the mo when that quota is exhausted and the user is impacted.

Detecting application issues chop-chop

Aggregate covering metrics and synthetic both play important roles in detecting application-level issues. Metrics from traditional Application Performance Management (APM) tools such as New Relic and AppDynamics are priceless to detect changes in aggregative latency and fault at the covering level. It is critical, however, to match these “ inside out ” metric with “ outside-in ” signals.

Real User Monitoring (RUM) features add aggregated information about the wellness of the coating as experienced by all end user. This is worthful for evaluating trends but less useful for real-time alert given the downside of aggregate metrics noted above. Synthetics can add the existent time “ outside-in ” view. While it could take considerable time to observe changes in total metrics after a code deployment, for illustration, contiguous triggers for synthetics can yield insightful consequence.

Diagnosing origin campaign

Pro tip: Tools like SUSA can handle this autonomously — upload your app and get results without writing a single test script.

Understandingwhyan alert fired is as important as feature reliable alert. The first action an on call engineer will take, whether paged mid-workday or at 3 AM on a weekend, is to determine why the alert fired, so they can guide the corrective activeness.

Typical metrical alerts entirely narrate youwhatwas in error, such a high API latency, low disk space, or an unavailable waiter. The on call engineer must still use their skills and knowledge of the system to flux the alerting with additional data to determine the grounds.

Conversely, alerts based on synthetics keep rich examination detail captured during the failed execution. For example, with mabl exam, an technologist can merely click through the alert to the resultant, to immediately review screenshots, HAR net logs, and browser touch, trim the necessary mental hop. Further, while monitoring metric alarm are a “ delay and see if it resort ” approach, synthetic can be rehear on demand to confirm or clear an error, such as mabl ’ s “ Rerun ” push on Slack alerting.

Limiting false positives

Since alerts will sometimes be erroneous, it ’ s important to realise how an alert can be wrong. In statistical testing these error cases are called false positives and mistaken negatives, or in plain English, reporting an outage when there is no outage, or describe proper scheme operation when there is an outage.

Due to the breadth of samples usable to a metric based alert, simply triggering off a individual out of range sample, like an API 500 status codification, will lead to many false positives from the perspective of an application user, specially if your UI doesn ’ t still use the affected API terminus. Conversely, since a test based alerting indicates that a critical user flow can not be discharge like processing a checkout, a single failed examination can powerfully indicate a job with the scheme that is probable to be impacting the user experience. Overall, measured based alarm are more prone to false positives than exam based alerts due to the coupler of test based alerts to real world user flows.

Limiting false negatives

The case of false positives is the opposite relationship discussed above. The higher the sampling frequency feeding an alert, the low-toned the risk of missing a sampling that indicates an error state. Since metrics are often produced on the order of multiple sample per instant, or even many per second, a metric based alert can react quickly to a province alteration in the monitored covering. Additionally, it is potential for a measured to consume all interactions with a system, such as every API call do.

Because trial based prosody require running a total exploiter emulate test to make a measurement, exam based alerts get a graininess of 1-15 minutes. This lower taste pace means meaningful impingement can occur to the monitored coating that goes unnoticed, falling between the taking of measurements, and increase the likelihood of returning a false negative that a troubled scheme is in a good state.

How do you balance these opposing benefit of measured and test alarum for application monitoring? At mabl we unite them produce an accurate image of mabl ’ s system wellness, supervise both our core REST APIs utilise prosody and our critical user flows using a rich mabl test suite.

Limiting up-front investing

The up-front investment required for monitoring depends significantly on two factors: the architecture of the thing being monitor and the scale of prosody accumulation and analysis. Infrastructure deploy to a major cloud provider will have entree to integrated cloud-based monitoring and alerting service such as AWS CloudWatch or Google Cloud Monitoring. However, squad with base deployed on-prem will likely have to instrument their own scalable and true prosody collection system. Regardless of where the infrastructure is deployed, it is often necessary to instrument or modify application code to generate the metrics that are indicative of coating wellness.

The inauguration cost for synthetics can be quite low in comparison. Custom testing solutions can be built using a pocket-size cluster of machine that run exam on fixed agenda, on demand, or in response to event such as those created by CI/CD system.SaaStesting solutions such as mabl can further reduce the initial clip investing by negociate the exam infrastructure and scheduling, with the added benefit of easier test creation and maintenance, robust reporting, and rich symptomatic data as noted above.

Limiting toll of additional signals/alarms

The incremental price of adding metrics is usually quite low once the monitoring system is in place. Code qualifying may be necessary to add or modify, but these alteration require little effort.

Adding a new test much requires a large time investment than lend a new metric. Whether the trial is written in codification using a framework like Selenium or with a codeless examination solution like mabl, the examination author must take the clip to see that the tryout is robust plenty to swear its results. Although it may lead more clip to implement a new test compared to adding a new metric, tests can ordinarily be developed by a greater number of citizenry in a given organisation -- particularly if you use a low-code solution like mabl.

Limiting ongoing maintenance

The cost of ongoing maintenance for metric varies based on whether a squad is grapple their own system for collecting and analyse metrics or habituate a cloud-based service. Teams who contend their own metric systems are probable to experience eminent substructure maintenance overhead, specially as the volume of metrics being tracked -- and thus the base to store, canvass, and store these metrics -- scales. Cloud-based prosody do not require significant maintenance. In either case, there is, of trend, some overhead associate with configuring alerts, dashboards, and so forth.

The overhead associated with synthetics depends on how often the application is undergoing significant changes and how lively the tests are to change. Writing tests that will yet function aright as the covering changes is not picayune, but many squad are capable to achieve this result with sufficient planning and coordination between tryout author and application developers. AI-enabled package testing tool like mabl can further reduce the amount of feat required by mechanically updating tryout as the application evolves.

So, should your squad use synthetic or traditional metrics to monitor application health?

Generally, if you ’ re delivering a customer-facing application, you need synthetics as well as base, system, and application metrics to maintain a complete view of the wellness of your service. If you ’ re precisely getting started, we believe that it ’ s best to part with the key performance indicators for your business, and work backwards from there. If you have uptime targets as part of your KPIs, you 're potential to want synthetics to establish that uptime. If you hold application-specific KPIs (such as minutes or sign-ups), you ’ re belike going to want to define counters for those via custom instrumentation within the application or via integration with your pet logging service. In any case, depart with the KPIs will provide a forcing function so that you measure what affair.

Want to see for yourself how you can quiz and validate your key user journeys with & nbsp; mabl?Sign up for a free trial today!

Quality Engineering Resources

Automate This With SUSA

Upload your APK or URL. SUSA explores like 10 real users — finds bugs, accessibility violations, and security issues. No scripts needed.

Try SUSA Free

Test Your App Autonomously

Upload your APK or URL. SUSA explores like 10 real users — finds bugs, accessibility violations, and security issues. No scripts.

Try SUSA Free

Using Synthetic Transaction Monitoring for an Outside-in View of App Health

Using Synthetic Transaction Monitoring for an Outside-in View of App Health

Detecting user-impacting number quick

Detecting system or base issues quickly

Detecting application issues chop-chop

Diagnosing origin campaign

Limiting false positives

Limiting false negatives

Limiting up-front investing

Limiting toll of additional signals/alarms

Limiting ongoing maintenance

So, should your squad use synthetic or traditional metrics to monitor application health?

Quality Engineering Resources

Automate This With SUSA

Test Your App Autonomously

Related Articles