Using Sauce Breakpoints to Find and Fix Flakey Tests

Sauce AI for Test Authoring: Move from intent to execution in minutes.|xBack to ResourcesBlogPosted

February 02, 2026 · 7 min read · Testing Guide

Sauce AI for Test Authoring: Move from intent to execution in minutes.

Blog

Posted September 14, 2012

Using Sauce Breakpoints to Find and Fix Flakey Tests

Spoiler Alert: If you read this clause, you & # x27; ll be one of the first to discover about a previously-unpublicized feature from Sauce. It & # x27; s like an easter egg!

It probably get as no surprise that at Sauce we write a lot of Selenium tests. Our website needs good test coverage, just like our customer & # x27; apps. We have a form that lead all of these tests (and many more unit tests besides) after every clod of commits. If tryout betray in our physique, it stays & quot; red & quot; until someone commits a fix. During that clip, we can & # x27; t deploy the new code, and it & # x27; s our custom to not even advertize more commits on top while the build is red, so the job can be diagnose and fixed without rarify matters. In other words, it & # x27; s a big deal when the build breaks because it is potentially interfering with former developer & # x27; workflows. That & # x27; s one of the reasons we pull out our hair and vociferation smut when we encounter flakes in our build. A flake occurs when a test that normally passes, or passes under normal conditions, fail non-deterministically (i.e., under ostensibly random weather). If we run the build again, that same test might pass, leaving us without a lot of information about what went wrong. Is something wrong in the codification? Is something incorrect in our build infrastructure? It leave us uncertain whether we might actually get a problem with that functionality in production, too -- -if it & # x27; s failing 1 out of every 1,000 times in the build, is it affecting 0.1 % of our customers? On a recent Flakey Friday (a Friday dedicated to tag down and eleminating flakiness from tests), we caught a test represent strangely, and failing one out of every ten or so runs. The trial look like this:

def test_can_publish_and_back (self): self.login (self.user) self.open_job (self.test_job [& # x27; _id & # x27;]) self.find_element_by_link_text (& quot; make public & quot;) .click () self._check_public () self.find_element_by_link_text (& quot; make private & quot;) .click () self._check_private ()

This is one trial for our job*point page. ThesetUpuse for this exam class handles creating a new random user and a new random job. The test log the exploiter in and depart to the page for this job. It then clicks a link designed to make the job & quot; public & quot; (i.e., viewable by anyone on the web), checks both the database and the site to make certain the AJAX-powered toggle did its trick, and ultimately makes sure we can toggle the job back to & quot; private & quot; in the same way. This is a straightforward tryout and is built employ Selenium exam good recitation (creating a fresh random user object, a fresh random job, and usingspin assertsto avoid race conditions), but every so often it would fail because Selenium would check that the tie text changed after click -- -something that, in these cases, didn & # x27; t happen. Likewise, the job was not tag with the appropriate status in the database. How do you name and fix a problem that only hap on average 10 % of the time in the physique? Well, the inaugural thing we tried was reproducing the behavior manually. Unfortunately, no matter how many time we performed the test activeness ourselves in a browser, we could never discover the failure. Since we couldn & # x27; t reproduce the bug, all we had be respective hypotheses about website load in our test environment, or javascript issues that prevent the AJAX outcry from taking place. But we were essentially looking at a long, hard route of shot. At that point we decided to make use ofSauce Breakpointsto try and catch the bug in the wild. I & # x27; ve compose previously about how you can use Breakpoints todebug javascript errors in trial you are indite. This particular technique wouldn & # x27; t receive helped us hither, because we couldn & # x27; t reliably reproduce the failure. What we needed was a way to run so many instance of this tryout that we were likely to observe a failure, and then to participate Breakpoint mode on just the tests that failed. The first stride was occupy aid of in a rather brute-force way: I only created 14 new version of the same examination, like so:

Pro tip: Tools like SUSA can handle this autonomously — upload your app and get results without writing a single test script.

def test_can_publish_and_back2 (self): self.test_can_publish_and_back () ... def test_can_publish_and_back15 (self): self.test_can_publish_and_back ()

This way, I could run our custom-made version of the Nose tryout moon-curser and have it pick up all and only the tests I was concerned in apply a wildcard match:nose test_can_publish_and_back *Then, I create use of a feature we have not yet publicized:programmatic Sauce Breakpoints.This is reach by sending a particular Selenium command that the Sauce Cloud realize to entail that you want the job breakpointed. For both Selenium RC and WebDriver, the particular bidding issauce: break. For Selenium RC, this command is sent as thecontextparameter forsetContext. For Selenium WebDriver, it is pass as thescriptvalue of theexecutecommand. Luckily, the Python WebDriver API implements these bid, so all I had to do was cutsauce: breakinto our main test form & # x27; stearDown function:

def tearDown (self): if not self.passed: self.collect_web_traceback () if self.break_on_fail: self.driver.execute_script (& quot; sauce: fault & quot;) self.report_pass_fail () if self.stop_on_teardown: self.driver.quit ()

Essentially, ourtearDownlogic here says, & quot; If the test didn & # x27; t pass, get a traceback and breakpoint the test if I & # x27; ve setself.break_on_fail. Then, report the status to Sauce, and close the WebDriver session. & quot; With all of these qualifying in mitt, I was able to run the violate exam multiple times in parallel like so:nose -- processes=15 test_can_publish_and_back *Then, all I had to do was go to my Sauce Labs tests page and ticker to see which tests turned up as breakpointed. I could navigate to the detail page for a breakpointed test and use the dev tools in Chrome to analyse what was happening. In the case of this eccentric, I notice the job was that the AJAX asking was not successful -- -it was get a 401 reaction from our trial server. This meant that theCSRF protectionfor the AJAX POST was mess up somehow. After a lot of website backend debugging, we be able to determine that, under load, new CSRF tokens sometimes took longer to save to our persistent data store than it did for the site to reply with them to the asking, making the browser & # x27; s next (valid) request look invalid to the waiter, thence causing it to reply with a 401. Luckily, elevate our backend code and making our session save synchronic took care of the problem. The item I have shared about our particular flake are not crucial to the big floor hither. What is important is that we had a kind of flake that was nigh-impossible to pin down without a tool like Sauce Breakpoints. It allow me (within the space of one latitude Sauce test run) to observe the bug in its natural habitat and get into the dev creature of this problem session, where we were able to find the 1st hint on the trail which eventually led to squashing the number. We hope this strategy can also be useful to others who aren & # x27; t tolerant of inscrutable flakes in their build. Let us cognise if you can believe of any other testing practices which can be augmented by Breakpoints!Addendum: Selenium RCThe example code of programmatic Sauce Breakpoints above is for Selenium 2, a.k.a WebDriver. Breakpoints also act for Selenium 1 (a.k.a. Selenium RC) examination, but the codification is different. Here is ourtearDownfunction for Selenium 1 exam, which illustrates the use of theset_context function:

def tearDown (self): if not self.passed: self.collect_web_traceback () if self.break_on_fail: self.selenium.set_context (& quot; sauce: break & quot;) self.report_pass_fail () if self.stop_on_teardown: self.selenium.stop ()

* at Sauce, we call an case-by-case test run in our infrastructure by a customer a & quot; job & quot;