Back to docs

Flaky Test Detection

Identify unreliable tests by running them multiple times and analyzing consistency. (v6.3.0)


The Problem

Flaky tests sometimes pass and sometimes fail without code changes. They:

  • Waste time investigating false failures
  • Erode trust in the test suite
  • Slow CI/CD pipelines with retries
  • Hide real bugs among noise

The Solution

CBrowser's flaky detection:

  1. Runs each test multiple times (default: 5)
  2. Tracks pass/fail results per run
  3. Calculates a flakiness score
  4. Identifies which specific steps are unreliable
  5. Provides actionable recommendations

Quick Start

# Run tests 5 times (default)
npx cbrowser flaky-check tests.txt

# Run 10 times for more accuracy
npx cbrowser flaky-check tests.txt --runs 10

# Set custom threshold (flag tests >25% flaky)
npx cbrowser flaky-check tests.txt --threshold 25

# Save report to file
npx cbrowser flaky-check tests.txt --output report.json

Understanding Flakiness Score

Score Meaning Classification
0% All runs had same result stable_pass or stable_fail
1-39% Occasional inconsistency mostly_pass or mostly_fail
40-60% Highly unpredictable flaky
61-99% Occasional consistency mostly_pass or mostly_fail
100% Maximum flakiness (50/50) flaky

The score represents how unpredictable the test is:

  • 0% = Always same result (stable)
  • 100% = Perfect 50/50 split (maximally flaky)

Example Output

πŸ” FLAKY TEST DETECTION REPORT
══════════════════════════════════════════════════════════════

πŸ“‹ Suite: E2E Tests
   Runs per test: 5
   Total duration: 127.3s

──────────────────────────────────────────────────────────────
TEST RESULTS
──────────────────────────────────────────────────────────────

βœ… STABLE_PASS (5/5 passed, flakiness: 0%)
   Login Flow
   └─ Avg duration: 2.1s (Β±0.1s)

βœ… STABLE_PASS (5/5 passed, flakiness: 0%)
   Homepage Load
   └─ Avg duration: 1.3s (Β±0.05s)

⚠️  FLAKY (3/5 passed, flakiness: 80%)
   Search Functionality
   └─ Avg duration: 3.5s (Β±1.2s)
   └─ Flaky steps:
      β€’ wait for "Loading" to disappear (60% flaky)
      β€’ verify page contains "results" (40% flaky)

❌ STABLE_FAIL (0/5 passed, flakiness: 0%)
   Checkout Flow
   └─ Avg duration: 0.8s (Β±0.02s)
   └─ Consistent error: Element not found: "Add to Cart"

⚠️  MOSTLY_PASS (4/5 passed, flakiness: 32%)
   Profile Update
   └─ Avg duration: 4.2s (Β±0.8s)
   └─ Flaky steps:
      β€’ click "Save Changes" (20% flaky)

══════════════════════════════════════════════════════════════
πŸ“Š SUMMARY
══════════════════════════════════════════════════════════════

βœ… Overall Flakiness: 22.4%

   Stable Pass:  2 tests
   Stable Fail:  1 test
   Flaky:        1 test
   Mostly Pass:  1 test

⚠️  Most flaky test: Search Functionality (80%)
⚠️  Most flaky step: wait for "Loading" to disappear (60%)

══════════════════════════════════════════════════════════════
πŸ’‘ RECOMMENDATIONS
══════════════════════════════════════════════════════════════

β€’ Search Functionality:
  - Replace "wait for Loading" with explicit timeout
  - Use more specific selector for results verification

β€’ Profile Update:
  - Add wait before clicking "Save Changes"
  - Check if button is disabled during save

β€’ Checkout Flow (stable fail):
  - This test consistently fails - fix the selector, not flakiness

Per-Step Analysis

CBrowser doesn't just tell you a test is flakyβ€”it identifies exactly which steps are unreliable:

TEST: Search Functionality
  Step Analysis:
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Step                                β”‚ Pass  β”‚ Fail  β”‚ Flaky    β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ go to https://example.com/search    β”‚ 5     β”‚ 0     β”‚ 0%       β”‚
  β”‚ type "query" in search box          β”‚ 5     β”‚ 0     β”‚ 0%       β”‚
  β”‚ click search button                 β”‚ 5     β”‚ 0     β”‚ 0%       β”‚
  β”‚ wait for "Loading" to disappear     β”‚ 2     β”‚ 3     β”‚ 60% ⚠️   β”‚
  β”‚ verify page contains "results"      β”‚ 3     β”‚ 2     β”‚ 40% ⚠️   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This tells you exactly where to focus your debugging efforts.


Duration Variance

Flaky tests often have inconsistent timing. CBrowser tracks this:

Duration Analysis:
  Average: 3.5s
  Minimum: 1.8s
  Maximum: 6.2s
  Variance: Β±1.2s (34%)  ← High variance = timing issue

High duration variance often indicates:

  • Race conditions
  • Network timing issues
  • Animation/transition problems
  • Resource loading inconsistencies

API Usage

import {
  parseNLTestSuite,
  detectFlakyTests,
  formatFlakyTestReport
} from 'cbrowser';

// Parse test content
const suite = parseNLTestSuite(testContent, "My Tests");

// Run flaky detection
const result = await detectFlakyTests(suite, {
  runs: 10,
  flakinessThreshold: 20,
  delayBetweenRuns: 500,
  headless: true,
});

// Show report
console.log(formatFlakyTestReport(result));

// Access detailed analysis
for (const test of result.testAnalyses) {
  console.log(`${test.testName}: ${test.classification}`);
  console.log(`  Flakiness: ${test.flakinessScore}%`);
  console.log(`  Passed: ${test.passCount}/${test.totalRuns}`);

  // Check step-level flakiness
  for (const step of test.stepAnalysis) {
    if (step.isFlaky) {
      console.log(`  ⚠️ Flaky step: ${step.instruction}`);
      console.log(`     ${step.flakinessScore}% flaky`);
    }
  }
}

// Get summary statistics
console.log(`Overall flakiness: ${result.summary.overallFlakinessScore}%`);
console.log(`Flaky tests: ${result.summary.flakyTests}`);
console.log(`Most flaky: ${result.summary.mostFlakyTest}`);

Options

Option Default Description
--runs 5 Number of times to run each test
--threshold 20 Flakiness % to flag as problematic
--delay 500 MS between runs
--output - Save JSON report to file
--no-headless false Show browser during runs

Fixing Flaky Tests

Common causes and solutions:

Timing Issues

Problem: Test runs faster than page loads

# Before (flaky)
click search
verify page contains "results"

# After (stable)
click search
wait for "Loading" to disappear
verify page contains "results"

Animation Issues

Problem: Element moving during animation

# Before (flaky)
click "Menu"
click "Settings"

# After (stable)
click "Menu"
wait 0.5 seconds
click "Settings"

Dynamic Content

Problem: Content changes between runs

# Before (flaky)
verify page contains "10 results"

# After (stable)
verify page contains "results"

Network Timing

Problem: API responses vary in speed

# Before (flaky)
go to /dashboard
verify page contains "Welcome"

# After (stable)
go to /dashboard
wait for "Welcome" to appear
verify page contains "Welcome"

CI/CD Integration

Run flaky detection in your pipeline:

# GitHub Actions
- name: Check for flaky tests
  run: |
    npx cbrowser flaky-check tests.txt --runs 5 --threshold 20 --output flaky-report.json

- name: Fail if flaky tests found
  run: |
    FLAKY=$(cat flaky-report.json | jq '.summary.flakyTests')
    if [ "$FLAKY" -gt "0" ]; then
      echo "Found $FLAKY flaky tests!"
      exit 1
    fi

Best Practices

  1. Run on merge requests - Catch new flakiness before it lands
  2. Use 10+ runs for accuracy - 5 runs can miss intermittent issues
  3. Fix or quarantine - Don't let flaky tests pollute results
  4. Track over time - Monitor flakiness trends
  5. Combine with repair - Use AI repair to fix identified issues

Related


Copyright: (c) 2026 Alexa Eden.

License: MIT License

Contact: [email protected]

From the Blog