Identify unreliable tests by running them multiple times and analyzing consistency. (v6.3.0)

The Problem

Flaky tests sometimes pass and sometimes fail without code changes. They:

Waste time investigating false failures
Erode trust in the test suite
Slow CI/CD pipelines with retries
Hide real bugs among noise

The Solution

CBrowser's flaky detection:

Runs each test multiple times (default: 5)
Tracks pass/fail results per run
Calculates a flakiness score
Identifies which specific steps are unreliable
Provides actionable recommendations

Quick Start

# Run tests 5 times (default)
npx cbrowser flaky-check tests.txt

# Run 10 times for more accuracy
npx cbrowser flaky-check tests.txt --runs 10

# Set custom threshold (flag tests >25% flaky)
npx cbrowser flaky-check tests.txt --threshold 25

# Save report to file
npx cbrowser flaky-check tests.txt --output report.json

Understanding Flakiness Score

Score	Meaning	Classification
0%	All runs had same result	`stable_pass` or `stable_fail`
1-39%	Occasional inconsistency	`mostly_pass` or `mostly_fail`
40-60%	Highly unpredictable	`flaky`
61-99%	Occasional consistency	`mostly_pass` or `mostly_fail`
100%	Maximum flakiness (50/50)	`flaky`

The score represents how unpredictable the test is:

0% = Always same result (stable)
100% = Perfect 50/50 split (maximally flaky)

Example Output

🔍 FLAKY TEST DETECTION REPORT
══════════════════════════════════════════════════════════════

📋 Suite: E2E Tests
   Runs per test: 5
   Total duration: 127.3s

──────────────────────────────────────────────────────────────
TEST RESULTS
──────────────────────────────────────────────────────────────

✅ STABLE_PASS (5/5 passed, flakiness: 0%)
   Login Flow
   └─ Avg duration: 2.1s (±0.1s)

✅ STABLE_PASS (5/5 passed, flakiness: 0%)
   Homepage Load
   └─ Avg duration: 1.3s (±0.05s)

⚠️  FLAKY (3/5 passed, flakiness: 80%)
   Search Functionality
   └─ Avg duration: 3.5s (±1.2s)
   └─ Flaky steps:
      • wait for "Loading" to disappear (60% flaky)
      • verify page contains "results" (40% flaky)

❌ STABLE_FAIL (0/5 passed, flakiness: 0%)
   Checkout Flow
   └─ Avg duration: 0.8s (±0.02s)
   └─ Consistent error: Element not found: "Add to Cart"

⚠️  MOSTLY_PASS (4/5 passed, flakiness: 32%)
   Profile Update
   └─ Avg duration: 4.2s (±0.8s)
   └─ Flaky steps:
      • click "Save Changes" (20% flaky)

══════════════════════════════════════════════════════════════
📊 SUMMARY
══════════════════════════════════════════════════════════════

✅ Overall Flakiness: 22.4%

   Stable Pass:  2 tests
   Stable Fail:  1 test
   Flaky:        1 test
   Mostly Pass:  1 test

⚠️  Most flaky test: Search Functionality (80%)
⚠️  Most flaky step: wait for "Loading" to disappear (60%)

══════════════════════════════════════════════════════════════
💡 RECOMMENDATIONS
══════════════════════════════════════════════════════════════

• Search Functionality:
  - Replace "wait for Loading" with explicit timeout
  - Use more specific selector for results verification

• Profile Update:
  - Add wait before clicking "Save Changes"
  - Check if button is disabled during save

• Checkout Flow (stable fail):
  - This test consistently fails - fix the selector, not flakiness

Per-Step Analysis

CBrowser doesn't just tell you a test is flaky—it identifies exactly which steps are unreliable:

TEST: Search Functionality
  Step Analysis:
  ┌─────────────────────────────────────┬───────┬───────┬──────────┐
  │ Step                                │ Pass  │ Fail  │ Flaky    │
  ├─────────────────────────────────────┼───────┼───────┼──────────┤
  │ go to https://example.com/search    │ 5     │ 0     │ 0%       │
  │ type "query" in search box          │ 5     │ 0     │ 0%       │
  │ click search button                 │ 5     │ 0     │ 0%       │
  │ wait for "Loading" to disappear     │ 2     │ 3     │ 60% ⚠️   │
  │ verify page contains "results"      │ 3     │ 2     │ 40% ⚠️   │
  └─────────────────────────────────────┴───────┴───────┴──────────┘

This tells you exactly where to focus your debugging efforts.

Duration Variance

Flaky tests often have inconsistent timing. CBrowser tracks this:

Duration Analysis:
  Average: 3.5s
  Minimum: 1.8s
  Maximum: 6.2s
  Variance: ±1.2s (34%)  ← High variance = timing issue

High duration variance often indicates:

Race conditions
Network timing issues
Animation/transition problems
Resource loading inconsistencies

API Usage

import {
  parseNLTestSuite,
  detectFlakyTests,
  formatFlakyTestReport
} from 'cbrowser';

// Parse test content
const suite = parseNLTestSuite(testContent, "My Tests");

// Run flaky detection
const result = await detectFlakyTests(suite, {
  runs: 10,
  flakinessThreshold: 20,
  delayBetweenRuns: 500,
  headless: true,
});

// Show report
console.log(formatFlakyTestReport(result));

// Access detailed analysis
for (const test of result.testAnalyses) {
  console.log(`${test.testName}: ${test.classification}`);
  console.log(`  Flakiness: ${test.flakinessScore}%`);
  console.log(`  Passed: ${test.passCount}/${test.totalRuns}`);

  // Check step-level flakiness
  for (const step of test.stepAnalysis) {
    if (step.isFlaky) {
      console.log(`  ⚠️ Flaky step: ${step.instruction}`);
      console.log(`     ${step.flakinessScore}% flaky`);
    }
  }
}

// Get summary statistics
console.log(`Overall flakiness: ${result.summary.overallFlakinessScore}%`);
console.log(`Flaky tests: ${result.summary.flakyTests}`);
console.log(`Most flaky: ${result.summary.mostFlakyTest}`);

Options

Option	Default	Description
`--runs`	5	Number of times to run each test
`--threshold`	20	Flakiness % to flag as problematic
`--delay`	500	MS between runs
`--output`	-	Save JSON report to file
`--no-headless`	false	Show browser during runs

Fixing Flaky Tests

Common causes and solutions:

Timing Issues

Problem: Test runs faster than page loads

# Before (flaky)
click search
verify page contains "results"

# After (stable)
click search
wait for "Loading" to disappear
verify page contains "results"

Animation Issues

Problem: Element moving during animation

# Before (flaky)
click "Menu"
click "Settings"

# After (stable)
click "Menu"
wait 0.5 seconds
click "Settings"

Dynamic Content

Problem: Content changes between runs

# Before (flaky)
verify page contains "10 results"

# After (stable)
verify page contains "results"

Network Timing

Problem: API responses vary in speed

# Before (flaky)
go to /dashboard
verify page contains "Welcome"

# After (stable)
go to /dashboard
wait for "Welcome" to appear
verify page contains "Welcome"

CI/CD Integration

Run flaky detection in your pipeline:

# GitHub Actions
- name: Check for flaky tests
  run: |
    npx cbrowser flaky-check tests.txt --runs 5 --threshold 20 --output flaky-report.json

- name: Fail if flaky tests found
  run: |
    FLAKY=$(cat flaky-report.json | jq '.summary.flakyTests')
    if [ "$FLAKY" -gt "0" ]; then
      echo "Found $FLAKY flaky tests!"
      exit 1
    fi

Best Practices

Run on merge requests - Catch new flakiness before it lands
Use 10+ runs for accuracy - 5 runs can miss intermittent issues
Fix or quarantine - Don't let flaky tests pollute results
Track over time - Monitor flakiness trends
Combine with repair - Use AI repair to fix identified issues

AI Test Repair - Automatically fix flaky tests
Natural Language Tests - The test format this analyzes
CLI Reference - All flaky-check options

Copyright: (c) 2026 Alexa Eden.

License: MIT License

Contact: [email protected]

Flaky Test Detection