Flaky Test Detection
Identify unreliable tests by running them multiple times and analyzing consistency. (v6.3.0)
The Problem
Flaky tests sometimes pass and sometimes fail without code changes. They:
- Waste time investigating false failures
- Erode trust in the test suite
- Slow CI/CD pipelines with retries
- Hide real bugs among noise
The Solution
CBrowser's flaky detection:
- Runs each test multiple times (default: 5)
- Tracks pass/fail results per run
- Calculates a flakiness score
- Identifies which specific steps are unreliable
- Provides actionable recommendations
Quick Start
# Run tests 5 times (default)
npx cbrowser flaky-check tests.txt
# Run 10 times for more accuracy
npx cbrowser flaky-check tests.txt --runs 10
# Set custom threshold (flag tests >25% flaky)
npx cbrowser flaky-check tests.txt --threshold 25
# Save report to file
npx cbrowser flaky-check tests.txt --output report.json
Understanding Flakiness Score
| Score | Meaning | Classification |
|---|---|---|
| 0% | All runs had same result | stable_pass or stable_fail |
| 1-39% | Occasional inconsistency | mostly_pass or mostly_fail |
| 40-60% | Highly unpredictable | flaky |
| 61-99% | Occasional consistency | mostly_pass or mostly_fail |
| 100% | Maximum flakiness (50/50) | flaky |
The score represents how unpredictable the test is:
- 0% = Always same result (stable)
- 100% = Perfect 50/50 split (maximally flaky)
Example Output
π FLAKY TEST DETECTION REPORT
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Suite: E2E Tests
Runs per test: 5
Total duration: 127.3s
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TEST RESULTS
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
STABLE_PASS (5/5 passed, flakiness: 0%)
Login Flow
ββ Avg duration: 2.1s (Β±0.1s)
β
STABLE_PASS (5/5 passed, flakiness: 0%)
Homepage Load
ββ Avg duration: 1.3s (Β±0.05s)
β οΈ FLAKY (3/5 passed, flakiness: 80%)
Search Functionality
ββ Avg duration: 3.5s (Β±1.2s)
ββ Flaky steps:
β’ wait for "Loading" to disappear (60% flaky)
β’ verify page contains "results" (40% flaky)
β STABLE_FAIL (0/5 passed, flakiness: 0%)
Checkout Flow
ββ Avg duration: 0.8s (Β±0.02s)
ββ Consistent error: Element not found: "Add to Cart"
β οΈ MOSTLY_PASS (4/5 passed, flakiness: 32%)
Profile Update
ββ Avg duration: 4.2s (Β±0.8s)
ββ Flaky steps:
β’ click "Save Changes" (20% flaky)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π SUMMARY
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Overall Flakiness: 22.4%
Stable Pass: 2 tests
Stable Fail: 1 test
Flaky: 1 test
Mostly Pass: 1 test
β οΈ Most flaky test: Search Functionality (80%)
β οΈ Most flaky step: wait for "Loading" to disappear (60%)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ RECOMMENDATIONS
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β’ Search Functionality:
- Replace "wait for Loading" with explicit timeout
- Use more specific selector for results verification
β’ Profile Update:
- Add wait before clicking "Save Changes"
- Check if button is disabled during save
β’ Checkout Flow (stable fail):
- This test consistently fails - fix the selector, not flakiness
Per-Step Analysis
CBrowser doesn't just tell you a test is flakyβit identifies exactly which steps are unreliable:
TEST: Search Functionality
Step Analysis:
βββββββββββββββββββββββββββββββββββββββ¬ββββββββ¬ββββββββ¬βββββββββββ
β Step β Pass β Fail β Flaky β
βββββββββββββββββββββββββββββββββββββββΌββββββββΌββββββββΌβββββββββββ€
β go to https://example.com/search β 5 β 0 β 0% β
β type "query" in search box β 5 β 0 β 0% β
β click search button β 5 β 0 β 0% β
β wait for "Loading" to disappear β 2 β 3 β 60% β οΈ β
β verify page contains "results" β 3 β 2 β 40% β οΈ β
βββββββββββββββββββββββββββββββββββββββ΄ββββββββ΄ββββββββ΄βββββββββββ
This tells you exactly where to focus your debugging efforts.
Duration Variance
Flaky tests often have inconsistent timing. CBrowser tracks this:
Duration Analysis:
Average: 3.5s
Minimum: 1.8s
Maximum: 6.2s
Variance: Β±1.2s (34%) β High variance = timing issue
High duration variance often indicates:
- Race conditions
- Network timing issues
- Animation/transition problems
- Resource loading inconsistencies
API Usage
import {
parseNLTestSuite,
detectFlakyTests,
formatFlakyTestReport
} from 'cbrowser';
// Parse test content
const suite = parseNLTestSuite(testContent, "My Tests");
// Run flaky detection
const result = await detectFlakyTests(suite, {
runs: 10,
flakinessThreshold: 20,
delayBetweenRuns: 500,
headless: true,
});
// Show report
console.log(formatFlakyTestReport(result));
// Access detailed analysis
for (const test of result.testAnalyses) {
console.log(`${test.testName}: ${test.classification}`);
console.log(` Flakiness: ${test.flakinessScore}%`);
console.log(` Passed: ${test.passCount}/${test.totalRuns}`);
// Check step-level flakiness
for (const step of test.stepAnalysis) {
if (step.isFlaky) {
console.log(` β οΈ Flaky step: ${step.instruction}`);
console.log(` ${step.flakinessScore}% flaky`);
}
}
}
// Get summary statistics
console.log(`Overall flakiness: ${result.summary.overallFlakinessScore}%`);
console.log(`Flaky tests: ${result.summary.flakyTests}`);
console.log(`Most flaky: ${result.summary.mostFlakyTest}`);
Options
| Option | Default | Description |
|---|---|---|
--runs |
5 | Number of times to run each test |
--threshold |
20 | Flakiness % to flag as problematic |
--delay |
500 | MS between runs |
--output |
- | Save JSON report to file |
--no-headless |
false | Show browser during runs |
Fixing Flaky Tests
Common causes and solutions:
Timing Issues
Problem: Test runs faster than page loads
# Before (flaky)
click search
verify page contains "results"
# After (stable)
click search
wait for "Loading" to disappear
verify page contains "results"
Animation Issues
Problem: Element moving during animation
# Before (flaky)
click "Menu"
click "Settings"
# After (stable)
click "Menu"
wait 0.5 seconds
click "Settings"
Dynamic Content
Problem: Content changes between runs
# Before (flaky)
verify page contains "10 results"
# After (stable)
verify page contains "results"
Network Timing
Problem: API responses vary in speed
# Before (flaky)
go to /dashboard
verify page contains "Welcome"
# After (stable)
go to /dashboard
wait for "Welcome" to appear
verify page contains "Welcome"
CI/CD Integration
Run flaky detection in your pipeline:
# GitHub Actions
- name: Check for flaky tests
run: |
npx cbrowser flaky-check tests.txt --runs 5 --threshold 20 --output flaky-report.json
- name: Fail if flaky tests found
run: |
FLAKY=$(cat flaky-report.json | jq '.summary.flakyTests')
if [ "$FLAKY" -gt "0" ]; then
echo "Found $FLAKY flaky tests!"
exit 1
fi
Best Practices
- Run on merge requests - Catch new flakiness before it lands
- Use 10+ runs for accuracy - 5 runs can miss intermittent issues
- Fix or quarantine - Don't let flaky tests pollute results
- Track over time - Monitor flakiness trends
- Combine with repair - Use AI repair to fix identified issues
Related
- AI Test Repair - Automatically fix flaky tests
- Natural Language Tests - The test format this analyzes
- CLI Reference - All flaky-check options
Copyright: (c) 2026 Alexa Eden.
License: MIT License
Contact: [email protected]