Step 10 of 10

Run reliability drills and harden the flow

Stress test the workflow with edge cases and failure scenarios to find and fix weaknesses before the system runs on real emails.

Why this matters

A workflow that works 80% of the time is not a production workflow, it is a liability. The 20% of cases it mishandles will happen at the worst possible time: during high email volume, when the team is not watching, on the most sensitive messages. Reliability drills systematically surface those failure modes so you can fix them before they cause real problems.

Build instructions

Design your test matrix

Each test case should have a defined input, an expected outcome, and a pass/fail result you record.

Step 1
Test 1: Clear inquiry email, no history. Expected: auto-sent, intent=inquiry, confidence > 0.80.
Step 2
Test 2: Billing question disguised as a general question ('I have a question about my account balance'). Expected: escalated, risk_flag=true.
Step 3
Test 3: Email with 1000+ words of quoted thread history. Expected: normalization strips history, classifier sees only new content.
Step 4
Test 4: Email in a language other than English. Expected: escalated as unknown or handled gracefully without fabricated content.
Step 5
Test 5: Empty email body (only subject line). Expected: fallback text triggers classification as unknown, escalated.
Step 6
Test 6: Automated bounce notification (e.g. from mailer-daemon). Expected: pre-filter blocks it before any processing.
Step 7
Test 7: Follow-up from an existing conversation thread. Expected: correctly classified as follow-up, routed per policy.
Step 8
Test 8: Simulated API failure, disconnect OpenAI temporarily. Expected: Zap logs the failure, does not send any email, marks the row as failed.

Run each test case and record results

Step 1
For each test case, send the email through your trigger, wait for the Zap to run, and check the outcome in: the Run Log sheet, the escalation mailbox (if applicable), the auto-send mailbox (if applicable), and the Zap run history.
Step 2
Create a simple test results table: Test Case | Expected Outcome | Actual Outcome | Pass or Fail | Notes.
Step 3
Do not move on until you have a result for every test case. A skipped test is a risk you are choosing to accept.

Fix and re-test failures

Step 1
Rank failures by risk: failures that auto-sent when they should not have > failures that escalated when they could have auto-sent > failures that logged incorrectly.
Step 2
Fix the highest-risk failure first. Then re-run only that specific test case to confirm the fix. Do not re-run all tests after each fix, that takes too long. Run the full matrix once more after all fixes are applied.
Step 3
Document every fix in a 'Hardening Log' section of your Config sheet: what failed, why it failed, what you changed.

Common mistakes

Only running happy-path tests. The happy path almost always works. Edge cases are where systems fail. At least half your test cases should be edge cases or failure scenarios.
Fixing failures without re-testing. A fix that solves one test case can break another. Always re-run at least the affected test after each fix.

Pro tips

Run the full test matrix again two weeks after go-live using real emails that came through the system. Compare the real outcomes to your expected outcomes. The differences are your next improvement targets.

Before you continue

All eight test cases have a recorded result. At least six of eight pass. The two highest-risk failures (any case involving an auto-send that should have been blocked) are resolved and re-tested. Your Hardening Log documents every fix made.

Step result

You have a tested, documented, hardened workflow that you have verified handles edge cases safely. It is ready for production use.