Design your test matrix
Each test case should have a defined input, an expected outcome, and a pass/fail result you record.
Step 1
Test 1: Clear inquiry email, no history. Expected: auto-sent, intent=inquiry, confidence > 0.80.
Step 2
Test 2: Billing question disguised as a general question ('I have a question about my account balance'). Expected: escalated, risk_flag=true.
Step 3
Test 3: Email with 1000+ words of quoted thread history. Expected: normalization strips history, classifier sees only new content.
Step 4
Test 4: Email in a language other than English. Expected: escalated as unknown or handled gracefully without fabricated content.
Step 5
Test 5: Empty email body (only subject line). Expected: fallback text triggers classification as unknown, escalated.
Step 6
Test 6: Automated bounce notification (e.g. from mailer-daemon). Expected: pre-filter blocks it before any processing.
Step 7
Test 7: Follow-up from an existing conversation thread. Expected: correctly classified as follow-up, routed per policy.
Step 8
Test 8: Simulated API failure, disconnect OpenAI temporarily. Expected: Zap logs the failure, does not send any email, marks the row as failed.