MolKit logo
Tools

Step 5 of 10

Build safety guardrails

Add hard stops that prevent the system from auto-sending replies for high-risk or low-confidence classifications, ensuring human review happens before anything sensitive is sent.

Why this matters

Confidence scores and intent categories are probabilistic signals, they are right most of the time, not all of the time. Safety guardrails turn those probabilities into hard rules. When the system is uncertain or when the email touches a sensitive topic, the guardrails redirect to human review. This is what makes the system trustworthy enough to run unattended.

Build instructions

Add a Paths step with two branches

  1. Step 1

    Click '+' after the classification parse step. Search for 'Paths by Zapier'. This creates a branching structure where each path has its own conditions.

  2. Step 2

    Name Path A: 'Auto-send, safe and confident'. Name Path B: 'Escalate, blocked by policy'.

  3. Step 3

    In Path A conditions, set ALL of these to true: Risk Flag = 'false' AND Intent is not 'unknown' AND Confidence ≥ your threshold (e.g., 0.80).

  4. Step 4

    Path B runs for everything else, it has no conditions because it is the default when Path A does not match.

Log the block reason in Path B

  1. Step 1

    In Path B, before any escalation action, add a Formatter step that builds a block reason string.

  2. Step 2

    Use this logic (in a Code step): if (inputData.risk_flag === 'true') return { reason: 'Hard stop: ' + inputData.intent + ' intent triggers mandatory review.' }; if (parseFloat(inputData.confidence) < 0.80) return { reason: 'Low confidence: ' + inputData.confidence + ' below threshold of 0.80.' }; return { reason: 'Unknown block condition.' };

  3. Step 3

    Write this reason to a log column in your Google Sheet. Every blocked email should have a documented reason, this is what enables you to tune the system over time.

Test each guardrail condition

  1. Step 1

    Test the risk_flag block: send an email about billing. Confirm it routes to Path B even if the simulated confidence is 0.95.

  2. Step 2

    Test the confidence block: manually set the mock confidence to 0.60 in your Code step for one test run. Confirm it routes to Path B.

  3. Step 3

    Test the happy path: send a clear inquiry email and confirm it routes to Path A with a confidence above your threshold.

Common mistakes

  • Checking only risk_flag and ignoring confidence. A low-confidence inquiry classification should not auto-send, the model might be wrong about the category.
  • Putting the confidence check before the risk_flag check. Risk flags are absolute, they are not negotiable even at high confidence. Always check risk_flag first.

Pro tips

  • Log escalated emails in a dedicated tab so you can review them weekly and identify if any categories should have their thresholds adjusted.

Before you continue

All three block conditions route correctly: hard-stop intent → Path B, low confidence → Path B, safe+confident → Path A. No hard-stop email has ever continued to the drafting step in your tests.

Step result

The system blocks all high-risk and uncertain emails before any reply is generated, ensuring human review is the fallback for everything the system is not confident about.