MolKit logo
Tools

Step 3 of 10

Normalize incoming email content

Clean the raw email body so that the intent classification and reply drafting steps work on consistent, high-quality input.

Why this matters

Email bodies are messy. They contain quoted reply history ('On Monday, John wrote...'), signature blocks, promotional footers, HTML artifacts, and tracking pixels converted to text. If you send this raw content to the AI classifier, the previous thread hijacks the classification, the model classifies the quoted reply instead of the actual new message.

Build instructions

Use a Formatter step for basic cleanup

  1. Step 1

    Add a Formatter by Zapier step. Choose action 'Text' → 'Remove HTML Tags'. Pass the email body (Snippet or Body field from the trigger) as the input. This strips HTML formatting.

  2. Step 2

    Add another Formatter step: 'Text' → 'Trim Whitespace'. This removes leading/trailing spaces and reduces multiple blank lines.

  3. Step 3

    Add a third Formatter step: 'Text' → 'Truncate'. Set a max length of 1000 characters. This limits how much content goes to the AI, focusing it on the start of the message where the intent is usually stated.

Strip quoted reply history with a Code step

For more reliable normalization, use Zapier's Code step to extract only the newest message content.

  1. Step 1

    Add a 'Code by Zapier' step. Choose language 'JavaScript'.

  2. Step 2

    In the Input Data section, create a variable called 'body' and map it to the email body field.

  3. Step 3

    In the code field, paste this script: const lines = inputData.body.split('\n'); const cutoff = lines.findIndex(l => l.startsWith('On ') && l.includes('wrote:')); const clean = cutoff > 0 ? lines.slice(0, cutoff) : lines; return { normalized: clean.filter(l => !l.startsWith('>')).join('\n').trim() };

  4. Step 4

    This removes lines starting with '>' (quoted text) and cuts off at the 'On [date] [name] wrote:' pattern common in Gmail threads.

Add a fallback for empty content

  1. Step 1

    After normalization, add a Formatter step: 'Utilities' → 'Default Value'. Set the Input to the normalized text output. Set the Default Value to '[No message body detected. Classify as Unknown intent.]'.

  2. Step 2

    This ensures the classification step always receives a non-empty string. The fallback text tells the AI to classify the email as Unknown, which routes it to manual review, the safest outcome for empty content.

Common mistakes

  • Passing the raw email body to the classifier without stripping quoted history. The classifier will often classify the most recent quoted message instead of the new one, producing completely wrong intent assignments.
  • Using only the 'Snippet' field from the trigger instead of the full body. Snippets are truncated to ~200 characters and may cut off mid-sentence. Use the full Body field and truncate it yourself after normalization.

Pro tips

  • Log the normalized text to a Google Sheet column during testing. Compare the raw input to the normalized output for 10 real emails. This shows you exactly what the classifier is seeing.

Before you continue

Run the normalization steps with three test emails: a fresh email with no history, a reply with quoted thread, and an HTML-formatted email. Confirm the normalized output for each is clean plain text containing only the newest message content.

Step result

The classifier and drafting steps receive a clean, consistent input that represents only the current email's content.