Melek Somai | Building an Agent-First Email Assistant with Cloudflare Durable Objects

The source code is available meleksomai/os and you can try interacting with the AI Assistant by sending an email to hello@somai.me.

As I argued in my previous post, I believe the future of software lies in agent-first systems. During the holidays , I finally had time to explore what it would mean to build software around LLMs from first principles.¹ Around the same time, I found myself confronting a backlog of emails requiring the usual tedious task of triaging, prioritizing, responding, scheduling, etc.

This led me to build an AI assistant that operates autonomously over email on my behalf. This is a great example of an agentic application where the AI Agent operates autonomously on behalf of a user. Despite its persistent dysfunction, email remains the most resilient protocol for asynchronous human (and soon machine) communication. It is a well-defined protocol with clear inputs (incoming messages) and outputs (responses, scheduling actions), making it an ideal testbed. The assistant must navigate the complexities of human communication, context retention, and task execution, all while adhering to the constraints of email as a medium.

This would serve dual purposes: reducing my email burden while also exploring the design space of agent-first systems. It also felt like enough of risk since I wasn't relying on this system for anything mission-critical, yet it would be launched into the wild and exposed to the real world. I remain concerned about potential jailbreaking or hallucinations from the AI assistant. I will explore some of these challenges later in the post. But, seeing how the system would fail and what edge cases would emerge when deployed in the wild is part of this experiment after all.

Design Principles

Starting from first principles required defining clear requirements that would guide the architecture and implementation of this AI Assistant. My goal is to build an AI Assistant that can manage emails autonomously — providing context-aware responses, scheduling meetings, and handling routine inquiries without human intervention. For this to work, the AI Assistant must:

intercept and understand emails sent to a specific email inbox (e.g., hello@somai.me). The AI Assistant should be able to parse the email content, extract relevant information, and understand the context of the conversation.
maintain context per contact to adjust its behavior safely. The assistant remembers prior interactions with each individual contact to inform future responses. It also means that there is a unique memory/context for each contact and that there is no cross-contamination of context between different contacts. Technically, this means that the assistant must be able to maintain separate state/memory for each contact.
connect to tools and services to perform actions on my behalf, such as scheduling meetings, sending follow-up emails, or retrieving information from external sources. Model-Context-Protocol (MCP) are useful to connect LLMs with external tools and services.
learn from my personal feedback and improve over time. The assistant should adapt its responses based on my personal feedback, refining its understanding of preferences and communication styles. This requires a feedback loop where I can provide corrections or suggestions to the assistant, and it can incorporate that feedback into its future behavior.
be secure and private since email often contains sensitive information. The assistant must ensure that all data is handled securely and without exposing sensitive information to unauthorized parties.

Platform: Cloudflare Workers + Durable Objects

Cloudflare has emerged as a compelling alternative to traditional cloud providers, offering AWS-level primitives with a developer experience closer to Vercel's. Cloudflare has a solid serverless platform with Cloudflare Workers, Cloudflare Email Routing, Cloudflare KV, and most recently Cloudflare Durable Objects.²

Cloudflare Durable Objects

Durable Objects service is the significant approach to serverless computing since AWS Lambda. Rather than thinking about scale in terms of microservices and distributed systems, Durable Objects let you think about scale in terms of stateful instances. Each instance is a self-contained unit of compute with its own memory, storage, and lifecycle. The scaling unit is no longer a stateless function but a stateful object that can maintain its own context over time. This is well-suited for building agent-first systems where each agent can be a Durable Objects instance with its own state and behavior. Durable Objects add concepts such as scheduling that enables each instance to have its own lifecycle, web sockets for streaming and long running connections (ideal for agents), and single-threading which enormously simplifies how we think about concurrency and race conditions.³

“
Agents require infrastructure designed for continuity, not ephemerality.

In my experience, building agent-first systems on traditional serverless platforms is fraught with complexity. Serverless assumes ephemerality. Functions execute, return results, and disappear. State, if needed, lives elsewhere in a database or external storage. This works well for request-response architectures but is too complex when applying serverless paradigms to agentic computing. And this is only going to get worse as agents become more sophisticated and stateful.⁴

Architecture

The architecture is simple: an email is received by Cloudflare Email Routing, forwarded to a Cloudflare Worker that routes the email to a specific agent (Durable Objects) instance. The agent processes the email, runs LLMs, updates its state, and optionally sends a reply.

I will focus on the most interesting parts of the architecture. You can check out the source code on meleksomai/os.

Cloudflare Worker

The Cloudflare Worker is the entry point for handling incoming emails. It uses Agents SDK helper function routeAgentEmail to route emails to the appropriate agent instance.

cloudflare worker logic

import { routeAgentEmail } from "agents";
 
export default {
  async email(message: ForwardableEmailMessage, env: Env) {
    await routeAgentEmail(message, env, {
      // ... logic to resolve the correct agent instance ...
    });
  },
};

When an email is received, the routing logic should extract the sender's email address and use it to determine the correct agent instance. If an instance for that contact does not exist, a new one is created. However, email threads complicate things. If I have an ongoing conversation with someone (someone@example.com) and I reply to their email, I want the email to be routed to the same agent instance that is handling my conversation with that person, regardless of the sender. Email headers solve this problem. Email has a set of headers that can be used to identify the thread. The most relevant headers are Message-ID, In-Reply-To, and References.⁵ These headers are used by to group emails into threads and stack conversations.

Below is a simple function that extracts the root thread ID from the email headers using the References and In-Reply-To headers.

./resolver.ts

/**
 * Extract the root thread ID from email headers
 *
 * Priority:
 * 1. If has References, use the first message ID in the References chain
 * 2. If this is a reply (has In-Reply-To), use In-Reply-To as thread ID
 * 3. Otherwise, use the current Message-ID (new thread)
 */
export function extractThreadId(email: ForwardableEmailMessage): string | null {
  const messageId = email.headers.get("Message-ID");
  const inReplyTo = email.headers.get("In-Reply-To");
  const references = email.headers.get("References");
 
  if (references) {
    // References contains space-separated message IDs, oldest first
    const refList = references.split(/[\s,]+/).filter((r) => r.trim());
    if (refList.length > 0) {
      return refList[0] || null; // Return the root message ID
    }
  }
 
  if (inReplyTo) {
    return inReplyTo;
  }
 
  // This is a new thread, use the current Message-ID
  return messageId?.toLocaleLowerCase() || null;
}

Thread-based routing enables another powerful pattern: I can reply to the email thread from my personal email address to only my AI Assistant (hello@somai.me). That way, I can share a private conversation with my AI Assistant regarding the ongoing thread without exposing its content to the other participants. This is a very powerful pattern that allows me to have private conversations with my AI Assistant about ongoing email threads that can be used to adjust its behavior or provide additional context.

Cloudflare provides some email routing resolvers out of the box in its Agents SDK like createCatchAllEmailResolver / createAddressBasedEmailResolver / createHeaderBasedEmailResolver. However, none of them fit my use case. I needed a custom routing logic that could route emails based on the email thread and contact.

Custom Routing Resolver

Since we are able to extract the thread ID from the email headers, we can use that to maintain a mapping between thread IDs and agent instance IDs. I created a custom email resolver createThreadBasedEmailResolver that can maintain this mapping using a Cloudflare KV store. When a new email is received, the routing logic checks for the sender's email address and the thread identifiers in the email headers. If a match is found in the KV store, the email is routed to the corresponding agent instance. If no match is found, a new agent instance is created, and the thread identifier is stored in the KV store.

resolver.ts

import type { EmailResolver } from "agents";
 
/**
 * Thread-Based Email Resolver
 *
 * Routes emails based on conversation threads, ensuring all emails in a thread
 * route to the same Durable Objects instance regardless of sender.
 */
export function createThreadBasedEmailResolver<Env>(
  agentName: string,
  store: KVNamespace
): EmailResolver<Env> {
  return async (email: ForwardableEmailMessage, env: Env) => {
    // Determine the thread ID (use first message in thread)
    const state = await evaluateState(email, store);
    switch (state.type) {
      case "NEW_THREAD":
        // Map new thread ID to external person's email
        await store.put(state.threadId!, state.instanceId, {
          expirationTtl: 60 * 60 * 24 * 90, // 90 days
        });
        return { agentName, agentId: state.instanceId };
      case "EXISTING_THREAD":
        // Route to existing mapped instance
        return { agentName, agentId: state.instanceId };
      case "NO_THREAD":
        // No thread ID, route based on sender (could be owner or external)
        return { agentName, agentId: email.from.toLocaleLowerCase() };
      default:
        throw new Error("Unhandled email state");
    }
  };
}
 
// More logic here

Cloudflare KV Store

The KV store is used to maintain the mapping between thread IDs and agent instance IDs. The KV store is attached to the Worker that handles the email routing using the wrangler.jsonc configuration file.

wrangler.jsonc

{
  "$schema": "node_modules/wrangler/config-schema.json",
  "name": "emailbot",
  "compatibility_date": "2025-12-23",
  "compatibility_flags": ["nodejs_compat"],
  // ... other configurations ...
  "kv_namespaces": [
    {
      "binding": "EMAIL_LOOKUP_KV",
      "id": "your-kv-namespace-id"
    }
  ]
}

The only missing piece is updating the routing logic to use this custom resolver.

cloudflare worker with custom resolver

import { routeAgentEmail } from "agents";
import { HelloEmailAgent } from "./agent";
import { createThreadBasedEmailResolver } from "./resolvers";
 
export default {
  async email(message: ForwardableEmailMessage, env: Env) {
    await routeAgentEmail(message, env, {
      resolver: createThreadBasedEmailResolver(HelloEmailAgent.name, env.EMAIL_LOOKUP_KV),
    });
  }
};

AI Agent

Now that we have the routing logic in place, we can focus on the AI Agent itself. Cloudflare Agents SDK provides a base Agent class that we can extend to implement our logic. It is basically a wrapper around Durable Objects that provides useful abstractions for building agentic systems.

Memory

Each Durable Objects' instance has its own memory: a simple SQLite database. This architecture makes memory management straightforward. Each agent instance can store its own state in its own database. The memory schema is therefore simple since we do not have to worry about multi-tenancy or cross-contamination of context between different contacts.

For my use case, the memory schema is pretty straightforward. Each agent instance maintains a list of messages (emails) exchanged with the contact, a context string that summarizes the conversation, and a summary of the contact's preferences and behavior.

memory.ts

export type Memory = {
  lastUpdated: Date | null;
  messages: Message[];
  context: string;
  summary: string;
};

When initializing the agent instance, we set up the initial memory state. We also define methods to update the memory state as the agent processes incoming emails and generates responses that should be persisted.

agent.ts

import { Agent, type AgentEmail } from "agents";
import type { Memory } from "./types";
 
export class HelloEmailAgent extends Agent<Env, Memory> {
  initialState: Memory = {
    lastUpdated: null,
    messages: [],
    context: "",
    summary: "",
    contact: null,
  };
 
  /**
   * Apply state updates atomically
   * Single source of truth - only this method mutates state
   */
  private applyUpdates(updates: Partial<Memory> | undefined): void {
    if (!updates) return;
    this.setState({ ...this.state, ...updates });
    log.debug("[agent] state updated", { keys: Object.keys(updates) });
  }
 
  //....
}

Routing Logic

At the entry point, the agent receives an email and determines whether it is from me (the owner) or from an external contact. The agent then routes the email to the appropriate workflow: either the owner workflow or the external contact workflow. This separation isolates the logic for handling emails from me versus emails from others.

We could call this defensive programming for AI agents—and that's exactly what it is. It is less prone to jailbreaking and unintended behavior if we separate the logic for handling emails from the owner versus external contacts using a deterministic routing.

snippet ./agent.ts

import { Agent, type AgentEmail } from "agents";
import type { Memory } from "./types";
 
export class HelloEmailAgent extends Agent<Env, Memory> {
  initialState: Memory = {
    lastUpdated: null,
    messages: [],
    context: "",
    summary: "",
    contact: null,
  };
 
  /**
   * Main entry point for handling incoming emails
   */
  async _onEmail(email: AgentEmail): Promise<void> {
    const from = email.from.toLowerCase();
    const owner = this.env.EMAIL_ROUTING_DESTINATION.toLowerCase();
    const routing = this.env.EMAIL_ROUTING_ADDRESS.toLowerCase();
    const subject = email.headers.get("Subject") || "(no subject)";
 
    if (from === routing) {
      return;
    }
 
    const route = from === owner ? "owner" : "external";
 
    if (from === owner) {
      // Email from owner - Loop Agent
      await this.handleOwnerEmail(email);
    } else {
 
      // Email from external sender(s) - Workflow Agent
      await this.handleIncomingEmail(email);
    }
 
    // ...
  }
}

Workflow vs Loop Agents

To process emails, I use two distinct design patterns.

Loop Agents: The first is using a Loop Agent for handling emails from me (the owner). The Loop Agent pattern is useful for orchestrating dynamic workflows where the agent needs to iterate over a set of tasks until a certain condition is met. The Loop Agent has total freedom over the steps that it can execute from the list of tools it has access to and can decide when to stop based on the context and feedback it receives. The Loop Agent is therefore the most flexible and powerful pattern for building complex workflows. However, it is also less deterministic and more prone to unintended behavior if not carefully designed.
Workflow Agents: The second is using a more structured and step-by-step workflow. This is adapted from Anthropic's guide on building effective agents that emphasizes defining clear steps for the agent to follow. The Workflow Agent is more deterministic and easier to reason about since each step is explicitly defined. However, it is also less flexible and may not be able to handle dynamic workflows as effectively as the Loop Agent.

Those two patterns have the same interface and can be used interchangeably. The choice of which pattern to use depends on the specific use case and requirements of the workflow. Each Agent pattern implements the same AgentExecutor interface and can be invoked in the same way. This allows for flexibility in choosing the appropriate pattern for different scenarios.

./agent/workflows/agent.ts

import { Memory } from "../types";
 
/**
 * Result from any agent execution
 * Agents return what they did and what state updates they propose
 */
export interface AgentResult<T = unknown> {
  /** Output from the agent (can be text, structured data, etc.) */
  output: T;
  /** Proposed state updates (parent decides whether to apply) */
  stateUpdates?: Partial<Memory>;
}
 
/**
 * Tool result that may include state updates
 */
export interface ToolResult<T = unknown> {
  data: T;
  stateUpdates?: Partial<Memory>;
}
 
/**
 * AgentExecutor interface for consistent agent pattern
 * Agents are factories that return an executor with this interface
 */
export interface AgentExecutor<TInput, TOutput> {
  execute(input: TInput): Promise<AgentResult<TOutput>>;
}

Workflow Agents For Reviewing Incoming Emails

For incoming emails from external contacts, I used a Workflow Agent that follows a defined set of steps. Some of the steps are AI-powered (e.g., classify email, generate draft), while others are deterministic (e.g., send email, update context).

You should think of LLMs as stochastic functions that can be used to perform specific tasks (e.g., classification, generation) rather than as monolithic agents that try to do everything. This modular approach allows for better control over the agent's behavior and reduces the risk of unintended actions. Since its tool invocation has a deterministic interface (input, output), we can compose LLM-powered tools into larger workflows that have somehow predictable behavior.⁶

./agent/workflows/reply-contact-workflow.ts

import { getEmailTools } from "../tools";
import type { Memory } from "../types";
import { log } from "../utils/logger";
import type { AgentExecutor, AgentResult } from "./agent";
import { WorkflowAgent } from "./workflow-agent";
 
export interface ReplyAgentOutput {
  action: "replied" | "skipped";
  emailId?: string;
}
 
/**
 * Creates a reply contact workflow agent
 *
 * This workflow:
 * 1. Classifies the incoming email
 * 2. If action is "reply": generates a draft and sends it
 * 3. Returns the result with action taken
 *
 * @param env - Environment bindings
 * @param state - Current agent memory state
 */
export const createReplyContactAgent = (
  env: Env,
  state: Memory
): AgentExecutor<void, ReplyAgentOutput> =>
  new WorkflowAgent({
    tools: getEmailTools(env, state),
    run: async ({ executeTool }): Promise<AgentResult<ReplyAgentOutput>> => {
      // Step 1: Classify
      const classifyResult = await executeTool("classifyEmail", { state });
      const classification = classifyResult.data;
 
      // Step 2: Decide
      if (classification.action !== "reply") {
        log.info("[reply-workflow] decision", {
          action: classification.action,
          reason: "no reply needed",
        });
        return { output: { action: "skipped" } };
      }
 
      log.info("[reply-workflow] decision", { action: "reply" });
 
      // Step 3: Draft
      const draftResult = await executeTool("generateReplyDraft", { state });
      const draft = draftResult.data;
      const originalEmail = state.messages.at(-1);
 
      if (!originalEmail) {
        log.error("[reply-workflow] error", {
          error: "no message to reply to",
        });
        throw new Error("No message to reply to");
      }
 
      // Step 4: Send (addresses resolved from state/env automatically)
      const sendResult = await executeTool("sendEmail", {
        recipient: "contact",
        subject: originalEmail.subject,
        content: draft,
      });
 
      return {
        output: {
          action: "replied",
          emailId: sendResult.data.id ?? undefined,
        },
        stateUpdates: {
          lastUpdated: new Date().toISOString(),
        },
      };
    },
  });

LLM-Powered Loop Agents For Owner Emails

For emails from me (the owner), I used a Loop Agent that can handle more dynamic workflows. The Loop Agent can iterate over a set of tasks until a certain condition is met. This is useful for handling emails that may require multiple steps or iterations to resolve. Rather than defining a fixed workflow, the Loop Agent can decide what actions to take based on the context and feedback it receives. It uses an LLM to determine the next action to take, allowing for more flexibility and adaptability. You can think of this as a more free-form agent that can handle complex interactions.

The implementation of the Loop Agent uses Vercel AI SDK implementation. The Loop Agent defines a set of possible actions (tools) that it can invoke, and the LLM decides which action to take based on the current context.

The logic is therefore handled by writing the system prompt that guides the LLM on how to behave and what actions to take. You can think of it as using plain English to program the agent's behavior. The system prompt defines the agent's role, available actions, decision framework, and important guidelines to follow.

./agent/workflows/owner-loop-agent.ts

import { ToolLoopAgent } from "ai";
import type { Memory } from "../types";
import type { AgentExecutor, AgentResult } from "./agent";
 
/**
 * System prompt for the owner response agent
 */
const SYSTEM_PROMPT = `You are an AI assistant helping manage emails for Melek Somai. You are processing an email FROM the owner (Melek).
 
## Your Role
 
Analyze the owner's email and determine what actions to take. The owner may be:
1. **Replying to someone (CC'ing you)** - Learn from how they responded
2. **Sending you direct instructions** - Update your context/knowledge base
3. **Forwarding an email for you to handle** - Send a reply on their behalf
 
## Available Actions
 
- **updateContext**: Store important information for future reference
- **generateReplyDraft**: Draft an email response based on context (use this before sendEmail to compose thoughtful replies)
- **sendEmail**: Send an email. Choose recipient:
  - "contact" - reply to the external person
  - "owner" - notify Melek
  - "both" - reply to contact AND cc Melek
 
## Decision Framework
 
### If the owner CC'd you on a reply:
- Extract preferences or patterns from how they responded
- Update context if useful
- Do NOT send any emails
 
### If the owner sent you direct instructions:
- Update context with new preferences
 
### If the owner forwards an email to you:
- This is implicit delegation - the owner wants you to handle it
- Use generateReplyDraft to compose a thoughtful response
- Then use sendEmail with recipient "contact" to send it
- Unless it involves commitments, money, or sensitive matters
- Update context with any relevant information
 
### If the owner is asking for help with scheduling:
- Consider their known availability (from context)
- Use generateReplyDraft to compose a response with proposed times
- Then use sendEmail with recipient "contact" to send it
- Unless it involves firm commitments
 
## Important Guidelines
 
- Always update context when you learn something new about preferences
- Forwarded emails = delegation to act (send to contact)
- CC'd emails = observation only (learn, don't act)
- When in doubt about sensitive matters, don't send
`;
 
export interface OwnerAgentInput {
  prompt: string;
}
 
export interface OwnerAgentOutput {
  text: string;
}
 
/**
 * Creates an owner response agent that returns state updates in result
 *
 * @param env - Environment bindings
 * @param state - Current memory state (used for address resolution)
 */
export const createOwnerResponseAgent = async (
  env: Env,
  state: Memory
): Promise<AgentExecutor<OwnerAgentInput, OwnerAgentOutput>> => {
  log.debug("[owner-agent] creating", { contact: state.contact });
 
  const model = await retrieveModel(env);
 
  // Accumulator for state updates from tools
  const stateUpdates: Partial<Memory> = {};
 
  // Wrap tools to capture state updates
  const tools = wrapToolsWithStateCapture(
    {
      ...getContextTools(env, state),
      ...getEmailTools(env, state),
    },
    stateUpdates
  );
 
  const agent = new ToolLoopAgent({
    model,
    instructions: SYSTEM_PROMPT,
    tools,
  });
 
  return {
    execute: async (input): Promise<AgentResult<OwnerAgentOutput>> => {
      const { text } = await agent.generate({ prompt: input.prompt });
      return {
        output: { text },
        stateUpdates:
          Object.keys(stateUpdates).length > 0 ? stateUpdates : undefined,
      };
    },
  };
};

Tools

In both the Workflow and Loop Agents, tools are used to perform specific tasks. You can think of tools as functions that the agent can invoke to perform actions. They can be deterministic functions (e.g., send email, update context) or AI-powered functions (e.g., classify email, generate draft). Each tool has a defined input and output interface, allowing the agent to invoke them as needed. Vercel AI SDK provides a very powerful and easy framework to build tools that can be used by agents.

The AI Agent uses a set of tools to perform specific tasks. These tools include:

Email Classification Tool: Classifies incoming emails into categories (e.g., inquiry, complaint, follow-up) to determine the appropriate response strategy.
Email Drafting Tool: Generates draft responses based on the email content and context.
Context Update Tool: Updates the agent's memory based on new information from incoming emails
Email Sending Tool: Sends emails on behalf of the agent.

Cloudflare email routing currently does not support sending multiple emails directly from Durable Objects. Therefore, I used Resend as an external email sending service to send emails from the agent. This is a temporary workaround until Cloudflare adds support for sending emails directly from Durable Objects.

Email Classification Tool

I want to walk through one of the AI-powered tools in detail: the email classification tool. This tool is responsible for classifying incoming emails into categories to determine the appropriate response strategy.

Although the classification is handled by an LLM, the tool forces the output to conform to a strict schema using Zod validation. This ensures that the agent receives structured and predictable data that it can use to make decisions. LMs excel at generating structured data, and using Zod helps ensure that the output is valid and conforms to the expected schema at runtime. One trick I use is to include the Zod schema definition in the system prompt so that the LLM knows exactly what structure to follow.

There are three main parts to the email classification tool:

Input Schema (lines 6-10): Defines the expected input structure for the tool. In this case, it expects the current agent memory state, which includes messages and context.
Output Schema (lines 12-31): Defines the expected output structure for the tool. The output includes the classified intents, risk level, recommended action, whether approval is required, and comments.
Execution Logic (lines 58-65): The core logic of the tool that uses an LLM to classify the email based on the input state. It constructs a prompt that includes the email content, historical messages, and context, then invokes the LLM to generate the classification.

./agent/tools/classify-email-tool.ts

export const classifyEmailTool = (env: Env) =>
  tool({
    description:
      "Classify an email to determine its intent, risk level, and recommended action. Use this to triage incoming emails.",
    
    inputSchema: z.object({
      state: MemorySchema.describe(
        "The current agent memory state with messages and context"
      ),
    }),
    
    outputSchema: z.object({
      intents: z
        .array(
          z.enum([
            "scheduling",
            "information_request",
            "action_request",
            "introduction_networking",
            "sales_vendor",
            "fyi_notification",
            "sensitive_legal_financial",
            "unknown_ambiguous",
          ])
        )
        .min(1),
      risk: z.enum(["low", "medium", "high"]),
      action: z.enum(["reply", "forward", "ignore"]),
      requires_approval: z.boolean(),
      comments: z.string().min(1).max(500),
    }),
    execute: async ({ state }): Promise<ToolResult<EmailClassification>> => {
      const startTime = Date.now();
      try {
        const model = await retrieveModel(env);
 
        const message = state.messages[state.messages.length - 1];
        const contextMessages = state.messages
          .slice(0, -1)
          .slice(-10)
          .map((msg) => msg.raw)
          .join("\n\n---\n\n");
        const prompt = `Classify the following email:
 
        from:${message?.from}
        subject:${message?.subject}
        content:
        ${message?.raw}
 
        ----------------------
        Prior historical messages (last 10 messages sent prior to this email by the same sender):
        ${contextMessages}
 
        ----------------------
        Please keep in mind the context provided below that may help with classification:
        ${state.context}`;
 
        const { output } = await generateText({
          model,
          system: SYSTEM_PROMPT,
          output: Output.object({
            schema: EmailClassificationSchema,
          }),
          prompt: prompt,
        });
 
        return { data: output };
      } catch (err) {
        throw err;
      }
    },
  });
 
// ----------- System Prompt -----------
 
const SYSTEM_PROMPT = `For every incoming email, follow this process strictly and return ONLY valid JSON that conforms to EmailClassificationSchema.
 
---
 
## About Melek (Context)
 
Melek Somai is a physician-technologist and executive working at the intersection of healthcare, software engineering, and AI. His work focuses 
on building pragmatic, high-impact systems rather than speculative or hype-driven technology.
 
---
 
Step 1 — Classify intent
 
Select one or more intents from the allowed list (do not invent new labels):
 
- scheduling
- information_request
- action_request
- introduction_networking
- sales_vendor
- fyi_notification
- sensitive_legal_financial
- unknown_ambiguous
 
Guidance:
- Use introduction_networking for polite outreach, introductions, compliments, or relationship-building notes, even if there is no explicit request.
- Use unknown_ambiguous only when intent genuinely cannot be determined from the content.
- Do NOT use unknown_ambiguous for friendly introductions with clear social intent.
 
---
 
Step 2 — Assess risk and authority
 
Answer internally (do not include the Q&A in the final output):
 
- Is this reversible?
- Does this require the user's approval before committing or sending?
- Does this involve commitments, money, contracts, credentials, compliance, or legal exposure?
- Do I have sufficient context to respond accurately?
 
Then assign risk:
 
- low: routine, reversible, no commitments, no sensitive content, high confidence
- medium: some ambiguity or mild commitment risk; safe to draft but not auto-send
- high: legal/financial/sensitive, identity/security concerns, or material commitment risk
 
---
 
Step 3 — Choose exactly one action
 
- reply: safe to draft a response (including polite acknowledgements or relationship-building replies)
- forward: requires human review; do not draft a full reply
- ignore: no response needed (spam, automated FYI, or clearly non-actionable)
 
Important clarification:
- "reply" is appropriate even when there is no explicit question or request, as long as a polite, professional acknowledgement would be reasonable and risk is low.
- Do not escalate solely due to the absence of a direct ask.
 
Default action rule for introduction_networking:
- If the email is clearly an introduction or friendly note AND contains no sales pitch, no request for money or contracts, and no sensitive/legal/financial content:
  - risk = low
  - action = reply
  - requires_approval = false
- Choose forward only if the message includes commercial terms, access requests, sensitive topics, or reputational risk.
 
If uncertain between two risks or actions, choose the safer option (higher risk / forward), except for low-risk introduction_networking, which should default to reply.
 
---
 
Step 4 — Explain the decision
 
Provide a short comment that references:
- the intents selected
- the risk level
- the reason for the chosen action
 
---
 
Output rules:
 
- Return ONLY JSON
- Do not include markdown, prose, or explanations outside the schema
- Use concise, professional wording in comments
 
 
import { z } from "zod";
 
export const EmailIntentSchema = z.enum([
  "scheduling",
  "information_request",
  "action_request",
  "introduction_networking",
  "sales_vendor",
  "fyi_notification",
  "sensitive_legal_financial",
  "unknown_ambiguous",
]);
 
export const EmailClassificationSchema = z.object({
  intents: z.array(EmailIntentSchema).min(1),
 
  risk: z.enum(["low", "medium", "high"]),
 
  action: z.enum(["reply", "forward", "ignore"]),
 
  // Whether the assistant should wait for explicit user approval before sending or committing.
  requires_approval: z.boolean(),
 
  // A short explanation for logs and UI surfaces.
  comments: z.string().min(1).max(500),
});
 
`;

Summary

In this post, I walked through the architecture and implementation of my personal AI Assistant using Cloudflare Workers and Durable Objects. The key components include custom email routing logic based on email threads, AI-powered memory management using SQLite, and distinct agent patterns (Workflow and Loop Agents) for handling different types of emails. The use of tools, both deterministic and AI-powered, allows for modular and composable agent behavior.

We are still in the early days of AI Agents, and there is much to explore and improve. I think the combination of Cloudflare's serverless platform and the Agents SDK provides a powerful foundation for building scalable and efficient AI-powered applications. However, it still feels a bit confusing and the frameworks are evolving rapidly. I started this project as an experiment, but it took several iterations to get to a prototype that is architecturally sound and functionally useful.

In terms of safety and security, I took several precautions to ensure that the agent behaves responsibly. The separation of owner and external contact workflows helps prevent unintended actions. The use of strict schemas for tool outputs ensures that the agent receives structured and predictable data. Additionally, the routing logic based on email threads helps maintain context and continuity in conversations.

Send me an email at hello@somai.me and you will be routed to the agent. It is early, but it already works.

By building "from first principles," I mean starting not from existing software patterns or human workflows, but from the fundamental capabilities and constraints of LLMs themselves. Most AI products today attempt to fit language models into interfaces designed for humans—dashboards, buttons, forms. The first-principles approach inverts this: what would software look like if we designed it around what an agent can naturally do? In the case of email, this meant asking: given that LLMs are stateless, context-dependent, and excel at language understanding, what architecture would allow an agent to operate autonomously within a protocol that was never designed for machines? The answer required rethinking memory, state, and interaction patterns from the ground up—not retrofitting AI into an inbox UI. ↩
Vercel vs AWS vs Cloudflare: Vercel is an amazing platform. It has historically been great for web apps. The recent fluid compute, AI SDK, and Workflow are very powerful and have tons of community support. However, Vercel is tightly coupled to Next.js and the web framework lifecycle. For this project, it felt unnatural for an agent-first system. I really like the craft the team at Vercel put into building all the products and services, but I did not want to build on top of a web app. AWS is a powerful hyper-scaler. It has become, though, an onerous platform to start with. I certainly do not want to set up VPCs, configure Bedrock, manage IAM, connect my CI with Cloudformation and orchestrate a mountain of infrastructure pieces just to get a prototype running. Moreover, their foray into AI with AWS Bedrock and its sub-services like AgentCore seems not to resonate with the new community of developers. For instance, I find the development experience on AWS Lambda to be quite challenging compared to Vercel. AWS's new AI services do not have the same "care" and dynamism that AWS used to put into building things like DynamoDB or S3 back in the day. ↩
I am excited but also skeptical. I am curious to see how Cloudflare will execute on this vision over time. There are still many open questions around Durable Objects, and I have yet to see successful businesses and ideas built on top. ↩
We see already the emergence of new infrastructure constructs and primitives such as sandboxed execution environments that allow agents to invoke tools safely, dedicated agent VMs that maintain state between invocations, RAG systems that ground LLM reasoning in retrieved knowledge, and orchestration layers (LangGraph, Temporal, AWS Step Functions) that coordinate multi-step agentic workflows. All these point to a future where agents require continuity rather than ephemerality. Traditional serverless platforms are not designed for this paradigm shift. ↩
For more information about email threading and headers, see RFC 5322 bis 12. It is important to note that we are assuming that In-Reply-To is always a single parent and hence we can walk backwards through the References field to find the parent of each message listed there. Therefore, this is not compatible when a reply has multiple parents (which is discouraged in the RFC). https://datatracker.ietf.org/doc/html/draft-ietf-emailcore-rfc5322bis-12#name-identification-fields ↩
This is similar to the concept of tool use in agentic systems where LLMs are used as components that can be orchestrated by higher-level logic. By breaking down the agent's behavior into discrete steps, we can better manage complexity and ensure that the agent behaves as intended. This might seem confusing and it is. But as we build more sophisticated agentic systems, we will need to adopt such modular and composable architectures to manage the complexity of agent behavior. ↩

Building an Agent-First Email Assistant with Cloudflare Durable Objects

Exploring agentic architectures using Cloudflare Workers and Durable Objects