Learn how to deploy orchestrated AI agent teams to Cloudflare Workers for low-latency, always-on customer-facing workflows. Complete technical guide.
When you're running production AI agents, latency kills user experience. A customer-facing workflow that waits 500ms for a response to travel to a distant data center feels broken. Edge deployment changes the equation entirely.
Cloudflare Workers give you compute infrastructure distributed across 300+ cities globally. Your agents run closer to your users, cutting response times from hundreds of milliseconds to tens. This matters especially for always-on AI agents handling real-time customer interactions, background automation, or time-sensitive data processing.
But edge deployment isn't just about speed. It's about building headless companies-organizations where autonomous agent teams handle operations without human intervention. When you combine edge runtimes with proper orchestration, you get the foundation for scaling without adding headcount. That's the core promise: deploy agent teams that work 24/7, with zero infrastructure overhead, and pay only for what you use.
This tutorial walks you through deploying orchestrated agent teams to Cloudflare Workers. We'll cover the architecture, the practical setup, integration patterns, and how to monitor what's actually happening in production. By the end, you'll understand how to run customer-facing agent workflows at the edge.
Edge computing means running code on servers geographically closer to your users rather than in centralized data centers. Cloudflare Workers is one of the most accessible edge platforms available. It's a serverless runtime that executes your code on Cloudflare's global network within milliseconds of user requests.
For AI agents, this topology creates specific advantages and constraints. Traditional agent architectures-like those you might build with frameworks such as CrewAI or LangGraph-often assume a persistent server somewhere. They maintain state, manage long-running tasks, and expect reliable network connectivity back to a central location.
Edge runtimes flip this model. Cloudflare Workers runs your code in stateless functions. Each invocation is independent. There's no persistent memory between requests. This forces you to think differently about how agents maintain context, coordinate across multiple tasks, and persist state.
The solution is proper orchestration. Instead of a single agent running on a server, you deploy agent teams where each agent is a discrete, stateless function. A coordinator (often itself an agent) routes work to the right agent, aggregates results, and manages the overall workflow. This is where platforms like Padiso shine-they handle the orchestration layer so you don't have to rebuild it.
When you use an orchestration platform alongside edge runtimes, you get:
This architecture is particularly powerful for founders and operators building lean, agent-operated companies where automation replaces headcount.
Before you deploy agents, you need a working Cloudflare Workers setup. This means installing the right tools, understanding the project structure, and knowing how to test locally.
Installation and Project Setup
Start by installing Wrangler, Cloudflare's command-line tool for building and deploying Workers:
npm install -g wranglerOnce installed, create a new Workers project:
wrangler init my-agent-worker
cd my-agent-workerWrangler scaffolds a basic project structure. You'll see a src/index.ts file (or .js if you prefer JavaScript), a wrangler.toml configuration file, and a package.json for dependencies.
The wrangler.toml file is your project's configuration. It defines your Worker's name, the account where it deploys, environment variables, and bindings to other Cloudflare services. Here's a minimal example:
name = "agent-orchestration-worker"
main = "src/index.ts"
compatibility_date = "2024-01-01"
[env.production]
name = "agent-orchestration-worker-prod"
route = "https://agents.example.com/*"The compatibility_date tells Cloudflare which version of the Workers runtime to use. Always use a recent date to access current features.
Understanding the Worker Lifecycle
A Cloudflare Worker is a request handler. When an HTTP request hits your Worker URL, Cloudflare invokes your code. Your code processes the request and returns a response. The entire lifecycle-from request to response-must complete within 30 seconds (or 50 seconds for Workers Unbound, which costs more).
This constraint matters for agents. Long-running tasks must either complete within the timeout or be offloaded to background processing. For customer-facing workflows, you typically want fast responses anyway, so this aligns with good design.
Here's a minimal Worker that responds to requests:
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const url = new URL(request.url);
if (url.pathname === '/agent/query') {
return handleAgentQuery(request, env);
}
return new Response('Not Found', { status: 404 });
},
};
async function handleAgentQuery(request: Request, env: Env): Promise<Response> {
const body = await request.json();
const query = body.query;
// Agent logic goes here
const result = await processWithAgent(query, env);
return new Response(JSON.stringify(result), {
headers: { 'Content-Type': 'application/json' },
});
}Test locally with wrangler dev, which starts a local server mimicking Cloudflare's environment. This is critical-testing against production immediately is slow and expensive.
Now that your environment is set up, you need to structure how agents actually run. Edge agents differ from traditional server-based agents because they're stateless and must complete quickly.
Agent as a Function
Think of each agent as a pure function: it takes input, performs reasoning or computation, and returns output. No side effects, no persistent state within the agent itself. State lives outside the agent-in a database, cache, or external service.
Here's a simple example of an agent handler:
interface AgentRequest {
query: string;
context?: Record<string, unknown>;
agentId: string;
}
interface AgentResponse {
result: string;
reasoning: string;
executionTime: number;
}
async function runAgent(
request: AgentRequest,
env: Env
): Promise<AgentResponse> {
const startTime = Date.now();
// Fetch agent configuration from KV (Cloudflare's key-value store)
const agentConfig = await env.AGENT_KV.get(`agent:${request.agentId}`);
if (!agentConfig) {
throw new Error(`Agent ${request.agentId} not found`);
}
const config = JSON.parse(agentConfig);
// Call the AI model (e.g., OpenAI, Claude via API)
const aiResponse = await callAIModel({
query: request.query,
systemPrompt: config.systemPrompt,
context: request.context,
}, env);
const executionTime = Date.now() - startTime;
return {
result: aiResponse.content,
reasoning: aiResponse.reasoning || '',
executionTime,
};
}This agent accepts a query, retrieves its configuration from Cloudflare KV (a distributed key-value store), calls an AI model, and returns the result. The entire operation is stateless-the agent doesn't maintain memory between invocations.
Integrating with AI Model APIs
Most agents need to call an LLM (Large Language Model). You can call external APIs (OpenAI, Anthropic, etc.) directly from your Worker:
async function callAIModel(
input: { query: string; systemPrompt: string; context?: unknown },
env: Env
): Promise<{ content: string; reasoning: string }> {
const apiKey = env.OPENAI_API_KEY;
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`,
},
body: JSON.stringify({
model: 'gpt-4',
messages: [
{ role: 'system', content: input.systemPrompt },
{ role: 'user', content: input.query },
],
temperature: 0.7,
}),
});
if (!response.ok) {
throw new Error(`AI API error: ${response.statusText}`);
}
const data = await response.json();
const content = data.choices[0].message.content;
return {
content,
reasoning: '', // Some models return reasoning separately
};
}Store your API keys in Cloudflare's environment variables (configured in wrangler.toml or via the dashboard). Never hardcode secrets.
Handling Multiple Agents and Routing
Most real workflows involve multiple agents, each specialized for different tasks. You need a router that directs work to the right agent:
async function routeToAgent(
query: string,
context: Record<string, unknown>,
env: Env
): Promise<AgentResponse> {
// Classify the query to determine which agent should handle it
const classification = await classifyQuery(query, env);
let agentId: string;
switch (classification.type) {
case 'customer-support':
agentId = 'support-agent';
break;
case 'data-analysis':
agentId = 'analytics-agent';
break;
case 'content-generation':
agentId = 'writer-agent';
break;
default:
agentId = 'general-agent';
}
const agentRequest: AgentRequest = {
query,
context,
agentId,
};
return runAgent(agentRequest, env);
}
async function classifyQuery(
query: string,
env: Env
): Promise<{ type: string; confidence: number }> {
// Simple classification using a lightweight model or keyword matching
// In production, you might use a dedicated classification service
if (query.includes('help') || query.includes('support')) {
return { type: 'customer-support', confidence: 0.9 };
}
if (query.includes('analyze') || query.includes('data')) {
return { type: 'data-analysis', confidence: 0.85 };
}
if (query.includes('write') || query.includes('generate')) {
return { type: 'content-generation', confidence: 0.8 };
}
return { type: 'general', confidence: 0.5 };
}This router classifies incoming queries and dispatches them to specialized agents. Each agent is optimized for its domain, improving both speed and quality.
Edge agents are stateless by design, but they often need to maintain context across multiple interactions. This requires external state storage.
Using Cloudflare KV for State
Cloudflare KV is a globally distributed key-value store. It's perfect for caching agent configurations, conversation histories, and user context:
interface ConversationState {
userId: string;
messages: Array<{ role: string; content: string }>;
metadata: Record<string, unknown>;
lastUpdated: number;
}
async function loadConversationState(
userId: string,
env: Env
): Promise<ConversationState | null> {
const key = `conversation:${userId}`;
const stored = await env.AGENT_KV.get(key);
if (!stored) {
return null;
}
return JSON.parse(stored);
}
async function saveConversationState(
state: ConversationState,
env: Env
): Promise<void> {
const key = `conversation:${state.userId}`;
const ttl = 86400 * 7; // 7 days
await env.AGENT_KV.put(key, JSON.stringify(state), {
expirationTtl: ttl,
});
}
async function handleAgentWithMemory(
userId: string,
query: string,
env: Env
): Promise<AgentResponse> {
// Load existing conversation
let state = await loadConversationState(userId, env);
if (!state) {
state = {
userId,
messages: [],
metadata: {},
lastUpdated: Date.now(),
};
}
// Add user message to history
state.messages.push({ role: 'user', content: query });
// Call agent with full conversation history
const response = await runAgent(
{
query,
context: { messages: state.messages, metadata: state.metadata },
agentId: 'conversation-agent',
},
env
);
// Add agent response to history
state.messages.push({ role: 'assistant', content: response.result });
// Save updated state
await saveConversationState(state, env);
return response;
}KV has global replication, so reads are fast everywhere. Writes replicate eventually (typically within seconds), which is fine for most agent workflows.
Using D1 for Structured Data
For more complex data-user profiles, transaction logs, agent execution records-use Cloudflare D1, a SQLite database at the edge:
async function logAgentExecution(
agentId: string,
query: string,
result: string,
executionTime: number,
env: Env
): Promise<void> {
const db = env.DB; // D1 binding
await db.prepare(`
INSERT INTO agent_executions (agent_id, query, result, execution_time, created_at)
VALUES (?, ?, ?, ?, ?)
`).bind(agentId, query, result, executionTime, new Date().toISOString()).run();
}
async function getAgentMetrics(
agentId: string,
env: Env
): Promise<{ avgExecutionTime: number; totalExecutions: number }> {
const db = env.DB;
const result = await db.prepare(`
SELECT
COUNT(*) as total_executions,
AVG(execution_time) as avg_execution_time
FROM agent_executions
WHERE agent_id = ? AND created_at > datetime('now', '-24 hours')
`).bind(agentId).first();
return {
avgExecutionTime: result?.avg_execution_time || 0,
totalExecutions: result?.total_executions || 0,
};
}D1 is ideal for analytics, audit trails, and structured data that agents need to query or update.
Single agents are useful, but real business value comes from coordinated agent teams. One agent might gather data, another analyzes it, a third formats results for presentation. Orchestration ties these together.
Workflow Patterns
Common patterns include:
Here's a sequential workflow:
interface WorkflowStep {
agentId: string;
input: unknown;
retries?: number;
}
interface WorkflowExecution {
steps: WorkflowStep[];
initialInput: unknown;
}
async function executeWorkflow(
workflow: WorkflowExecution,
env: Env
): Promise<unknown> {
let currentOutput = workflow.initialInput;
for (const step of workflow.steps) {
const retries = step.retries || 3;
let lastError: Error | null = null;
for (let attempt = 0; attempt < retries; attempt++) {
try {
const response = await runAgent(
{
query: JSON.stringify(currentOutput),
agentId: step.agentId,
},
env
);
currentOutput = response.result;
break; // Success, move to next step
} catch (error) {
lastError = error as Error;
if (attempt < retries - 1) {
// Wait before retrying
await new Promise(resolve => setTimeout(resolve, 1000 * (attempt + 1)));
}
}
}
if (lastError) {
throw new Error(`Workflow failed at step ${step.agentId}: ${lastError.message}`);
}
}
return currentOutput;
}For parallel workflows, use Promise.all():
async function executeParallelAgents(
agentIds: string[],
input: unknown,
env: Env
): Promise<Record<string, unknown>> {
const promises = agentIds.map(agentId =>
runAgent(
{ query: JSON.stringify(input), agentId },
env
).then(response => ({ [agentId]: response.result }))
);
const results = await Promise.all(promises);
return Object.assign({}, ...results);
}For more complex orchestration-conditional routing, dynamic workflows, multi-step feedback loops-consider using a dedicated orchestration platform. Padiso's agent orchestration platform handles exactly this: it manages multi-agent workflows, state transitions, error handling, and monitoring without requiring you to rebuild orchestration logic.
Agents rarely operate in isolation. They need to fetch data, trigger actions, and integrate with your existing systems.
API Integration Patterns
Cloudflare Workers can call any external API. Here's a pattern for safely integrating third-party services:
interface IntegrationConfig {
baseUrl: string;
apiKey: string;
timeout: number;
}
async function callIntegration(
config: IntegrationConfig,
endpoint: string,
method: string,
body?: unknown
): Promise<unknown> {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), config.timeout);
try {
const response = await fetch(`${config.baseUrl}${endpoint}`, {
method,
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${config.apiKey}`,
},
body: body ? JSON.stringify(body) : undefined,
signal: controller.signal,
});
if (!response.ok) {
throw new Error(`Integration error: ${response.status}`);
}
return response.json();
} finally {
clearTimeout(timeoutId);
}
}
// Example: Agent that fetches customer data
async function customerDataAgent(
customerId: string,
env: Env
): Promise<AgentResponse> {
const crmConfig: IntegrationConfig = {
baseUrl: env.CRM_BASE_URL,
apiKey: env.CRM_API_KEY,
timeout: 5000,
};
const customerData = await callIntegration(
crmConfig,
`/customers/${customerId}`,
'GET'
);
return {
result: JSON.stringify(customerData),
reasoning: 'Fetched customer data from CRM',
executionTime: 0,
};
}Webhook Triggers and Callbacks
Agents can trigger external actions via webhooks. This enables workflows like "Agent analyzes data, then triggers a Slack notification":
async function notifySlack(
channel: string,
message: string,
env: Env
): Promise<void> {
await fetch(env.SLACK_WEBHOOK_URL, {
method: 'POST',
body: JSON.stringify({
channel,
text: message,
}),
});
}
async function agentWithNotification(
query: string,
env: Env
): Promise<AgentResponse> {
const response = await runAgent(
{ query, agentId: 'analysis-agent' },
env
);
// Notify team of important results
if (response.result.includes('CRITICAL')) {
await notifySlack('#alerts', `Critical finding: ${response.result}`);
}
return response;
}When agents run in production, you need visibility into what's actually happening. Edge deployments make this trickier because you don't have a central server to log to.
Structured Logging
Cloudflare Workers integrates with Cloudflare's Logpush service, which sends logs to your preferred analytics platform. But even without that, you can log to external services:
interface ExecutionLog {
timestamp: string;
agentId: string;
query: string;
result: string;
executionTime: number;
status: 'success' | 'error';
error?: string;
}
async function logExecution(
log: ExecutionLog,
env: Env
): Promise<void> {
// Send to external logging service
await fetch(env.LOGGING_ENDPOINT, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(log),
});
// Also store in D1 for local querying
const db = env.DB;
await db.prepare(`
INSERT INTO execution_logs (agent_id, query, result, execution_time, status, error, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
`).bind(
log.agentId,
log.query,
log.result,
log.executionTime,
log.status,
log.error || null,
log.timestamp
).run();
}Metrics and Performance Tracking
Track key metrics to understand agent performance:
interface AgentMetrics {
agentId: string;
successRate: number;
avgExecutionTime: number;
errorCount: number;
lastExecuted: string;
}
async function recordMetric(
agentId: string,
executionTime: number,
success: boolean,
env: Env
): Promise<void> {
const key = `metrics:${agentId}:${new Date().toISOString().split('T')[0]}`;
let metrics = await env.AGENT_KV.get(key);
let data = metrics ? JSON.parse(metrics) : {
totalExecutions: 0,
successCount: 0,
totalTime: 0,
};
data.totalExecutions += 1;
if (success) data.successCount += 1;
data.totalTime += executionTime;
await env.AGENT_KV.put(key, JSON.stringify(data));
}
async function getMetrics(
agentId: string,
env: Env
): Promise<AgentMetrics> {
const today = new Date().toISOString().split('T')[0];
const key = `metrics:${agentId}:${today}`;
const stored = await env.AGENT_KV.get(key);
const data = stored ? JSON.parse(stored) : {
totalExecutions: 0,
successCount: 0,
totalTime: 0,
};
return {
agentId,
successRate: data.totalExecutions > 0 ? data.successCount / data.totalExecutions : 0,
avgExecutionTime: data.totalExecutions > 0 ? data.totalTime / data.totalExecutions : 0,
errorCount: data.totalExecutions - data.successCount,
lastExecuted: new Date().toISOString(),
};
}Use these metrics to alert on anomalies-sudden increases in error rates, execution time spikes, or agents going silent.
Moving from local development to production requires careful attention to configuration, secrets, and deployment strategy.
Environment Configuration
Use separate environments for development, staging, and production. Define them in wrangler.toml:
[env.development]
name = "agent-worker-dev"
route = "https://dev.example.com/*"
[env.staging]
name = "agent-worker-staging"
route = "https://staging.example.com/*"
[env.production]
name = "agent-worker-prod"
route = "https://agents.example.com/*"Deploy to each environment separately:
wrangler deploy --env development
wrangler deploy --env staging
wrangler deploy --env productionSecrets Management
Store sensitive values (API keys, database credentials) as secrets:
wrangler secret put OPENAI_API_KEY --env production
wrangler secret put DATABASE_URL --env productionSecrets are encrypted and never exposed in your code or logs.
Rate Limiting and Abuse Prevention
Edge agents can be abused. Implement rate limiting:
async function checkRateLimit(
userId: string,
env: Env
): Promise<boolean> {
const key = `ratelimit:${userId}`;
const current = await env.AGENT_KV.get(key);
const count = current ? parseInt(current) : 0;
if (count >= 100) { // 100 requests per hour
return false;
}
await env.AGENT_KV.put(key, String(count + 1), {
expirationTtl: 3600,
});
return true;
}
async function handleAgentRequest(
request: Request,
env: Env
): Promise<Response> {
const userId = request.headers.get('X-User-ID') || 'anonymous';
if (!(await checkRateLimit(userId, env))) {
return new Response('Rate limit exceeded', { status: 429 });
}
// Process request
const body = await request.json();
const response = await routeToAgent(body.query, {}, env);
return new Response(JSON.stringify(response), {
headers: { 'Content-Type': 'application/json' },
});
}Error Handling and Resilience
Production agents fail. Plan for it:
async function executeWithFallback(
primaryAgentId: string,
fallbackAgentId: string,
query: string,
env: Env
): Promise<AgentResponse> {
try {
return await runAgent(
{ query, agentId: primaryAgentId },
env
);
} catch (error) {
console.error(`Primary agent failed: ${error}`);
try {
return await runAgent(
{ query, agentId: fallbackAgentId },
env
);
} catch (fallbackError) {
console.error(`Fallback agent also failed: ${fallbackError}`);
return {
result: 'Service temporarily unavailable. Please try again.',
reasoning: 'Both primary and fallback agents failed',
executionTime: 0,
};
}
}
}While Cloudflare Workers provides the runtime, managing complex multi-agent workflows at scale requires orchestration. This is where platforms like Padiso add value.
Padiso handles:
You can deploy Cloudflare Workers as part of a broader Padiso orchestration setup. Workers handle the edge execution layer, while Padiso manages coordination, state, and monitoring.
To integrate Cloudflare Workers with Padiso, expose your Workers as HTTP endpoints and register them with Padiso. Padiso then routes work to your Workers, manages state between invocations, and provides observability across your entire agent system.
This architecture is particularly powerful for founders building headless companies. Your agents run at the edge for low latency, but they're coordinated through Padiso, giving you the operational visibility and control needed to run a business entirely on autonomous agents.
As you move toward production, keep these principles in mind:
Start Simple, Scale Gradually
Begin with a single agent handling one task. Get that working reliably, then add complexity. Cloudflare Workers scales automatically, but your orchestration logic shouldn't be more complex than it needs to be.
Monitor Everything
You can't fix what you can't see. Log execution details, track metrics, and set up alerts for anomalies. This is especially important for always-on agents where problems might go unnoticed for hours.
Design for Failure
Assume agents will fail. Build retries, fallbacks, and graceful degradation into your workflows. A failed agent should return a sensible error message, not break the entire workflow.
Optimize for Latency
Edge deployment is about speed. Keep agent logic lean. Offload heavy computation to background jobs. Cache results aggressively. Every millisecond matters for customer-facing workflows.
Use Proper Abstractions
Don't hardcode agent logic into your Workers. Create reusable agent handlers, workflow patterns, and integration utilities. This makes your code maintainable and testable.
Plan for State
Stateless execution is great for scalability, but agents need context. Design your state storage strategy early. KV for caching, D1 for structured data, and external services for specialized needs.
Deploying agents to the edge changes how you think about automation. Instead of centralized servers running your business logic, you get distributed, always-on agents running closer to your users and data.
Cloudflare Workers provides the infrastructure. You've learned how to build agents as stateless functions, manage state externally, orchestrate multi-agent workflows, integrate with external services, and monitor everything in production.
But Workers alone isn't enough for complex agent systems. As your agent team grows-handling multiple workflows, managing state across invocations, coordinating with external services-you need orchestration. Explore Padiso's agent orchestration platform to see how it complements Workers-based deployments, providing the coordination layer that turns agent teams into reliable, observable business infrastructure.
The future of business operations is agents running 24/7, making decisions, integrating systems, and scaling without human intervention. Edge deployment is how you make that future real-with low latency, high availability, and zero infrastructure overhead.
Start building today. Deploy your first agent to Cloudflare Workers, monitor it, learn what works, and scale from there. The tools are simple. The possibilities are limitless.