Agentic Workflows, Agents, and Multi-Agent Systems

Start simple

Most teams building with LLMs jump straight to agents. I did too. After building a multi-agent orchestrator and pushing it to production, these are the things I’ve learned: the right pattern depends on the problem, and agents are rarely the right first choice.

Anthropic’s engineering team says the same thing: start simple, add complexity only when you have to.

Compare and contrast

Pattern Breakdown

Workflow

You define the path. The LLM is one step in it.

Every step is defined in code before anything runs. You decide the path. The LLM executes it. Input goes in, output comes out, same way every time. Prompt chaining, routing, parallelization. These are all workflow patterns from Anthropic’s building effective agents guide.

In practice: A customer signs up. Your system verifies their email, calls an LLM to generate a personalized welcome message based on their profile, creates the account, and sends the email. Every customer hits the same steps in the same order. If the email verification fails, you know exactly where it broke. You can tell your team exactly how much each signup costs in API calls because the path never changes. This is where most production AI should live.

Agentic Workflow

You define the structure. The LLM decides when to retry.

A structured flow, but the AI has room to make decisions inside it. Plan, execute, reflect, and loop back if the result isn’t good enough. The structure keeps it on track. The AI gives it flexibility.

The LLM might call tools during execution, but you decide which tools are available at each step.

In practice: You’re building a code review system. The AI reads a pull request, plans which files to focus on, runs linting and security checks, then reflects on the results. If it finds issues, it loops back: re-analyzes the flagged code, generates a more specific review, and checks again. The overall flow is defined (read, check, review) but the AI decides how deep to go and whether to loop. If the first pass is clean, it moves on. If it finds a SQL injection risk, it digs in. This is the sweet spot for teams that have outgrown pure workflows but don’t need full agent autonomy.

Agent

You give it tools and a system prompt. The LLM decides which tool to use.

The model decides what to do next. You give it a set of tools and guardrails, not a path. It chooses which tool to call, reads the result, and decides the next step on its own. This is the key difference from an agentic workflow: instead of you defining when tools get called, the LLM picks from the full toolbox on every turn.

In practice: A customer asks your support agent “I was charged twice for my last order.” The agent doesn’t follow a script. It decides to look up the customer’s account, finds the order, checks the payment history, sees two charges, determines one is a duplicate, and initiates a refund. If the customer had asked a different question, the agent would have taken a completely different path.

The thing that separates good agents from bad ones is tool design. If your “lookup order” tool returns clean, structured data, the agent makes good decisions. If it returns a messy blob, the agent gets confused and picks the wrong next step. A poorly described tool will send your agent down the wrong path before it even starts reasoning. Spend your time making tools obvious to use, not picking between agent frameworks.

Multi-Agent (MAS)

Multiple agents, each with their own prompt and tools.

Multiple specialized agents working together, each owning a domain. An orchestrator delegates work, agents execute in their area of expertise, and results converge into a final output.

The key thing to know: multi-agent works best when the problem breaks into parallel strands that don’t depend on each other. Tasks where each piece can be researched or processed independently are a natural fit. Tasks that are tightly interdependent, where every step relies on the previous one, are better served by a single agent or an agentic workflow.

The cost is real. Anthropic’s own multi-agent research system performed significantly better than a single agent, but consumed roughly 15x the tokens. That tradeoff is worth it when the quality gain matters, like deep research or high-stakes analysis. It’s not worth it for tasks where a single well-prompted agent gets you 90% of the way there.

In practice: A company needs to process a large contract. One agent can’t do this well because the tasks require different reasoning. A document extraction agent pulls out key terms, dates, and obligations. A compliance agent checks those terms against regulatory requirements. A risk assessment agent flags problematic clauses based on historical litigation data. Each agent has different tools, different system prompts, and different expertise. The extraction agent needs OCR and parsing tools. The compliance agent needs access to regulatory databases. The risk agent needs case law. Cramming all of that into one prompt would dilute every capability. The orchestrator coordinates the handoffs and assembles the final report.

Each subagent needs a clear objective, defined boundaries, and an expected output format. Without that clarity, agents duplicate each other’s work or leave gaps. The orchestrator’s job isn’t just delegation. It’s memory management, preventing token overflow, and knowing when to stop.

How to decide

Always pick the simplest pattern that meets your needs. The question is: can you write the control flow yourself, or does the problem require the model to figure it out? Sometimes the model needs to discover which tools to use based on what it finds along the way. You can’t hardcode every path in advance. That’s when you need more autonomy.

References