🛠️ From Tasks to Workflow
When teams think about “agents,” they often imagine something autonomous, intelligent, and complex. But the most effective systems we’ve seen start much simpler — with predictable, well-structured workflows.
Workflows give you reliability and control. They allow you to define the exact steps your system should take, when to call an LLM, and when to trust it. Think of them as blueprints for decision-making, not freeform chatbots.
🧱 Composable Patterns That Work
Through customer projects and internal builds, we’ve seen a handful of patterns show up again and again. These aren’t just design ideas — they’re battle-tested ways to reduce cost, improve performance, and gain visibility.
1. Prompt Chaining
Break a task into clear sequential steps:
- Generate → Check → Improve
- Summarize → Extract → Format
Each step can be validated independently. This trades latency for control — and improves reliability.
2. Routing
Classify the input, then direct it to the right subflow or model:
- FAQ? → cheap model
- Refund request? → call refund API
- Complex issue? → escalate to agent
Routing lets you build specialized logic and optimize for speed + cost.
3. Parallelization
Split the work:
- Run multiple LLMs in parallel on different subtasks (sectioning)
- Run the same prompt multiple times and combine the output (voting)
This improves speed, confidence, and robustness.
4. Evaluator-Optimizer Loops
One agent proposes, another reviews:
- “Write a draft” → “Check for tone and accuracy”
- “Suggest changes” → “Evaluate if they’re better”
This is useful for creative tasks and continuous improvement cycles.
5. Orchestrator–Workers
An LLM breaks down a problem and delegates subtasks:
- Plan → Delegate → Aggregate
This is useful when the shape of the work changes dynamically.
🧠 Designing for the 80/20 of Information Workflows
In most enterprises, 80% of the value comes from 20% of the workflows — the predictable, repetitive tasks that follow a known structure.
For these, we recommend building deterministic workflows using Autogen’s DiGraphGroupChat. They’re easier to test, debug, and optimize.
For the remaining 20% — the fuzzy, complex, or one-off tasks — use Autogen Teams for more flexible, agent-led execution.
This lets you:
- Optimize core tasks with low-latency, high-confidence chains
- Handle exceptions or novel queries with adaptive agents
- Keep evaluation, cost, and control where they matter most
🔁 How We Implement This at Aegis
We use Autogen under the hood — with two core abstractions:
- DiGraphGroupChat: for predictable workflows with defined structure (used in chaining, routing, parallelization, etc.)
- Autogen Teams: for more open-ended agents with autonomy and planning
You can start with DiGraph for reliability, then move to Teams when you need flexibility.
These are both configurable via YAML or JSON, meaning you can define workflows declaratively — and evaluate them consistently.
📄 Example: DiGraph Config for a Grading Workflow
{
"type": "digraph",
"nodes": [
{"name": "marker", "agent": "auto_marker"},
{"name": "evaluator", "agent": "grade_evaluator"},
{"name": "finaliser", "agent": "final_check"}
],
"edges": [
["marker", "evaluator"],
["evaluator", "finaliser"]
]
}
This defines a structured 3-step process:
auto_marker
gives an initial gradegrade_evaluator
checks tone, rubric alignment, etc.final_check
applies final filters, formatting, and schema checks
This graph can be run, tested, and versioned — with prompt and tool logic externalized in config.
For less structured tasks (like debugging unexpected student answers or handling appeals), we switch to Autogen Teams
, where agents reason more freely within defined safety limits.
👥 Example: Using Autogen Teams for Open-Ended Tasks
Let’s say your grading workflow fails — the evaluator detects that an answer is borderline or mismatched with rubric expectations. This is where an Autogen Team can take over.
Instead of just re-prompting, you spawn a structured team to reason through:
- Was the rubric misinterpreted?
- Should this be escalated to a human?
- Can we enrich the feedback with citations or examples?
A sample team config might look like:
{
"type": "team",
"agents": [
{"name": "marker", "llm_config": {"model": "gpt-4", "system_message": "Grade based on rubric"}},
{"name": "moderator", "llm_config": {"model": "gpt-4", "system_message": "Resolve grading conflicts and make final call"}},
{"name": "explainer", "llm_config": {"model": "gpt-4", "system_message": "Summarize key strengths and improvement areas for the student"}}
],
"coordination": {
"entry_point": "marker",
"strategy": "discuss_until_resolved",
"termination": "moderator_decision"
}
}
This gives your system flexibility when things get fuzzy — without bloating the core workflow.
🏫 LMS Example
🔄 LMS Evaluation Pipeline: Step-by-Step
-
Student submits answer
Triggered via frontend or API — the structured input is validated and passed to the workflow engine. -
auto_marker
assigns an initial grade
Based on rubric alignment, using a config-driven prompt. -
grade_evaluator
checks the response
Evaluates tone, justification, and alignment with expected feedback norms. -
Failure triggers fallback logic
A confidence threshold or rubric mismatch activates a rerun, possibly with clarification. -
Optional Autogen Team invocation
When ambiguous, a structured Autogen Team takes over for deliberation and final call. -
final_check
normalizes and formats the result
Applies schema validation and optionally appends rationale or scoring metadata. -
Log and return
Results are saved for evaluation metrics (e.g. F1, feedback helpfulness) and returned to the LMS.
Let’s say your LMS has an evaluation pipeline:
- A student submits an answer
- The agent grades it, then runs an LLM-based evaluator to verify tone and coverage
- If it fails, it re-prompts with an improvement instruction
- Finally, a lightweight scoring model double-checks consistency
That’s not “an agent.” That’s a workflow. And it’s exactly how you should build your first real system.
Build workflows. Validate outputs. Keep it simple until complexity is earned.
Next: How to give your agents access to data — without leaking everything.