2 min read348 words

什么是 AI Agent

从聊天机器人到 Agent

传统聊天机器人只能根据输入生成回复。而 AI Agent 能够感知环境、制定计划、执行行动并根据反馈调整策略。

graph LR subgraph 传统 Chatbot A1[输入] --> A2[LLM] --> A3[输出] end subgraph AI Agent B1[感知] --> B2[思考] B2 --> B3[计划] B3 --> B4[行动] B4 --> B5[观察] B5 --> B2 end style A2 fill:#ffcdd2,stroke:#c62828,stroke-width:2px style B2 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px style B3 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style B4 fill:#fff3e0,stroke:#f57c00,stroke-width:2px

核心区别：

维度	Chatbot	AI Agent
交互方式	单轮问答	多步骤自主执行
工具使用	无	调用 API、写代码、操作文件
记忆	无或短期	工作记忆 + 长期记忆
决策	被动响应	主动规划
环境感知	仅文本输入	多种输入源

Agent 的核心组件

一个典型的 AI Agent 由四个核心组件构成：

graph TB A[AI Agent] --> B[大脑 - LLM] A --> C[工具 - Tools] A --> D[记忆 - Memory] A --> E[规划 - Planning] B --> B1[理解任务] B --> B2[推理决策] B --> B3[生成回复] C --> C1[搜索引擎] C --> C2[代码执行] C --> C3[文件操作] C --> C4[API 调用] D --> D1[短期记忆 - 上下文] D --> D2[长期记忆 - 向量库] E --> E1[任务分解] E --> E2[步骤排序] E --> E3[失败重试] style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px style B fill:#c8e6c9,stroke:#388e3c,stroke-width:2px style C fill:#fff3e0,stroke:#f57c00,stroke-width:2px style D fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px style E fill:#ffcdd2,stroke:#c62828,stroke-width:2px

Agent 的决策循环

几乎所有 Agent 都遵循一个核心循环：感知 → 思考 → 行动 → 观察。

"""
最简单的 Agent 循环实现
"""
from openai import OpenAI
import json
class SimpleAgent:
"""最小可行 Agent"""
def __init__(self):
self.client = OpenAI()
self.tools = {}
self.max_iterations = 10
def register_tool(self, name: str, func, description: str):
"""注册工具"""
self.tools[name] = {
"function": func,
"schema": {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": {
"type": "object",
"properties": {
"input": {"type": "string", "description": "工具输入"}
},
"required": ["input"],
},
},
},
}
def run(self, task: str) -> str:
"""执行任务"""
messages = [
{"role": "system", "content": "你是一个能使用工具的AI助手。根据需要调用工具来完成任务。"},
{"role": "user", "content": task},
]
tool_schemas = [t["schema"] for t in self.tools.values()]
for i in range(self.max_iterations):
print(f"\n--- 迭代 {i + 1} ---")
# 1. 思考：让 LLM 决定下一步
response = self.client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tool_schemas if tool_schemas else None,
)
message = response.choices[0].message
# 2. 判断：是否需要调用工具
if not message.tool_calls:
print(f"最终回答: {message.content[:100]}...")
return message.content
# 3. 行动：执行工具调用
messages.append(message)
for tool_call in message.tool_calls:
func_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
print(f"调用工具: {func_name}({args})")
# 执行工具
if func_name in self.tools:
result = self.tools[func_name]["function"](**args)
else:
result = f"错误: 未知工具 {func_name}"
print(f"工具返回: {str(result)[:100]}")
# 4. 观察：将结果反馈给 LLM
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result),
})
return "达到最大迭代次数，任务未完成。"
# ==================
# 使用示例
# ==================
def search_web(input: str) -> str:
"""模拟搜索"""
return f"搜索结果: '{input}' 是一个重要的 AI 技术概念..."
def calculator(input: str) -> str:
"""计算器"""
try:
return str(eval(input))
except Exception:
return "计算错误"
agent = SimpleAgent()
agent.register_tool("search", search_web, "搜索互联网获取信息")
agent.register_tool("calculator", calculator, "数学计算")
result = agent.run("请搜索 AI Agent 的定义，然后计算 2024 + 2 等于多少")

Agent 分类

graph TB A[Agent 分类] --> B[按架构] A --> C[按能力] A --> D[按场景] B --> B1[ReAct Agent] B --> B2[Plan-and-Execute] B --> B3[Reflexion Agent] B --> B4[Multi-Agent] C --> C1[工具型 Agent] C --> C2[对话型 Agent] C --> C3[自主型 Agent] D --> D1[代码助手] D --> D2[数据分析] D --> D3[客服机器人] D --> D4[自动化工作流] style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px

架构模式	特点	适用场景
ReAct	交替推理和行动	通用任务
Plan-and-Execute	先规划后执行	复杂多步骤任务
Reflexion	自我反思改进	需要高质量的任务
Multi-Agent	多个 Agent 协作	大型复杂系统

本章小结

AI Agent = LLM + 工具 + 记忆 + 规划
Agent 的核心是「感知-思考-行动-观察」循环
LLM 是 Agent 的大脑，但工具和记忆同样关键
不同架构模式适用于不同场景

下一章：深入了解 ReAct 和 Plan-and-Execute 等核心架构。