4 min read858 words

高级推理技术：ToT、ReAct 与更多

Chain-of-Thought 是线性的逐步推理。但很多复杂问题需要探索多条路径、与外部工具交互或自我反思修正。本节介绍几种超越 CoT 的高级推理技术。

技术全景

graph TB A[高级推理技术] --> B[Tree of Thoughts
树形思维] A --> C[ReAct
推理+行动] A --> D[Reflexion
自我反思] A --> E[Least-to-Most
从少到多] B --> B1[探索多条推理路径
选择最优方案] C --> C1[推理与工具调用交替
获取真实信息] D --> D1[执行后反思错误
改进再次执行] E --> E1[分解为子问题
逐步解决] style A fill:#ede7f6,stroke:#5e35b1,stroke-width:3px style B fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style C fill:#fff9c4,stroke:#f9a825,stroke-width:2px style D fill:#c8e6c9,stroke:#43a047,stroke-width:2px style E fill:#fce4ec,stroke:#c2185b,stroke-width:2px

Tree of Thoughts（树形思维）

由 Yao et al., 2023 提出。核心思想：像下棋一样思考——探索多种可能性，评估每条路径，选择最优方案。

CoT vs ToT 的对比

graph TB subgraph "Chain-of-Thought（线性）" A1[问题] --> A2[步骤1] --> A3[步骤2] --> A4[步骤3] --> A5[答案] end subgraph "Tree of Thoughts（树形）" B1[问题] --> B2[思路A] B1 --> B3[思路B] B1 --> B4[思路C] B2 --> B5[A-1] B2 --> B6[A-2] B3 --> B7[B-1 ✓] B3 --> B8[B-2] B4 --> B9[C-1] B7 --> B10["最优答案 ✓"] end style A1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style A5 fill:#c8e6c9,stroke:#43a047,stroke-width:2px style B1 fill:#ede7f6,stroke:#5e35b1,stroke-width:3px style B7 fill:#a5d6a7,stroke:#2e7d32,stroke-width:2px style B10 fill:#a5d6a7,stroke:#2e7d32,stroke-width:3px

实现方式

ToT 可以通过精心设计的提示词来实现，不需要特殊框架：

from openai import OpenAI
client = OpenAI()
def tree_of_thoughts(problem: str) -> str:
"""
使用 Tree of Thoughts 策略解决问题。
分三个阶段：
1. 生成多个候选思路
2. 评估每条思路的可行性
3. 深入最优思路得出答案
"""
# 阶段1：生成候选思路
step1_prompt = f"""
针对以下问题，请生成3种不同的解决思路。
每种思路需要：
- 简述核心方法（2-3句话）
- 列出关键假设
- 初步评估可行性（高/中/低）
问题：{problem}
请按以下格式输出：
思路A：[方法名称]
核心方法：...
关键假设：...
初步可行性：...
思路B：[方法名称]
核心方法：...
关键假设：...
初步可行性：...
思路C：[方法名称]
核心方法：...
关键假设：...
初步可行性：...
"""
step1_response = client.chat.completions.create(
model="gpt-4o",
temperature=0.7,
messages=[{"role": "user", "content": step1_prompt}]
)
candidates = step1_response.choices[0].message.content
# 阶段2：深入评估
step2_prompt = f"""
以下是针对一个问题的3种候选解决思路。
请作为一位严谨的评委，对每种思路进行深入评估。
问题：{problem}
候选思路：
{candidates}
评估维度（每项1-10分）：
1. 正确性：方法在逻辑上是否正确？
2. 完整性：是否考虑了所有关键因素？
3. 可行性：是否可以实际执行？
4. 效率：时间和资源消耗是否合理？
请给出每种思路的总分，并选出最佳思路。
"""
step2_response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": step2_prompt}]
)
evaluation = step2_response.choices[0].message.content
# 阶段3：深入最优思路
step3_prompt = f"""
基于以下评估结果，请沿着得分最高的思路深入推理，给出完整的解决方案。
问题：{problem}
评估结果：
{evaluation}
请给出：
1. 详细的执行步骤
2. 每步的推理过程
3. 最终答案
4. 对答案的验证
"""
step3_response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": step3_prompt}]
)
return step3_response.choices[0].message.content
# 使用示例：适合复杂决策问题
result = tree_of_thoughts(
"设计一个能处理每秒1万请求的实时推荐系统架构，"
"技术栈限定为 Python + Redis + PostgreSQL，"
"预算控制在每月3万元以内。"
)
print(result)

ToT 适用场景

适合使用 ToT	不适合使用 ToT
多解方案的设计问题	有唯一正确答案的简单计算
策略规划和决策	翻译、摘要等模式化任务
复杂的系统设计	实时性要求高的场景
创意方案的探索	预算极其有限（Token消耗大）

ReAct（Reasoning + Acting）

由 Yao et al., 2022 提出。核心思想：让模型交替进行"推理"和"行动"，从外部获取真实信息辅助推理。

这是 AI Agent 的核心技术之一。

graph TB A[用户问题] --> B[Thought 1
思考：需要什么信息？] B --> C[Action 1
行动：搜索/计算/查询] C --> D[Observation 1
观察：获得的结果] D --> E[Thought 2
思考：信息是否充分？] E --> F{信息充分？} F -->|否| G[Action 2
补充信息] G --> H[Observation 2] H --> I[Thought 3] F -->|是| J[Final Answer
给出最终答案] style A fill:#ede7f6,stroke:#5e35b1,stroke-width:3px style B fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style C fill:#fff9c4,stroke:#f9a825,stroke-width:2px style D fill:#c8e6c9,stroke:#43a047,stroke-width:2px style J fill:#a5d6a7,stroke:#2e7d32,stroke-width:3px

ReAct 提示词模板

react_prompt = """
你是一个能够使用工具的AI助手。在回答问题时，请交替进行"思考"和"行动"。
可用工具：
1. search(query) - 搜索信息
2. calculate(expression) - 数学计算
3. lookup(term) - 查找术语定义
格式要求：
Thought: 你的思考过程
Action: 工具名称(参数)
Observation: [工具返回的结果]
... （重复直到有足够信息）
Thought: 综合所有信息得出结论
Final Answer: 最终答案
---
问题：{user_question}
Thought: """
# 示例对话展示
example = """
问题：特斯拉Model 3和比亚迪海豹，哪个在2025年中国市场更有性价比？
Thought: 我需要比较这两款车的价格和核心参数。先查找各自的价格信息。
Action: search("特斯拉Model 3 2025 中国售价")
Observation: 特斯拉Model 3 2025款中国起售价约24.59万元（后驱版）
Thought: 现在查找比亚迪海豹的价格。
Action: search("比亚迪海豹 2025 售价")
Observation: 比亚迪海豹 2025款起售价约17.68万元（EV版）
Thought: 价格差异明显。我还需要比较续航、空间等关键参数才能做出性价比判断。
Action: search("特斯拉Model 3 vs 比亚迪海豹 2025 参数对比")
Observation: Model 3续航606km，海豹续航550km。海豹轴距更长(2920mm vs 2875mm)...
Thought: 现在我有足够的信息来做对比分析了。
从价格看，海豹便宜约7万元。续航Model 3略优（+56km）。
空间海豹更优。综合来看，比亚迪海豹的性价比更高。
Final Answer: 从性价比角度来看，比亚迪海豹更有优势...
"""

使用 LangChain 实现 ReAct Agent

from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain import hub
import requests
# 定义工具
def web_search(query: str) -> str:
"""搜索互联网获取最新信息"""
# 实际项目中接入搜索API（如 SerpAPI、Tavily等）
# 这里是简化示例
return f"搜索 '{query}' 的结果: ..."
def calculator(expression: str) -> str:
"""执行数学计算"""
try:
result = eval(expression)  # 生产代码应使用安全的计算库
return str(result)
except Exception as e:
return f"计算错误: {e}"
tools = [
Tool(name="Search", func=web_search, description="搜索互联网获取信息"),
Tool(name="Calculator", func=calculator, description="进行数学计算"),
]
# 创建 ReAct Agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
react_prompt = hub.pull("hwchase17/react")  # 使用标准 ReAct 提示词模板
agent = create_react_agent(llm, tools, react_prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,  # 打印推理过程
max_iterations=5  # 限制最大迭代次数
)
# 运行
result = agent_executor.invoke({
"input": "如果我用20万人民币投资沪深300指数基金，按历史年化收益率8%计算，10年后大约有多少钱？"
})
print(result["output"])

Reflexion（自我反思）

由 Shinn et al., 2023 提出。核心思想：执行→反思错误→修正→再次执行。

graph TB A[任务] --> B[首次执行] B --> C[得到结果] C --> D{检查结果} D -->|正确| E[输出最终答案] D -->|有误| F[自我反思
分析错误原因] F --> G[制定改进计划] G --> H[修正后重新执行] H --> C style A fill:#ede7f6,stroke:#5e35b1,stroke-width:3px style B fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style D fill:#ffe0b2,stroke:#e64a19,stroke-width:2px style E fill:#a5d6a7,stroke:#2e7d32,stroke-width:3px style F fill:#ffcdd2,stroke:#c62828,stroke-width:2px style G fill:#fff9c4,stroke:#f9a825,stroke-width:2px

Reflexion 提示词模板

def reflexion_solve(problem: str, max_attempts: int = 3) -> str:
"""
使用 Reflexion 策略解决问题：执行 → 反思 → 修正。
Args:
problem: 问题描述
max_attempts: 最大尝试次数
Returns:
最终答案
"""
history = []
for attempt in range(1, max_attempts + 1):
# 步骤1：尝试解决
if attempt == 1:
solve_prompt = f"""
请解决以下问题，展示详细推理过程。
问题：{problem}
"""
else:
solve_prompt = f"""
请解决以下问题，展示详细推理过程。
问题：{problem}
之前的尝试和反思记录：
{chr(10).join(history)}
请基于之前的反思，避免同样的错误，重新求解。
"""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": solve_prompt}]
)
solution = response.choices[0].message.content
# 步骤2：自我反思
reflect_prompt = f"""
请作为一位审核员，检查以下问题的解答是否正确。
问题：{problem}
解答：
{solution}
请检查：
1. 推理逻辑是否正确？有无跳步或错误假设？
2. 计算结果是否正确？
3. 最终答案是否合理？
4. 是否有遗漏的关键因素？
如果发现错误，请：
- 指出具体的错误位置
- 解释为什么是错误的
- 给出修正方向
输出格式：
判断：正确/有误
分析：...
修正建议：...
"""
reflection = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": reflect_prompt}]
)
feedback = reflection.choices[0].message.content
if "正确" in feedback and "有误" not in feedback:
print(f"第 {attempt} 次尝试通过验证！")
return solution
history.append(f"--- 第{attempt}次尝试 ---\n解答：{solution}\n反思：{feedback}")
print(f"第 {attempt} 次尝试发现问题，进入下一轮修正...")
return solution  # 返回最后一次尝试的结果

Least-to-Most（从少到多分解）

由 Zhou et al., 2022 提出。核心思想：先将复杂问题分解为子问题，然后从最简单的子问题开始逐步解决。

def least_to_most(problem: str) -> str:
"""
使用 Least-to-Most 策略：分解 → 逐步解决。
"""
# 阶段1：问题分解
decompose_prompt = f"""
请将以下复杂问题分解为一系列从简单到复杂的子问题。
每个子问题的答案应该能帮助回答后续更复杂的子问题。
复杂问题：{problem}
请按以下格式输出子问题列表（从最简单到最复杂）：
子问题1：...
子问题2：...
子问题3：...
...
"""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": decompose_prompt}]
)
subproblems = response.choices[0].message.content
# 阶段2：逐步解决
solve_prompt = f"""
请逐一解答以下子问题。每解决一个子问题后，利用该答案来帮助解决下一个。
子问题列表：
{subproblems}
请按顺序解答每个子问题，每个子问题给出完整答案后再进入下一个。
最后，综合所有子问题的答案，给出原始问题的完整回答。
原始问题：{problem}
"""
final_response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": solve_prompt}]
)
return final_response.choices[0].message.content
# 使用示例
result = least_to_most(
"设计一个完整的电商推荐系统，包括数据收集、特征工程、"
"模型选择、A/B测试和上线部署的全流程方案。"
)

技术对比总结

graph LR subgraph "选择指南" A{任务类型？} A -->|数学推理| B["CoT
线性推理"] A -->|多方案比较| C["ToT
树形探索"] A -->|需要外部信息| D["ReAct
推理+行动"] A -->|需要验证修正| E["Reflexion
自我反思"] A -->|复杂可分解| F["Least-to-Most
逐步分解"] end style A fill:#ede7f6,stroke:#5e35b1,stroke-width:3px style B fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style C fill:#fff9c4,stroke:#f9a825,stroke-width:2px style D fill:#c8e6c9,stroke:#43a047,stroke-width:2px style E fill:#ffcdd2,stroke:#c62828,stroke-width:2px style F fill:#fce4ec,stroke:#c2185b,stroke-width:2px

技术	核心原理	最佳场景	Token消耗	实现复杂度
CoT	线性逐步推理	数学、逻辑推理	⭐⭐	⭐
Self-Consistency	多次CoT + 投票	高可靠性需求	⭐⭐⭐⭐	⭐⭐
ToT	多路径探索+评估	方案设计、创意任务	⭐⭐⭐⭐⭐	⭐⭐⭐
ReAct	推理+工具调用	需要最新信息的任务	⭐⭐⭐	⭐⭐⭐
Reflexion	执行→反思→修正	编程、复杂计算	⭐⭐⭐⭐	⭐⭐⭐
Least-to-Most	分解→逐步解决	大型复杂问题	⭐⭐⭐	⭐⭐

动手练习

练习1：ToT 方案设计

使用 Tree of Thoughts 方法，为以下问题生成3个候选方案并选出最优方案：

"帮一家5人创业团队选择技术栈来开发一个在线教育平台（提供视频课程、在线考试和学习社区功能）"

练习2：ReAct 流程模拟

手动模拟一个 ReAct 流程（Thought → Action → Observation 交替），回答以下问题：

"比较 FastAPI 和 Django 在2025年的生态系统，推荐哪个更适合构建 AI 应用的后端？"

本章要点

✅ Tree of Thoughts 通过探索多条推理路径并评估选优，适合方案设计类问题
✅ ReAct 让模型交替推理和调用工具，是 AI Agent 的核心技术
✅ Reflexion 通过自我反思和修正来提升答案质量
✅ Least-to-Most 将复杂问题分解为递进的子问题来解决
✅ 不同技术有不同的适用场景和成本特点，应根据实际需求选择
✅ 这些技术可以组合使用，例如 ReAct + CoT、ToT + Self-Consistency

下一步：角色设定与系统提示词 🚀