【转译】从零开始:用 Python 构建一个 AI Agent
Table of Contents
此文翻译自Leonie Monigatti的 Building an AI agent from scratch in Python
如何通过大模型 API 实现一个不依赖框架的单智能体系统(a single AI agent)。
市面上构建 AI 智能体的框架层出不穷,如 CrewAI、LangGraph 和 OpenAI Agents SDK,面对这些选择,开发者往往会感到无从下手。Anthropic 曾建议:在依赖复杂的框架抽象之前,最好先直接调用大模型(LLM)接口来理解其底层基本功。
本教程遵循这一思路,我们将直接使用大模型 API,在 Python 中从零开始实现一个 AI 智能体。通过这种方式,你可以深入理解智能体运行的内部机制。我们将首先聚焦于单智能体的实现,这是迈向更复杂的“智能体工作流”(agentic workflows)或“多智能体系统”(multi-agent)的基石。
手写 AI 智能体的核心组件 #
我们将一步步实现一个 Agent() 类,它包含智能体的四大核心组件:
大模型与指令 (LLM & Instructions):智能体的“大脑”,负责推理、决策,并遵循特定的行为准则。记忆 (Memory):对话历史(短期记忆 short-term memory),让智能体理解当前的语境。工具 (Tools):智能体可以调用的外部函数或 API。
循环 (Agent Loop):将上述组件有机结合的闭环逻辑。
组件 1:大模型与指令(LLM and Instructions) #
智能体的核心是具备“工具调用”能力的大模型(如 Anthropic’s Claude 4 Sonnet、OpenAI’s GPT-4o 或 Google’s Gemini 2.5 Pro).
本教程以 Anthropic 的 API 为例,但其逻辑可以轻松迁移到其他模型。
在使用 Anthropic API 之前,你需要先准备好 ANTHROPIC_API_KEY。只需注册 Anthropic 账号,并在控制面板(Dashboard)的 “API Keys” 栏目中生成即可。获取密钥后,请根据你的开发环境,将其妥善配置在环境变量、.env 配置文件或 Google Colab 的 Secrets(安全密钥)中。
首先,安装并导入必要的库:
%%capture
%pip install -U anthropic python-dotenv
import anthropic
import os
from dotenv import load_dotenv
from google.colab import userdata
load_dotenv()
print(anthropic.__version__)
# 0.69.0
现在,我们实现一个简单的 Agent 类,包含以下部分:
初始化(Initialization):设置 LLM 客户端,并为模型配置一个系统提示词(system prompt),该提示词包含了关于 Agent 应当如何行动的指令。(你也可以将其设为一个可以传递给 Agent 的参数,但为了简单起见,我们将使用一个固定的提示词。)
chat 方法:通过将用户消息发送给 LLM API 并返回响应来处理这些消息。
class Agent:
"""一个简单的、能够回答问题的 AI 智能体"""
def __init__(self):
self.client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
self.model = "claude-sonnet-4-20250514"
self.system_message = "You are a helpful assistant that breaks down problems into steps and solves them systematically."
def chat(self, message):
"""Process a user message and return a response"""
response = self.client.messages.create(
model=self.model,
max_tokens=1024,
system=self.system_message,
messages=[
{"role": "user", "content": message}
],
temperature=0.1,
)
return response
此时的智能体只能进行单次问答。测试一下:
agent = Agent()
response = agent.chat("I have 4 apples. How many do you have?")
print(response.content[0].text)
don't have any apples - as an AI, I don't have a physical form, so I can't possess physical objects like apples. Only you have apples in # this scenario (4 of them).
Is there something you'd like to do with this information, like a math problem involving your apples?
接着问第二个问题:
response = agent.chat("I ate 1 apple. How many are left?")
print(response.content[0].text)
I don't have enough information to answer how many apples are left. To solve this, I would need to know:
**What I need:**
- How many apples you started with
**The calculation would be:**
Starting number of apples - 1 apple eaten = Apples remaining
Could you tell me how many apples you had before eating one?
如你所见,Agent 缺失了第一条消息的信息。这就是为什么我们需要给 Agent 提供对话历史。
组件 2:记忆 (对话上下文) #
智能体的记忆(Memory)可以有多种不同的形式,例如短时记忆和长时记忆,而且记忆管理本身就是一个复杂的话题。为了本教程起见,让我们保持简单,从一个基础的短时记忆实现开始。
短时记忆让智能体能够访问对话历史,从而理解当前的交互。在最简单的形式下,短时记忆仅仅是用户(user)和助手(assistant)之间过去消息的列表。(请注意,随着对话历史变得越来越长,你将会遇到**上下文窗口(context window)**的限制,并需要实现更复杂的解决方案。)
我们通过添加一个 messages 属性来实现短时记忆,在该属性中我们同时存储以下两部分内容:
用户输入:使用 {"role": "user", "content": message}
响应结果:使用 {"role": "assistant", "content": response.content}
class Agent:
"""A simple AI agent that can answer questions in a multi-turn conversation"""
def __init__(self):
self.client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
self.model = "claude-sonnet-4-20250514"
self.system_message = "You are a helpful assistant that breaks down problems into steps and solves them systematically."
self.messages = []
def chat(self, message):
"""Process a user message and return a response"""
# Store user input in short-term memory
self.messages.append({"role": "user", "content": message})
response = self.client.messages.create(
model=self.model,
max_tokens=1024,
system=self.system_message,
messages=self.messages,
temperature=0.1,
)
# Store assistant's response in short-term memory
self.messages.append({"role": "assistant", "content": response.content})
return response
现在,让我们用之前的对话示例再次测试该智能体。
agent = Agent()
response = agent.chat("I have 4 apples. How many do you have?")
print(response.content[0].text)
response = agent.chat("I ate 1 apple. How many are left?")
print(response.content[0].text)
I don't have any apples - as an AI, I don't have a physical form and can't possess physical objects like apples. You have 4 apples, and I have 0 apples.
Is there something you'd like to do with your 4 apples, like a math problem or recipe suggestion?
Let me solve this step by step:
**Step 1:** Identify the starting amount
- You started with 4 apples
**Step 2:** Identify what was consumed
- You ate 1 apple
**Step 3:** Calculate the remaining amount
- Apples left = Starting amount - Apples eaten
- Apples left = 4 - 1 = 3
**Answer:** You have 3 apples left.
如你所见,该智能体(Agent)现在已经能够进行对话并引用之前的信息了。
但是,如果你交给智能体一个稍微复杂一点的数学题,会发生什么呢?
agent = Agent()
response = agent.chat("What is 157.09 * 493.89?")
print(response.content[0].text)
I'll solve this step by step using the standard multiplication algorithm.
157.09 × 493.89
First, let me multiply 157.09 by each digit of 493.89:
**Step 1:** 157.09 × 9 (ones place)
157.09 × 9 = 1,413.81
**Step 2:** 157.09 × 80 (tens place)
157.09 × 8 = 1,256.72
1,256.72 × 10 = 12,567.2
**Step 3:** 157.09 × 300 (hundreds place)
157.09 × 3 = 471.27
471.27 × 100 = 47,127
**Step 4:** 157.09 × 90,000 (ten-thousands place)
157.09 × 9 = 1,413.81
1,413.81 × 10,000 = 14,138,100
**Step 5:** 157.09 × 400,000 (hundred-thousands place)
157.09 × 4 = 628.36
628.36 × 100,000 = 62,836,000
**Step 6:** Add all partial products:
1,413.81
12,567.2
47,127
14,138,100
62,836,000
-----------
77,035,208.01
Therefore, **157.09 × 493.89 = 77,035.2081**
该智能体的回答听起来完全可信,但如果你去验证一下,你就会发现即使是像 Claude 4 Sonnet 这样强大的大语言模型(LLM),在没有工具辅助的情况下,仍然会犯算术错误。
157.09 * 493.89
# 77585.1801
组件 3:工具调用 (Tool Use) #
为了扩展智能体的能力,你可以为其提供工具,这些工具的范围涵盖从简单的函数到调用外部 API。在本教程中,我们将实现一个简单的 CalculatorTool 类,用于处理数学问题。
虽然不同供应商对工具使用的具体实现方式有所不同,但在核心层面上,始终需要两个关键组件:
函数实现(Function implementation):这是执行工具逻辑的实际代码,例如执行计算或进行 API 调用。
工具模式(Tool schema):工具的结构化描述。工具描述非常重要,因为它会告知大语言模型(LLM)该工具的功能、何时应该使用它以及它需要哪些参数。
本教程遵循 Anthropic 关于工具使用(Tool use)的文档。如果你在学习本教程时使用的是不同的大语言模型 API,我建议你查阅该 LLM 供应商关于工具使用的相关文档。
class CalculatorTool():
"""A tool for performing mathematical calculations"""
def get_schema(self):
return {
"name": "calculator",
"description": "Performs basic mathematical calculations, use also for simple additions",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Mathematical expression to evaluate (e.g., '2+2', '10*5')"
}
},
"required": ["expression"]
}
}
def execute(self, expression):
"""
Evaluate mathematical expressions.
WARNING: This tutorial uses eval() for simplicity but it is not recommended for production use.
Args:
expression (str): The mathematical expression to evaluate
Returns:
float: The result of the evaluation
"""
try:
result = eval(expression)
return {"result": result}
except:
return {"error": "Invalid mathematical expression"}
请注意,在本教程中,我们仅实现了一个单一工具。在生产环境的代码中,你通常会使用 抽象基类(abstract base class) 来确保所有工具之间拥有一致的接口。
让我们测试一下这个计算器函数是否正常工作。
calculator_tool = CalculatorTool()
calculator_tool.execute("157.09 * 493.89")
# {'result': 77585.1801}
既然我们已经有了一个 CalculatorTool,现在让我们分三个步骤为智能体添加工具使用能力:
添加
tools和tool_map属性,用于存储可用的工具。添加私有的
_get_tool_schemas()方法,用于提取工具的模式(Schemas)。在
create方法中添加工具处理逻辑,用于检测并处理工具调用。
class Agent:
"""A simple AI agent that can use tools to answer questions in a multi-turn conversation"""
def __init__(self, tools):
self.client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
self.model = "claude-sonnet-4-20250514"
self.system_message = "You are a helpful assistant that breaks down problems into steps and solves them systematically."
self.messages = []
self.tools = tools
self.tool_map = {tool.get_schema()["name"]: tool for tool in tools}
def _get_tool_schemas(self):
"""Get tool schemas for all registered tools"""
return [tool.get_schema() for tool in self.tools]
def chat(self, message):
"""Process a user message and return a response"""
# Store user input in short-term memory
self.messages.append({"role": "user", "content": message})
response = self.client.messages.create(
model=self.model,
max_tokens=1024,
system=self.system_message,
tools=self._get_tool_schemas() if self.tools else None,
messages=self.messages,
temperature=0.1,
)
# Store assistant's response in short-term memory
self.messages.append({"role": "assistant", "content": response.content})
return response
让我们来试一试。
calculator_tool = CalculatorTool()
agent = Agent(tools=[calculator_tool])
response = agent.chat("What is 157.09 * 493.89?")
for block in response:
print(block)
('id', 'msg_01BzC2FerKEr8rC1wGfaMiNK')
('content', [TextBlock(citations=None, text="I'll calculate 157.09 * 493.89 for you.", type='text'), ToolUseBlock(id='toolu_017NhVhd5wYWdEw7fFRPHyXL', input={'expression': '157.09 * 493.89'}, name='calculator', type='tool_use')])
('model', 'claude-sonnet-4-20250514')
('role', 'assistant')
('stop_reason', 'tool_use')
('stop_sequence', None)
('type', 'message')
('usage', Usage(cache_creation=CacheCreation(ephemeral_1h_input_tokens=0, ephemeral_5m_input_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=433, output_tokens=77, server_tool_use=None, service_tier='standard'))
正如你在响应(response)中看到的,智能体回答道:“I’ll calculate 157.09 * 493.89 for you.”但它并没有自己去计算这个表达式,而是停了下来,其 stop_reason(停止原因)显示为 tool_use。这意味着,智能体正在等待用户去执行该工具,并将工具的运行结果返回给它。
由于智能体已经做出了响应,表示它需要协助来执行工具并处于等待状态。这时,循环(loop)的最后一个组件就该发挥作用了。
组件 4:Agent 循环(Agent Loop) #
你可能已经听过有人说过这样一句话:“Agent 就是一个在循环中使用工具的模型(Agents are models using tools in a loop)。” 如果没有这个“循环”,Agent 其实只能处理单轮请求,无法进行多轮交互。
我非常喜欢 Anthropic 的 Barry Zhan 给出的这段伪代码,它很好地说明了:Agent 本质上就是一个在循环中做决策的 LLM —— 观察结果,然后决定下一步该做什么。
env = Environment()
tools = Tools(env)
system_prompt = "Goals, constraints, and how to act"
while True:
action = llm.run(system_prompt + env.state)
env.state = tools.run(action)
对于这个简单的 Agent 实现来说,整体流程如下:
- 用户向 Agent 发送消息
- Agent 判断自己需要调用某个工具,并返回一个 stop_reason = tool_use 的响应,同时包含一个 tool_use 块,里面给出工具名称和参数。 这一步的含义是:“我先停在这里,请你用这些参数去执行这个工具。”
- 用户执行该工具,并在下一条消息中把工具的执行结果返回给 Agent
- Agent 继续执行,并给出最终回复
import json
def run_agent(user_input, max_turns=10):
calculator_tool = CalculatorTool()
agent = Agent(tools=[calculator_tool])
i = 0
while i < max_turns: # It's safer to use max_turns rather than while True
i += 1
print(f"\nIteration {i}:")
print(f"User input: {user_input}")
response = agent.chat(user_input)
print(f"Agent output: {response.content[0].text}")
# Handle tool use if present
if response.stop_reason == "tool_use":
# Process all tool uses in the response
tool_results = []
for content_block in response.content:
if content_block.type == "tool_use":
tool_name = content_block.name
tool_input = content_block.input
print(f"Using tool {tool_name} with input {tool_input}")
# Execute the tool
tool = agent.tool_map[tool_name]
tool_result = tool.execute(**tool_input)
tool_results.append({
"type": "tool_result",
"tool_use_id": content_block.id,
"content": json.dumps(tool_result)
})
print(f"Tool result: {tool_result}")
# Add tool results to conversation
user_input = tool_results
else:
return response.content[0].text
return
测试已实现的 AI Agent #
下面通过几个示例测试用例来验证这个 AI Agent 的行为。
测试 1:通用问题(不使用工具) #
这个测试展示了 Agent 在不需要调用任何外部工具的情况下,回答一个简单通用问题的能力。
response = run_agent("I have 4 apples. How many do you have?")
Iteration 1:
User input: I have 4 apples. How many do you have?
Agent output: I don't have any apples since I'm an AI assistant - I don't have a physical form or possessions. But I can help you with calculations involving your 4 apples if you need!
Is there something specific you'd like to calculate or figure out with your 4 apples?
测试2: 工具使用 #
这个测试展示了 Agent 如何理解为了完成一个特定任务,它需要使用工具,并且能够调用 CalculatorTool 来得到正确的计算结果。
response = run_agent("What is 157.09 * 493.89?")
Iteration 1:
User input: What is 157.09 * 493.89?
Agent output: I'll calculate 157.09 * 493.89 for you.
Using tool calculator with input {'expression': '157.09 * 493.89'}
Tool result: {'result': 77585.1801}
Iteration 2:
User input: [{'type': 'tool_result', 'tool_use_id': 'toolu_01FC9yLWt2Cf6a8zLGhj7ZJz', 'content': '{"result": 77585.1801}'}]
Agent output: The result of 157.09 * 493.89 is **77,585.1801**.
测试 3:分步骤使用工具 #
这个测试展示了 Agent 将一个更复杂的问题拆解为多个更小的步骤的能力,并且能够在同一次对话中多次使用 CalculatorTool,最终得到正确答案。
response = run_agent("If my brother is 32 years younger than my mother and my mother is 30 years older than me and I am 20, how old is my brother?")
Iteration 1:
User input: If my brother is 32 years younger than my mother and my mother is 30 years older than me and I am 20, how old is my brother?
Agent output: I'll solve this step by step using the given information.
Given:
- You are 20 years old
- Your mother is 30 years older than you
- Your brother is 32 years younger than your mother
Let me calculate your mother's age first:
Using tool calculator with input {'expression': '20 + 30'}
Tool result: {'result': 50}
Iteration 2:
User input: [{'type': 'tool_result', 'tool_use_id': 'toolu_01WPMQRzCi4roua9vQ7qXeCR', 'content': '{"result": 50}'}]
Agent output: So your mother is 50 years old.
Now I'll calculate your brother's age:
Using tool calculator with input {'expression': '50 - 32'}
Tool result: {'result': 18}
Iteration 3:
User input: [{'type': 'tool_result', 'tool_use_id': 'toolu_01UL7n7a85XJUn7Tgk8kiHhX', 'content': '{"result": 18}'}]
Agent output: Your brother is 18 years old.
To summarize:
- You: 20 years old
- Your mother: 50 years old (30 years older than you)
- Your brother: 18 years old (32 years younger than your mother)
总结 #
本教程向你展示了如何在不依赖任何框架的情况下,仅使用一个 LLM API,从零实现一个最小可用的 AI Agent。
希望通过这个过程,你已经理解了 AI Agent 在底层是如何工作的,也明白了人们所说的那句话:
“Agent 是在循环中使用工具的模型(Agents are models using tools in a loop)。”