云端部署
将LLM应用部署到云服务,让更多人访问。
部署方案对比
| 方案 | 优势 | 劣势 | 适用场景 |
|---|---|---|---|
| Hugging Face Spaces | 简单易用 | 配置有限 | 原型展示 |
| Streamlit Cloud | 一键部署 | 流量限制 | 简单应用 |
| AWS Lambda | 按需付费 | 冷启动 | API服务 |
| Google Cloud Run | 容器化 | 需要Docker | 微服务 |
| Azure AI | 企业级 | 学习曲线长 | 企业应用 |
Hugging Face Spaces
创建Space
# 1. 登录 https://huggingface.co/spaces
# 2. 创建新Space
# 3. 选择SDK: Streamlit
# 4. Space名称: my-llm-app
准备代码
创建 app.py:
import streamlit as st
from langchain_openai import ChatOpenAI
import os
st.title("🤖 LLM应用")
# 环境变量
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
st.error("请设置 OPENAI_API_KEY 环境变量")
st.stop()
# 初始化
@st.cache_resource
def get_llm():
return ChatOpenAI(api_key=api_key)
llm = get_llm()
# 用户输入
prompt = st.text_area("输入提示词:")
if st.button("生成"):
if prompt:
with st.spinner("生成中..."):
response = llm.invoke(prompt)
st.markdown(response.content)
创建requirements.txt
streamlit>=1.29.0
langchain>=0.1.0
langchain-openai>=0.0.5
配置Secrets
在Space的Settings → Secrets中添加:
- OPENAI_API_KEY: 你的API密钥
部署
# 克隆Space仓库
git clone https://huggingface.co/spaces/yourname/my-llm-app
cd my-llm-app
# 添加文件
git add .
git commit -m "Initial commit"
git push
# 等待自动部署
访问:https://huggingface.co/spaces/yourname/my-llm-app
Streamlit Cloud
准备代码
同上,使用Streamlit应用。
部署
# 1. 推送代码到GitHub
git remote add origin https://github.com/yourname/llm-app.git
git push -u origin main
# 2. 访问 https://share.streamlit.io
# 3. 连接GitHub仓库
# 4. 选择仓库和主文件
# 5. 点击Deploy
配置环境变量
在部署设置中添加环境变量:
- OPENAI_API_KEY
Docker部署
创建Dockerfile
FROM python:3.11-slim
# 设置工作目录
WORKDIR /app
# 安装依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 复制代码
COPY . .
# 暴露端口
EXPOSE 8501
# 启动命令
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
创建docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "8501:8501"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
volumes:
- ./data:/app/data
构建和运行
# 构建镜像
docker-compose build
# 运行容器
docker-compose up -d
# 访问 http://localhost:8501
AWS部署
API Gateway + Lambda
1. 创建Lambda函数
import json
from langchain_openai import ChatOpenAI
import os
def lambda_handler(event, context):
# 解析请求
body = json.loads(event['body'])
query = body.get('query', '')
# 初始化LLM
llm = ChatOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# 生成
response = llm.invoke(query)
# 返回
return {
'statusCode': 200,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*'
},
'body': json.dumps({
'answer': response.content
})
}
2. 部署Lambda
# 使用SAM或Serverless Framework
pip install awscli
pip install sam-cli
# 初始化
sam init
# 构建
sam build
# 部署
sam deploy
ECS (Elastic Container Service)
1. 创建任务定义
{
"family": "llm-app",
"containerDefinitions": [
{
"name": "llm-app",
"image": "your-docker-image",
"memory": 2048,
"cpu": 1024,
"essential": true,
"portMappings": [
{
"containerPort": 8501,
"protocol": "tcp"
}
],
"environment": [
{
"name": "OPENAI_API_KEY",
"value": "your-api-key"
}
]
}
]
}
2. 创建服务
aws ecs create-service \
--cluster my-cluster \
--service-name llm-app \
--task-definition llm-app \
--desired-count 1 \
--launch-type FARGATE
Google Cloud Run
1. 构建镜像
# 构建镜像
gcloud builds submit --tag gcr.io/PROJECT_ID/llm-app
# 或使用Docker
docker build -t gcr.io/PROJECT_ID/llm-app .
docker push gcr.io/PROJECT_ID/llm-app
2. 部署
gcloud run deploy llm-app \
--image gcr.io/PROJECT_ID/llm-app \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars OPENAI_API_KEY=your-api-key
3. 访问
# 获取URL
gcloud run services describe llm-app \
--platform managed \
--region us-central1 \
--format 'value(status.url)'
Azure部署
Azure App Service
# 创建资源组
az group create --name myResourceGroup --location eastus
# 创建App Service计划
az appservice plan create \
--name myPlan \
--resource-group myResourceGroup \
--sku B1 \
--is-linux
# 创建Web应用
az webapp create \
--name my-llm-app \
--resource-group myResourceGroup \
--plan myPlan \
--runtime "PYTHON|3.11"
# 配置环境变量
az webapp config appsettings set \
--resource-group myResourceGroup \
--name my-llm-app \
--settings OPENAI_API_KEY=your-api-key
# 部署
git remote add azure https://your-name@my-llm-app.scm.azurewebsites.net:443/my-llm-app.git
git push azure main
FastAPI + Docker
创建FastAPI应用
main.py:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
import os
app = FastAPI()
# 初始化LLM
llm = ChatOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
class QueryRequest(BaseModel):
query: str
@app.post("/generate")
async def generate(request: QueryRequest):
"""生成接口"""
try:
response = llm.invoke(request.query)
return {"answer": response.content}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health():
"""健康检查"""
return {"status": "healthy"}
创建Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
部署到任何平台
# 构建镜像
docker build -t llm-api .
# 运行容器
docker run -d -p 8000:8000 \
-e OPENAI_API_KEY=your-api-key \
llm-api
性能优化
1. 使用CDN
# 使用Cloudflare或AWS CloudFront
# 缓存静态文件
2. 负载均衡
# docker-compose.yml
version: '3.8'
services:
app:
image: llm-api
deploy:
replicas: 3 # 3个实例
ports:
- "8000:8000"
3. 自动扩展
# AWS Auto Scaling
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name llm-app-asg \
--launch-template launch-template-id \
--min-size 1 \
--max-size 10
监控和日志
1. 日志收集
import logging
from logging.handlers import RotatingFileHandler
# 配置日志
logging.basicConfig(
level=logging.INFO,
handlers=[
RotatingFileHandler('app.log', maxBytes=10*1024*1024, backupCount=5),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
@app.post("/generate")
async def generate(request: QueryRequest):
logger.info(f"收到请求: {request.query}")
# ...
2. 指标监控
from prometheus_client import Counter, Histogram, generate_latest
# 指标
request_counter = Counter('requests_total', 'Total requests')
request_duration = Histogram('request_duration_seconds', 'Request duration')
@app.middleware("http")
async def add_metrics(request, call_next):
request_counter.inc()
with request_duration.time():
response = await call_next(request)
return response
@app.get("/metrics")
async def metrics():
return generate_latest()
学习要点
✅ Hugging Face Spaces最简单,适合原型 ✅ Streamlit Cloud一键部署Streamlit应用 ✅ Docker容器化便于跨平台部署 ✅ AWS Lambda按需付费,适合API服务 ✅ Google Cloud Run完全托管,运维简单 ✅ 监控和日志对生产环境很重要
下一步: 学习 监控和维护 📊