随着 ChatGPT、Claude、文心一言、讯飞星火等大语言模型的爆发式增长,越来越多的 App 开始集成 AI 能力。无论是智能客服、内容生成、代码辅助还是个性化推荐,LLM 都能带来革命性的用户体验提升。
本文将深入探讨如何在移动应用中优雅地集成各主流 LLM 服务,涵盖架构设计、API 对接、流式响应处理、安全防护等核心话题。
1. 集成架构选型
架构对比
在 App 中集成 LLM,通常有三种架构模式:
┌─────────────────────────────────────────────────────────────────────────────┐
│ 架构模式对比 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 方案一:客户端直连 │
│ ┌──────────┐ ┌──────────────┐ │
│ │ App │ ──────→ │ LLM API │ │
│ └──────────┘ │ (OpenAI等) │ │
│ └──────────────┘ │
│ 优点:延迟低、实现简单 │
│ 缺点:API Key 暴露风险、无法统一管控 │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ 方案二:服务端代理(推荐) │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ App │ ──→ │ 自有服务端 │ ──→ │ LLM API │ │
│ └──────────┘ └──────────────┘ └──────────────┘ │
│ 优点:Key 安全、可做限流/审计/缓存、支持多模型切换 │
│ 缺点:需要维护服务端、增加一跳延迟 │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ 方案三:端侧模型 │
│ ┌──────────┐ ┌──────────────┐ │
│ │ App │ ──→ │ 本地 LLM │ │
│ └──────────┘ │ (CoreML等) │ │
│ └──────────────┘ │
│ 优点:无网络依赖、隐私保护、无 API 费用 │
│ 缺点:模型能力有限、占用设备资源 │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
推荐架构:服务端代理
对于生产环境,强烈推荐使用服务端代理架构:
┌─────────────────────────────────────────────────────────────────┐
│ 完整架构设计 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ │
│ │ iOS │──┐ │
│ │ App │ │ ┌────────────────────────────────┐ │
│ └─────────┘ │ │ API Gateway │ │
│ ├─────→│ • 身份认证 (JWT/OAuth) │ │
│ ┌─────────┐ │ │ • 限流控制 │ │
│ │ Android │──┤ │ • 请求日志 │ │
│ │ App │ │ └────────────┬───────────────────┘ │
│ └─────────┘ │ │ │
│ │ ▼ │
│ ┌─────────┐ │ ┌────────────────────────────────┐ │
│ │ Web │──┘ │ LLM Service Layer │ │
│ │ Client │ │ • Prompt 模板管理 │ │
│ └─────────┘ │ • 多模型路由 │ │
│ │ • 响应缓存 │ │
│ │ • 内容审核 │ │
│ └────────────┬───────────────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ OpenAI │ │ Claude │ │ 星火 │ │
│ │ API │ │ API │ │ API │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
2. 主流 LLM 服务商 API 对比
基本信息对比
| 服务商 | 模型 | 上下文窗口 | 定价(输入/输出) | 特点 |
|---|---|---|---|---|
| OpenAI | GPT-4o | 128K | $2.5/$10 /M tokens | 综合能力最强 |
| OpenAI | GPT-4o-mini | 128K | $0.15/$0.6 /M tokens | 性价比之选 |
| Anthropic | Claude 3.5 Sonnet | 200K | $3/$15 /M tokens | 长文本、代码能力强 |
| Anthropic | Claude 3 Haiku | 200K | $0.25/$1.25 /M tokens | 快速响应 |
| 讯飞 | 星火 4.0 Ultra | 128K | ¥0.14/¥0.14 /千tokens | 中文优化 |
| 百度 | 文心一言 4.0 | 128K | ¥0.12/¥0.12 /千tokens | 国产替代 |
| 阿里 | 通义千问 Max | 32K | ¥0.02/¥0.06 /千tokens | 价格优势 |
API 风格对比
各家 API 虽然功能类似,但请求格式和响应结构有所差异:
// OpenAI / Claude 风格(主流)
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"stream": true
}
// 讯飞星火风格
{
"header": {"app_id": "xxx"},
"parameter": {"chat": {"domain": "4.0Ultra", "temperature": 0.5}},
"payload": {"message": {"text": [{"role": "user", "content": "Hello!"}]}}
}
3. iOS 客户端集成实践
3.1 网络层封装
首先,定义统一的消息模型和服务协议:
// MARK: - 消息模型
enum MessageRole: String, Codable {
case system
case user
case assistant
}
struct ChatMessage: Codable, Identifiable {
let id: UUID
let role: MessageRole
var content: String
let timestamp: Date
init(role: MessageRole, content: String) {
self.id = UUID()
self.role = role
self.content = content
self.timestamp = Date()
}
}
// MARK: - LLM 服务协议
protocol LLMServiceProtocol {
/// 非流式请求
func sendMessage(_ messages: [ChatMessage]) async throws -> String
/// 流式请求
func streamMessage(_ messages: [ChatMessage]) -> AsyncThrowingStream<String, Error>
}
3.2 OpenAI API 集成
import Foundation
class OpenAIService: LLMServiceProtocol {
private let apiKey: String
private let baseURL: URL
private let model: String
private let session: URLSession
init(
apiKey: String,
baseURL: URL = URL(string: "https://api.openai.com/v1")!,
model: String = "gpt-4o"
) {
self.apiKey = apiKey
self.baseURL = baseURL
self.model = model
let config = URLSessionConfiguration.default
config.timeoutIntervalForRequest = 60
config.timeoutIntervalForResource = 300
self.session = URLSession(configuration: config)
}
// MARK: - 非流式请求
func sendMessage(_ messages: [ChatMessage]) async throws -> String {
let request = try buildRequest(messages: messages, stream: false)
let (data, response) = try await session.data(for: request)
guard let httpResponse = response as? HTTPURLResponse else {
throw LLMError.invalidResponse
}
guard httpResponse.statusCode == 200 else {
let errorInfo = try? JSONDecoder().decode(OpenAIErrorResponse.self, from: data)
throw LLMError.apiError(
code: httpResponse.statusCode,
message: errorInfo?.error.message ?? "Unknown error"
)
}
let result = try JSONDecoder().decode(OpenAIChatResponse.self, from: data)
return result.choices.first?.message.content ?? ""
}
// MARK: - 流式请求 (SSE)
func streamMessage(_ messages: [ChatMessage]) -> AsyncThrowingStream<String, Error> {
AsyncThrowingStream { continuation in
Task {
do {
let request = try buildRequest(messages: messages, stream: true)
let (bytes, response) = try await session.bytes(for: request)
guard let httpResponse = response as? HTTPURLResponse,
httpResponse.statusCode == 200 else {
throw LLMError.invalidResponse
}
// 解析 SSE 流
for try await line in bytes.lines {
// SSE 格式: "data: {...}"
guard line.hasPrefix("data: ") else { continue }
let jsonString = String(line.dropFirst(6))
// 结束标志
if jsonString == "[DONE]" {
break
}
// 解析 JSON
guard let data = jsonString.data(using: .utf8),
let chunk = try? JSONDecoder().decode(
OpenAIStreamChunk.self, from: data
),
let content = chunk.choices.first?.delta.content else {
continue
}
continuation.yield(content)
}
continuation.finish()
} catch {
continuation.finish(throwing: error)
}
}
}
}
// MARK: - 构建请求
private func buildRequest(messages: [ChatMessage], stream: Bool) throws -> URLRequest {
var request = URLRequest(url: baseURL.appendingPathComponent("chat/completions"))
request.httpMethod = "POST"
request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
let body = OpenAIChatRequest(
model: model,
messages: messages.map { OpenAIMessage(role: $0.role.rawValue, content: $0.content) },
stream: stream,
temperature: 0.7,
maxTokens: 4096
)
request.httpBody = try JSONEncoder().encode(body)
return request
}
}
// MARK: - OpenAI 数据模型
struct OpenAIChatRequest: Codable {
let model: String
let messages: [OpenAIMessage]
let stream: Bool
let temperature: Double
let maxTokens: Int
enum CodingKeys: String, CodingKey {
case model, messages, stream, temperature
case maxTokens = "max_tokens"
}
}
struct OpenAIMessage: Codable {
let role: String
let content: String
}
struct OpenAIChatResponse: Codable {
let choices: [Choice]
struct Choice: Codable {
let message: OpenAIMessage
}
}
struct OpenAIStreamChunk: Codable {
let choices: [StreamChoice]
struct StreamChoice: Codable {
let delta: Delta
}
struct Delta: Codable {
let content: String?
}
}
struct OpenAIErrorResponse: Codable {
let error: ErrorDetail
struct ErrorDetail: Codable {
let message: String
}
}
// MARK: - 错误定义
enum LLMError: LocalizedError {
case invalidResponse
case apiError(code: Int, message: String)
case networkError(Error)
var errorDescription: String? {
switch self {
case .invalidResponse:
return "Invalid response from server"
case .apiError(let code, let message):
return "API Error (\(code)): \(message)"
case .networkError(let error):
return "Network Error: \(error.localizedDescription)"
}
}
}
3.3 Claude API 集成
class ClaudeService: LLMServiceProtocol {
private let apiKey: String
private let baseURL: URL
private let model: String
private let session: URLSession
init(
apiKey: String,
baseURL: URL = URL(string: "https://api.anthropic.com")!,
model: String = "claude-3-5-sonnet-20241022"
) {
self.apiKey = apiKey
self.baseURL = baseURL
self.model = model
self.session = URLSession(configuration: .default)
}
func sendMessage(_ messages: [ChatMessage]) async throws -> String {
let request = try buildRequest(messages: messages, stream: false)
let (data, response) = try await session.data(for: request)
guard let httpResponse = response as? HTTPURLResponse,
httpResponse.statusCode == 200 else {
throw LLMError.invalidResponse
}
let result = try JSONDecoder().decode(ClaudeResponse.self, from: data)
return result.content.first?.text ?? ""
}
func streamMessage(_ messages: [ChatMessage]) -> AsyncThrowingStream<String, Error> {
AsyncThrowingStream { continuation in
Task {
do {
let request = try buildRequest(messages: messages, stream: true)
let (bytes, _) = try await session.bytes(for: request)
for try await line in bytes.lines {
// Claude SSE 格式略有不同
guard line.hasPrefix("data: ") else { continue }
let jsonString = String(line.dropFirst(6))
guard let data = jsonString.data(using: .utf8),
let event = try? JSONDecoder().decode(
ClaudeStreamEvent.self, from: data
) else { continue }
switch event.type {
case "content_block_delta":
if let text = event.delta?.text {
continuation.yield(text)
}
case "message_stop":
break
default:
continue
}
}
continuation.finish()
} catch {
continuation.finish(throwing: error)
}
}
}
}
private func buildRequest(messages: [ChatMessage], stream: Bool) throws -> URLRequest {
var request = URLRequest(url: baseURL.appendingPathComponent("v1/messages"))
request.httpMethod = "POST"
request.setValue(apiKey, forHTTPHeaderField: "x-api-key")
request.setValue("2023-06-01", forHTTPHeaderField: "anthropic-version")
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
// Claude 需要分离 system message
let systemMessage = messages.first { $0.role == .system }?.content
let chatMessages = messages.filter { $0.role != .system }
let body = ClaudeRequest(
model: model,
maxTokens: 4096,
system: systemMessage,
messages: chatMessages.map {
ClaudeMessage(role: $0.role.rawValue, content: $0.content)
},
stream: stream
)
request.httpBody = try JSONEncoder().encode(body)
return request
}
}
// MARK: - Claude 数据模型
struct ClaudeRequest: Codable {
let model: String
let maxTokens: Int
let system: String?
let messages: [ClaudeMessage]
let stream: Bool
enum CodingKeys: String, CodingKey {
case model, system, messages, stream
case maxTokens = "max_tokens"
}
}
struct ClaudeMessage: Codable {
let role: String
let content: String
}
struct ClaudeResponse: Codable {
let content: [ContentBlock]
struct ContentBlock: Codable {
let text: String
}
}
struct ClaudeStreamEvent: Codable {
let type: String
let delta: Delta?
struct Delta: Codable {
let text: String?
}
}
3.4 讯飞星火 API 集成(WebSocket)
讯飞星火使用 WebSocket 进行流式通信,需要特殊处理:
import Foundation
import CryptoKit
class SparkService: NSObject, LLMServiceProtocol {
private let appId: String
private let apiKey: String
private let apiSecret: String
private let model: SparkModel
private var webSocketTask: URLSessionWebSocketTask?
private var streamContinuation: AsyncThrowingStream<String, Error>.Continuation?
enum SparkModel: String {
case ultra = "4.0Ultra"
case max = "generalv3.5"
case pro = "generalv3"
case lite = "generalv2"
var domain: String { rawValue }
var wsURL: String {
switch self {
case .ultra: return "wss://spark-api.xf-yun.com/v4.0/chat"
case .max: return "wss://spark-api.xf-yun.com/v3.5/chat"
case .pro: return "wss://spark-api.xf-yun.com/v3.1/chat"
case .lite: return "wss://spark-api.xf-yun.com/v2.1/chat"
}
}
}
init(appId: String, apiKey: String, apiSecret: String, model: SparkModel = .ultra) {
self.appId = appId
self.apiKey = apiKey
self.apiSecret = apiSecret
self.model = model
super.init()
}
func sendMessage(_ messages: [ChatMessage]) async throws -> String {
var fullContent = ""
for try await chunk in streamMessage(messages) {
fullContent += chunk
}
return fullContent
}
func streamMessage(_ messages: [ChatMessage]) -> AsyncThrowingStream<String, Error> {
AsyncThrowingStream { [weak self] continuation in
guard let self = self else {
continuation.finish(throwing: LLMError.invalidResponse)
return
}
self.streamContinuation = continuation
Task {
do {
// 生成鉴权 URL
let authURL = try self.generateAuthURL()
// 创建 WebSocket 连接
let session = URLSession(
configuration: .default,
delegate: self,
delegateQueue: nil
)
self.webSocketTask = session.webSocketTask(with: authURL)
self.webSocketTask?.resume()
// 发送消息
let request = self.buildSparkRequest(messages: messages)
let data = try JSONEncoder().encode(request)
let message = URLSessionWebSocketTask.Message.data(data)
try await self.webSocketTask?.send(message)
// 开始接收响应
self.receiveMessage()
} catch {
continuation.finish(throwing: error)
}
}
}
}
private func receiveMessage() {
webSocketTask?.receive { [weak self] result in
guard let self = self else { return }
switch result {
case .success(let message):
switch message {
case .data(let data):
self.handleResponseData(data)
case .string(let text):
if let data = text.data(using: .utf8) {
self.handleResponseData(data)
}
@unknown default:
break
}
case .failure(let error):
self.streamContinuation?.finish(throwing: error)
}
}
}
private func handleResponseData(_ data: Data) {
guard let response = try? JSONDecoder().decode(SparkResponse.self, from: data) else {
return
}
// 提取文本内容
if let text = response.payload?.choices?.text?.first?.content {
streamContinuation?.yield(text)
}
// 检查是否结束
if response.header?.status == 2 {
streamContinuation?.finish()
webSocketTask?.cancel(with: .goingAway, reason: nil)
} else {
// 继续接收
receiveMessage()
}
}
// MARK: - 鉴权 URL 生成
private func generateAuthURL() throws -> URL {
guard var components = URLComponents(string: model.wsURL) else {
throw LLMError.invalidResponse
}
let dateFormatter = DateFormatter()
dateFormatter.dateFormat = "EEE, dd MMM yyyy HH:mm:ss z"
dateFormatter.locale = Locale(identifier: "en_US_POSIX")
dateFormatter.timeZone = TimeZone(abbreviation: "GMT")
let date = dateFormatter.string(from: Date())
// 构造签名原文
let signatureOrigin = """
host: \(components.host ?? "")
date: \(date)
GET \(components.path) HTTP/1.1
"""
// HMAC-SHA256 签名
let key = SymmetricKey(data: Data(apiSecret.utf8))
let signature = HMAC<SHA256>.authenticationCode(
for: Data(signatureOrigin.utf8),
using: key
)
let signatureBase64 = Data(signature).base64EncodedString()
// 构造 authorization
let authorizationOrigin = """
api_key="\(apiKey)", algorithm="hmac-sha256", headers="host date request-line", signature="\(signatureBase64)"
"""
let authorization = Data(authorizationOrigin.utf8).base64EncodedString()
// 添加查询参数
components.queryItems = [
URLQueryItem(name: "authorization", value: authorization),
URLQueryItem(name: "date", value: date),
URLQueryItem(name: "host", value: components.host)
]
guard let url = components.url else {
throw LLMError.invalidResponse
}
return url
}
private func buildSparkRequest(messages: [ChatMessage]) -> SparkRequest {
SparkRequest(
header: .init(appId: appId),
parameter: .init(chat: .init(domain: model.domain, temperature: 0.5)),
payload: .init(message: .init(
text: messages.map { .init(role: $0.role.rawValue, content: $0.content) }
))
)
}
}
extension SparkService: URLSessionWebSocketDelegate {
func urlSession(
_ session: URLSession,
webSocketTask: URLSessionWebSocketTask,
didOpenWithProtocol protocol: String?
) {
print("WebSocket connected")
}
func urlSession(
_ session: URLSession,
webSocketTask: URLSessionWebSocketTask,
didCloseWith closeCode: URLSessionWebSocketTask.CloseCode,
reason: Data?
) {
print("WebSocket closed: \(closeCode)")
}
}
// MARK: - 讯飞数据模型
struct SparkRequest: Codable {
let header: Header
let parameter: Parameter
let payload: Payload
struct Header: Codable {
let appId: String
enum CodingKeys: String, CodingKey {
case appId = "app_id"
}
}
struct Parameter: Codable {
let chat: Chat
struct Chat: Codable {
let domain: String
let temperature: Double
}
}
struct Payload: Codable {
let message: Message
struct Message: Codable {
let text: [TextItem]
}
}
struct TextItem: Codable {
let role: String
let content: String
}
}
struct SparkResponse: Codable {
let header: Header?
let payload: Payload?
struct Header: Codable {
let code: Int?
let status: Int?
}
struct Payload: Codable {
let choices: Choices?
struct Choices: Codable {
let text: [TextItem]?
}
}
struct TextItem: Codable {
let content: String?
}
}
4. 统一服务管理层
多模型路由
class LLMManager {
enum Provider: String, CaseIterable {
case openAI = "OpenAI"
case claude = "Claude"
case spark = "讯飞星火"
}
static let shared = LLMManager()
private var services: [Provider: LLMServiceProtocol] = [:]
private var currentProvider: Provider = .openAI
private init() {}
// MARK: - 配置
func configure(provider: Provider, service: LLMServiceProtocol) {
services[provider] = service
}
func setCurrentProvider(_ provider: Provider) {
guard services[provider] != nil else {
fatalError("Provider \(provider) not configured")
}
currentProvider = provider
}
// MARK: - 快捷配置
func configureOpenAI(apiKey: String, baseURL: URL? = nil) {
let service = OpenAIService(
apiKey: apiKey,
baseURL: baseURL ?? URL(string: "https://api.openai.com/v1")!
)
configure(provider: .openAI, service: service)
}
func configureClaude(apiKey: String) {
let service = ClaudeService(apiKey: apiKey)
configure(provider: .claude, service: service)
}
func configureSpark(appId: String, apiKey: String, apiSecret: String) {
let service = SparkService(appId: appId, apiKey: apiKey, apiSecret: apiSecret)
configure(provider: .spark, service: service)
}
// MARK: - 对外接口
var currentService: LLMServiceProtocol {
guard let service = services[currentProvider] else {
fatalError("No service configured for provider \(currentProvider)")
}
return service
}
func sendMessage(_ messages: [ChatMessage]) async throws -> String {
try await currentService.sendMessage(messages)
}
func streamMessage(_ messages: [ChatMessage]) -> AsyncThrowingStream<String, Error> {
currentService.streamMessage(messages)
}
// MARK: - 智能路由(根据任务类型选择最佳模型)
func sendWithBestModel(
_ messages: [ChatMessage],
taskType: TaskType
) async throws -> String {
let provider = selectBestProvider(for: taskType)
guard let service = services[provider] else {
return try await sendMessage(messages)
}
return try await service.sendMessage(messages)
}
enum TaskType {
case codeGeneration
case creativeWriting
case translation
case summarization
case conversation
}
private func selectBestProvider(for taskType: TaskType) -> Provider {
switch taskType {
case .codeGeneration:
// Claude 代码能力强
return services[.claude] != nil ? .claude : currentProvider
case .creativeWriting:
return services[.openAI] != nil ? .openAI : currentProvider
case .translation, .summarization:
// 国产模型中文好
return services[.spark] != nil ? .spark : currentProvider
case .conversation:
return currentProvider
}
}
}
5. 聊天界面实现
ViewModel 设计
import SwiftUI
import Combine
@MainActor
class ChatViewModel: ObservableObject {
@Published var messages: [ChatMessage] = []
@Published var inputText: String = ""
@Published var isLoading: Bool = false
@Published var error: Error?
@Published var streamingContent: String = ""
private let llmManager = LLMManager.shared
private var streamTask: Task<Void, Never>?
// 系统提示词
var systemPrompt: String = """
你是一个友好、专业的AI助手。请用简洁清晰的语言回答用户的问题。
如果不确定答案,请诚实地告知用户。
"""
// MARK: - 发送消息(流式)
func sendMessage() {
let userMessage = inputText.trimmingCharacters(in: .whitespacesAndNewlines)
guard !userMessage.isEmpty, !isLoading else { return }
// 添加用户消息
messages.append(ChatMessage(role: .user, content: userMessage))
inputText = ""
// 创建助手消息占位
let assistantMessage = ChatMessage(role: .assistant, content: "")
messages.append(assistantMessage)
isLoading = true
streamingContent = ""
// 启动流式请求
streamTask = Task {
do {
// 构建完整消息列表(包含系统提示)
var fullMessages = [ChatMessage(role: .system, content: systemPrompt)]
fullMessages.append(contentsOf: messages.dropLast()) // 不包含空的助手消息
let stream = llmManager.streamMessage(fullMessages)
for try await chunk in stream {
streamingContent += chunk
// 更新最后一条消息
if let lastIndex = messages.indices.last {
messages[lastIndex].content = streamingContent
}
}
isLoading = false
} catch {
self.error = error
isLoading = false
// 移除失败的助手消息
if messages.last?.role == .assistant && messages.last?.content.isEmpty == true {
messages.removeLast()
}
}
}
}
// MARK: - 取消请求
func cancelStream() {
streamTask?.cancel()
streamTask = nil
isLoading = false
}
// MARK: - 清空对话
func clearMessages() {
messages.removeAll()
streamingContent = ""
}
// MARK: - 重新生成
func regenerateLastResponse() {
guard messages.count >= 2,
messages.last?.role == .assistant else { return }
messages.removeLast()
// 重新发送最后一条用户消息
if let lastUserMessage = messages.last, lastUserMessage.role == .user {
inputText = lastUserMessage.content
messages.removeLast()
sendMessage()
}
}
}
SwiftUI 聊天界面
import SwiftUI
struct ChatView: View {
@StateObject private var viewModel = ChatViewModel()
@FocusState private var isInputFocused: Bool
var body: some View {
VStack(spacing: 0) {
// 消息列表
ScrollViewReader { proxy in
ScrollView {
LazyVStack(spacing: 16) {
ForEach(viewModel.messages) { message in
MessageBubble(message: message)
.id(message.id)
}
}
.padding()
}
.onChange(of: viewModel.messages.count) { _ in
withAnimation {
proxy.scrollTo(viewModel.messages.last?.id, anchor: .bottom)
}
}
.onChange(of: viewModel.streamingContent) { _ in
withAnimation {
proxy.scrollTo(viewModel.messages.last?.id, anchor: .bottom)
}
}
}
Divider()
// 输入区域
InputBar(
text: $viewModel.inputText,
isLoading: viewModel.isLoading,
onSend: viewModel.sendMessage,
onCancel: viewModel.cancelStream
)
.focused($isInputFocused)
}
.navigationTitle("AI 助手")
.toolbar {
ToolbarItem(placement: .navigationBarTrailing) {
Menu {
ForEach(LLMManager.Provider.allCases, id: \.self) { provider in
Button(provider.rawValue) {
LLMManager.shared.setCurrentProvider(provider)
}
}
} label: {
Image(systemName: "ellipsis.circle")
}
}
}
.alert("Error", isPresented: .constant(viewModel.error != nil)) {
Button("OK") { viewModel.error = nil }
} message: {
Text(viewModel.error?.localizedDescription ?? "")
}
}
}
// MARK: - 消息气泡
struct MessageBubble: View {
let message: ChatMessage
var body: some View {
HStack(alignment: .top, spacing: 12) {
if message.role == .user {
Spacer(minLength: 60)
}
// 头像
if message.role == .assistant {
Image(systemName: "brain.head.profile")
.font(.title2)
.foregroundColor(.purple)
.frame(width: 36, height: 36)
.background(Color.purple.opacity(0.1))
.clipShape(Circle())
}
// 消息内容
VStack(alignment: message.role == .user ? .trailing : .leading, spacing: 4) {
Text(message.content)
.padding(.horizontal, 14)
.padding(.vertical, 10)
.background(
message.role == .user
? Color.blue
: Color(.systemGray5)
)
.foregroundColor(message.role == .user ? .white : .primary)
.cornerRadius(18)
.textSelection(.enabled)
Text(message.timestamp, style: .time)
.font(.caption2)
.foregroundColor(.secondary)
}
// 用户头像
if message.role == .user {
Image(systemName: "person.fill")
.font(.title3)
.foregroundColor(.blue)
.frame(width: 36, height: 36)
.background(Color.blue.opacity(0.1))
.clipShape(Circle())
}
if message.role == .assistant {
Spacer(minLength: 60)
}
}
}
}
// MARK: - 输入栏
struct InputBar: View {
@Binding var text: String
let isLoading: Bool
let onSend: () -> Void
let onCancel: () -> Void
var body: some View {
HStack(spacing: 12) {
TextField("输入消息...", text: $text, axis: .vertical)
.textFieldStyle(.plain)
.padding(.horizontal, 16)
.padding(.vertical, 10)
.background(Color(.systemGray6))
.cornerRadius(20)
.lineLimit(1...5)
.disabled(isLoading)
Button {
if isLoading {
onCancel()
} else {
onSend()
}
} label: {
Image(systemName: isLoading ? "stop.fill" : "arrow.up.circle.fill")
.font(.system(size: 32))
.foregroundColor(
text.isEmpty && !isLoading ? .gray : .blue
)
}
.disabled(text.isEmpty && !isLoading)
}
.padding(.horizontal)
.padding(.vertical, 8)
.background(Color(.systemBackground))
}
}
6. 安全性考虑
API Key 保护策略
┌───────────────────────────────────────────────────────────────────────────┐
│ API Key 安全等级对比 │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ ❌ 最差:硬编码在客户端 │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ let apiKey = "sk-xxxxxxxxxxxx" // 极易被逆向提取 │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ⚠️ 较差:存储在 Info.plist 或 xcconfig │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ // 仍然可以从 ipa 包中提取 │ │
│ │ let apiKey = Bundle.main.infoDictionary?["API_KEY"] │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ⚠️ 中等:混淆存储 │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ // 增加逆向难度,但理论上仍可破解 │ │
│ │ let apiKey = ObfuscatedKeys.openAIKey.decrypt() │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ✅ 推荐:服务端代理 │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ // API Key 只存储在服务端,客户端使用用户 token 认证 │ │
│ │ App → 自有服务端 (用户认证) → LLM API (API Key) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ✅ 最佳:服务端代理 + 多层防护 │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ • 用户认证 (JWT) │ │
│ │ • 请求签名验证 │ │
│ │ • IP 白名单 │ │
│ │ • 限流控制 (按用户/按 IP) │ │
│ │ • 请求审计日志 │ │
│ │ • 敏感词过滤 │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────────────┘
服务端安全实现(Node.js 示例)
// server.js - Express 服务端代理示例
const express = require('express');
const rateLimit = require('express-rate-limit');
const { OpenAI } = require('openai');
const jwt = require('jsonwebtoken');
const app = express();
app.use(express.json());
// 初始化 OpenAI 客户端(API Key 只在服务端)
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
});
// 限流中间件
const limiter = rateLimit({
windowMs: 60 * 1000, // 1分钟
max: 20, // 最多20次请求
message: { error: 'Too many requests, please try again later.' }
});
// JWT 认证中间件
const authenticateToken = (req, res, next) => {
const authHeader = req.headers['authorization'];
const token = authHeader && authHeader.split(' ')[1];
if (!token) {
return res.status(401).json({ error: 'No token provided' });
}
jwt.verify(token, process.env.JWT_SECRET, (err, user) => {
if (err) {
return res.status(403).json({ error: 'Invalid token' });
}
req.user = user;
next();
});
};
// 内容审核
const contentFilter = (messages) => {
const sensitiveWords = ['敏感词1', '敏感词2']; // 实际应使用专业审核服务
for (const msg of messages) {
for (const word of sensitiveWords) {
if (msg.content.includes(word)) {
throw new Error('Content contains sensitive words');
}
}
}
};
// 聊天接口
app.post('/api/chat', authenticateToken, limiter, async (req, res) => {
try {
const { messages, stream = false } = req.body;
// 内容审核
contentFilter(messages);
// 记录请求日志
console.log(`User ${req.user.id} sent message:`, messages.slice(-1)[0]?.content);
if (stream) {
// 流式响应
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true
});
for await (const chunk of completion) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
res.write(`data: ${JSON.stringify({ content })}\n\n`);
}
}
res.write('data: [DONE]\n\n');
res.end();
} else {
// 非流式响应
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages
});
res.json({
content: completion.choices[0]?.message?.content || ''
});
}
} catch (error) {
console.error('Chat error:', error);
res.status(500).json({ error: error.message });
}
});
// 用量统计接口
app.get('/api/usage', authenticateToken, async (req, res) => {
// 返回用户的 API 使用统计
res.json({
userId: req.user.id,
requestsToday: 42,
tokensUsed: 15000,
limit: 100000
});
});
app.listen(3000, () => {
console.log('LLM Proxy Server running on port 3000');
});
7. 性能优化技巧
7.1 响应缓存
class LLMCache {
static let shared = LLMCache()
private let cache = NSCache<NSString, CachedResponse>()
private let fileManager = FileManager.default
private let cacheDirectory: URL
private init() {
cache.countLimit = 100
cache.totalCostLimit = 10 * 1024 * 1024 // 10MB
cacheDirectory = fileManager.urls(for: .cachesDirectory, in: .userDomainMask)[0]
.appendingPathComponent("LLMCache")
try? fileManager.createDirectory(at: cacheDirectory, withIntermediateDirectories: true)
}
// 生成缓存 Key
func cacheKey(for messages: [ChatMessage]) -> String {
let content = messages.map { "\($0.role.rawValue):\($0.content)" }.joined(separator: "|")
return content.sha256()
}
// 获取缓存
func get(for messages: [ChatMessage]) -> String? {
let key = cacheKey(for: messages)
// 先查内存缓存
if let cached = cache.object(forKey: key as NSString) {
if Date() < cached.expiresAt {
return cached.content
} else {
cache.removeObject(forKey: key as NSString)
}
}
// 再查磁盘缓存
let fileURL = cacheDirectory.appendingPathComponent(key)
if let data = try? Data(contentsOf: fileURL),
let cached = try? JSONDecoder().decode(CachedResponse.self, from: data),
Date() < cached.expiresAt {
// 加载到内存
cache.setObject(cached, forKey: key as NSString)
return cached.content
}
return nil
}
// 存储缓存
func set(_ content: String, for messages: [ChatMessage], ttl: TimeInterval = 3600) {
let key = cacheKey(for: messages)
let cached = CachedResponse(
content: content,
expiresAt: Date().addingTimeInterval(ttl)
)
// 存入内存
cache.setObject(cached, forKey: key as NSString)
// 存入磁盘
let fileURL = cacheDirectory.appendingPathComponent(key)
if let data = try? JSONEncoder().encode(cached) {
try? data.write(to: fileURL)
}
}
}
class CachedResponse: NSObject, Codable {
let content: String
let expiresAt: Date
init(content: String, expiresAt: Date) {
self.content = content
self.expiresAt = expiresAt
}
}
extension String {
func sha256() -> String {
let data = Data(self.utf8)
var hash = [UInt8](repeating: 0, count: Int(CC_SHA256_DIGEST_LENGTH))
data.withUnsafeBytes {
_ = CC_SHA256($0.baseAddress, CC_LONG(data.count), &hash)
}
return hash.map { String(format: "%02x", $0) }.joined()
}
}
7.2 请求合并与防抖
class RequestDebouncer {
private var pendingTask: Task<String, Error>?
private var lastMessages: [ChatMessage]?
private let debounceInterval: TimeInterval
init(debounceInterval: TimeInterval = 0.5) {
self.debounceInterval = debounceInterval
}
func debounce(
messages: [ChatMessage],
action: @escaping ([ChatMessage]) async throws -> String
) async throws -> String {
// 取消之前的请求
pendingTask?.cancel()
lastMessages = messages
// 延迟执行
try await Task.sleep(nanoseconds: UInt64(debounceInterval * 1_000_000_000))
// 检查是否被新请求取代
guard !Task.isCancelled, messages == lastMessages else {
throw CancellationError()
}
// 执行实际请求
return try await action(messages)
}
}
7.3 Token 计算与控制
struct TokenCounter {
// 简化的 token 估算(实际应使用 tiktoken)
static func estimateTokens(_ text: String) -> Int {
// 英文约 4 字符 = 1 token
// 中文约 1-2 字符 = 1 token
let englishChars = text.filter { $0.isASCII }.count
let chineseChars = text.filter { !$0.isASCII }.count
return (englishChars / 4) + chineseChars
}
static func estimateTokens(_ messages: [ChatMessage]) -> Int {
messages.reduce(0) { $0 + estimateTokens($1.content) + 4 } // 每条消息额外开销
}
// 截断消息以适应 context window
static func truncateMessages(
_ messages: [ChatMessage],
maxTokens: Int,
preserveSystem: Bool = true
) -> [ChatMessage] {
var result: [ChatMessage] = []
var tokenCount = 0
// 保留系统消息
if preserveSystem, let systemMessage = messages.first(where: { $0.role == .system }) {
result.append(systemMessage)
tokenCount += estimateTokens(systemMessage.content)
}
// 从后往前添加消息(保留最新的对话)
let userAssistantMessages = messages.filter { $0.role != .system }.reversed()
for message in userAssistantMessages {
let messageTokens = estimateTokens(message.content)
if tokenCount + messageTokens > maxTokens {
break
}
result.insert(message, at: preserveSystem ? 1 : 0)
tokenCount += messageTokens
}
return result
}
}
8. 端侧模型集成
对于离线场景或隐私敏感场景,可以考虑端侧模型:
使用 Core ML 运行本地模型
import CoreML
import NaturalLanguage
class LocalLLMService: LLMServiceProtocol {
private var model: MLModel?
private let modelURL: URL
init(modelName: String) {
// 假设模型已下载到 Documents 目录
modelURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
.appendingPathComponent("\(modelName).mlmodelc")
}
func loadModel() async throws {
let config = MLModelConfiguration()
config.computeUnits = .cpuAndGPU // 使用 GPU 加速
model = try await MLModel.load(contentsOf: modelURL, configuration: config)
}
func sendMessage(_ messages: [ChatMessage]) async throws -> String {
guard let model = model else {
throw LLMError.invalidResponse
}
// 构建输入
let prompt = messages.map { "\($0.role.rawValue): \($0.content)" }.joined(separator: "\n")
// 这里的实现取决于具体模型的输入格式
// 以下为示意代码
let input = try MLDictionaryFeatureProvider(dictionary: [
"input_text": MLFeatureValue(string: prompt)
])
let output = try await model.prediction(from: input)
guard let result = output.featureValue(for: "output_text")?.stringValue else {
throw LLMError.invalidResponse
}
return result
}
func streamMessage(_ messages: [ChatMessage]) -> AsyncThrowingStream<String, Error> {
// 端侧模型通常不支持真正的流式输出
// 可以模拟逐字输出效果
AsyncThrowingStream { continuation in
Task {
do {
let result = try await sendMessage(messages)
// 模拟打字效果
for char in result {
continuation.yield(String(char))
try await Task.sleep(nanoseconds: 20_000_000) // 20ms
}
continuation.finish()
} catch {
continuation.finish(throwing: error)
}
}
}
}
}
端侧模型对比
| 方案 | 模型大小 | 推理速度 | 能力 | 适用场景 |
|---|---|---|---|---|
| Apple Intelligence | 系统集成 | 快 | 中等 | iOS 18.1+ 设备 |
| llama.cpp | 1-7GB | 中等 | 较强 | 通用对话 |
| Whisper | 75MB-1.5GB | 快 | 语音识别 | 语音转文字 |
| MLC-LLM | 2-4GB | 较快 | 中等 | 轻量对话 |
9. 最佳实践总结
架构选择
| 场景 | 推荐方案 |
|---|---|
| 商业 App | 服务端代理(必须) |
| 个人项目/Demo | 可直连,做好 Key 轮换 |
| 离线应用 | 端侧模型 |
| 高并发场景 | 服务端代理 + 负载均衡 + 缓存 |
用户体验
| 方面 | 建议 |
|---|---|
| 响应速度 | 优先使用流式响应,给用户即时反馈 |
| 加载状态 | 显示打字动画,而非空白等待 |
| 错误处理 | 友好提示 + 重试按钮 |
| 历史记录 | 本地持久化,支持多会话管理 |
| 上下文管理 | 实现 token 计算,自动截断历史消息 |
成本控制
| 策略 | 说明 |
|---|---|
| 模型选择 | 简单任务用小模型(GPT-4o-mini/Haiku) |
| 缓存 | 相同问题直接返回缓存 |
| 限流 | 按用户/按时间段限制请求数 |
| Prompt 优化 | 精简 system prompt,减少 token 消耗 |
| 监控告警 | 设置费用阈值告警 |
10. 完整示例项目结构
AIChat/
├── App/
│ ├── AIChatApp.swift
│ └── AppDelegate.swift
├── Core/
│ ├── LLM/
│ │ ├── LLMServiceProtocol.swift
│ │ ├── LLMManager.swift
│ │ ├── Services/
│ │ │ ├── OpenAIService.swift
│ │ │ ├── ClaudeService.swift
│ │ │ └── SparkService.swift
│ │ ├── Models/
│ │ │ ├── ChatMessage.swift
│ │ │ ├── OpenAIModels.swift
│ │ │ ├── ClaudeModels.swift
│ │ │ └── SparkModels.swift
│ │ └── Cache/
│ │ └── LLMCache.swift
│ ├── Network/
│ │ ├── APIClient.swift
│ │ └── SSEParser.swift
│ └── Security/
│ ├── KeychainManager.swift
│ └── TokenManager.swift
├── Features/
│ ├── Chat/
│ │ ├── ChatView.swift
│ │ ├── ChatViewModel.swift
│ │ ├── MessageBubble.swift
│ │ └── InputBar.swift
│ ├── Settings/
│ │ ├── SettingsView.swift
│ │ └── ModelPickerView.swift
│ └── History/
│ ├── HistoryView.swift
│ └── ConversationListView.swift
├── Resources/
│ └── Prompts/
│ └── SystemPrompts.json
└── Tests/
├── LLMServiceTests.swift
└── TokenCounterTests.swift
929

被折叠的 条评论
为什么被折叠?



