目录
- 理论框架与数学基础
- 红包分配算法分类与建模
- 核心算法数学证明
- 算法实现与优化
- 性能分析与复杂度理论
- 公平性度量与统计检验
- 高级优化技术
- 实际应用与工程实现
1. 理论框架与数学基础
1.1 问题形式化定义
红包分配问题可以严格定义为:
定义 1.1(红包分配问题): 给定总金额 M>0M > 0M>0 和参与人数 n∈N+n \in \mathbb{N}^+n∈N+,分配函数 f:{1,2,...,n}→R+f: \{1, 2, ..., n\} \rightarrow \mathbb{R}^+f:{1,2,...,n}→R+ 需要满足:
- 总额约束: ∑i=1nf(i)=M\sum_{i=1}^n f(i) = M∑i=1nf(i)=M
- 非负约束: ∀i,f(i)≥ϵ\forall i, f(i) \geq \epsilon∀i,f(i)≥ϵ(其中 ϵ>0\epsilon > 0ϵ>0 为最小分配额)
- 概率约束: fff 的生成过程具有随机性
1.2 随机分布理论基础
1.2.1 概率空间定义
设 Ω\OmegaΩ 为样本空间,F\mathcal{F}F 为事件域,PPP 为概率测度:
(Ω, ℱ, P) where:
Ω = {(x₁, x₂, ..., xₙ) | xᵢ ≥ 0, Σxᵢ = M}
ℱ = Ω的所有子集构成的σ-代数
P = 特定的分配概率分布
1.2.2 随机变量性质
定义随机变量 XiX_iXi 表示第 iii 个人获得的金额,满足:
E[Xᵢ] = M/n (公平性期望)
Var(Xᵢ) ≥ 0 (非负方差)
Cov(Xᵢ, Xⱼ) = -Cov(Xᵢ, Xᵢ) (负相关约束)
1.3 公平性度量理论
1.3.1 基尼系数定义
对于分配结果 x=(x1,x2,...,xn)\mathbf{x} = (x_1, x_2, ..., x_n)x=(x1,x2,...,xn):
G = (2/(n(n-1))) × Σᵢ₌₁ⁿ [i × x_(i) - (n+1)/2 × x̄]
其中 x_(i) 是排序后的分配结果,x̄ = M/n 为平均分配
1.3.2 信息熵度量
定义分配的信息熵:
H(X) = -Σᵢ₌₁ⁿ P(X = xᵢ) × log₂ P(X = xᵢ)
2. 红包分配算法分类与建模
2.1 算法分类体系
基于随机数生成方法和约束处理方式,红包分配算法可以分为以下几类:
2.1.1 线性约束优化类
算法族 A: 求解 min E[ Σᵢ (Xᵢ - M/n)² ]
约束条件: ΣXᵢ = M, Xᵢ ≥ ε
2.1.2 随机过程类
算法族 B: 基于马尔可夫链的分配过程
X⁽ᵏ⁾ = T(X⁽ᵏ⁻¹¹⁾) where T为转移算子
2.1.3 启发式算法类
算法族 C: 基于生物启发的分配机制
如遗传算法、粒子群优化等
2.2 问题建模
2.2.1 数学规划模型
目标函数: max f(X₁, X₂, ..., Xₙ) = α·公平性 + β·随机性 - γ·方差
约束条件:
g₁(X) = Σᵢ₌₁ⁿ Xᵢ - M = 0
g₂(X) = Xᵢ - ε ≥ 0, ∀i ∈ [1,n]
h(X) = Xᵢ ≤ M - (n-1)·ε, ∀i ∈ [1,n]
2.2.2 动力学系统建模
定义分配过程的离散动力学方程:
X(k+1) = X(k) + η(k)·∇J(X(k))
其中:
η(k) 为第k步的学习率
J(X) 为目标函数
∇J(X) 为梯度向量
3. 核心算法数学证明
3.1 简单随机分配算法
3.1.1 算法描述
简单随机分配算法基于均匀分布的随机数生成。
3.1.2 数学证明
定理 3.1.1: 在简单随机分配算法中,第i个分配的概率密度函数为:
f_Xᵢ(x) = (n-1)/M × (1 - x/M)ⁿ⁻² for 0 ≤ x ≤ M
证明:
设 Yj=Xj/MY_j = X_j/MYj=Xj/M,归一化的分配额,满足 ∑j=1nYj=1\sum_{j=1}^n Y_j = 1∑j=1nYj=1,Yj≥0Y_j \geq 0Yj≥0。
使用Dirichlet分布的性质,第k个分量的边际分布为Beta分布:
Y_k ~ Beta(1, n-1)
因此:
f_Y_k(y) = (n-1)·(1-y)ⁿ⁻²
f_X_k(x) = f_Y_k(x/M)·(1/M) = (n-1)/M·(1-x/M)ⁿ⁻²
证毕。
3.1.3 期望和方差计算
E[X_k] = ∫₀ᴹ x·f_X_k(x)dx = M/n
Var(X_k) = E[X_k²] - (E[X_k])² = M²(n-1)/(n²(n+1))
3.2 二倍均值法
3.2.1 算法原理
二倍均值法通过动态调整随机范围来控制分配结果的方差。
3.2.2 收敛性证明
定理 3.2.1: 二倍均值法产生的分配序列 {Xi}\{X_i\}{Xi} 满足:
lim_{n→∞} Var(X_i) = 0
证明:
设在第k步有剩余金额 MkM_kMk 和剩余人数 nkn_knk,则:
E[X_k | M_k, n_k] = M_k/n_k
Var(X_k | M_k, n_k) = M_k²/n_k²(n_k+1)
由于 MkM_kMk 和 nkn_knk 随迭代减少,可得:
Var(X_k) ≤ M²/k²(k+1) → 0 as k → ∞
证毕。
3.2.3 最优性分析
定理 3.2.2: 在满足总额约束和最小值约束的条件下,二倍均值法最小化:
J(X) = E[Σᵢ (Xᵢ - M/n)²]
证明:
使用拉格朗日乘数法,设拉格朗日函数:
L(X, λ) = Σᵢ (Xᵢ - M/n)² + λ(Σᵢ Xᵢ - M)
对X_k求偏导:
∂L/∂X_k = 2(X_k - M/n) + λ = 0
⇒ X_k = M/n - λ/2
约束条件 ΣX_k = M 给出:
n·(M/n - λ/2) = M
⇒ λ = 0
⇒ X_k = M/n
但由于随机性和约束,该最优解无法直接获得,二倍均值法提供了次优解。
证毕。
3.3 线性同余法
3.3.1 算法描述
线性同余法(LCG)使用线性同余方程生成伪随机数:
X₀ = seed
Xₖ₊₁ = (a·Xₖ + c) mod m
3.3.2 周期分析
定理 3.3.1: 当满足以下条件时,LCG达到最大周期m:
- ccc 和 mmm 互质
- a−1a-1a−1 可被 mmm 的所有质因数整除
- 若 mmm 可被4整除,则 a−1a-1a−1 也可被4整除
证明:
使用群论中的循环群性质。若生成元 ggg 的阶为 ϕ(m)\phi(m)ϕ(m),则序列达到最大周期。
证毕。
3.3.3 分布均匀性证明
定理 3.3.2: 对于满足定理3.3.1条件的LCG,生成的序列 {Xk}\{X_k\}{Xk} 在模 mmm 意义下均匀分布:
lim_{N→∞} (1/N)·Σₖ₌₀ᴺ⁻¹ f(X_k) = (1/m)·Σₓ₌₀ᵐ⁻¹ f(x)
对所有连续函数 fff 成立。
证明:
使用Weyl均匀分布判别法。由于LCG的递推关系是线性的,其周期性保证了在完整周期内每个值出现相同的次数。
证毕。
4. 算法实现与优化
4.1 简单随机分配算法实现
4.1.1 基础实现
import random
import math
from typing import List, Tuple
import numpy as np
from scipy import stats
class SimpleRandomDistribution:
"""
简单随机分配算法实现
基于均匀分布的随机数生成,满足基本约束条件
"""
def __init__(self, min_amount: float = 0.01, random_seed: int = None):
"""
初始化分配器
Args:
min_amount: 最小分配金额
random_seed: 随机种子,用于重现性
"""
self.min_amount = min_amount
if random_seed is not None:
random.seed(random_seed)
np.random.seed(random_seed)
def allocate(self, total_amount: float, num_people: int) -> List[float]:
"""
执行简单随机分配
Args:
total_amount: 总金额
num_people: 参与人数
Returns:
List[float]: 每个人获得的金额列表
Raises:
ValueError: 当参数无效时抛出
"""
if total_amount <= 0 or num_people <= 0:
raise ValueError("总金额和参与人数必须为正数")
if total_amount < self.min_amount * num_people:
raise ValueError(f"总金额不足以满足最小分配: 至少需要 {self.min_amount * num_people}")
# 计算剩余金额
remaining_amount = total_amount
allocations = []
# 为前n-1个人随机分配
for i in range(num_people - 1):
# 计算最小和最大可能的分配金额
min_allocation = self.min_amount
max_allocation = (remaining_amount - self.min_amount * (num_people - i - 1))
if min_allocation > max_allocation:
# 如果无法满足约束,将剩余金额分配给当前人
allocation = remaining_amount
else:
# 在约束范围内均匀随机选择
allocation = random.uniform(min_allocation, max_allocation)
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
# 最后一个人获得剩余金额
allocations.append(round(remaining_amount, 2))
return allocations
def get_statistics(self, allocations: List[float]) -> dict:
"""
计算分配结果的统计特征
Args:
allocations: 分配结果列表
Returns:
dict: 包含各种统计指标的字典
"""
if not allocations:
return {}
n = len(allocations)
total = sum(allocations)
mean = total / n
# 计算方差和标准差
variance = sum((x - mean) ** 2 for x in allocations) / n
std_dev = math.sqrt(variance)
# 计算基尼系数
sorted_allocations = sorted(allocations)
gini = 0
for i, val in enumerate(sorted_allocations):
gini += (2 * (i + 1) - n - 1) * val
gini = gini / (n * total)
# 计算变异系数
cv = std_dev / mean if mean > 0 else 0
return {
'total': total,
'mean': mean,
'variance': variance,
'std_dev': std_dev,
'min': min(allocations),
'max': max(allocations),
'gini_coefficient': abs(gini),
'coefficient_of_variation': cv,
'sample_size': n
}
# 测试代码
def test_simple_random_distribution():
"""测试简单随机分配算法"""
allocator = SimpleRandomDistribution(min_amount=0.01)
# 测试案例
test_cases = [
(10.0, 5),
(100.0, 10),
(50.0, 3),
(1.0, 2)
]
print("=== 简单随机分配算法测试 ===")
for total, people in test_cases:
allocations = allocator.allocate(total, people)
stats = allocator.get_statistics(allocations)
print(f"\n总金额: {total}, 参与人数: {people}")
print(f"分配结果: {allocations}")
print(f"统计信息:")
for key, value in stats.items():
if isinstance(value, float):
print(f" {key}: {value:.4f}")
else:
print(f" {key}: {value}")
if __name__ == "__main__":
test_simple_random_distribution()
4.1.2 高级优化版本
import random
import math
import numpy as np
from typing import List, Tuple, Optional
from dataclasses import dataclass
import concurrent.futures
from scipy.optimize import minimize
@dataclass
class DistributionResult:
"""分配结果数据类"""
allocations: List[float]
statistics: dict
algorithm_info: dict
class OptimizedRandomDistribution(SimpleRandomDistribution):
"""
优化的随机分配算法
包含方差控制、并行计算、收敛性保证等高级功能
"""
def __init__(self, min_amount: float = 0.01, random_seed: Optional[int] = None,
target_variance: Optional[float] = None):
super().__init__(min_amount, random_seed)
self.target_variance = target_variance
self.allocation_history = []
def allocate_with_variance_control(self, total_amount: float,
num_people: int,
target_std: Optional[float] = None) -> List[float]:
"""
带方差控制的分配算法
Args:
total_amount: 总金额
num_people: 参与人数
target_std: 目标标准差
Returns:
List[float]: 分配结果
"""
if target_std is None:
if self.target_variance is not None:
target_std = math.sqrt(self.target_variance)
else:
target_std = math.sqrt(total_amount / num_people) * 0.1 # 默认10%变异系数
# 使用多阶段分配策略
allocations = []
remaining_amount = total_amount
remaining_people = num_people
# 阶段1: 粗分配
while remaining_people > 1:
mean_remaining = remaining_amount / remaining_people
# 根据目标标准差调整分配范围
max_allocation = min(2 * mean_remaining,
remaining_amount - self.min_amount * (remaining_people - 1))
min_allocation = self.min_amount
# 使用正态分布生成分配金额
std_dev = min(target_std, mean_remaining / 3) # 防止标准差过大
allocation = self._truncated_normal_random(min_allocation, max_allocation,
mean_remaining, std_dev)
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
remaining_people -= 1
# 最后一个人获得剩余金额
allocations.append(round(remaining_amount, 2))
return allocations
def _truncated_normal_random(self, low: float, high: float,
mean: float, std_dev: float) -> float:
"""
在指定区间内生成正态分布随机数
Args:
low: 最小值
high: 最大值
mean: 正态分布均值
std_dev: 正态分布标准差
Returns:
float: 截断正态分布随机数
"""
# 使用拒绝采样
while True:
value = np.random.normal(mean, std_dev)
if low <= value <= high:
return value
def batch_allocation(self, total_amount: float, num_people: int,
num_simulations: int) -> List[DistributionResult]:
"""
批量分配模拟
Args:
total_amount: 总金额
num_people: 参与人数
num_simulations: 模拟次数
Returns:
List[DistributionResult]: 批量模拟结果
"""
results = []
for i in range(num_simulations):
# 为每次模拟设置不同的种子
seed = (self.allocation_history.__len__() + 1) * 1000 + i
allocator = OptimizedRandomDistribution(self.min_amount, seed, self.target_variance)
allocations = allocator.allocate_with_variance_control(total_amount, num_people)
statistics = allocator.get_statistics(allocations)
result = DistributionResult(
allocations=allocations,
statistics=statistics,
algorithm_info={
'simulation_id': i,
'seed': seed,
'target_variance': self.target_variance
}
)
results.append(result)
self.allocation_history.extend(results)
return results
def parallel_batch_allocation(self, total_amount: float, num_people: int,
num_simulations: int, num_workers: int = 4) -> List[DistributionResult]:
"""
并行批量分配
Args:
total_amount: 总金额
num_people: 参与人数
num_simulations: 模拟次数
num_workers: 并行工作进程数
Returns:
List[DistributionResult]: 并行模拟结果
"""
def simulate_batch(batch_size: int, batch_id: int) -> List[DistributionResult]:
batch_results = []
for i in range(batch_size):
simulation_id = batch_id * batch_size + i
seed = simulation_id * 1000 + batch_id
allocator = OptimizedRandomDistribution(self.min_amount, seed, self.target_variance)
allocations = allocator.allocate_with_variance_control(total_amount, num_people)
statistics = allocator.get_statistics(allocations)
result = DistributionResult(
allocations=allocations,
statistics=statistics,
algorithm_info={
'simulation_id': simulation_id,
'seed': seed,
'batch_id': batch_id
}
)
batch_results.append(result)
return batch_results
# 将模拟分配到不同的批次
batch_size = math.ceil(num_simulations / num_workers)
batch_ids = list(range(num_workers))
results = []
with concurrent.futures.ProcessPoolExecutor(max_workers=num_workers) as executor:
futures = [executor.submit(simulate_batch, batch_size, batch_id)
for batch_id in batch_ids]
for future in concurrent.futures.as_completed(futures):
batch_results = future.result()
results.extend(batch_results[:num_simulations - len(results)])
return results[:num_simulations]
# 高级测试代码
def test_optimized_random_distribution():
"""测试优化版随机分配算法"""
print("=== 优化版随机分配算法测试 ===")
# 创建优化分配器
allocator = OptimizedRandomDistribution(
min_amount=0.01,
random_seed=42,
target_variance=0.1
)
# 单次分配测试
total_amount = 100.0
num_people = 10
print(f"\n单次分配测试 - 总金额: {total_amount}, 人数: {num_people}")
allocations = allocator.allocate_with_variance_control(total_amount, num_people, target_std=2.0)
stats = allocator.get_statistics(allocations)
print(f"分配结果: {allocations}")
print(f"标准差: {stats['std_dev']:.4f}")
print(f"基尼系数: {stats['gini_coefficient']:.4f}")
# 批量模拟测试
print(f"\n批量模拟测试 - 1000次模拟")
results = allocator.batch_allocation(total_amount, num_people, 100)
# 分析模拟结果
gini_coefficients = [r.statistics['gini_coefficient'] for r in results]
std_devs = [r.statistics['std_dev'] for r in results]
print(f"基尼系数统计:")
print(f" 平均值: {np.mean(gini_coefficients):.4f}")
print(f" 标准差: {np.std(gini_coefficients):.4f}")
print(f" 范围: [{min(gini_coefficients):.4f}, {max(gini_coefficients):.4f}]")
print(f"\n标准差统计:")
print(f" 平均值: {np.mean(std_devs):.4f}")
print(f" 标准差: {np.std(std_devs):.4f}")
print(f" 范围: [{min(std_devs):.4f}, {max(std_devs):.4f}]")
if __name__ == "__main__":
test_optimized_random_distribution()
4.2 二倍均值法实现
import random
import math
from typing import List, Tuple
import numpy as np
from dataclasses import dataclass
@dataclass
class DoubleMeanResult:
"""二倍均值法结果"""
allocations: List[float]
remaining_amounts: List[float]
mean_values: List[float]
convergence_info: dict
class DoubleMeanAllocator:
"""
二倍均值法分配算法
基于动态调整的随机范围,确保分配的公平性和收敛性
"""
def __init__(self, min_amount: float = 0.01, random_seed: int = None):
self.min_amount = min_amount
if random_seed is not None:
random.seed(random_seed)
np.random.seed(random_seed)
# 算法参数
self.convergence_threshold = 1e-6
self.max_iterations = 1000
def allocate(self, total_amount: float, num_people: int,
track_history: bool = True) -> DoubleMeanResult:
"""
二倍均值法分配
Args:
total_amount: 总金额
num_people: 参与人数
track_history: 是否跟踪分配历史
Returns:
DoubleMeanResult: 分配结果和跟踪信息
"""
if total_amount < self.min_amount * num_people:
raise ValueError("总金额不足以满足最小分配约束")
allocations = []
remaining_amounts = []
mean_values = []
remaining_amount = total_amount
current_people = num_people
iteration = 0
while current_people > 1 and iteration < self.max_iterations:
# 计算当前均值
current_mean = remaining_amount / current_people
# 确定分配范围 [0.01, 2 * mean]
max_allocation = min(2 * current_mean,
remaining_amount - self.min_amount * (current_people - 1))
min_allocation = self.min_amount
if min_allocation > max_allocation:
# 约束冲突,使用等分策略
allocation = remaining_amount / current_people
else:
# 均匀随机选择
allocation = random.uniform(min_allocation, max_allocation)
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
current_people -= 1
if track_history:
remaining_amounts.append(remaining_amount)
mean_values.append(current_mean)
iteration += 1
# 最后一个人获得剩余金额
final_allocation = round(remaining_amount, 2)
allocations.append(final_allocation)
if track_history:
remaining_amounts.append(0.0)
mean_values.append(0.0)
# 计算收敛信息
if track_history and len(mean_values) > 1:
convergence_info = self._analyze_convergence(allocations, mean_values)
else:
convergence_info = {'iterations': iteration, 'converged': True}
return DoubleMeanResult(
allocations=allocations,
remaining_amounts=remaining_amounts,
mean_values=mean_values,
convergence_info=convergence_info
)
def _analyze_convergence(self, allocations: List[float],
mean_values: List[float]) -> dict:
"""
分析算法收敛性
Args:
allocations: 分配结果
mean_values: 历史均值
Returns:
dict: 收敛分析结果
"""
# 计算均值收敛性
if len(mean_values) < 2:
return {'iterations': len(allocations), 'converged': True}
# 计算均值变化率
mean_diffs = [abs(mean_values[i] - mean_values[i+1])
for i in range(len(mean_values)-1)]
max_diff = max(mean_diffs) if mean_diffs else 0
# 检查是否收敛
converged = max_diff < self.convergence_threshold
# 计算分配均匀性
allocation_variance = np.var(allocations)
expected_variance = (sum(allocations) / len(allocations)) ** 2 / 12 # 均匀分布方差
return {
'iterations': len(allocations),
'converged': converged,
'max_mean_difference': max_diff,
'allocation_variance': allocation_variance,
'expected_uniform_variance': expected_variance,
'variance_ratio': allocation_variance / expected_variance if expected_variance > 0 else 0
}
def theoretical_analysis(self, total_amount: float, num_people: int) -> dict:
"""
理论分析
Args:
total_amount: 总金额
num_people: 参与人数
Returns:
dict: 理论分析结果
"""
# 理论期望
theoretical_mean = total_amount / num_people
# 理论方差推导
# 对于二倍均值法,方差可以通过积分计算
theoretical_variance = 0
# 递归计算理论方差
def recursive_variance(remaining_amount: float, remaining_people: int) -> float:
if remaining_people == 1:
return 0
current_mean = remaining_amount / remaining_people
max_alloc = min(2 * current_mean, remaining_amount - self.min_amount * (remaining_people - 1))
min_alloc = self.min_amount
if min_alloc >= max_alloc:
return 0
# 计算当前分配的方差
alloc_mean = (min_alloc + max_alloc) / 2
alloc_variance = (max_alloc - min_alloc) ** 2 / 12
# 递归计算剩余分配的方差
expected_remaining_amount = remaining_amount - alloc_mean
remaining_variance = recursive_variance(expected_remaining_amount, remaining_people - 1)
return alloc_variance + remaining_variance
theoretical_variance = recursive_variance(total_amount, num_people)
# 理论基尼系数(近似)
# 对于二倍均值法,基尼系数约为 ln(n) / (2n)
theoretical_gini = math.log(num_people) / (2 * num_people)
return {
'theoretical_mean': theoretical_mean,
'theoretical_variance': theoretical_variance,
'theoretical_std_dev': math.sqrt(theoretical_variance),
'theoretical_gini_coefficient': theoretical_gini,
'expected_fairness_score': 1 - theoretical_gini
}
def test_double_mean_allocator():
"""测试二倍均值法"""
print("=== 二倍均值法测试 ===")
allocator = DoubleMeanAllocator(min_amount=0.01, random_seed=42)
test_cases = [
(100.0, 10),
(50.0, 5),
(200.0, 20)
]
for total_amount, num_people in test_cases:
print(f"\n测试: 总金额={total_amount}, 人数={num_people}")
result = allocator.allocate(total_amount, num_people, track_history=True)
print(f"分配结果: {result.allocations}")
print(f"收敛信息: {result.convergence_info}")
# 理论分析
theory = allocator.theoretical_analysis(total_amount, num_people)
print(f"理论分析:")
for key, value in theory.items():
if isinstance(value, float):
print(f" {key}: {value:.6f}")
else:
print(f" {key}: {value}")
if __name__ == "__main__":
test_double_mean_allocator()
4.3 线性同余法实现
import random
import math
from typing import List, Tuple, Optional
import numpy as np
from dataclasses import dataclass
@dataclass
class LCGResult:
"""线性同余法结果"""
allocations: List[float]
random_sequence: List[int]
period_info: dict
distribution_quality: dict
class LinearCongruentialAllocator:
"""
线性同余法分配算法
基于LCG伪随机数生成器,提供可控的随机性和统计性质
"""
def __init__(self, min_amount: float = 0.01,
a: int = 1664525, c: int = 1013904223, m: int = 2**32):
"""
初始化LCG
Args:
min_amount: 最小分配金额
a: 乘数因子
c: 增量因子
m: 模数
"""
self.min_amount = min_amount
self.a = a
self.c = c
self.m = m
# 检查最大周期条件
self.max_period_conditions = self._check_max_period_conditions()
def _check_max_period_conditions(self) -> dict:
"""检查是否满足最大周期条件"""
conditions = {
'c_and_m_coprime': math.gcd(self.c, self.m) == 1,
'a_minus_1_divisible_by_prime_factors': True,
'a_minus_1_divisible_by_4_if_m_divisible_by_4': True
}
# 检查质因数条件
m_factors = self._prime_factors(self.m)
for p in m_factors:
if (self.a - 1) % p != 0:
conditions['a_minus_1_divisible_by_prime_factors'] = False
break
# 检查4的条件
if self.m % 4 == 0 and (self.a - 1) % 4 != 0:
conditions['a_minus_1_divisible_by_4_if_m_divisible_by_4'] = False
conditions['can_reach_max_period'] = all([
conditions['c_and_m_coprime'],
conditions['a_minus_1_divisible_by_prime_factors'],
conditions['a_minus_1_divisible_by_4_if_m_divisible_by_4']
])
return conditions
def _prime_factors(self, n: int) -> List[int]:
"""计算n的质因数"""
factors = []
d = 2
while d * d <= n:
while n % d == 0:
if d not in factors:
factors.append(d)
n //= d
d += 1
if n > 1:
factors.append(n)
return factors
def generate_lcg_sequence(self, seed: int, length: int) -> List[int]:
"""
生成LCG序列
Args:
seed: 初始种子
length: 序列长度
Returns:
List[int]: LCG序列
"""
sequence = []
x = seed % self.m
for _ in range(length):
x = (self.a * x + self.c) % self.m
sequence.append(x)
return sequence
def detect_period(self, seed: int, max_iterations: int = 1000000) -> Tuple[int, bool]:
"""
检测LCG序列的周期
Args:
seed: 初始种子
max_iterations: 最大迭代次数
Returns:
Tuple[int, bool]: (检测到的周期, 是否达到最大周期)
"""
sequence = {}
x = seed % self.m
for i in range(max_iterations):
if x in sequence:
period = i - sequence[x]
is_max_period = (period == self.m)
return period, is_max_period
sequence[x] = i
x = (self.a * x + self.c) % self.m
return max_iterations, False
def lcg_to_uniform(self, lcg_value: int) -> float:
"""
将LCG整数值转换为[0,1)均匀分布
Args:
lcg_value: LCG整数值
Returns:
float: [0,1)均匀分布浮点数
"""
return lcg_value / self.m
def allocate(self, total_amount: float, num_people: int,
seed: int, track_sequence: bool = True) -> LCGResult:
"""
使用LCG进行分配
Args:
total_amount: 总金额
num_people: 参与人数
seed: LCG种子
track_sequence: 是否跟踪随机序列
Returns:
LCGResult: 分配结果
"""
if total_amount < self.min_amount * num_people:
raise ValueError("总金额不足以满足最小分配约束")
# 生成所需的随机数
required_random_nums = num_people
random_sequence = self.generate_lcg_sequence(seed, required_random_nums)
# 转换到[0,1)区间
uniform_sequence = [self.lcg_to_uniform(x) for x in random_sequence]
allocations = []
remaining_amount = total_amount
for i in range(num_people - 1):
# 计算分配范围
min_allocation = self.min_amount
max_allocation = remaining_amount - self.min_amount * (num_people - i - 1)
if min_allocation >= max_allocation:
allocation = remaining_amount / (num_people - i)
else:
# 使用LCG生成的均匀随机数
random_value = uniform_sequence[i]
allocation = min_allocation + random_value * (max_allocation - min_allocation)
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
# 最后一个人获得剩余金额
allocations.append(round(remaining_amount, 2))
# 分析分布质量
distribution_quality = self._analyze_distribution_quality(allocations, uniform_sequence)
# 周期分析
period, is_max_period = self.detect_period(seed)
period_info = {
'detected_period': period,
'is_maximum_period': is_max_period,
'theoretical_max_period': self.m if self.max_period_conditions['can_reach_max_period'] else 'unknown'
}
return LCGResult(
allocations=allocations,
random_sequence=random_sequence,
period_info=period_info,
distribution_quality=distribution_quality
)
def _analyze_distribution_quality(self, allocations: List[float],
uniform_sequence: List[float]) -> dict:
"""
分析随机数分布质量
Args:
allocations: 分配结果
uniform_sequence: 均匀分布序列
Returns:
dict: 分布质量分析
"""
# Kolmogorov-Smirnov检验
from scipy.stats import kstest
ks_statistic, ks_p_value = kstest(uniform_sequence, 'uniform')
# 序列相关性分析
autocorrelations = []
max_lag = min(10, len(uniform_sequence) // 4)
for lag in range(1, max_lag + 1):
if len(uniform_sequence) > lag:
corr = np.corrcoef(uniform_sequence[:-lag], uniform_sequence[lag:])[0, 1]
autocorrelations.append(corr)
# 分配结果的统计特征
allocation_stats = {
'mean': np.mean(allocations),
'std': np.std(allocations),
'min': np.min(allocations),
'max': np.max(allocations),
'range': np.max(allocations) - np.min(allocations)
}
return {
'kolmogorov_smirnov': {
'statistic': ks_statistic,
'p_value': ks_p_value,
'is_uniform': ks_p_value > 0.05
},
'autocorrelations': autocorrelations,
'max_autocorrelation': max(abs(c) for c in autocorrelations) if autocorrelations else 0,
'allocation_statistics': allocation_stats
}
def optimize_parameters(self, target_period: int = None) -> dict:
"""
优化LCG参数以达到最佳性能
Args:
target_period: 目标周期
Returns:
dict: 优化结果
"""
# 使用Mersenne质数作为模数以提高性能
mersenne_primes = [2**31 - 1, 2**61 - 1, 2**127 - 1]
best_params = None
best_score = -1
for m in mersenne_primes:
for a in range(2, min(1000, m)):
c = 1 # 使用简单的增量
lcg = LinearCongruentialAllocator(self.min_amount, a, c, m)
# 评估参数质量
score = self._evaluate_lcg_quality(lcg)
if score > best_score:
best_score = score
best_params = {'a': a, 'c': c, 'm': m}
return {
'best_parameters': best_params,
'quality_score': best_score,
'optimization_criteria': ['max_period', 'distribution_quality', 'computational_efficiency']
}
def _evaluate_lcg_quality(self, lcg) -> float:
"""评估LCG质量"""
score = 0
# 周期质量 (30%)
if lcg.max_period_conditions['can_reach_max_period']:
score += 30
# 分布质量 (40%)
test_result = lcg.allocate(100.0, 10, 42, track_sequence=True)
quality = test_result.distribution_quality
if quality['kolmogorov_smirnov']['is_uniform']:
score += 20
if quality['max_autocorrelation'] < 0.1:
score += 20
# 计算效率 (30%)
# 简单的线性操作,计算效率很高
score += 30
return score
def test_linear_congruential_allocator():
"""测试线性同余法"""
print("=== 线性同余法测试 ===")
# 标准LCG参数 (Numerical Recipes)
allocator = LinearCongruentialAllocator(
min_amount=0.01,
a=1664525,
c=1013904223,
m=2**32
)
print("LCG参数条件检查:")
for condition, satisfied in allocator.max_period_conditions.items():
print(f" {condition}: {satisfied}")
# 测试分配
total_amount = 100.0
num_people = 10
seed = 42
result = allocator.allocate(total_amount, num_people, seed)
print(f"\n分配结果: {result.allocations}")
print(f"周期信息: {result.period_info}")
# 分布质量分析
quality = result.distribution_quality
print(f"\n分布质量分析:")
print(f" KS检验 p值: {quality['kolmogorov_smirnov']['p_value']:.6f}")
print(f" 最大自相关: {quality['max_autocorrelation']:.6f}")
print(f" 分配统计: {quality['allocation_statistics']}")
# 参数优化
print(f"\n参数优化测试:")
optimized = allocator.optimize_parameters()
print(f"最佳参数: {optimized['best_parameters']}")
print(f"质量评分: {optimized['quality_score']}")
if __name__ == "__main__":
test_linear_congruential_allocator()
5. 性能分析与复杂度理论
5.1 时间复杂度分析
5.1.1 算法复杂度分类
基于大O记号,我们对各种红包分配算法进行复杂度分析:
定理 5.1.1: 简单随机分配算法的时间复杂度为 O(n)O(n)O(n),其中 nnn 为参与人数。
证明:
算法执行的主要操作包括:
- 初始化分配数组:O(n)O(n)O(n)
- 循环分配过程:(n−1)(n-1)(n−1) 次迭代,每次 O(1)O(1)O(1) 操作
- 总计:O(n)+(n−1)×O(1)=O(n)O(n) + (n-1) \times O(1) = O(n)O(n)+(n−1)×O(1)=O(n)
证毕。
定理 5.1.2: 二倍均值法的时间复杂度为 O(nlogn)O(n \log n)O(nlogn)。
证明:
二倍均值法除了基本的线性操作外,还需要: - 计算当前均值的浮点运算:O(1)O(1)O(1) 每次
- 收敛性分析:在某些情况下需要额外的迭代
在理想情况下,时间复杂度仍为 O(n)O(n)O(n),但在包含收敛性保证的情况下为 O(nlogn)O(n \log n)O(nlogn)。
证毕。
定理 5.1.3: 线性同余法的时间复杂度为 O(n)O(n)O(n),但需要额外的 O(1)O(1)O(1) 预处理。
证明:
LCG的主要操作包括: - 参数初始化:O(1)O(1)O(1)
- 序列生成:nnn 次线性同余运算,每次 O(1)O(1)O(1)
- 分配过程:nnn 次迭代,每次 O(1)O(1)O(1)
总计:O(n)O(n)O(n)
证毕。
5.1.2 空间复杂度分析
定理 5.1.4: 所有三种算法的空间复杂度均为 O(n)O(n)O(n)。
证明:
算法需要存储:
- 分配结果数组:O(n)O(n)O(n)
- 临时变量和状态:O(1)O(1)O(1)
总计:O(n)O(n)O(n)
证毕。
5.2 收敛性分析
5.2.1 二倍均值法收敛性
定理 5.2.1: 二倍均值法在每次迭代中满足:
E[|X_k - M/n|] ≤ Var(X_k)^{1/2}
证明:
使用Chebyshev不等式:
P(|X_k - μ| ≥ kσ) ≤ 1/k²
因此:
E[|X_k - μ|] = ∫₀^∞ P(|X_k - μ| ≥ t) dt
≤ ∫₀^∞ min(1, Var(X_k)/t²) dt
= 2√(Var(X_k))
证毕。
5.2.2 LCG周期性分析
定理 5.2.2: 对于满足最大周期条件的LCG,其周期为 mmm。
证明:
使用群论中的循环群理论。若LCG参数满足最大周期条件,则序列 {Xk}\{X_k\}{Xk} 在模 mmm 意义下遍历所有可能值,形成完整的循环。
证毕。
5.3 性能优化策略
5.3.1 并行化分析
定理 5.3.1: 批量分配操作可以通过并行化获得接近线性的加速比。
证明:
设 Tseq(n)T_{seq}(n)Tseq(n) 为串行执行时间,Tpar(n,p)T_{par}(n,p)Tpar(n,p) 为 ppp 个处理器的并行执行时间。
理论上:
S_p = T_seq(n) / T_par(n,p) ≤ p
在理想情况下,加速比接近线性。
证毕。
5.3.2 缓存优化
算法 5.3.1: 预计算频繁使用的分配模式
import functools
from typing import Dict, Tuple
class CacheOptimizedAllocator:
"""
缓存优化的分配器
通过预计算和缓存提高重复分配的性能
"""
def __init__(self, min_amount: float = 0.01):
self.min_amount = min_amount
self.allocation_cache: Dict[Tuple[float, int], List[float]] = {}
self.cache_size_limit = 10000
@functools.lru_cache(maxsize=1000)
def _generate_base_allocation(self, total_amount: float, num_people: int) -> Tuple[float, ...]:
"""
生成基础分配模式
Args:
total_amount: 总金额
num_people: 参与人数
Returns:
Tuple[float, ...]: 基础分配模式
"""
# 使用确定性算法生成基础模式
import hashlib
# 基于输入参数生成种子
seed = int(hashlib.md5(f"{total_amount}_{num_people}".encode()).hexdigest(), 16)
# 使用LCG生成确定性随机序列
result = []
remaining_amount = total_amount
x = seed % (2**32)
a, c, m = 1664525, 1013904223, 2**32
for i in range(num_people - 1):
x = (a * x + c) % m
uniform_value = x / m
min_allocation = self.min_amount
max_allocation = remaining_amount - self.min_amount * (num_people - i - 1)
if min_allocation < max_allocation:
allocation = min_allocation + uniform_value * (max_allocation - min_allocation)
allocation = round(allocation, 2)
else:
allocation = remaining_amount / (num_people - i)
result.append(allocation)
remaining_amount -= allocation
result.append(round(remaining_amount, 2))
return tuple(result)
def allocate_with_cache(self, total_amount: float, num_people: int,
randomness_level: float = 1.0) -> List[float]:
"""
带缓存的分配
Args:
total_amount: 总金额
num_people: 参与人数
randomness_level: 随机性级别 [0, 1]
Returns:
List[float]: 分配结果
"""
cache_key = (total_amount, num_people)
# 检查缓存
if cache_key in self.allocation_cache:
base_allocation = self.allocation_cache[cache_key]
else:
# 生成并缓存基础分配
base_allocation = self._generate_base_allocation(total_amount, num_people)
# 管理缓存大小
if len(self.allocation_cache) >= self.cache_size_limit:
# 简单的LRU策略:删除最老的条目
oldest_key = next(iter(self.allocation_cache))
del self.allocation_cache[oldest_key]
self.allocation_cache[cache_key] = base_allocation
# 根据随机性级别调整结果
if randomness_level < 1.0:
adjusted_allocation = []
for i, base_amount in enumerate(base_allocation):
# 随机调整,幅度与randomness_level成正比
import random
random.seed(hash(f"{total_amount}_{num_people}_{i}") % 2**32)
adjustment = (random.random() - 0.5) * 2 * (1 - randomness_level) * base_amount
adjusted_amount = base_amount + adjustment
adjusted_allocation.append(max(self.min_amount, round(adjusted_amount, 2)))
# 重新归一化以满足总额约束
adjusted_allocation = self._renormalize(adjusted_allocation, total_amount)
return adjusted_allocation
return list(base_allocation)
def _renormalize(self, allocations: List[float], target_total: float) -> List[float]:
"""
重新归一化分配结果
Args:
allocations: 调整后的分配
target_total: 目标总额
Returns:
List[float]: 归一化的分配
"""
current_total = sum(allocations)
if current_total == 0:
return allocations
scale_factor = target_total / current_total
renormalized = [max(self.min_amount, amount * scale_factor) for amount in allocations]
# 确保总额精确匹配
difference = target_total - sum(renormalized)
renormalized[-1] += difference
return [round(amount, 2) for amount in renormalized]
def get_cache_statistics(self) -> Dict:
"""获取缓存统计信息"""
return {
'cache_size': len(self.allocation_cache),
'cache_hit_potential': len(self.allocation_cache) / self.cache_size_limit if self.cache_size_limit > 0 else 0,
'cached_patterns': list(self.allocation_cache.keys())
}
# 性能测试
def performance_comparison():
"""性能对比测试"""
import time
from concurrent.futures import ProcessPoolExecutor
print("=== 算法性能对比测试 ===")
test_cases = [
(100.0, 10),
(1000.0, 100),
(10000.0, 1000)
]
algorithms = {
'简单随机': SimpleRandomDistribution(),
'二倍均值': DoubleMeanAllocator(),
'LCG': LinearCongruentialAllocator(),
'缓存优化': CacheOptimizedAllocator()
}
for total, people in test_cases:
print(f"\n测试场景: 总金额={total}, 人数={people}")
for name, allocator in algorithms.items():
start_time = time.time()
# 执行多次分配以获得可靠的时间测量
num_runs = 100 if people <= 100 else 10
for _ in range(num_runs):
if name == 'LCG':
result = allocator.allocate(total, people, 42)
elif name == '缓存优化':
result = allocator.allocate_with_cache(total, people, randomness_level=0.8)
else:
result = allocator.allocate(total, people)
end_time = time.time()
avg_time = (end_time - start_time) / num_runs
print(f" {name}: 平均时间 {avg_time*1000:.4f} ms")
if __name__ == "__main__":
performance_comparison()
6. 公平性度量与统计检验
6.1 公平性度量理论
6.1.1 基尼系数深度分析
基尼系数是衡量分配公平性的经典指标,其数学定义为:
定义 6.1.1: 对于离散分配向量 x=(x1,x2,...,xn)\mathbf{x} = (x_1, x_2, ..., x_n)x=(x1,x2,...,xn),基尼系数定义为:
G(\mathbf{x}) = \frac{\sum_{i=1}^n \sum_{j=1}^n |x_i - x_j|}{2n \sum_{i=1}^n x_i}
定理 6.1.1: 基尼系数满足以下性质:
- 0≤G(x)≤10 \leq G(\mathbf{x}) \leq 10≤G(x)≤1
- G(x)=0G(\mathbf{x}) = 0G(x)=0 当且仅当所有 xix_ixi 相等
- G(x)=1G(\mathbf{x}) = 1G(x)=1 当且仅当除一个元素外其他所有元素为0
证明:
性质1-3的证明可以直接从定义得出。
证毕。
6.1.2 信息熵度量
信息熵提供了另一种公平性度量视角:
定义 6.1.2: 分配的信息熵定义为:
H(\mathbf{x}) = -\sum_{i=1}^n p_i \log_2 p_i
其中 p_i = x_i / \sum_{j=1}^n x_j
6.1.3 Theil指数
Theil指数是另一种重要的不平等度量:
定义 6.1.3: Theil指数定义为:
T(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n \frac{x_i}{\mu} \log_2 \frac{x_i}{\mu}
其中 \mu = \frac{1}{n} \sum_{i=1}^n x_i
6.2 统计检验方法
6.2.1 Kolmogorov-Smirnov检验
KS检验用于验证分配是否符合理论分布:
import numpy as np
from scipy import stats
from typing import List, Tuple, Dict
import matplotlib.pyplot as plt
class FairnessAnalyzer:
"""
公平性分析器
提供多种公平性度量和统计检验方法
"""
def __init__(self, significance_level: float = 0.05):
self.significance_level = significance_level
def calculate_gini_coefficient(self, allocations: List[float]) -> float:
"""计算基尼系数"""
if not allocations or sum(allocations) == 0:
return 0
# 排序
sorted_allocations = sorted(allocations)
n = len(allocations)
# 使用Lorenz曲线的积分计算基尼系数
cumsum = np.cumsum(sorted_allocations)
total = cumsum[-1]
# 基尼系数 = 1 - 2 × (Lorenz曲线下方面积)
lorenz_curve = cumsum / total
area_under_lorenz = np.trapz(lorenz_curve, np.arange(1, n + 1) / n)
gini = 1 - 2 * area_under_lorenz
return abs(gini)
def calculate_theil_index(self, allocations: List[float]) -> float:
"""计算Theil指数"""
if not allocations:
return 0
n = len(allocations)
mean_allocation = np.mean(allocations)
if mean_allocation == 0:
return 0
theil_sum = 0
for allocation in allocations:
if allocation > 0:
ratio = allocation / mean_allocation
theil_sum += ratio * np.log2(ratio)
return theil_sum / n
def calculate_information_entropy(self, allocations: List[float]) -> float:
"""计算信息熵"""
if not allocations:
return 0
total = sum(allocations)
if total == 0:
return 0
# 计算概率分布
probabilities = [allocation / total for allocation in allocations if allocation > 0]
if not probabilities:
return 0
# 计算熵
entropy = 0
for p in probabilities:
entropy -= p * np.log2(p)
return entropy
def kolmogorov_smirnov_test(self, allocations: List[float],
distribution: str = 'uniform',
theoretical_params: Dict = None) -> Dict:
"""
Kolmogorov-Smirnov检验
Args:
allocations: 分配数据
distribution: 理论分布名称
theoretical_params: 理论分布参数
Returns:
Dict: KS检验结果
"""
if not allocations:
return {'statistic': 0, 'p_value': 1, 'is_rejected': False}
# 标准化数据到[0,1]区间
total = sum(allocations)
normalized_data = [x / total for x in allocations]
# 执行KS检验
if distribution == 'uniform':
ks_statistic, p_value = stats.kstest(normalized_data, 'uniform')
elif distribution == 'normal':
mean = np.mean(normalized_data)
std = np.std(normalized_data)
ks_statistic, p_value = stats.kstest(normalized_data,
lambda x: stats.norm.cdf(x, mean, std))
else:
ks_statistic, p_value = stats.kstest(normalized_data, distribution)
is_rejected = p_value < self.significance_level
return {
'statistic': ks_statistic,
'p_value': p_value,
'is_rejected': is_rejected,
'conclusion': '拒绝原假设' if is_rejected else '不能拒绝原假设'
}
def chi_square_goodness_of_fit(self, allocations: List[float],
expected_distribution: List[float]) -> Dict:
"""
卡方拟合优度检验
Args:
allocations: 实际分配
expected_distribution: 期望分布
Returns:
Dict: 卡方检验结果
"""
if len(allocations) != len(expected_distribution):
raise ValueError("实际分布和期望分布的长度必须相同")
# 计算卡方统计量
observed = np.array(allocations)
expected = np.array(expected_distribution)
# 避免除零
expected = np.maximum(expected, 1e-10)
chi2_statistic = np.sum((observed - expected) ** 2 / expected)
# 自由度 = 类别数 - 1 - 参数个数
degrees_of_freedom = len(allocations) - 1
# 计算p值
p_value = 1 - stats.chi2.cdf(chi2_statistic, degrees_of_freedom)
is_rejected = p_value < self.significance_level
return {
'chi2_statistic': chi2_statistic,
'degrees_of_freedom': degrees_of_freedom,
'p_value': p_value,
'is_rejected': is_rejected,
'conclusion': '拒绝原假设' if is_rejected else '不能拒绝原假设'
}
def comprehensive_fairness_analysis(self, allocations: List[float],
expected_equal_allocation: float = None) -> Dict:
"""
综合公平性分析
Args:
allocations: 分配结果
expected_equal_allocation: 期望的等分金额
Returns:
Dict: 综合分析结果
"""
if not allocations:
return {'error': '分配数据为空'}
results = {}
# 基本统计
results['basic_statistics'] = {
'mean': np.mean(allocations),
'median': np.median(allocations),
'std': np.std(allocations),
'min': np.min(allocations),
'max': np.max(allocations),
'range': np.max(allocations) - np.min(allocations),
'coefficient_of_variation': np.std(allocations) / np.mean(allocations) if np.mean(allocations) > 0 else 0
}
# 公平性指标
results['fairness_metrics'] = {
'gini_coefficient': self.calculate_gini_coefficient(allocations),
'theil_index': self.calculate_theil_index(allocations),
'information_entropy': self.calculate_information_entropy(allocations),
'relative_mean_deviation': np.mean(np.abs(np.array(allocations) - np.mean(allocations))) / np.mean(allocations) if np.mean(allocations) > 0 else 0
}
# 统计检验
results['statistical_tests'] = {
'uniformity_test': self.kolmogorov_smirnov_test(allocations, 'uniform'),
'normality_test': self.kolmogorov_smirnov_test(allocations, 'normal')
}
# 与期望分配的比较
if expected_equal_allocation is not None:
expected_allocation = [expected_equal_allocation] * len(allocations)
results['expected_comparison'] = self.chi_square_goodness_of_fit(allocations, expected_allocation)
# 公平性评级
gini = results['fairness_metrics']['gini_coefficient']
results['fairness_rating'] = self._calculate_fairness_rating(gini, results['statistical_tests'])
return results
def _calculate_fairness_rating(self, gini_coefficient: float,
statistical_tests: Dict) -> str:
"""计算公平性评级"""
if gini_coefficient < 0.1:
base_rating = "优秀"
elif gini_coefficient < 0.2:
base_rating = "良好"
elif gini_coefficient < 0.3:
base_rating = "一般"
else:
base_rating = "较差"
# 根据统计检验结果调整评级
uniformity_p = statistical_tests['uniformity_test']['p_value']
if uniformity_p < 0.01:
base_rating += " (高度不均匀)"
elif uniformity_p < 0.05:
base_rating += " (不均匀)"
return base_rating
# 综合测试
def comprehensive_fairness_test():
"""综合公平性测试"""
print("=== 公平性分析综合测试 ===")
analyzer = FairnessAnalyzer(significance_level=0.05)
# 生成测试数据
test_scenarios = {
'完全平均': [20.0, 20.0, 20.0, 20.0, 20.0],
'轻度不均': [25.0, 23.0, 21.0, 19.0, 17.0],
'高度不均': [50.0, 20.0, 10.0, 5.0, 1.0],
'极端情况': [90.0, 3.0, 2.0, 2.0, 2.0, 1.0]
}
for scenario_name, allocations in test_scenarios.items():
print(f"\n=== {scenario_name} 场景 ===")
result = analyzer.comprehensive_fairness_analysis(
allocations,
expected_equal_allocation=sum(allocations) / len(allocations)
)
# 打印主要指标
print(f"分配结果: {allocations}")
print(f"基本统计:")
for key, value in result['basic_statistics'].items():
print(f" {key}: {value:.4f}")
print(f"公平性指标:")
for key, value in result['fairness_metrics'].items():
print(f" {key}: {value:.6f}")
print(f"统计检验:")
for test_name, test_result in result['statistical_tests'].items():
print(f" {test_name}: p值={test_result['p_value']:.6f}, {test_result['conclusion']}")
print(f"公平性评级: {result['fairness_rating']}")
if __name__ == "__main__":
comprehensive_fairness_test()
7. 高级优化技术
7.1 自适应权重分配
7.1.1 动态权重算法
import numpy as np
from typing import List, Tuple, Optional
from dataclasses import dataclass
import math
@dataclass
class AdaptiveWeightConfig:
"""自适应权重配置"""
initial_variance_target: float = 1.0
convergence_rate: float = 0.1
max_iterations: int = 100
variance_tolerance: float = 1e-6
class AdaptiveWeightAllocator:
"""
自适应权重分配器
根据历史分配结果动态调整权重,实现更优的公平性控制
"""
def __init__(self, min_amount: float = 0.01,
config: AdaptiveWeightConfig = None):
self.min_amount = min_amount
self.config = config or AdaptiveWeightConfig()
# 历史统计
self.allocation_history = []
self.variance_history = []
self.convergence_history = []
def allocate_adaptive(self, total_amount: float, num_people: int,
use_history: bool = True) -> Tuple[List[float], dict]:
"""
自适应权重分配
Args:
total_amount: 总金额
num_people: 参与人数
use_history: 是否使用历史数据
Returns:
Tuple[List[float], dict]: (分配结果, 算法信息)
"""
if use_history and self.allocation_history:
# 基于历史数据调整参数
adapted_params = self._adapt_parameters()
else:
# 使用默认参数
adapted_params = {
'variance_target': self.config.initial_variance_target,
'weight_distribution': [1.0] * num_people
}
# 执行分配
allocations, allocation_info = self._weighted_allocation(
total_amount, num_people, adapted_params
)
# 更新历史
self._update_history(allocations, allocation_info)
return allocations, allocation_info
def _adapt_parameters(self) -> dict:
"""基于历史数据自适应参数"""
if not self.variance_history:
return {
'variance_target': self.config.initial_variance_target,
'weight_distribution': [1.0]
}
# 分析最近的方差趋势
recent_variances = self.variance_history[-10:] # 最近10次
if len(recent_variances) < 3:
return {
'variance_target': self.config.initial_variance_target,
'weight_distribution': [1.0]
}
# 计算方差变化趋势
variance_trend = np.polyfit(range(len(recent_variances)), recent_variances, 1)[0]
# 调整方差目标
current_target = self.variance_history[-1]
if variance_trend > 0:
# 方差在增加,减少目标
new_variance_target = current_target * (1 - self.config.convergence_rate)
else:
# 方差在减少或稳定,增加目标
new_variance_target = current_target * (1 + self.config.convergence_rate)
# 确保在合理范围内
new_variance_target = max(0.1, min(new_variance_target, 10.0))
# 计算权重分布
weight_distribution = self._calculate_weights_from_history()
return {
'variance_target': new_variance_target,
'weight_distribution': weight_distribution
}
def _calculate_weights_from_history(self) -> List[float]:
"""从历史数据计算权重分布"""
if len(self.allocation_history) < 2:
return [1.0]
# 分析最近的分配模式
recent_allocations = self.allocation_history[-5:] # 最近5次
# 计算每个人的平均分配
n = len(recent_allocations[0]) if recent_allocations else 1
avg_allocations = []
for i in range(n):
person_allocations = [alloc[i] for alloc in recent_allocations if i < len(alloc)]
avg_allocation = np.mean(person_allocations)
avg_allocations.append(avg_allocation)
# 基于分配差异计算权重
total_avg = sum(avg_allocations)
if total_avg == 0:
return [1.0] * n
weights = []
for avg_alloc in avg_allocations:
# 分配较少的人获得更高权重
weight = total_avg / (avg_alloc + 1e-10)
weights.append(weight)
# 归一化权重
weight_sum = sum(weights)
weights = [w / weight_sum for w in weights]
return weights
def _weighted_allocation(self, total_amount: float, num_people: int,
params: dict) -> Tuple[List[float], dict]:
"""执行加权分配"""
variance_target = params['variance_target']
weights = params['weight_distribution']
allocations = []
remaining_amount = total_amount
current_weights = weights.copy()
iteration = 0
convergence_achieved = False
while remaining_amount > 0 and iteration < self.config.max_iterations:
# 计算当前权重和
weight_sum = sum(current_weights)
if weight_sum <= 0:
# 如果权重和为0,使用均匀权重
current_weights = [1.0] * len(current_weights)
weight_sum = len(current_weights)
# 计算期望分配
expected_allocations = []
for weight in current_weights:
expected = (weight / weight_sum) * remaining_amount
expected_allocations.append(expected)
# 添加随机性
randomness_factor = math.exp(-iteration / 10) # 随迭代减少
for i in range(num_people - 1):
if i >= len(expected_allocations):
break
expected = expected_allocations[i]
# 添加随机扰动
noise = np.random.normal(0, math.sqrt(variance_target) * randomness_factor)
raw_allocation = expected + noise
# 约束处理
min_allocation = self.min_amount
max_allocation = remaining_amount - self.min_amount * (num_people - i - 1)
allocation = max(min_allocation, min(raw_allocation, max_allocation))
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
# 更新权重(减少已分配的权重)
if i < len(current_weights):
current_weights[i] *= 0.5 # 衰减权重
iteration += 1
# 最后一个人获得剩余金额
final_allocation = round(remaining_amount, 2)
allocations.append(final_allocation)
# 检查收敛性
final_variance = np.var(allocations)
convergence_achieved = abs(final_variance - variance_target) < self.config.variance_tolerance
allocation_info = {
'iterations': iteration,
'final_variance': final_variance,
'target_variance': variance_target,
'convergence_achieved': convergence_achieved,
'final_weights': current_weights
}
return allocations, allocation_info
def _update_history(self, allocations: List[float], info: dict):
"""更新历史记录"""
self.allocation_history.append(allocations)
self.variance_history.append(info['final_variance'])
self.convergence_history.append(info['convergence_achieved'])
# 限制历史长度
max_history = 100
if len(self.allocation_history) > max_history:
self.allocation_history = self.allocation_history[-max_history:]
self.variance_history = self.variance_history[-max_history:]
self.convergence_history = self.convergence_history[-max_history:]
def get_adaptation_statistics(self) -> dict:
"""获取自适应统计信息"""
if not self.variance_history:
return {'message': '暂无历史数据'}
return {
'total_adaptations': len(self.variance_history),
'average_variance': np.mean(self.variance_history),
'variance_trend': np.polyfit(range(len(self.variance_history)), self.variance_history, 1)[0],
'convergence_rate': np.mean(self.convergence_history),
'current_target_variance': self.variance_history[-1] if self.variance_history else None
}
# 多目标优化算法
class MultiObjectiveAllocator:
"""
多目标优化分配器
同时考虑公平性、效率、用户体验等多个目标
"""
def __init__(self, min_amount: float = 0.01):
self.min_amount = min_amount
# 目标权重(可调)
self.objective_weights = {
'fairness': 0.4, # 公平性权重
'efficiency': 0.3, # 效率权重
'user_satisfaction': 0.3 # 用户满意度权重
}
def allocate_multi_objective(self, total_amount: float, num_people: int,
preferences: dict = None) -> Tuple[List[float], dict]:
"""
多目标优化分配
Args:
total_amount: 总金额
num_people: 参与人数
preferences: 用户偏好(可选)
Returns:
Tuple[List[float], dict]: (分配结果, 优化信息)
"""
if preferences:
self._adjust_weights(preferences)
# 候选解生成
candidate_solutions = self._generate_candidate_solutions(total_amount, num_people, 20)
# 多目标评估
evaluated_solutions = []
for solution in candidate_solutions:
scores = self._evaluate_objectives(solution, total_amount)
evaluated_solutions.append((solution, scores))
# Pareto前沿分析
pareto_front = self._find_pareto_front(evaluated_solutions)
# 选择最优解
best_solution = self._select_best_solution(pareto_front)
return best_solution[0], {
'objective_scores': best_solution[1],
'pareto_front_size': len(pareto_front),
'optimization_weights': self.objective_weights.copy()
}
def _adjust_weights(self, preferences: dict):
"""根据用户偏好调整权重"""
for objective, adjustment in preferences.items():
if objective in self.objective_weights:
self.objective_weights[objective] += adjustment
# 确保权重和为1
total = sum(self.objective_weights.values())
self.objective_weights = {k: v/total for k, v in self.objective_weights.items()}
def _generate_candidate_solutions(self, total_amount: float, num_people: int,
num_candidates: int) -> List[List[float]]:
"""生成候选解"""
candidates = []
# 简单随机生成
for _ in range(num_candidates // 4):
import random
random.seed(None)
allocations = []
remaining = total_amount
for i in range(num_people - 1):
allocation = random.uniform(self.min_amount, remaining - self.min_amount * (num_people - i - 1))
allocation = round(allocation, 2)
allocations.append(allocation)
remaining -= allocation
allocations.append(round(remaining, 2))
candidates.append(allocations)
# 二倍均值生成
from algorithms import DoubleMeanAllocator
dm_allocator = DoubleMeanAllocator(self.min_amount)
for _ in range(num_candidates // 4):
result = dm_allocator.allocate(total_amount, num_people, track_history=False)
candidates.append(result.allocations)
# LCG生成
from algorithms import LinearCongruentialAllocator
lcg_allocator = LinearCongruentialAllocator(self.min_amount)
for _ in range(num_candidates // 4):
result = lcg_allocator.allocate(total_amount, num_people, 42, track_sequence=False)
candidates.append(result.allocations)
# 启发式生成(基于规则)
for _ in range(num_candidates // 4):
allocations = []
remaining = total_amount
# 指数衰减分配
decay_factor = 0.8
for i in range(num_people - 1):
expected = (decay_factor ** i) * (total_amount * (1 - decay_factor))
allocation = max(self.min_amount, min(expected, remaining - self.min_amount * (num_people - i - 1)))
allocation = round(allocation, 2)
allocations.append(allocation)
remaining -= allocation
allocations.append(round(remaining, 2))
candidates.append(allocations)
return candidates
def _evaluate_objectives(self, solution: List[float], total_amount: float) -> dict:
"""评估目标函数"""
# 公平性评分(基于基尼系数)
from fairness_analysis import FairnessAnalyzer
analyzer = FairnessAnalyzer()
gini = analyzer.calculate_gini_coefficient(solution)
fairness_score = max(0, 1 - gini) # 基尼系数越小,公平性越高
# 效率评分(基于方差)
variance = np.var(solution)
efficiency_score = max(0, 1 - variance / (total_amount / len(solution)) ** 2)
# 用户满意度(基于分配范围)
allocation_range = max(solution) - min(solution)
avg_allocation = np.mean(solution)
satisfaction_score = max(0, 1 - allocation_range / avg_allocation) if avg_allocation > 0 else 0
return {
'fairness': fairness_score,
'efficiency': efficiency_score,
'user_satisfaction': satisfaction_score,
'total_score': (fairness_score * self.objective_weights['fairness'] +
efficiency_score * self.objective_weights['efficiency'] +
satisfaction_score * self.objective_weights['user_satisfaction'])
}
def _find_pareto_front(self, evaluated_solutions: List[Tuple[List[float], dict]]) -> List[Tuple[List[float], dict]]:
"""寻找Pareto前沿"""
pareto_front = []
for solution, scores in evaluated_solutions:
is_dominated = False
for other_solution, other_scores in evaluated_solutions:
if solution == other_solution:
continue
# 检查是否被支配
dominates = all(other_scores[key] >= scores[key] for key in scores if key != 'total_score')
strictly_better = any(other_scores[key] > scores[key] for key in scores if key != 'total_score')
if dominates and strictly_better:
is_dominated = True
break
if not is_dominated:
pareto_front.append((solution, scores))
return pareto_front
def _select_best_solution(self, pareto_front: List[Tuple[List[float], dict]]) -> Tuple[List[float], dict]:
"""从Pareto前沿选择最优解"""
if not pareto_front:
# 如果没有Pareto解,返回随机解
return [100.0 / len([]) for _ in range(0)], {}
# 基于权重选择最优解
best_solution = None
best_score = -1
for solution, scores in pareto_front:
if scores['total_score'] > best_score:
best_score = scores['total_score']
best_solution = (solution, scores)
return best_solution
# 高级优化测试
def test_advanced_optimization():
"""测试高级优化技术"""
print("=== 高级优化技术测试 ===")
# 自适应权重测试
print("\n--- 自适应权重分配测试 ---")
adaptive_allocator = AdaptiveWeightAllocator(
min_amount=0.01,
config=AdaptiveWeightConfig(
initial_variance_target=2.0,
convergence_rate=0.05
)
)
total_amount = 100.0
num_people = 10
for i in range(5):
allocations, info = adaptive_allocator.allocate_adaptive(total_amount, num_people)
print(f"第{i+1}次分配: {allocations}")
print(f" 目标方差: {info['target_variance']:.4f}, 实际方差: {info['final_variance']:.4f}")
print(f" 收敛: {info['convergence_achieved']}")
print("\n自适应统计:")
stats = adaptive_allocator.get_adaptation_statistics()
for key, value in stats.items():
print(f" {key}: {value}")
# 多目标优化测试
print("\n--- 多目标优化测试 ---")
multi_allocator = MultiObjectiveAllocator(min_amount=0.01)
allocations, info = multi_allocator.allocate_multi_objective(total_amount, num_people)
print(f"多目标优化结果: {allocations}")
print(f"目标评分: {info['objective_scores']}")
print(f"Pareto前沿大小: {info['pareto_front_size']}")
if __name__ == "__main__":
test_advanced_optimization()
7.2 机器学习增强分配
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from typing import List, Tuple, Dict
import pickle
from dataclasses import dataclass
@dataclass
class MLTrainingData:
"""机器学习训练数据"""
features: List[List[float]]
targets: List[List[float]]
class MLEnhancedAllocator:
"""
机器学习增强分配器
使用历史数据训练模型,优化未来的分配决策
"""
def __init__(self, min_amount: float = 0.01):
self.min_amount = min_amount
self.models = {}
self.scalers = {}
self.training_data = []
self.feature_names = [
'total_amount', 'num_people', 'target_variance',
'time_step', 'previous_variance', 'fairness_score'
]
# 初始化模型
for i in range(10): # 假设最多支持10人的分配
self.models[f'person_{i}'] = RandomForestRegressor(
n_estimators=50,
max_depth=10,
random_state=42
)
self.scalers[f'person_{i}'] = StandardScaler()
def generate_features(self, total_amount: float, num_people: int,
iteration: int = 0, historical_data: dict = None) -> List[float]:
"""生成特征向量"""
features = [
total_amount,
num_people,
iteration,
0 if historical_data is None else historical_data.get('previous_variance', 0),
0 if historical_data is None else historical_data.get('fairness_score', 0.5)
]
# 填充到固定长度
while len(features) < len(self.feature_names):
features.append(0.0)
return features[:len(self.feature_names)]
def add_training_data(self, allocation_result: Dict):
"""添加训练数据"""
features = allocation_result['features']
targets = allocation_result['targets']
self.training_data.append({
'features': features,
'targets': targets,
'quality_score': allocation_result.get('quality_score', 0.5)
})
# 限制训练数据大小
max_training_samples = 1000
if len(self.training_data) > max_training_samples:
self.training_data = self.training_data[-max_training_samples:]
def train_models(self):
"""训练机器学习模型"""
if len(self.training_data) < 10:
print("训练数据不足,跳过模型训练")
return
print("开始训练机器学习模型...")
# 准备训练数据
X = []
Y = []
for data in self.training_data:
X.append(data['features'])
# 为每个人创建目标值
targets = []
num_people = len(data['targets'])
for i in range(min(10, num_people)):
if i < len(data['targets']):
targets.append(data['targets'][i])
else:
targets.append(data['targets'][-1]) # 重复最后一个值
Y.append(targets)
X = np.array(X)
Y = np.array(Y)
# 训练每个人对应的模型
for person_idx in range(min(10, Y.shape[1])):
model_name = f'person_{person_idx}'
y_person = Y[:, person_idx]
# 标准化特征
X_scaled = self.scalers[model_name].fit_transform(X)
# 训练模型
self.models[model_name].fit(X_scaled, y_person)
print(f"模型训练完成,共训练了 {len(self.models)} 个模型")
def predict_allocation(self, total_amount: float, num_people: int,
iteration: int = 0) -> List[float]:
"""使用训练好的模型预测分配"""
if not self.models or len(self.training_data) < 10:
# 如果模型未训练,使用启发式方法
return self._heuristic_allocation(total_amount, num_people)
features = self.generate_features(total_amount, num_people, iteration)
features_scaled = np.array([features])
predictions = []
for person_idx in range(min(10, num_people)):
model_name = f'person_{person_idx}'
# 标准化特征
features_scaled = self.scalers[model_name].transform(features_scaled)
# 预测
prediction = self.models[model_name].predict(features_scaled)[0]
predictions.append(prediction)
# 归一化预测结果以满足总额约束
predictions = self._normalize_allocation(predictions, total_amount)
return predictions
def _heuristic_allocation(self, total_amount: float, num_people: int) -> List[float]:
"""启发式分配方法"""
allocations = []
remaining_amount = total_amount
for i in range(num_people - 1):
# 基于位置的权重
weight = 1.0 / (1 + i * 0.1) # 前面的人获得稍多
expected_allocation = (weight / sum(1.0 / (1 + j * 0.1) for j in range(num_people - 1))) * remaining_amount
min_allocation = self.min_amount
max_allocation = remaining_amount - self.min_amount * (num_people - i - 1)
allocation = max(min_allocation, min(expected_allocation, max_allocation))
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
allocations.append(round(remaining_amount, 2))
return allocations
def _normalize_allocation(self, predictions: List[float], target_total: float) -> List[float]:
"""归一化分配结果"""
if not predictions:
return []
current_total = sum(predictions)
if current_total == 0:
return [target_total / len(predictions)] * len(predictions)
scale_factor = target_total / current_total
normalized = [max(self.min_amount, pred * scale_factor) for pred in predictions]
# 确保最后一个值满足约束
difference = target_total - sum(normalized[:-1]) if len(normalized) > 1 else target_total
normalized[-1] = max(self.min_amount, difference)
return [round(amount, 2) for amount in normalized]
def ml_guided_allocation(self, total_amount: float, num_people: int,
quality_threshold: float = 0.7) -> Tuple[List[float], Dict]:
"""
机器学习指导的分配
Args:
total_amount: 总金额
num_people: 参与人数
quality_threshold: 质量阈值
Returns:
Tuple[List[float], Dict]: (分配结果, 质量评估)
"""
# 使用ML模型预测
ml_predictions = self.predict_allocation(total_amount, num_people)
# 评估预测质量
quality_score = self._evaluate_prediction_quality(ml_predictions, total_amount, num_people)
if quality_score >= quality_threshold:
# 高质量预测,直接使用
return ml_predictions, {
'method': 'ml_prediction',
'quality_score': quality_score,
'confidence': 'high'
}
else:
# 质量不足,使用混合策略
heuristic_result = self._heuristic_allocation(total_amount, num_people)
# 加权混合
mixed_result = []
ml_weight = quality_score
heuristic_weight = 1 - quality_score
for i in range(min(len(ml_predictions), len(heuristic_result))):
mixed = ml_weight * ml_predictions[i] + heuristic_weight * heuristic_result[i]
mixed_result.append(round(mixed, 2))
# 确保总额约束
mixed_result = self._normalize_allocation(mixed_result, total_amount)
return mixed_result, {
'method': 'mixed_ml_heuristic',
'quality_score': quality_score,
'ml_weight': ml_weight,
'confidence': 'medium'
}
def _evaluate_prediction_quality(self, predictions: List[float],
total_amount: float, num_people: int) -> float:
"""评估预测质量"""
# 检查约束满足度
constraint_score = 1.0 if abs(sum(predictions) - total_amount) < 0.01 else 0.5
# 检查最小值约束
min_constraint_score = 1.0 if all(p >= self.min_amount for p in predictions) else 0.3
# 检查方差合理性
variance = np.var(predictions)
expected_variance = (total_amount / num_people) ** 2
variance_score = max(0, 1 - abs(variance - expected_variance) / expected_variance)
# 综合评分
quality_score = (constraint_score + min_constraint_score + variance_score) / 3
return quality_score
def save_models(self, filepath: str):
"""保存训练好的模型"""
model_data = {
'models': self.models,
'scalers': self.scalers,
'training_data': self.training_data,
'feature_names': self.feature_names
}
with open(filepath, 'wb') as f:
pickle.dump(model_data, f)
print(f"模型已保存到: {filepath}")
def load_models(self, filepath: str):
"""加载训练好的模型"""
try:
with open(filepath, 'rb') as f:
model_data = pickle.load(f)
self.models = model_data['models']
self.scalers = model_data['scalers']
self.training_data = model_data['training_data']
self.feature_names = model_data['feature_names']
print(f"模型已从 {filepath} 加载")
except FileNotFoundError:
print(f"模型文件 {filepath} 不存在")
# 强化学习分配器
class ReinforcementLearningAllocator:
"""
强化学习分配器
通过试错学习优化分配策略
"""
def __init__(self, min_amount: float = 0.01, learning_rate: float = 0.1):
self.min_amount = min_amount
self.learning_rate = learning_rate
# Q表存储状态-动作值
self.q_table = {}
self.epsilon = 0.1 # 探索率
self.gamma = 0.9 # 折扣因子
# 状态空间定义
self.amount_bins = [0, 50, 100, 500, 1000, float('inf')]
self.people_bins = [0, 5, 10, 20, 50, float('inf')]
# 动作空间:不同的分配策略
self.action_strategies = [
'uniform',
'biased_forward',
'biased_backward',
'exponential',
'random_weighted'
]
def get_state(self, total_amount: float, num_people: int) -> Tuple[int, int]:
"""将连续状态离散化"""
amount_bin = bisect.bisect_right(self.amount_bins, total_amount) - 1
people_bin = bisect.bisect_right(self.people_bins, num_people) - 1
return (amount_bin, people_bin)
def select_action(self, state: Tuple[int, int]) -> str:
"""使用ε-贪婪策略选择动作"""
if np.random.random() < self.epsilon or state not in self.q_table:
# 探索:随机选择策略
return np.random.choice(self.action_strategies)
else:
# 利用:选择最佳策略
return max(self.q_table[state], key=self.q_table[state].get)
def execute_strategy(self, strategy: str, total_amount: float,
num_people: int) -> Tuple[List[float], Dict]:
"""执行指定的分配策略"""
allocations = []
remaining_amount = total_amount
if strategy == 'uniform':
# 均匀分配
allocations = [round(total_amount / num_people, 2)] * num_people
if sum(allocations) != total_amount:
allocations[-1] += total_amount - sum(allocations[:-1])
elif strategy == 'biased_forward':
# 前向偏置
for i in range(num_people - 1):
bias = 1.0 + (num_people - i) * 0.1
allocation = (bias / sum(1.0 + (num_people - j) * 0.1 for j in range(num_people - 1))) * remaining_amount
allocation = max(self.min_amount, allocation)
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
allocations.append(round(remaining_amount, 2))
elif strategy == 'exponential':
# 指数衰减
decay_factor = 0.7
for i in range(num_people - 1):
expected = (decay_factor ** i) * (total_amount * (1 - decay_factor))
allocation = max(self.min_amount, min(expected, remaining_amount - self.min_amount * (num_people - i - 1)))
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
allocations.append(round(remaining_amount, 2))
else: # random_weighted
# 随机权重
weights = np.random.exponential(1, num_people - 1)
weights /= weights.sum()
for i, weight in enumerate(weights):
allocation = weight * remaining_amount
allocation = max(self.min_amount, min(allocation, remaining_amount - self.min_amount * (num_people - i - 1)))
allocation = round(allocation, 2)
allocations.append(allocation)
remaining_amount -= allocation
allocations.append(round(remaining_amount, 2))
# 计算质量指标
quality_metrics = self._calculate_quality_metrics(allocations, total_amount)
return allocations, quality_metrics
def _calculate_quality_metrics(self, allocations: List[float],
total_amount: float) -> Dict:
"""计算分配质量指标"""
from fairness_analysis import FairnessAnalyzer
analyzer = FairnessAnalyzer()
return {
'gini_coefficient': analyzer.calculate_gini_coefficient(allocations),
'variance': np.var(allocations),
'coefficient_of_variation': np.std(allocations) / np.mean(allocations) if np.mean(allocations) > 0 else 0,
'fairness_score': 1 - analyzer.calculate_gini_coefficient(allocations),
'total_score': 1 - analyzer.calculate_gini_coefficient(allocations) # 简化的总评分
}
def rl_guided_allocation(self, total_amount: float, num_people: int,
num_episodes: int = 100) -> Tuple[List[float], Dict]:
"""
强化学习指导的分配
Args:
total_amount: 总金额
num_people: 参与人数
num_episodes: 训练回合数
Returns:
Tuple[List[float], Dict]: (分配结果, 学习信息)
"""
state = self.get_state(total_amount, num_people)
# 训练阶段
for episode in range(num_episodes):
action = self.select_action(state)
allocations, rewards = self.execute_strategy(action, total_amount, num_people)
# 更新Q值
if state not in self.q_table:
self.q_table[state] = {strategy: 0.0 for strategy in self.action_strategies}
current_q = self.q_table[state][action]
max_next_q = max(self.q_table.get(state, {}).values(), default=0)
# Q学习更新
new_q = current_q + self.learning_rate * (rewards['total_score'] + self.gamma * max_next_q - current_q)
self.q_table[state][action] = new_q
# 执行阶段:使用学习到的最佳策略
best_action = self.select_action(state) if state in self.q_table else np.random.choice(self.action_strategies)
final_allocations, final_rewards = self.execute_strategy(best_action, total_amount, num_people)
return final_allocations, {
'strategy_used': best_action,
'training_episodes': num_episodes,
'q_values': self.q_table.get(state, {}),
'quality_metrics': final_rewards,
'learning_progress': len(self.q_table)
}
# ML增强测试
def test_ml_enhanced_allocation():
"""测试ML增强分配"""
print("=== 机器学习增强分配测试 ===")
ml_allocator = MLEnhancedAllocator(min_amount=0.01)
# 模拟训练数据
print("生成模拟训练数据...")
for _ in range(50):
total_amount = np.random.uniform(50, 200)
num_people = np.random.randint(3, 15)
allocation_result = ml_allocator._heuristic_allocation(total_amount, num_people)
# 评估质量
from fairness_analysis import FairnessAnalyzer
analyzer = FairnessAnalyzer()
quality_score = 1 - analyzer.calculate_gini_coefficient(allocation_result)
training_data = {
'features': ml_allocator.generate_features(total_amount, num_people),
'targets': allocation_result,
'quality_score': quality_score
}
ml_allocator.add_training_data(training_data)
# 训练模型
ml_allocator.train_models()
# 测试ML指导的分配
print("\n测试ML指导的分配:")
total_amount = 100.0
num_people = 8
allocations, info = ml_allocator.ml_guided_allocation(total_amount, num_people)
print(f"ML增强分配结果: {allocations}")
print(f"分配信息: {info}")
# 强化学习测试
print("\n测试强化学习分配:")
rl_allocator = ReinforcementLearningAllocator(min_amount=0.01)
rl_allocations, rl_info = rl_allocator.rl_guided_allocation(total_amount, num_people, num_episodes=50)
print(f"RL分配结果: {rl_allocations}")
print(f"RL信息: {rl_info}")
if __name__ == "__main__":
test_ml_enhanced_allocation()
8. 实际应用与工程实现
8.1 高并发系统设计
8.1.1 分布式分配架构
import asyncio
import aioredis
from typing import List, Dict, Optional
from dataclasses import dataclass
import uuid
from enum import Enum
class AllocationStatus(Enum):
PENDING = "pending"
PROCESSING = "processing"
COMPLETED = "completed"
FAILED = "failed"
@dataclass
class AllocationRequest:
"""分配请求"""
request_id: str
total_amount: float
num_people: int
user_id: str
preferences: Dict
timestamp: float
priority: int = 1
@dataclass
class AllocationResponse:
"""分配响应"""
request_id: str
status: AllocationStatus
allocations: Optional[List[float]]
execution_time: float
error_message: Optional[str] = None
class DistributedAllocator:
"""
分布式红包分配系统
支持高并发、分布式处理、容错恢复
"""
def __init__(self, redis_url: str = "redis://localhost:6379"):
self.redis_url = redis_url
self.redis_pool = None
self.allocation_workers = {}
self.request_queue = asyncio.Queue()
self.response_cache = {}
# 系统配置
self.max_concurrent_requests = 100
self.request_timeout = 30.0
self.cache_ttl = 300 # 5分钟
async def initialize(self):
"""初始化系统"""
self.redis_pool = aioredis.ConnectionPool.from_url(self.redis_url)
self.redis = aioredis.Redis(connection_pool=self.redis_pool)
# 启动工作进程
for i in range(4):
worker_id = f"worker_{i}"
self.allocation_workers[worker_id] = asyncio.create_task(
self._worker_process(worker_id)
)
print(f"分布式分配系统已启动,包含 {len(self.allocation_workers)} 个工作进程")
async def process_allocation_request(self, request: AllocationRequest) -> AllocationResponse:
"""
处理分配请求
Args:
request: 分配请求
Returns:
AllocationResponse: 分配响应
"""
start_time = asyncio.get_event_loop().time()
try:
# 检查缓存
cached_response = await self._get_cached_response(request)
if cached_response:
return cached_response
# 提交到队列
await self.request_queue.put(request)
# 等待响应
response = await asyncio.wait_for(
self._wait_for_response(request.request_id),
timeout=self.request_timeout
)
execution_time = asyncio.get_event_loop().time() - start_time
response.execution_time = execution_time
# 缓存结果
await self._cache_response(request, response)
return response
except asyncio.TimeoutError:
execution_time = asyncio.get_event_loop().time() - start_time
return AllocationResponse(
request_id=request.request_id,
status=AllocationStatus.FAILED,
allocations=None,
execution_time=execution_time,
error_message="请求超时"
)
async def _worker_process(self, worker_id: str):
"""工作进程主循环"""
while True:
try:
# 从队列获取请求
request = await self.request_queue.get()
# 记录处理开始
await self.redis.hset(
f"request:{request.request_id}",
"status",
AllocationStatus.PROCESSING.value
)
# 执行分配
allocations = await self._execute_allocation(request)
# 保存结果
response = AllocationResponse(
request_id=request.request_id,
status=AllocationStatus.COMPLETED,
allocations=allocations,
execution_time=0.0
)
# 存储响应
await self._store_response(response)
# 标记任务完成
self.request_queue.task_done()
except Exception as e:
print(f"Worker {worker_id} error: {e}")
await asyncio.sleep(1) # 短暂暂停避免频繁错误
async def _execute_allocation(self, request: AllocationRequest) -> List[float]:
"""
执行分配逻辑
Args:
request: 分配请求
Returns:
List[float]: 分配结果
"""
# 根据偏好选择算法
algorithm = self._select_algorithm(request.preferences)
if algorithm == "simple_random":
allocator = SimpleRandomDistribution(min_amount=0.01)
return allocator.allocate(request.total_amount, request.num_people)
elif algorithm == "double_mean":
allocator = DoubleMeanAllocator(min_amount=0.01)
result = allocator.allocate(request.total_amount, request.num_people, track_history=False)
return result.allocations
elif algorithm == "ml_enhanced":
allocator = MLEnhancedAllocator(min_amount=0.01)
allocations, _ = allocator.ml_guided_allocation(request.total_amount, request.num_people)
return allocations
else:
# 默认使用简单随机算法
allocator = SimpleRandomDistribution(min_amount=0.01)
return allocator.allocate(request.total_amount, request.num_people)
def _select_algorithm(self, preferences: Dict) -> str:
"""根据偏好选择算法"""
if 'algorithm' in preferences:
return preferences['algorithm']
# 智能选择算法
if preferences.get('require_fairness', False):
return "double_mean"
elif preferences.get('enable_ml', False):
return "ml_enhanced"
else:
return "simple_random"
async def _get_cached_response(self, request: AllocationRequest) -> Optional[AllocationResponse]:
"""获取缓存的响应"""
cache_key = self._generate_cache_key(request)
cached_data = await self.redis.get(cache_key)
if cached_data:
try:
data = eval(cached_data.decode())
return AllocationResponse(**data)
except:
pass
return None
async def _cache_response(self, request: AllocationRequest, response: AllocationResponse):
"""缓存响应"""
cache_key = self._generate_cache_key(request)
cache_data = {
'request_id': response.request_id,
'status': response.status.value,
'allocations': response.allocations,
'execution_time': response.execution_time,
'error_message': response.error_message
}
await self.redis.setex(cache_key, self.cache_ttl, str(cache_data))
def _generate_cache_key(self, request: AllocationRequest) -> str:
"""生成缓存键"""
# 基于请求参数生成缓存键
key_components = [
f"amount:{request.total_amount:.2f}",
f"people:{request.num_people}",
f"user:{request.user_id}",
f"prefs:{hash(str(sorted(request.preferences.items())))}"
]
return f"allocation:{':'.join(key_components)}"
async def _store_response(self, response: AllocationResponse):
"""存储响应"""
response_key = f"response:{response.request_id}"
response_data = {
'request_id': response.request_id,
'status': response.status.value,
'allocations': response.allocations,
'execution_time': response.execution_time,
'error_message': response.error_message
}
await self.redis.set(response_key, str(response_data))
await self.redis.expire(response_key, self.cache_ttl)
async def _wait_for_response(self, request_id: str) -> AllocationResponse:
"""等待响应"""
while True:
response_data = await self.redis.get(f"response:{request_id}")
if response_data:
try:
data = eval(response_data.decode())
if data['status'] == AllocationStatus.COMPLETED.value:
return AllocationResponse(**data)
elif data['status'] == AllocationStatus.FAILED.value:
return AllocationResponse(**data)
except:
pass
await asyncio.sleep(0.1) # 短暂等待
async def get_system_statistics(self) -> Dict:
"""获取系统统计信息"""
stats = await self.redis.info()
return {
'redis_connected_clients': stats.get('connected_clients', 0),
'redis_used_memory': stats.get('used_memory', 0),
'redis_keyspace_hits': stats.get('keyspace_hits', 0),
'redis_keyspace_misses': stats.get('keyspace_misses', 0),
'queue_size': self.request_queue.qsize(),
'active_workers': len(self.allocation_workers)
}
async def shutdown(self):
"""关闭系统"""
# 取消工作进程
for task in self.allocation_workers.values():
task.cancel()
# 关闭Redis连接
if self.redis:
await self.redis.close()
if self.redis_pool:
await self.redis_pool.disconnect()
# 负载均衡器
class LoadBalancer:
"""
负载均衡器
分配请求到不同的分配器实例
"""
def __init__(self, allocators: List[DistributedAllocator]):
self.allocators = allocators
self.current_index = 0
self.request_counts = {i: 0 for i in range(len(allocators))}
async def route_request(self, request: AllocationRequest) -> AllocationResponse:
"""路由请求到最佳分配器"""
# 选择负载最低的分配器
best_allocator = self._select_least_loaded_allocator()
# 路由请求
response = await best_allocator.process_allocation_request(request)
return response
def _select_least_loaded_allocator(self) -> DistributedAllocator:
"""选择负载最低的分配器"""
# 轮询策略 + 负载检查
allocator = self.allocators[self.current_index]
self.current_index = (self.current_index + 1) % len(self.allocators)
return allocator
# 性能监控系统
class PerformanceMonitor:
"""性能监控系统"""
def __init__(self):
self.metrics = {
'total_requests': 0,
'successful_requests': 0,
'failed_requests': 0,
'average_response_time': 0,
'requests_per_second': 0,
'error_rate': 0
}
self.response_times = []
self.request_timestamps = []
async def record_request(self, response: AllocationResponse):
"""记录请求指标"""
self.metrics['total_requests'] += 1
if response.status == AllocationStatus.COMPLETED:
self.metrics['successful_requests'] += 1
self.response_times.append(response.execution_time)
self.request_timestamps.append(asyncio.get_event_loop().time())
else:
self.metrics['failed_requests'] += 1
# 更新计算指标
self._update_metrics()
def _update_metrics(self):
"""更新计算指标"""
total = self.metrics['total_requests']
if total > 0:
self.metrics['error_rate'] = self.metrics['failed_requests'] / total
if self.response_times:
self.metrics['average_response_time'] = sum(self.response_times) / len(self.response_times)
# 计算RPS
current_time = asyncio.get_event_loop().time()
recent_requests = [t for t in self.request_timestamps if t > current_time - 60]
self.metrics['requests_per_second'] = len(recent_requests) / 60
def get_performance_report(self) -> Dict:
"""获取性能报告"""
return {
'total_requests': self.metrics['total_requests'],
'success_rate': self.metrics['successful_requests'] / max(1, self.metrics['total_requests']),
'error_rate': self.metrics['error_rate'],
'average_response_time_ms': self.metrics['average_response_time'] * 1000,
'requests_per_second': self.metrics['requests_per_second'],
'p95_response_time_ms': self._calculate_percentile(95) * 1000,
'p99_response_time_ms': self._calculate_percentile(99) * 1000
}
def _calculate_percentile(self, percentile: float) -> float:
"""计算响应时间的百分位数"""
if not self.response_times:
return 0
sorted_times = sorted(self.response_times)
index = int(len(sorted_times) * percentile / 100)
return sorted_times[min(index, len(sorted_times) - 1)]
# 实际应用测试
async def test_distributed_allocation():
"""测试分布式分配系统"""
print("=== 分布式红包分配系统测试 ===")
# 创建分布式分配器
allocator = DistributedAllocator()
await allocator.initialize()
# 创建性能监控器
monitor = PerformanceMonitor()
# 生成测试请求
test_requests = []
for i in range(20):
request = AllocationRequest(
request_id=f"test_{i}",
total_amount=100.0 + i * 10,
num_people=5 + (i % 5),
user_id=f"user_{i % 10}",
preferences={'algorithm': 'simple_random'},
timestamp=asyncio.get_event_loop().time(),
priority=1
)
test_requests.append(request)
print("处理并发请求...")
# 并发处理请求
tasks = [
allocator.process_allocation_request(request)
for request in test_requests
]
responses = await asyncio.gather(*tasks, return_exceptions=True)
# 记录性能指标
for response in responses:
if not isinstance(response, Exception):
await monitor.record_request(response)
print(f"Request {response.request_id}: {response.status.value} in {response.execution_time:.4f}s")
# 性能报告
print("\n性能报告:")
report = monitor.get_performance_report()
for key, value in report.items():
if isinstance(value, float):
print(f" {key}: {value:.4f}")
else:
print(f" {key}: {value}")
# 系统统计
print("\n系统统计:")
stats = await allocator.get_system_statistics()
for key, value in stats.items():
print(f" {key}: {value}")
# 关闭系统
await allocator.shutdown()
if __name__ == "__main__":
asyncio.run(test_distributed_allocation())
8.2 安全性与防护机制
8.2.1 安全审计系统
import hashlib
import hmac
from typing import Dict, List, Optional, Set
from dataclasses import dataclass
from datetime import datetime, timedelta
import jwt
from cryptography.fernet import Fernet
import json
@dataclass
class SecurityEvent:
"""安全事件"""
event_id: str
event_type: str
user_id: str
request_id: str
timestamp: float
severity: str # LOW, MEDIUM, HIGH, CRITICAL
details: Dict
ip_address: str
user_agent: str
@dataclass
class SecurityConfig:
"""安全配置"""
jwt_secret: str
encryption_key: bytes
max_requests_per_minute: int = 60
max_requests_per_hour: int = 1000
suspicious_keywords: Set[str] = None
blocked_ips: Set[str] = None
class SecurityAuditSystem:
"""
安全审计系统
提供完整的请求审计、安全检查、威胁检测功能
"""
def __init__(self, config: SecurityConfig):
self.config = config
self.security_events = []
self.user_request_history = {}
self.rate_limit_data = {}
# 初始化加密
self.cipher_suite = Fernet(self.config.encryption_key)
# 威胁模式检测
self.threat_patterns = {
'sql_injection': [
'union', 'select', 'drop', 'insert', 'update', 'delete',
'exec', 'execute', 'script', 'eval', 'function'
],
'buffer_overflow': [
'a' * 1000, 'x' * 500, '{' * 200
],
'path_traversal': [
'../../../', '..\\..\\', '%2e%2e%2f', '..%2f'
]
}
# 异常行为检测
self.anomaly_thresholds = {
'rapid_requests': 10, # 10秒内10次请求
'unusual_amount': 10000, # 异常大额分配
'unusual_people_count': 1000, # 异常多的人数
'geometric_progression': 0.9 # 几何级数分配检测
}
def create_secure_token(self, user_id: str, request_data: Dict) -> str:
"""
创建安全令牌
Args:
user_id: 用户ID
request_data: 请求数据
Returns:
str: JWT令牌
"""
payload = {
'user_id': user_id,
'timestamp': datetime.utcnow().timestamp(),
'request_hash': self._create_request_hash(request_data),
'exp': datetime.utcnow() + timedelta(hours=24)
}
token = jwt.encode(payload, self.config.jwt_secret, algorithm='HS256')
return token
def verify_secure_token(self, token: str, expected_request_data: Dict) -> bool:
"""
验证安全令牌
Args:
token: JWT令牌
expected_request_data: 期望的请求数据
Returns:
bool: 验证是否通过
"""
try:
payload = jwt.decode(token, self.config.jwt_secret, algorithms=['HS256'])
# 检查过期时间
if datetime.utcnow() > datetime.fromtimestamp(payload['exp']):
return False
# 检查请求数据完整性
actual_hash = self._create_request_hash(expected_request_data)
if actual_hash != payload['request_hash']:
return False
return True
except jwt.InvalidTokenError:
return False
def _create_request_hash(self, request_data: Dict) -> str:
"""创建请求数据哈希"""
data_str = json.dumps(request_data, sort_keys=True)
return hashlib.sha256(data_str.encode()).hexdigest()
def encrypt_sensitive_data(self, data: str) -> str:
"""加密敏感数据"""
return self.cipher_suite.encrypt(data.encode()).decode()
def decrypt_sensitive_data(self, encrypted_data: str) -> str:
"""解密敏感数据"""
return self.cipher_suite.decrypt(encrypted_data.encode()).decode()
def check_rate_limit(self, user_id: str, ip_address: str) -> bool:
"""
检查速率限制
Args:
user_id: 用户ID
ip_address: IP地址
Returns:
bool: 是否超过限制
"""
current_time = datetime.utcnow().timestamp()
# 清理过期数据
self._cleanup_expired_rate_data(current_time)
# 检查用户速率限制
if not self._check_user_rate_limit(user_id, current_time):
self._log_security_event(
'RATE_LIMIT_EXCEEDED',
user_id,
'HIGH',
{'limit_type': 'user', 'ip': ip_address}
)
return False
# 检查IP速率限制
if not self._check_ip_rate_limit(ip_address, current_time):
self._log_security_event(
'IP_RATE_LIMIT_EXCEEDED',
user_id,
'HIGH',
{'ip': ip_address}
)
return False
return True
def _check_user_rate_limit(self, user_id: str, current_time: float) -> bool:
"""检查用户速率限制"""
if user_id not in self.rate_limit_data:
self.rate_limit_data[user_id] = {'requests': [], 'ip_requests': {}}
user_data = self.rate_limit_data[user_id]
# 检查每分钟限制
recent_minute = [t for t in user_data['requests'] if current_time - t < 60]
if len(recent_minute) >= self.config.max_requests_per_minute:
return False
# 检查每小时限制
recent_hour = [t for t in user_data['requests'] if current_time - t < 3600]
if len(recent_hour) >= self.config.max_requests_per_hour:
return False
# 记录请求时间
user_data['requests'].append(current_time)
return True
def _check_ip_rate_limit(self, ip_address: str, current_time: float) -> bool:
"""检查IP速率限制"""
# IP速率限制(更严格)
ip_key = f"ip_{ip_address}"
if ip_key not in self.rate_limit_data:
self.rate_limit_data[ip_key] = []
recent_requests = [t for t in self.rate_limit_data[ip_key] if current_time - t < 60]
# IP每分钟最多30次请求
if len(recent_requests) >= 30:
return False
self.rate_limit_data[ip_address] = [current_time]
return True
def _cleanup_expired_rate_data(self, current_time: float):
"""清理过期的速率限制数据"""
cutoff_time = current_time - 3600 # 1小时前
for key in list(self.rate_limit_data.keys()):
if isinstance(self.rate_limit_data[key], dict) and 'requests' in self.rate_limit_data[key]:
self.rate_limit_data[key]['requests'] = [
t for t in self.rate_limit_data[key]['requests'] if t > cutoff_time
]
elif isinstance(self.rate_limit_data[key], list):
self.rate_limit_data[key] = [
t for t in self.rate_limit_data[key] if t > cutoff_time
]
def detect_threat_patterns(self, request_data: Dict) -> List[str]:
"""
检测威胁模式
Args:
request_data: 请求数据
Returns:
List[str]: 检测到的威胁类型列表
"""
threats = []
# 检查所有字符串字段
def check_recursive(obj, path=""):
if isinstance(obj, dict):
for key, value in obj.items():
check_recursive(value, f"{path}.{key}" if path else key)
elif isinstance(obj, list):
for i, item in enumerate(obj):
check_recursive(item, f"{path}[{i}]")
elif isinstance(obj, str):
for threat_type, patterns in self.threat_patterns.items():
for pattern in patterns:
if pattern.lower() in obj.lower():
threats.append(f"{threat_type}:{path}")
check_recursive(request_data)
return list(set(threats)) # 去重
def detect_anomalies(self, request_data: Dict, user_id: str) -> List[str]:
"""
检测异常行为
Args:
request_data: 请求数据
user_id: 用户ID
Returns:
List[str]: 检测到的异常类型列表
"""
anomalies = []
# 检查异常大额
total_amount = request_data.get('total_amount', 0)
if total_amount > self.anomaly_thresholds['unusual_amount']:
anomalies.append(f"UNUSUAL_AMOUNT:{total_amount}")
# 检查异常人数
num_people = request_data.get('num_people', 0)
if num_people > self.anomaly_thresholds['unusual_people_count']:
anomalies.append(f"UNUSUAL_PEOPLE_COUNT:{num_people}")
# 检查分配模式异常
allocations = request_data.get('allocations', [])
if len(allocations) > 2:
# 检查几何级数模式
ratios = []
for i in range(1, len(allocations)):
if allocations[i-1] != 0:
ratio = allocations[i] / allocations[i-1]
ratios.append(ratio)
if ratios and all(abs(r - ratios[0]) < 0.1 for r in ratios):
if ratios[0] > self.anomaly_thresholds['geometric_progression']:
anomalies.append("GEOMETRIC_PROGRESSION_DETECTED")
# 检查用户行为模式
if not self._check_user_behavior_pattern(user_id, request_data):
anomalies.append("ANOMALOUS_USER_BEHAVIOR")
return anomalies
def _check_user_behavior_pattern(self, user_id: str, request_data: Dict) -> bool:
"""检查用户行为模式"""
current_time = datetime.utcnow().timestamp()
if user_id not in self.user_request_history:
self.user_request_history[user_id] = []
user_history = self.user_request_history[user_id]
# 检查快速连续请求
recent_requests = [req for req in user_history if current_time - req['timestamp'] < 10]
if len(recent_requests) >= self.anomaly_thresholds['rapid_requests']:
return False
# 添加当前请求到历史
user_history.append({
'timestamp': current_time,
'total_amount': request_data.get('total_amount', 0),
'num_people': request_data.get('num_people', 0)
})
# 保持历史记录大小
if len(user_history) > 100:
user_history[:] = user_history[-50:]
return True
def _log_security_event(self, event_type: str, user_id: str,
severity: str, details: Dict):
"""记录安全事件"""
event = SecurityEvent(
event_id=str(uuid.uuid4()),
event_type=event_type,
user_id=user_id,
request_id=details.get('request_id', ''),
timestamp=datetime.utcnow().timestamp(),
severity=severity,
details=details,
ip_address=details.get('ip', ''),
user_agent=details.get('user_agent', '')
)
self.security_events.append(event)
# 保持事件日志大小
if len(self.security_events) > 10000:
self.security_events[:] = self.security_events[-5000:]
def comprehensive_security_check(self, request_data: Dict,
user_id: str, ip_address: str,
user_agent: str, token: str = None) -> Dict:
"""
综合安全检查
Args:
request_data: 请求数据
user_id: 用户ID
ip_address: IP地址
user_agent: 用户代理
token: 安全令牌
Returns:
Dict: 安全检查结果
"""
security_result = {
'is_secure': True,
'threats': [],
'anomalies': [],
'rate_limited': False,
'token_valid': False,
'risk_score': 0,
'recommendations': []
}
# 检查令牌
if token:
security_result['token_valid'] = self.verify_secure_token(token, request_data)
if not security_result['token_valid']:
security_result['is_secure'] = False
security_result['threats'].append('INVALID_TOKEN')
# 检查速率限制
if not self.check_rate_limit(user_id, ip_address):
security_result['rate_limited'] = True
security_result['is_secure'] = False
security_result['recommendations'].append("实施更严格的速率限制")
# 检查IP黑名单
if ip_address in self.config.blocked_ips:
security_result['is_secure'] = False
security_result['threats'].append('BLOCKED_IP')
return security_result
# 检测威胁模式
threats = self.detect_threat_patterns(request_data)
security_result['threats'].extend(threats)
if threats:
security_result['is_secure'] = False
# 检测异常行为
anomalies = self.detect_anomalies(request_data, user_id)
security_result['anomalies'].extend(anomalies)
if anomalies:
security_result['risk_score'] += len(anomalies) * 10
# 计算风险评分
if security_result['threats']:
security_result['risk_score'] += len(security_result['threats']) * 20
if security_result['rate_limited']:
security_result['risk_score'] += 15
if not security_result['token_valid']:
security_result['risk_score'] += 30
# 风险等级评估
if security_result['risk_score'] >= 70:
security_result['risk_level'] = 'HIGH'
security_result['is_secure'] = False
elif security_result['risk_score'] >= 40:
security_result['risk_level'] = 'MEDIUM'
else:
security_result['risk_level'] = 'LOW'
# 生成建议
if security_result['threats']:
security_result['recommendations'].append("审查和清理请求数据")
if security_result['anomalies']:
security_result['recommendations'].append("监控用户行为模式")
if security_result['rate_limited']:
security_result['recommendations'].append("考虑增加合法用户的限制阈值")
return security_result
def get_security_dashboard(self) -> Dict:
"""获取安全仪表板数据"""
current_time = datetime.utcnow().timestamp()
recent_events = [e for e in self.security_events if current_time - e.timestamp < 3600]
# 按类型统计
event_counts = {}
for event in recent_events:
event_counts[event.event_type] = event_counts.get(event.event_type, 0) + 1
# 按严重程度统计
severity_counts = {}
for event in recent_events:
severity_counts[event.severity] = severity_counts.get(event.severity, 0) + 1
return {
'total_events_last_hour': len(recent_events),
'event_types': event_counts,
'severity_breakdown': severity_counts,
'active_rate_limited_users': len([user for user in self.rate_limit_data.keys()
if not user.startswith('ip_')]),
'blocked_ips_count': len(self.config.blocked_ips),
'security_recommendations': self._generate_security_recommendations()
}
def _generate_security_recommendations(self) -> List[str]:
"""生成安全建议"""
recommendations = []
# 基于事件频率的建议
current_time = datetime.utcnow().timestamp()
recent_events = [e for e in self.security_events if current_time - e.timestamp < 3600]
rate_limit_events = len([e for e in recent_events if e.event_type == 'RATE_LIMIT_EXCEEDED'])
if rate_limit_events > 100:
recommendations.append("考虑实施更细粒度的速率限制策略")
threat_events = len([e for e in recent_events if 'sql_injection' in e.event_type.lower()])
if threat_events > 10:
recommendations.append("加强输入验证和数据清理")
return recommendations
# 安全系统测试
def test_security_system():
"""测试安全系统"""
print("=== 安全审计系统测试 ===")
# 生成随机密钥
encryption_key = Fernet.generate_key()
config = SecurityConfig(
jwt_secret="your-secret-key-here",
encryption_key=encryption_key,
max_requests_per_minute=60,
max_requests_per_hour=1000
)
security_system = SecurityAuditSystem(config)
# 测试令牌创建和验证
print("\n--- 令牌安全测试 ---")
request_data = {
'total_amount': 100.0,
'num_people': 5,
'user_id': 'test_user'
}
token = security_system.create_secure_token('test_user', request_data)
print(f"生成的令牌: {token[:50]}...")
is_valid = security_system.verify_secure_token(token, request_data)
print(f"令牌验证结果: {is_valid}")
# 测试威胁检测
print("\n--- 威胁检测测试 ---")
malicious_data = {
'total_amount': 100,
'num_people': 5,
'user_note': 'UNION SELECT * FROM users WHERE 1=1'
}
threats = security_system.detect_threat_patterns(malicious_data)
print(f"检测到的威胁: {threats}")
# 测试异常检测
print("\n--- 异常检测测试 ---")
anomalous_data = {
'total_amount': 50000, # 异常大额
'num_people': 5,
'allocations': [1000, 500, 250, 125, 62.5] # 几何级数
}
anomalies = security_system.detect_anomalies(anomalous_data, 'test_user')
print(f"检测到的异常: {anomalies}")
# 综合安全检查
print("\n--- 综合安全检查 ---")
security_result = security_system.comprehensive_security_check(
request_data=malicious_data,
user_id='test_user',
ip_address='192.168.1.100',
user_agent='Test Agent',
token=token
)
print("安全检查结果:")
for key, value in security_result.items():
print(f" {key}: {value}")
# 安全仪表板
print("\n--- 安全仪表板 ---")
dashboard = security_system.get_security_dashboard()
print("仪表板数据:")
for key, value in dashboard.items():
print(f" {key}: {value}")
if __name__ == "__main__":
test_security_system()
509

被折叠的 条评论
为什么被折叠?



