4.1综合实战pandas数据可视化

最新推荐文章于 2025-02-28 00:15:00 发布

批量小王子

最新推荐文章于 2025-02-28 00:15:00 发布

阅读量601

点赞数 21

分类专栏： 03_pandas系列课程文章标签：信息可视化 pandas 数据分析

本文链接：https://blog.csdn.net/m0_58149406/article/details/145170205

版权

03_pandas系列课程专栏收录该内容

15 篇文章

订阅专栏

4.1 数据可视化

Matplotlib 基础

折线图：

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
dates = pd.date_range('20230101', periods=100)
values = np.random.randn(100).cumsum()
df = pd.DataFrame({'date': dates, 'value': values})

# 绘制折线图
plt.figure(figsize=(10, 6))
plt.plot(df['date'], df['value'], marker='o', linestyle='-', color='b')
plt.title('Value Over Time', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Value', fontsize=12)
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个折线图，x轴为日期，y轴为值，展示随时间变化的趋势
包含网格线、数据点标记、倾斜的日期标签
"""

柱状图：

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
categories = ['A', 'B', 'C', 'D', 'E']
values = np.random.randint(50, 200, size=len(categories))
df = pd.DataFrame({'category': categories, 'value': values})

# 设置颜色和样式
colors = plt.cm.viridis(np.linspace(0, 1, len(categories)))

# 创建柱状图
plt.figure(figsize=(10, 6))
bars = plt.bar(df['category'], df['value'], color=colors, edgecolor='black')

# 添加数值标签
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{height:.0f}',
             ha='center', va='bottom', fontsize=12)

# 设置图表样式
plt.title('Value by Category', fontsize=14, pad=20)
plt.xlabel('Category', fontsize=12)
plt.ylabel('Value', fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个柱状图，x轴为类别，y轴为值，展示不同类别的值对比
包含：
- 渐变色柱状图
- 数值标签
- 网格线
- 自定义字体大小
- 黑色边框
"""

散点图：

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
x = np.random.normal(0, 1, 100)
y = x * 2 + np.random.normal(0, 0.5, 100)
categories = np.random.choice(['A', 'B', 'C'], size=100)
df = pd.DataFrame({'x': x, 'y': y, 'category': categories})

# 设置颜色映射
cmap = plt.cm.get_cmap('viridis', len(df['category'].unique()))

# 创建散点图
plt.figure(figsize=(10, 6))
for i, category in enumerate(df['category'].unique()):
    subset = df[df['category'] == category]
    plt.scatter(subset['x'], subset['y'], 
                color=cmap(i), 
                label=category,
                alpha=0.7,
                edgecolor='black',
                s=100)

# 添加回归线
z = np.polyfit(df['x'], df['y'], 1)
p = np.poly1d(z)
plt.plot(df['x'], p(df['x']), "r--", linewidth=2)

# 设置图表样式
plt.title('Scatter Plot with Regression Line', fontsize=14)
plt.xlabel('X Value', fontsize=12)
plt.ylabel('Y Value', fontsize=12)
plt.legend(title='Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个散点图，包含：
- 不同类别的数据点用不同颜色表示
- 黑色边框的数据点
- 回归线
- 网格线
- 图例
- 自定义字体大小
"""

Seaborn 基础

分布图：

 import pandas as pd
 import seaborn as sns
 import matplotlib.pyplot as plt
 import numpy as np
 import cartopy.crs as ccrs
 import cartopy.feature as cfeature

# 创建示例数据
np.random.seed(42)
normal_data = np.random.normal(0, 1, 1000)
skew_data = np.random.chisquare(5, 1000)
df = pd.DataFrame({
    'Normal': normal_data,
    'Skewed': skew_data
})

# 创建分布图
plt.figure(figsize=(12, 6))

# 正态分布
plt.subplot(1, 2, 1)
sns.histplot(df['Normal'], kde=True, color='blue', bins=30)
plt.title('Normal Distribution', fontsize=14)
plt.xlabel('Value', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.5)

# 偏态分布
plt.subplot(1, 2, 2)
sns.histplot(df['Skewed'], kde=True, color='green', bins=30)
plt.title('Skewed Distribution', fontsize=14)
plt.xlabel('Value', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()
"""
输出示例：
显示两个分布图，包含：
- 正态分布和偏态分布的对比
- 核密度估计曲线
- 网格线
- 自定义字体大小
- 分面显示
"""

热力图：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
data = np.random.rand(10, 10)
columns = [f'Feature_{i+1}' for i in range(10)]
df = pd.DataFrame(data, columns=columns)

# 创建热力图
plt.figure(figsize=(12, 8))
ax = sns.heatmap(
    df.corr(),
    annot=True,
    cmap='coolwarm',
    vmin=-1,
    vmax=1,
    linewidths=0.5,
    linecolor='white',
    annot_kws={'size': 10, 'color': 'black'},
    cbar_kws={'shrink': 0.8}
)

# 设置图表样式
plt.title('Feature Correlation Heatmap', fontsize=16, pad=20)
plt.xticks(fontsize=10, rotation=45)
plt.yticks(fontsize=10)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个热力图，包含：
- 10个特征之间的相关性
- 颜色深浅表示相关性强弱（-1到1）
- 白色分隔线
- 相关性数值显示
- 自定义字体大小
- 颜色条
- 特征名称旋转45度
"""

箱线图：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
categories = ['A', 'B', 'C', 'D']
data = {
    'category': np.repeat(categories, 100),
    'value': np.concatenate([
        np.random.normal(10, 2, 100),  # A
        np.random.normal(15, 3, 100),  # B
        np.random.normal(20, 4, 100),  # C
        np.random.normal(25, 5, 100)   # D
    ])
}
df = pd.DataFrame(data)

# 创建箱线图
plt.figure(figsize=(10, 6))
ax = sns.boxplot(
    x='category',
    y='value',
    data=df,
    palette='Set2',
    showmeans=True,
    meanprops={
        'marker': 'D',
        'markerfacecolor': 'white',
        'markeredgecolor': 'black'
    }
)

# 添加数据点
sns.stripplot(
    x='category',
    y='value',
    data=df,
    color='black',
    alpha=0.3,
    jitter=True,
    ax=ax
)

# 设置图表样式
plt.title('Value Distribution by Category', fontsize=14)
plt.xlabel('Category', fontsize=12)
plt.ylabel('Value', fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个箱线图，包含：
- 4个类别的值分布
- 不同颜色的箱体
- 显示均值（菱形标记）
- 原始数据点（黑色散点）
- 网格线
- 自定义字体大小
- 数据抖动显示
"""

高级可视化技巧

大数据集可视化：

# 使用Datashader处理百万级数据点
import datashader as ds
import datashader.transfer_functions as tf
from datashader import reductions
import pandas as pd
import numpy as np

# 生成百万级数据
num_points = 1_000_000
df = pd.DataFrame({
    'x': np.random.normal(0, 1, num_points),
    'y': np.random.normal(0, 1, num_points),
    'value': np.random.normal(0, 1, num_points)
})

# 创建画布
canvas = ds.Canvas(plot_width=800, plot_height=600)

# 聚合数据
agg = canvas.points(df, 'x', 'y', ds.mean('value'))

# 渲染图像
img = tf.shade(agg, cmap=['lightblue', 'darkblue'], how='log')
tf.set_background(img, 'black')
img.to_pil().show()

交互式仪表盘：

# 使用Dash创建交互式仪表盘
import dash
from dash import dcc, html
import plotly.express as px
import pandas as pd

# 创建示例数据
df = pd.DataFrame({
    'x': range(100),
    'y': np.random.normal(0, 1, 100),
    'category': np.random.choice(['A', 'B', 'C'], 100)
})

# 创建Dash应用
app = dash.Dash(__name__)

app.layout = html.Div([
    dcc.Graph(id='scatter-plot'),
    dcc.Dropdown(
        id='category-dropdown',
        options=[{'label': i, 'value': i} for i in df['category'].unique()],
        value='A'
    )
])

@app.callback(
    dash.dependencies.Output('scatter-plot', 'figure'),
    [dash.dependencies.Input('category-dropdown', 'value')]
)
def update_graph(selected_category):
    filtered_df = df[df['category'] == selected_category]
    fig = px.scatter(filtered_df, x='x', y='y')
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)

动画可视化：

# 使用Plotly创建动画
import plotly.express as px
import pandas as pd
import numpy as np

# 创建示例数据
df = pd.DataFrame({
    'x': np.random.normal(0, 1, 1000),
    'y': np.random.normal(0, 1, 1000),
    'frame': np.repeat(np.arange(100), 10),
    'category': np.random.choice(['A', 'B', 'C'], 1000)
})

fig = px.scatter(
    df, x='x', y='y', 
    animation_frame='frame',
    color='category',
    range_x=[-5, 5],
    range_y=[-5, 5]
)
fig.show()

3D可视化：

# 使用Plotly创建3D散点图
import plotly.express as px
import pandas as pd
import numpy as np

# 创建示例数据
df = pd.DataFrame({
    'x': np.random.normal(0, 1, 1000),
    'y': np.random.normal(0, 1, 1000),
    'z': np.random.normal(0, 1, 1000),
    'value': np.random.normal(0, 1, 1000)
})

fig = px.scatter_3d(
    df, x='x', y='y', z='z',
    color='value',
    size_max=10,
    opacity=0.7
)
fig.show()

地理数据可视化：

# 使用Plotly创建地理散点图
import plotly.express as px
import pandas as pd
import numpy as np

# 创建示例数据
df = pd.DataFrame({
    'lat': np.random.uniform(22.5, 25.5, 1000),
    'lon': np.random.uniform(120.5, 122.5, 1000),
    'value': np.random.normal(0, 1, 1000)
})

fig = px.scatter_geo(
    df, lat='lat', lon='lon',
    color='value',
    size='value',
    projection='natural earth'
)
fig.show()

网络图可视化：

# 使用NetworkX和Plotly创建网络图
import networkx as nx
import plotly.graph_objects as go
import numpy as np

# 创建示例网络
G = nx.random_geometric_graph(200, 0.125)

# 获取节点位置
pos = nx.get_node_attributes(G, 'pos')

# 创建边轨迹
edge_trace = []
for edge in G.edges():
    x0, y0 = pos[edge[0]]
    x1, y1 = pos[edge[1]]
    edge_trace.append(go.Scatter(
        x=[x0, x1, None], y=[y0, y1, None],
        line=dict(width=0.5, color='#888'),
        hoverinfo='none',
        mode='lines'
    ))

# 创建节点轨迹
node_trace = go.Scatter(
    x=[], y=[], text=[], mode='markers',
    hoverinfo='text',
    marker=dict(
        showscale=True,
        colorscale='YlGnBu',
        size=10,
        colorbar=dict(
            thickness=15,
            title='Node Connections',
            xanchor='left',
            titleside='right'
        )
    )
)

# 添加节点信息
for node in G.nodes():
    x, y = pos[node]
    node_trace['x'] += (x,)
    node_trace['y'] += (y,)
    node_trace['text'] += (f'Node {node}',)

# 创建图形
fig = go.Figure(data=edge_trace + [node_trace],
               layout=go.Layout(
                   showlegend=False,
                   hovermode='closest',
                   margin=dict(b=20,l=5,r=5,t=40),
                   xaxis=dict(showgrid=False, zeroline=False),
                   yaxis=dict(showgrid=False, zeroline=False)
               ))
fig.show()

自动化可视化流程：

class AutoVisualizer:
    def __init__(self, df):
        self.df = df
        
    def visualize(self, plot_type='scatter', **kwargs):
        if plot_type == 'scatter':
            return self._scatter_plot(**kwargs)
        elif plot_type == 'line':
            return self._line_plot(**kwargs)
        elif plot_type == 'bar':
            return self._bar_plot(**kwargs)
        else:
            raise ValueError(f"Unsupported plot type: {plot_type}")
            
    def _scatter_plot(self, x, y, color=None, size=None):
        fig = px.scatter(self.df, x=x, y=y, color=color, size=size)
        return fig
        
    def _line_plot(self, x, y, color=None):
        fig = px.line(self.df, x=x, y=y, color=color)
        return fig
        
    def _bar_plot(self, x, y, color=None):
        fig = px.bar(self.df, x=x, y=y, color=color)
        return fig

可视化最佳实践

选择合适的图表类型：
- 比较数据：柱状图、条形图
- 显示趋势：折线图、面积图
- 显示分布：直方图、箱线图
- 显示关系：散点图、气泡图
- 显示组成：饼图、堆叠图
- 显示地理数据：地图、热力图
颜色使用指南：
- 使用颜色区分不同类别
- 避免使用过多颜色（通常不超过7种）
- 使用渐变色表示连续值
- 确保颜色对比度足够
- 考虑色盲友好配色
交互设计原则：
- 提供工具提示（hover info）
- 支持缩放和平移
- 添加筛选和排序功能
- 提供数据导出选项
- 保持响应式设计
性能优化技巧：
- 对大数据集进行采样或聚合
- 使用WebGL加速渲染
- 启用懒加载
- 使用服务器端渲染
- 优化数据格式
可访问性设计：
- 提供文字描述
- 支持键盘导航
- 确保足够的对比度
- 提供替代文本
- 支持屏幕阅读器

实际案例

电商销售趋势：

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=90)
base_sales = np.linspace(1000, 5000, 90)
noise = np.random.normal(0, 300, 90)
promotions = np.random.choice([0, 1], size=90, p=[0.8, 0.2]) * 1000
sales = base_sales + noise + promotions
df = pd.DataFrame({'date': dates, 'sales': sales})

# 创建折线图
plt.figure(figsize=(14, 7))
plt.plot(df['date'], df['sales'], 
         marker='o', 
         linestyle='-', 
         color='royalblue',
         linewidth=2,
         markersize=8,
         label='Daily Sales')

# 标记促销日
promo_dates = df[df['sales'] > df['sales'].quantile(0.9)]['date']
plt.scatter(promo_dates, 
            df.loc[df['date'].isin(promo_dates), 'sales'],
            color='red',
            s=100,
            zorder=5,
            label='Promotion Days')

# 设置图表样式
plt.title('E-commerce Sales Trend with Promotions', fontsize=16, pad=20)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Sales (USD)', fontsize=14)
plt.xticks(rotation=45, fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(fontsize=12, loc='upper left')
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个折线图，包含：
- 90天的销售趋势
- 基础销售增长
- 随机波动
- 促销日标记（红色圆点）
- 网格线
- 自定义字体大小
- 日期标签旋转45度
- 图例说明
"""

社交媒体参与度：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
platforms = ['Facebook', 'Instagram', 'Twitter', 'TikTok', 'LinkedIn']
engagement = {
    'Facebook': np.random.normal(5000, 500, 100),
    'Instagram': np.random.normal(7000, 700, 100),
    'Twitter': np.random.normal(3000, 300, 100),
    'TikTok': np.random.normal(9000, 900, 100),
    'LinkedIn': np.random.normal(2000, 200, 100)
}
df = pd.DataFrame(engagement).melt(var_name='platform', value_name='engagement')

# 创建柱状图
plt.figure(figsize=(12, 7))
ax = sns.barplot(
    x='platform',
    y='engagement',
    data=df,
    palette='viridis',
    ci='sd',  # 显示标准差
    capsize=0.1  # 误差线端帽
)

# 添加数值标签
for p in ax.patches:
    ax.annotate(
        f'{p.get_height():.0f}',
        (p.get_x() + p.get_width() / 2., p.get_height()),
        ha='center',
        va='center',
        xytext=(0, 10),
        textcoords='offset points',
        fontsize=12
    )

# 设置图表样式
plt.title('Social Media Engagement Analysis', fontsize=16, pad=20)
plt.xlabel('Platform', fontsize=14)
plt.ylabel('Engagement (per post)', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个柱状图，包含：
- 5个社交媒体平台的参与度对比
- 误差线显示数据波动
- 数值标签显示平均值
- 渐变色柱状图
- 网格线
- 自定义字体大小
- 误差线端帽
"""

金融回报分布：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
days = 1000
returns = np.random.normal(0.001, 0.02, days).cumsum()
df = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=days),
    'return': returns
})

# 创建分布图
plt.figure(figsize=(14, 7))

# 主图：回报率分布
ax1 = plt.subplot2grid((2, 2), (0, 0), colspan=2)
sns.histplot(
    df['return'],
    kde=True,
    bins=50,
    color='royalblue',
    edgecolor='black',
    linewidth=1
)

# 添加统计信息
mean_return = df['return'].mean()
std_return = df['return'].std()
ax1.axvline(mean_return, color='red', linestyle='--', label=f'Mean: {mean_return:.4f}')
ax1.axvline(mean_return + std_return, color='green', linestyle=':', label=f'+1 Std: {mean_return + std_return:.4f}')
ax1.axvline(mean_return - std_return, color='green', linestyle=':', label=f'-1 Std: {mean_return - std_return:.4f}')

# 设置主图样式
ax1.set_title('Daily Return Distribution with Key Statistics', fontsize=16, pad=20)
ax1.set_xlabel('Daily Return', fontsize=14)
ax1.set_ylabel('Density', fontsize=14)
ax1.legend(fontsize=12)
ax1.grid(True, linestyle='--', alpha=0.5)

# 子图1：QQ图
ax2 = plt.subplot2grid((2, 2), (1, 0))
from scipy import stats
stats.probplot(df['return'], dist="norm", plot=ax2)
ax2.set_title('QQ Plot', fontsize=14)
ax2.grid(True, linestyle='--', alpha=0.5)

# 子图2：累计回报
ax3 = plt.subplot2grid((2, 2), (1, 1))
df['cum_return'] = (1 + df['return']).cumprod() - 1
ax3.plot(df['date'], df['cum_return'], color='purple', linewidth=2)
ax3.set_title('Cumulative Return', fontsize=14)
ax3.set_xlabel('Date', fontsize=12)
ax3.set_ylabel('Cumulative Return', fontsize=12)
ax3.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()
"""
输出示例：
显示一个综合金融分析图，包含：
- 主图：回报率分布直方图
  * 核密度估计曲线
  * 均值线（红色虚线）
  * ±1标准差线（绿色点线）
- 子图1：QQ图，检验正态分布
- 子图2：累计回报曲线
- 网格线
- 自定义字体大小
- 图例说明
"""

医疗数据相关性：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例医疗数据
np.random.seed(42)
num_patients = 1000
data = {
    'age': np.random.normal(45, 15, num_patients).astype(int),
    'blood_pressure': np.random.normal(120, 20, num_patients),
    'cholesterol': np.random.normal(200, 40, num_patients),
    'bmi': np.random.normal(25, 5, num_patients),
    'glucose': np.random.normal(100, 20, num_patients),
    'smoking': np.random.choice([0, 1], size=num_patients, p=[0.7, 0.3]),
    'diabetes': np.random.choice([0, 1], size=num_patients, p=[0.85, 0.15])
}
df = pd.DataFrame(data)

# 计算相关性矩阵
corr_matrix = df.corr()

# 创建热力图
plt.figure(figsize=(12, 10))
ax = sns.heatmap(
    corr_matrix,
    annot=True,
    cmap='coolwarm',
    vmin=-1,
    vmax=1,
    linewidths=0.5,
    linecolor='white',
    annot_kws={
        'size': 12,
        'color': 'black',
        'weight': 'bold'
    },
    cbar_kws={
        'shrink': 0.8,
        'label': 'Correlation Coefficient'
    },
    mask=np.triu(np.ones_like(corr_matrix, dtype=bool))  # 只显示下三角
)

# 设置图表样式
plt.title('Medical Data Feature Correlation Matrix', 
          fontsize=18, pad=20)
plt.xticks(
    rotation=45,
    fontsize=12,
    horizontalalignment='right'
)
plt.yticks(fontsize=12)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个医疗数据相关性热力图，包含：
- 7个医疗特征的相关性分析
- 下三角显示
- 冷暖色表示相关性强弱（-1到1）
- 白色分隔线
- 加粗的相关性数值
- 颜色条说明
- 特征名称旋转45度
- 自定义字体大小
"""

教育成绩分布：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
grades = ['Grade 7', 'Grade 8', 'Grade 9', 'Grade 10']
num_students = 200
data = {
    'grade': np.repeat(grades, num_students),
    'score': np.concatenate([
        np.random.normal(70, 10, num_students),  # Grade 7
        np.random.normal(75, 8, num_students),   # Grade 8
        np.random.normal(80, 7, num_students),   # Grade 9
        np.random.normal(85, 6, num_students)    # Grade 10
    ])
}
df = pd.DataFrame(data)

# 创建箱线图
plt.figure(figsize=(12, 8))
ax = sns.boxplot(
    x='grade',
    y='score',
    data=df,
    palette='Set3',
    showmeans=True,
    meanprops={
        'marker': 'D',
        'markerfacecolor': 'white',
        'markeredgecolor': 'black'
    },
    width=0.6,
    linewidth=1.5
)

# 添加数据点
sns.stripplot(
    x='grade',
    y='score',
    data=df,
    color='black',
    alpha=0.3,
    jitter=True,
    ax=ax
)

# 添加统计信息
for i, grade in enumerate(grades):
    mean_score = df[df['grade'] == grade]['score'].mean()
    median_score = df[df['grade'] == grade]['score'].median()
    ax.text(i, mean_score + 2, f'Mean: {mean_score:.1f}', 
            ha='center', va='bottom', fontsize=12, color='blue')
    ax.text(i, median_score - 2, f'Median: {median_score:.1f}', 
            ha='center', va='top', fontsize=12, color='red')

# 设置图表样式
plt.title('Student Score Distribution by Grade', fontsize=18, pad=20)
plt.xlabel('Grade', fontsize=14)
plt.ylabel('Score', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个教育成绩分布箱线图，包含：
- 4个年级的成绩分布
- 不同颜色的箱体
- 显示均值（菱形标记）
- 原始数据点（黑色散点）
- 均值和中位数标注
- 网格线
- 自定义字体大小
- 数据抖动显示
"""

零售利润分析：

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=90)
base_profit = np.linspace(1000, 5000, 90)
noise = np.random.normal(0, 300, 90)
promotions = np.random.choice([0, 1], size=90, p=[0.8, 0.2]) * 1000
profit = base_profit + noise + promotions
df = pd.DataFrame({'date': dates, 'profit': profit})

# 创建折线图
plt.figure(figsize=(14, 7))
plt.plot(df['date'], df['profit'], 
         marker='o', 
         linestyle='-', 
         color='royalblue',
         linewidth=2,
         markersize=8,
         label='Daily Profit')

# 标记促销日
promo_dates = df[df['profit'] > df['profit'].quantile(0.9)]['date']
plt.scatter(promo_dates, 
            df.loc[df['date'].isin(promo_dates), 'profit'],
            color='red',
            s=100,
            zorder=5,
            label='Promotion Days')

# 添加趋势线
z = np.polyfit(range(len(df)), df['profit'], 1)
p = np.poly1d(z)
plt.plot(df['date'], p(range(len(df))), "r--", linewidth=2, label='Trend Line')

# 设置图表样式
plt.title('Retail Profit Analysis with Promotions', fontsize=16, pad=20)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Profit (USD)', fontsize=14)
plt.xticks(rotation=45, fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(fontsize=12, loc='upper left')
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个零售利润分析折线图，包含：
- 90天的利润趋势
- 基础利润增长
- 随机波动
- 促销日标记（红色圆点）
- 趋势线
- 网格线
- 自定义字体大小
- 日期标签旋转45度
- 图例说明
"""

交通流量模式：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
hours = np.arange(24)
weekday_pattern = np.sin(2 * np.pi * hours / 24) * 1000 + 3000
weekend_pattern = np.sin(2 * np.pi * hours / 24) * 500 + 1500

# 生成一周数据
data = {
    'hour': np.tile(hours, 7),
    'count': np.concatenate([
        weekday_pattern + np.random.normal(0, 200, 24) * 5,  # 周一
        weekday_pattern + np.random.normal(0, 200, 24) * 5,  # 周二
        weekday_pattern + np.random.normal(0, 200, 24) * 5,  # 周三
        weekday_pattern + np.random.normal(0, 200, 24) * 5,  # 周四
        weekday_pattern + np.random.normal(0, 200, 24) * 5,  # 周五
        weekend_pattern + np.random.normal(0, 100, 24) * 2,  # 周六
        weekend_pattern + np.random.normal(0, 100, 24) * 2   # 周日
    ]),
    'day_type': np.repeat(['Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekend', 'Weekend'], 24)
}
df = pd.DataFrame(data)

# 创建折线图
plt.figure(figsize=(14, 8))
ax = sns.lineplot(
    x='hour',
    y='count',
    hue='day_type',
    style='day_type',
    data=df,
    palette='Set2',
    linewidth=2.5,
    markers=True,
    dashes=False,
    markersize=10
)

# 添加高峰时段标注
peak_hours = df.groupby('hour')['count'].mean().nlargest(3).index
for hour in peak_hours:
    plt.axvline(hour, color='red', linestyle='--', alpha=0.3)
    plt.text(hour, df['count'].max() * 0.9, 
             f'Peak Hour\n{hour}:00-{hour+1}:00', 
             ha='center', va='top', fontsize=12, color='red')

# 设置图表样式
plt.title('Hourly Traffic Pattern Analysis', fontsize=18, pad=20)
plt.xlabel('Hour of Day', fontsize=14)
plt.ylabel('Traffic Count', fontsize=14)
plt.xticks(np.arange(0, 24, 1), fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Day Type', fontsize=12, loc='upper right')
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个交通流量模式分析图，包含：
- 一周7天的交通流量模式
- 工作日和周末用不同颜色和样式区分
- 高峰时段标注（红色虚线）
- 数据点标记
- 网格线
- 自定义字体大小
- 图例说明
"""

天气温度变化：

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=365)
base_temp = 15 + 10 * np.sin(2 * np.pi * np.arange(365) / 365)
noise = np.random.normal(0, 3, 365)
heat_waves = np.random.choice([0, 1], size=365, p=[0.95, 0.05]) * 10
temperature = base_temp + noise + heat_waves
df = pd.DataFrame({'date': dates, 'temperature': temperature})

# 创建折线图
plt.figure(figsize=(14, 7))
plt.plot(df['date'], df['temperature'], 
         color='crimson',
         linewidth=2,
         label='Daily Temperature')

# 标记热浪
heat_wave_dates = df[df['temperature'] > df['temperature'].quantile(0.95)]['date']
plt.scatter(heat_wave_dates, 
            df.loc[df['date'].isin(heat_wave_dates), 'temperature'],
            color='darkorange',
            s=100,
            zorder=5,
            label='Heat Wave Days')

# 添加移动平均线
df['7_day_avg'] = df['temperature'].rolling(window=7).mean()
plt.plot(df['date'], df['7_day_avg'], 
         color='navy',
         linestyle='--',
         linewidth=2,
         label='7-Day Moving Average')

# 设置图表样式
plt.title('Annual Temperature Trend with Heat Waves', fontsize=16, pad=20)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Temperature (°C)', fontsize=14)
plt.xticks(rotation=45, fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(fontsize=12, loc='upper left')
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个温度变化分析图，包含：
- 365天的温度变化趋势
- 基础季节性变化
- 随机波动
- 热浪日标记（橙色圆点）
- 7日移动平均线（蓝色虚线）
- 网格线
- 自定义字体大小
- 日期标签旋转45度
- 图例说明
"""

员工任期分布：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
departments = ['Sales', 'Engineering', 'Marketing', 'Finance', 'HR']
tenure_data = {
    'Sales': np.random.gamma(3, 1, 500),
    'Engineering': np.random.gamma(5, 1.2, 500),
    'Marketing': np.random.gamma(4, 0.8, 500),
    'Finance': np.random.gamma(6, 1.5, 500),
    'HR': np.random.gamma(7, 1.8, 500)
}
df = pd.DataFrame(tenure_data).melt(var_name='department', value_name='tenure')

# 创建分布图
plt.figure(figsize=(14, 8))

# 主图：整体分布
ax1 = plt.subplot2grid((2, 2), (0, 0), colspan=2)
sns.histplot(
    df['tenure'],
    kde=True,
    bins=30,
    color='royalblue',
    edgecolor='black',
    linewidth=1,
    stat='density'
)

# 添加统计信息
mean_tenure = df['tenure'].mean()
median_tenure = df['tenure'].median()
ax1.axvline(mean_tenure, color='red', linestyle='--', label=f'Mean: {mean_tenure:.1f} years')
ax1.axvline(median_tenure, color='green', linestyle=':', label=f'Median: {median_tenure:.1f} years')

# 设置主图样式
ax1.set_title('Overall Employee Tenure Distribution', fontsize=16, pad=20)
ax1.set_xlabel('Tenure (years)', fontsize=14)
ax1.set_ylabel('Density', fontsize=14)
ax1.legend(fontsize=12)
ax1.grid(True, linestyle='--', alpha=0.5)

# 子图1：按部门分布
ax2 = plt.subplot2grid((2, 2), (1, 0))
sns.boxplot(
    x='department',
    y='tenure',
    data=df,
    palette='Set2',
    showmeans=True,
    meanprops={
        'marker': 'D',
        'markerfacecolor': 'white',
        'markeredgecolor': 'black'
    }
)
ax2.set_title('Tenure by Department', fontsize=14)
ax2.set_xlabel('Department', fontsize=12)
ax2.set_ylabel('Tenure (years)', fontsize=12)
ax2.grid(axis='y', linestyle='--', alpha=0.5)

# 子图2：累计分布
ax3 = plt.subplot2grid((2, 2), (1, 1))
df_sorted = df.sort_values('tenure')
df_sorted['cumulative'] = np.arange(1, len(df) + 1) / len(df)
sns.lineplot(
    x='tenure',
    y='cumulative',
    data=df_sorted,
    color='purple',
    linewidth=2
)
ax3.set_title('Cumulative Distribution', fontsize=14)
ax3.set_xlabel('Tenure (years)', fontsize=12)
ax3.set_ylabel('Cumulative Probability', fontsize=12)
ax3.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()
"""
输出示例：
显示一个综合员工任期分析图，包含：
- 主图：整体任期分布直方图
  * 核密度估计曲线
  * 均值线（红色虚线）
  * 中位数线（绿色点线）
- 子图1：按部门分布的箱线图
- 子图2：累计分布曲线
- 网格线
- 自定义字体大小
- 图例说明
"""

营销转化率：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
campaigns = ['Email', 'Social Media', 'Search Ads', 'Display Ads', 'Affiliate']
base_rates = [0.15, 0.12, 0.18, 0.10, 0.08]
noise = np.random.normal(0, 0.02, len(campaigns))
conversion_rates = np.clip(base_rates + noise, 0, 1)
cost_per_click = [0.5, 0.3, 1.2, 0.8, 0.4]
revenue_per_conversion = [100, 80, 120, 90, 70]

df = pd.DataFrame({
    'campaign': campaigns,
    'conversion_rate': conversion_rates,
    'cost_per_click': cost_per_click,
    'revenue_per_conversion': revenue_per_conversion
})

# 计算ROI
df['roi'] = (df['conversion_rate'] * df['revenue_per_conversion'] - df['cost_per_click']) / df['cost_per_click']

# 创建组合图
plt.figure(figsize=(14, 8))

# 主图：转化率柱状图
ax1 = plt.subplot2grid((2, 2), (0, 0), colspan=2)
bars = sns.barplot(
    x='campaign',
    y='conversion_rate',
    data=df,
    palette='viridis',
    edgecolor='black',
    linewidth=1
)

# 添加数值标签
for p in bars.patches:
    height = p.get_height()
    ax1.text(p.get_x() + p.get_width()/2., height,
             f'{height:.2%}',
             ha='center', va='bottom', fontsize=12)

# 设置主图样式
ax1.set_title('Marketing Campaign Performance Analysis', fontsize=16, pad=20)
ax1.set_xlabel('Campaign', fontsize=14)
ax1.set_ylabel('Conversion Rate', fontsize=14)
ax1.tick_params(axis='x', rotation=45)
ax1.grid(axis='y', linestyle='--', alpha=0.5)

# 子图1：ROI散点图
ax2 = plt.subplot2grid((2, 2), (1, 0))
sns.scatterplot(
    x='conversion_rate',
    y='roi',
    hue='campaign',
    data=df,
    palette='viridis',
    s=200,
    edgecolor='black',
    ax=ax2
)

# 添加ROI标注
for i, row in df.iterrows():
    ax2.text(row['conversion_rate'] + 0.005, row['roi'] + 0.1,
             f'{row["roi"]:.1f}x',
             fontsize=12, ha='left')

ax2.set_title('ROI Analysis', fontsize=14)
ax2.set_xlabel('Conversion Rate', fontsize=12)
ax2.set_ylabel('ROI (x)', fontsize=12)
ax2.grid(True, linestyle='--', alpha=0.5)

# 子图2：成本效益分析
ax3 = plt.subplot2grid((2, 2), (1, 1))
sns.barplot(
    x='campaign',
    y='cost_per_click',
    data=df,
    palette='viridis',
    edgecolor='black',
    linewidth=1,
    ax=ax3
)

# 添加成本标注
for p in ax3.patches:
    height = p.get_height()
    ax3.text(p.get_x() + p.get_width()/2., height,
             f'${height:.2f}',
             ha='center', va='bottom', fontsize=12)

ax3.set_title('Cost Per Click Analysis', fontsize=14)
ax3.set_xlabel('Campaign', fontsize=12)
ax3.set_ylabel('Cost Per Click ($)', fontsize=12)
ax3.tick_params(axis='x', rotation=45)
ax3.grid(axis='y', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()
"""
输出示例：
显示一个综合营销分析图，包含：
- 主图：各渠道转化率柱状图
  * 转化率数值标注
  * 渐变色柱状图
- 子图1：ROI分析散点图
  * ROI数值标注
  * 不同渠道颜色区分
- 子图2：点击成本分析柱状图
  * 成本数值标注
- 网格线
- 自定义字体大小
- 日期标签旋转45度
"""

多变量分析：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
num_samples = 500
data = {
    'age': np.random.normal(35, 10, num_samples).astype(int),
    'income': np.random.lognormal(4, 0.5, num_samples),
    'spending': np.random.normal(500, 200, num_samples),
    'satisfaction': np.random.randint(1, 11, num_samples),
    'gender': np.random.choice(['Male', 'Female'], size=num_samples)
}
df = pd.DataFrame(data)

# 创建多变量分析图
plt.figure(figsize=(14, 12))

# 创建PairGrid对象
g = sns.PairGrid(df, 
                vars=['age', 'income', 'spending', 'satisfaction'],
                hue='gender',
                palette='Set2',
                corner=True)

# 设置对角线和非对角线图形
g.map_diag(sns.histplot, 
          kde=True, 
          alpha=0.8, 
          edgecolor='black', 
          linewidth=0.5)

g.map_offdiag(sns.scatterplot, 
             alpha=0.7, 
             edgecolor='black', 
             linewidth=0.5, 
             s=80)

# 添加回归线
g.map_offdiag(sns.regplot, 
             scatter=False, 
             color='red', 
             line_kws={'linewidth': 1.5})

# 添加图例
g.add_legend(title='Gender', 
            fontsize=12, 
            bbox_to_anchor=(1.02, 0.5), 
            loc='center left')

# 设置整体样式
plt.suptitle('Multivariate Analysis of Customer Data', 
            fontsize=18, 
            y=1.02)

# 调整子图间距
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个多变量分析图，包含：
- 对角线：单变量分布直方图
  * 核密度估计曲线
  * 黑色边框
- 非对角线：散点图
  * 不同性别用不同颜色区分
  * 黑色边框的数据点
  * 红色回归线
- 图例说明
- 自定义字体大小
- 紧凑布局
"""

时间序列分解：

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
import numpy as np

# 创建示例数据
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=365*3)  # 3年数据
trend = np.linspace(0, 100, len(dates))
seasonality = 10 * np.sin(2 * np.pi * np.arange(len(dates)) / 365)
noise = np.random.normal(0, 5, len(dates))
values = trend + seasonality + noise
df = pd.DataFrame({'date': dates, 'value': values}).set_index('date')

# 设置图形大小
plt.figure(figsize=(14, 10))

# 原始数据
plt.subplot(4, 1, 1)
plt.plot(df['value'], color='royalblue', linewidth=1.5)
plt.title('Original Time Series', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)

# 时间序列分解
decomposition = seasonal_decompose(df['value'], model='additive', period=365)

# 趋势成分
plt.subplot(4, 1, 2)
plt.plot(decomposition.trend, color='darkorange', linewidth=1.5)
plt.title('Trend Component', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)

# 季节性成分
plt.subplot(4, 1, 3)
plt.plot(decomposition.seasonal, color='green', linewidth=1.5)
plt.title('Seasonal Component', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)

# 残差成分
plt.subplot(4, 1, 4)
plt.plot(decomposition.resid, color='purple', linewidth=1.5)
plt.title('Residual Component', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)

# 调整布局
plt.tight_layout()
plt.show()
"""
输出示例：
显示一个时间序列分解图，包含：
- 原始时间序列（蓝色）
- 趋势成分（橙色）
- 季节性成分（绿色）
- 残差成分（紫色）
- 网格线
- 自定义字体大小
- 紧凑布局
"""

地理数据可视化：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 创建示例数据
np.random.seed(42)
num_points = 500
latitudes = np.random.uniform(22.5, 25.5, num_points)
longitudes = np.random.uniform(120.5, 122.5, num_points)
values = np.random.normal(100, 20, num_points)
categories = np.random.choice(['A', 'B', 'C'], size=num_points)

df = pd.DataFrame({
    'latitude': latitudes,
    'longitude': longitudes,
    'value': values,
    'category': categories
})

# 创建地理散点图
plt.figure(figsize=(14, 10))

# 创建基础地图
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
ax.add_feature(cfeature.LAKES, alpha=0.5)
ax.add_feature(cfeature.RIVERS)

# 绘制散点图
scatter = ax.scatter(
    df['longitude'],
    df['latitude'],
    c=df['value'],
    cmap='viridis',
    s=100,
    alpha=0.8,
    edgecolor='black',
    linewidth=0.5,
    transform=ccrs.PlateCarree()
)

# 添加颜色条
cbar = plt.colorbar(scatter, orientation='vertical', pad=0.02)
cbar.set_label('Value', fontsize=12)

# 设置图表样式
plt.title('Geographical Data Visualization', fontsize=16, pad=20)
plt.xlabel('Longitude', fontsize=14)
plt.ylabel('Latitude', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)

# 添加图例
handles, labels = scatter.legend_elements(prop="sizes", alpha=0.6)
legend = ax.legend(
    handles, labels,
    title="Value",
    loc="upper right",
    bbox_to_anchor=(1.15, 1),
    fontsize=10
)

plt.tight_layout()
plt.show()
"""
输出示例：
显示一个地理散点图，包含：
- 基础地图（海岸线、国界、河流等）
- 渐变色散点图
- 颜色条
- 网格线
- 自定义字体大小
- 图例说明
- 黑色边框的数据点
"""

网络图分析：

import networkx as nx
G = nx.from_pandas_edgelist(df, 'source', 'target')
nx.draw(G, with_labels=True)
plt.title('Network Graph')
plt.show()
"""
输出示例：
显示一个网络图，展示节点之间的连接关系
"""

3D 可视化：

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['x'], df['y'], df['z'])
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
"""
输出示例：
显示一个3D散点图，展示三个变量之间的关系
"""

交互式可视化：

import plotly.express as px
fig = px.scatter(df, x='x', y='y', color='category')
fig.show()
"""
输出示例：
显示一个交互式散点图，用不同颜色表示不同类别
"""

动态时间序列：

import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['date'], y=df['value'], mode='lines'))
fig.show()
"""
输出示例：
显示一个交互式时间序列图，可以缩放和查看数据点详细信息
"""

地理热力图：

fig = px.density_mapbox(df, lat='latitude', lon='longitude', z='value', radius=10)
fig.update_layout(mapbox_style="stamen-terrain")
fig.show()
"""
输出示例：
显示一个交互式地理热力图，用颜色深浅表示不同区域的值密度
"""

树状图：

fig = px.treemap(df, path=['category', 'subcategory'], values='value')
fig.show()
"""
输出示例：
显示一个树状图，用矩形大小表示值的大小，层级表示分类关系
"""

平行坐标图：

fig = px.parallel_coordinates(df, color='target')
fig.show()
"""
输出示例：
显示一个平行坐标图，展示多个变量之间的关系，用颜色区分目标变量
"""

雷达图：

fig = px.line_polar(df, r='value', theta='category', line_close=True)
fig.show()
"""
输出示例：
显示一个雷达图，展示不同类别的值在多个维度上的表现
"""

旭日图：

fig = px.sunburst(df, path=['category', 'subcategory'], values='value')
fig.show()
"""
输出示例：
显示一个旭日图，用环形层级表示分类关系，扇形大小表示值的大小
"""

漏斗图：

fig = px.funnel(df, x='value', y='category')
fig.show()
"""
输出示例：
显示一个漏斗图，展示不同类别的值递减情况
"""

日历热力图：

fig = px.density_heatmap(df, x='date', y='hour', z='value')
fig.show()
"""
输出示例：
显示一个日历热力图，用颜色深浅表示不同日期和时间的值大小
"""

3D 曲面图：

fig = go.Figure(data=[go.Surface(z=df.values)])
fig.show()
"""
输出示例：
显示一个3D曲面图，可以旋转查看数据的三维形状
"""

动画可视化：

fig = px.scatter(df, x='x', y='y', animation_frame='time', size='value')
fig.show()
"""
输出示例：
显示一个动画散点图，可以播放时间序列变化
"""

地图标记：

fig = px.scatter_mapbox(df, lat='latitude', lon='longitude', size='value')
fig.update_layout(mapbox_style="open-street-map")
fig.show()
"""
输出示例：
显示一个交互式地图标记，用点的大小表示值的大小
"""

网络图：

fig = px.scatter_3d(df, x='x', y='y', z='z', color='category')
fig.show()
"""
输出示例：
显示一个交互式3D散点图，用颜色区分不同类别
"""

极坐标图：

fig = px.line_polar(df, r='value', theta='angle', line_close=True)
fig.show()
"""
输出示例：
显示一个极坐标图，展示角度和半径的关系
"""

多图表组合：

fig = px.scatter(df, x='x', y='y', facet_col='category', facet_row='time')
fig.show()
"""
输出示例：
显示一个多图表组合，按类别和时间分面展示数据
"""