【大数据】哮喘患者症状数据可视化分析系统 计算机毕业设计项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解

一、个人简介

💖💖作者:计算机编程果茶熊
💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我!
💛💛想说的话:感谢大家的关注与支持!
💜💜
网站实战项目
安卓/小程序实战项目
大数据实战项目
计算机毕业设计选题
💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

大数据框架:Hadoop+Spark(Hive需要定制修改)
开发语言:Java+Python(两个版本都支持)
数据库:MySQL
后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持)
前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

《哮喘患者症状数据可视化分析系统》是一款基于大数据技术的医疗数据分析平台,采用Hadoop和Spark分布式计算框架作为核心技术支撑,实现对哮喘患者症状数据的高效存储、处理与分析。系统后端采用Django框架开发,前端基于Vue、ElementUI和Echarts构建交互界面与可视化图表,数据处理层运用Spark SQL、Pandas、NumPy等工具进行多维度数据挖掘。系统功能涵盖用户管理、哮喘患者症状数据管理、哮喘控制分析、临床症状分析、患者聚类分析、风险因素分析、患者画像分析以及可视化大屏展示等模块。通过HDFS分布式存储患者症状数据,利用Spark进行大规模数据清洗和特征提取,系统能够对患者的症状频率、控制水平、用药情况等关键指标进行统计分析,并通过机器学习算法实现患者分群与风险预测,最终以直观的图表形式呈现分析结果,为医疗机构提供数据支持,辅助临床决策,提升哮喘患者的管理效率与治疗效果。

三、视频解说

【大数据】哮喘患者症状数据可视化分析系统 计算机毕业设计项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解

四、部分功能展示

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

五、部分代码展示



from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, count, avg, sum as spark_sum, datediff, current_date, expr
from pyspark.ml.feature import VectorAssembler, StandardScaler
from pyspark.ml.clustering import KMeans
from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
import json
import pandas as pd
import numpy as np
spark = SparkSession.builder.appName("AsthmaAnalysis").config("spark.sql.warehouse.dir", "/user/hive/warehouse").getOrCreate()
@require_http_methods(["POST"])
def asthma_control_analysis(request):
    data = json.loads(request.body)
    patient_ids = data.get('patient_ids', [])
    start_date = data.get('start_date')
    end_date = data.get('end_date')
    symptom_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/asthma_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "symptom_records").option("user", "root").option("password", "password").load()
    filtered_df = symptom_df.filter((col("patient_id").isin(patient_ids)) & (col("record_date") >= start_date) & (col("record_date") <= end_date))
    control_df = filtered_df.withColumn("control_score", (when(col("daytime_symptom_freq") <= 2, 1).otherwise(0) + when(col("nighttime_symptom_freq") == 0, 1).otherwise(0) + when(col("reliever_usage") <= 2, 1).otherwise(0) + when(col("activity_limitation") == 0, 1).otherwise(0)))
    control_df = control_df.withColumn("control_level", when(col("control_score") >= 4, "完全控制").when(col("control_score") >= 2, "部分控制").otherwise("未控制"))
    patient_control = control_df.groupBy("patient_id", "control_level").agg(count("*").alias("record_count"))
    patient_main_control = patient_control.groupBy("patient_id").agg(expr("max_by(control_level, record_count) as main_control_level"))
    overall_stats = control_df.groupBy("control_level").agg(count("*").alias("total_count"))
    trend_df = control_df.groupBy("record_date").agg(avg("control_score").alias("avg_control_score"), count("*").alias("daily_records"))
    trend_df = trend_df.orderBy("record_date")
    patient_control_pd = patient_main_control.toPandas()
    overall_stats_pd = overall_stats.toPandas()
    trend_pd = trend_df.toPandas()
    result = {"patient_control": patient_control_pd.to_dict(orient='records'),"overall_distribution": overall_stats_pd.to_dict(orient='records'),"control_trend": trend_pd.to_dict(orient='records')}
    return JsonResponse(result, safe=False)
@require_http_methods(["POST"])
def patient_clustering_analysis(request):
    data = json.loads(request.body)
    cluster_num = data.get('cluster_num', 3)
    feature_cols = data.get('feature_cols', ['age', 'symptom_freq', 'fev1_percent', 'medication_adherence', 'exacerbation_count'])
    patient_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/asthma_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "patient_info").option("user", "root").option("password", "password").load()
    symptom_summary = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/asthma_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "symptom_records").option("user", "root").option("password", "password").load()
    symptom_agg = symptom_summary.groupBy("patient_id").agg(avg("daytime_symptom_freq").alias("symptom_freq"),spark_sum(when(col("is_exacerbation") == 1, 1).otherwise(0)).alias("exacerbation_count"),avg("reliever_usage").alias("avg_reliever_usage"))
    merged_df = patient_df.join(symptom_agg, "patient_id", "left")
    merged_df = merged_df.withColumn("age", datediff(current_date(), col("birth_date")) / 365)
    merged_df = merged_df.na.fill({"symptom_freq": 0, "exacerbation_count": 0, "fev1_percent": 80, "medication_adherence": 0.5, "avg_reliever_usage": 0})
    assembler = VectorAssembler(inputCols=feature_cols, outputCol="features")
    feature_df = assembler.transform(merged_df)
    scaler = StandardScaler(inputCol="features", outputCol="scaled_features", withStd=True, withMean=True)
    scaler_model = scaler.fit(feature_df)
    scaled_df = scaler_model.transform(feature_df)
    kmeans = KMeans(k=cluster_num, seed=42, featuresCol="scaled_features", predictionCol="cluster")
    kmeans_model = kmeans.fit(scaled_df)
    clustered_df = kmeans_model.transform(scaled_df)
    cluster_stats = clustered_df.groupBy("cluster").agg(count("*").alias("patient_count"),avg("age").alias("avg_age"),avg("symptom_freq").alias("avg_symptom_freq"),avg("fev1_percent").alias("avg_fev1"),avg("exacerbation_count").alias("avg_exacerbation"))
    patient_clusters = clustered_df.select("patient_id", "patient_name", "cluster", "age", "symptom_freq", "fev1_percent")
    cluster_stats_pd = cluster_stats.toPandas()
    patient_clusters_pd = patient_clusters.toPandas()
    centers = kmeans_model.clusterCenters()
    centers_list = [{"cluster": i, "center": center.tolist()} for i, center in enumerate(centers)]
    result = {"cluster_statistics": cluster_stats_pd.to_dict(orient='records'),"patient_assignments": patient_clusters_pd.to_dict(orient='records'),"cluster_centers": centers_list}
    return JsonResponse(result, safe=False)
@require_http_methods(["POST"])
def risk_factor_analysis(request):
    data = json.loads(request.body)
    target_variable = data.get('target_variable', 'is_high_risk')
    analysis_factors = data.get('factors', ['age', 'smoking', 'allergen_exposure', 'family_history', 'bmi', 'medication_adherence'])
    patient_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/asthma_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "patient_info").option("user", "root").option("password", "password").load()
    symptom_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/asthma_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "symptom_records").option("user", "root").option("password", "password").load()
    risk_summary = symptom_df.groupBy("patient_id").agg(spark_sum(when(col("is_exacerbation") == 1, 1).otherwise(0)).alias("exacerbation_count"),avg("control_score").alias("avg_control_score"),count("*").alias("total_records"))
    risk_df = patient_df.join(risk_summary, "patient_id", "left")
    risk_df = risk_df.withColumn("is_high_risk", when((col("exacerbation_count") >= 2) | (col("avg_control_score") < 2), 1).otherwise(0))
    risk_df = risk_df.withColumn("age", datediff(current_date(), col("birth_date")) / 365)
    risk_df = risk_df.na.fill({"smoking": 0, "allergen_exposure": 0, "family_history": 0, "bmi": 22, "medication_adherence": 0.7, "exacerbation_count": 0})
    factor_impact = {}
    for factor in analysis_factors:
        if factor in ['smoking', 'family_history', 'allergen_exposure']:
            factor_stats = risk_df.groupBy(factor, "is_high_risk").agg(count("*").alias("count"))
            pivot_stats = factor_stats.groupBy(factor).pivot("is_high_risk", [0, 1]).sum("count").na.fill(0)
            pivot_pd = pivot_stats.toPandas()
            pivot_pd['risk_rate'] = pivot_pd[1] / (pivot_pd[0] + pivot_pd[1])
            factor_impact[factor] = pivot_pd.to_dict(orient='records')
        else:
            high_risk_avg = risk_df.filter(col("is_high_risk") == 1).agg(avg(factor).alias("high_risk_avg")).collect()[0]["high_risk_avg"]
            low_risk_avg = risk_df.filter(col("is_high_risk") == 0).agg(avg(factor).alias("low_risk_avg")).collect()[0]["low_risk_avg"]
            factor_impact[factor] = {"high_risk_avg": float(high_risk_avg) if high_risk_avg else 0,"low_risk_avg": float(low_risk_avg) if low_risk_avg else 0,"difference": float(high_risk_avg - low_risk_avg) if (high_risk_avg and low_risk_avg) else 0}
    high_risk_patients = risk_df.filter(col("is_high_risk") == 1).select("patient_id", "patient_name", "age", "exacerbation_count", "avg_control_score", "smoking", "family_history")
    high_risk_pd = high_risk_patients.toPandas()
    risk_distribution = risk_df.groupBy("is_high_risk").agg(count("*").alias("patient_count"))
    risk_dist_pd = risk_distribution.toPandas()
    result = {"factor_impact_analysis": factor_impact,"high_risk_patients": high_risk_pd.to_dict(orient='records'),"risk_distribution": risk_dist_pd.to_dict(orient='records')}
    return JsonResponse(result, safe=False)

六、部分文档展示

在这里插入图片描述

七、END

💕💕文末获取源码联系计算机编程果茶熊

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值