[2020年冠状病毒肺炎 - 武汉加油] 使用Logistic增长模型预测确诊病人数目

最新推荐文章于 2022-05-11 22:04:58 发布

千瞱

最新推荐文章于 2022-05-11 22:04:58 发布

阅读量8.3k

点赞数 8

分类专栏：机器学习

本文链接：https://blog.csdn.net/qq_26822029/article/details/104213781

版权

机器学习专栏收录该内容

19 篇文章

订阅专栏

在前一段时间，我脑子一热使用指数回归以及多项式回归对新型冠状病毒2019-nCov的感染人数进行预测。后来发现不行啊，感染人数不可能一直上涨啊，总得有停止上涨的时候啊！在多名网友的提醒下，在参考了邢翔瑞大佬的博客之后，我痛定思痛，尝试使用Logistic增长模型对感染人数进行预测，结果如下图（下图是直接使用拟合得到的r值，为0.303，经过简单的测试，这个模型预测的数量偏小，如果有大佬对此比较了解能否指点一二）：

2月9日预测结果（看最近几天的预测结果已经很接近了）：

2月8日预测结果：

2月7日预测结果：

我们在高中生物中有学过，种群的增长曲线有分"J"型和"S"型，其中"J"型为理想型，实际中种群的数量不可能一直像指数函数那样，而是一个S型曲线，如下图所示。

这个函数的曲线和机器学习中的Sigmoid函数 $f(x)=\frac{1}{1+e^{-x}}$ 及其相似，其公式都有点像，logistic增长函数为”

$P(t) = \frac{KP_0e^{rt}}{K+P_0(e^{rt} - 1)}$

其中K为环境最大容量，P0为初始容量，r为增长速率，r越大则增长越快（即更快的逼近上限）。

该模型的微分式是： $\frac{dx}{dt}=rx(1-x)$

如果看不懂上面的内容，直接复制粘贴代码运行一下吧~（其实和之前的思路一样，使用最小二乘法拟合数据，只是拟合的函数换了而已）

from scipy.optimize import curve_fit
import urllib
import json
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time


def date_encode(date):
    # '01.24' -> 1 * 100 + 24 = 124
    d = date.split('/')
    month, day = int(d[0]), int(d[1])
    return 100 * month + day


def date_decode(date):
    # 124 -> '01.24'
    return '{}.{}'.format(str(date // 100), str(date % 100))
    
    
def sequence_analyse(data):
    date_list, confirm_list, dead_list, heal_list, suspect_list = [], [], [], [], []
    data.sort(key = lambda x: date_encode(x['date']))
    for day in data:
        date_list.append(day['date'])
        confirm_list.append(int(day['confirm']))
        dead_list.append(int(day['dead']))
        heal_list.append(int(day['heal']))
        suspect_list.append(int(day['suspect']))
    return pd.DataFrame({
        'date': date_list, 
        'confirm': confirm_list, 
        'dead': dead_list,
        'heal': heal_list,
        'suspect': suspect_list
    })


def get_date_list(target_month = 3):
    """
    得到从1月13日到month月最后一天的所有日期列表
    """
    month_day = [0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    cur_month, cur_day = 1, 13
    ans = []
    while cur_month <= target_month:
        while cur_day <= month_day[cur_month]:
            d = "0" + str(cur_day) if cur_day < 10 else str(cur_day)
            ans += [str(cur_month) + "/" + d]
            cur_day += 1
        cur_day = 1
        cur_month += 1
    return ans


def logistic_function(t, K, P0, r):
    t0 = 0
    exp = np.exp(r * (t - t0))
    return (K * exp * P0) / (K + (exp - 1) * P0)

def predict():
    # 预测未来天数
    predict_days = 20
    
    # 获取实时数据
    url = 'https://view.inews.qq.com/g2/getOnsInfo?name=wuwei_ww_cn_day_counts'
    response = urllib.request.urlopen(url)
    json_data = response.read().decode('utf-8').replace('\n','')
    data = json.loads(json_data)
    data = json.loads(data['data'])
    df = sequence_analyse(data)
    date, confirm = df['date'].values, df['confirm'].values
    x = np.arange(len(confirm))
    date_labels = get_date_list(4)

    # 用最小二乘法估计拟合
    popt, pcov = curve_fit(logistic_function, x, confirm)
    print(popt)

    #近期情况预测
    predict_x = list(x) + [x[-1] + i for i in range(1, 1 + predict_days)]
    predict_x = np.array(predict_x)
    predict_y = logistic_function(predict_x, popt[0], popt[1], popt[2])

    #绘图
    plt.figure(figsize=(15, 8))
    plt.plot(x, confirm, 's',label="confimed infected number")
    plt.plot(predict_x, predict_y, 's',label="predicted infected number")
    plt.xticks(predict_x, date_labels[:len(predict_x) + 1], rotation=90)
    plt.yticks(rotation=90)

    plt.suptitle("Logistic Fitting Curve for 2019-nCov infected numbers(Max = {},  r={:.3})".format(int(popt[0]), popt[2]), fontsize=16, fontweight="bold")
    plt.title("Predict time:{}".format(time.strftime("%Y-%m-%d", time.localtime())), fontsize=16)
    plt.xlabel('date', fontsize=14)
    plt.ylabel('infected number', fontsize=14)
    plt.plot()

predict()

参考：

python 对于任意数据和曲线进行拟合并求出函数表达式的三种方案

python实现logistic增长模型拟合2019-nCov确诊人数2月1日更新