留存率预测模型
由资料和模型可知:留存率曲线是一个指数曲线,可根据前7天留存率数据预测之后的留存率数据:利用nls函数求出幂指数函数y=a*x^b的系数a、b
# 前七天实际留存率数据
(day <- seq(1:7)) # 天数
(ratio <- c(0.383,0.268,0.216,0.187,0.167,0.156,0.145)) # 留存率值
# 利用nls函数求出幂指数函数y=a*x^b的系数a、b
fit <- nls(ratio~a*day^b,start = list(a=1,b=1))
# 查看模型结果
summary(fit)
# 对新增用户在接下来365日每天的留存率进行预测
predicted <- predict(fit,data.frame(day=seq(1,365)))
# 查看预测结果
predicted
# 绘制留存率预测曲线
library(dygraphs)
data <- as.data.frame(predicted)
data <- ts(data)
dygraph(data,main="留存的预测曲线") %>%
dySeries("predicted",label="留存率",strokeWidth = 2) %>%
dyOptions(colors = "green",fillGraph = TRUE,fillAlpha = 0.4) %>%
dyHighlight(highlightCircleSize = 5,
highlightSeriesBackgroundAlpha = 0.2,
hideOnMouseOut = FALSE) %>%
dyAxis("x", label = "日期",drawGrid = FALSE) %>%
dyAxis("y", label = "留存率") %>%
dyRangeSelector()
结果如下:
> (day <- seq(1:7)) # 天数
[1] 1 2 3 4 5 6 7
> (ratio <- c(0.383,0.268,0.216,0.187,0.167,0.156,0.145)) # 留存率值
[1] 0.383 0.268 0.216 0.187 0.167 0.156 0.145
#参数估计结果
> summary(fit)
Formula: ratio ~ a * day^b
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 0.381911 0.002164 176.51 1.11e-10 ***
b -0.508544 0.005571 -91.29 2.99e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002349 on 5 degrees of freedom
Number of iterations to convergence: 7
Achieved convergence tolerance: 9.236e-08
#预测留存率结果:
> predicted
[1] 0.38191061 0.26845690 0.21843606 0.18870675 0.16846294 0.15354554
[7] 0.14196843 0.13264787 0.12493582 0.11841787 0.11281510 0.10793196
可放大图形,看更细致的曲线: