整体思路:后台通过分析每天日志行为,统计用户关卡通关信息。前端只给用户展示前100名,通过机器算法评估101-10000名的线性函数,前端通过给出分段函数得到并展示排名,如果排名大于10000名显示未上榜。
1. 数据分析
String iosSql = "select " +
"uid," +
"level," +
"row_number() over(order by level desc) rank " +
"from " +
"(select " +
"uid, " +
"max(str2Num(params['maxlevel'])) level " +
"from bricksdb.s3_stage " +
"WHERE ym='"+ym+"' and day='"+day+"' " +
"and appname='iOS_BricksGame' "+
"group by uid " +
"order by max(str2Num(params['maxlevel'])) desc " +
"limit 3000) t order by level desc ";
spark.sql(iosSql).write().mode(SaveMode.Overwrite).jdbc(JdbcUtils.URL, "ios_rank_level", JdbcUtils.getProJdbcInfo());
2. URL路径及参数
http://{ip}:{port}/u3d/getRank
Content-Type:application/json
uid:4e6a1dafa2ae1674
version:100067
Client-Name:BricksGame|iOS_BricksGame|CN_BricksGame|CN_iOS_BricksGame
接口结果描述:
uid+"#"+name+"#"+photo+"#"+photoFrame+"#"+level+"#"+starts+"#"+rank
用户ID+"#"+用户名+"#"+头像+"#"+头像框+"#"+关卡+"#"+星星+"#"+排名
{
"msg": "操作成功!",
"code": 1,
"data": {
"rankList": [
"151a1a0326f5477e#151a1a0326f5477e#1#1#5000#0#1",
"d7c43ad3a3dee434#d7c43ad3a3dee434#1#1#5000#0#2",
"8c7081e07f3fa45e#8c7081e07f3fa45e#1#1#5000#0#3",
"18be19e5d7feb0fa#18be19e5d7feb0fa#1#1#5000#0#4",
"44250e4332eb8d79#44250e4332eb8d79#1#1#5000#0#5",
...
"2895dd9c50cf1068#2895dd9c50cf1068#1#1#4623#0#95",
"279b94e344635720#279b94e344635720#1#1#4622#0#96",
"314291b1b724d9bd#314291b1b724d9bd#1#1#4612#0#97",
"ac461aee63589dc9#ac461aee63589dc9#1#1#4611#0#98",
"d6a0b17fc0828aca#d6a0b17fc0828aca#1#1#4610#0#99",
"9c0bb22866ee2f68#9c0bb22866ee2f68#1#1#4606#0#100"
]
}
}
3. 前端逻辑梳理
1.客户端拉取前100名排名信息展示排名信息。(假设100名的关卡信息为3500关)
2.个人信息获取:
(a) 假如本地信息的关卡大于等于3500关则遍历前100名信息。如果在集合中,直接展示个人信息(本地为主);如果不再集合中,删除最后一名信息并将自己的信息插入前100名中。
(b) 假如本地信息的关卡小于3500关。通过机器学习分析前1w名关卡和排名之间的关系(该公式服务器给出)。客户端通过这一关系展示个人排名。
4. 关卡与排名之间的关系
由图像可知:该图像分为两段1-400斜率较大;剩余400-5000斜率较缓慢。
5. 机器学习算法评估
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge
from sklearn.metrics import mean_squared_error
import pandas as pd
import matplotlib.pyplot as plt
def getRank():
"""
多种回归算法评估
"""
data = pd.read_csv("./data/android.csv")
# data = data.query("rank < 10001")
x_train, x_test, y_train, y_test = train_test_split(data[['level']], data[['rank']], test_size=0.00001)
plt.figure(figsize=(20, 8), dpi=80)
plt.scatter(data[['level']], data[['rank']])
plt.title("android levels and ranks")
plt.xlabel('levels')
plt.ylabel('ranks')
plt.show()
# 正规方程求解方式预测结果
lr = LinearRegression()
lr.fit(x_train, y_train)
print("系数:", lr.coef_, "截距离:", lr.intercept_)
y_lr_predict = lr.predict(x_test)
print("正规方程测试集里面每个关卡对应的排名:", y_lr_predict)
print("正规方程的均方误差:", mean_squared_error(y_test, y_lr_predict))
# 岭回归进行预测
rd = Ridge(alpha=1.0)
rd.fit(x_train, y_train)
print("岭回归算法系数:", rd.coef_, "截距离:", rd.intercept_)
y_rd_predict = rd.predict(x_test)
print("岭回归测试集里面每个关卡对应的排名:", y_rd_predict)
print("岭回归的均方误差:", mean_squared_error(y_test, y_rd_predict))
if __name__ == '__main__':
getRank()
6. 结论
关卡和排名关系如下(安卓):
-0.242474 * Rank +3850=Level (100<rank<=10000)
反推出 1400<Level <3825
-17.31919503 * Level + 43013 = Rank (1500<Level <=2000)
-6.24650545 * Level + 20734 = Rank (2000<Level <=3000)
-2.1950991 * Level + 8891= Rank (3000<Level <=4000)
-0.51153987 * Level + 2457= Rank (4000<Level <=4600 )
描述:当Level小于1500,此时未上榜;其余带入分段函数进行取整操作,判断其排名是否大于10000,大于未上榜,小于显示排名。
关卡和排名关系如下(IOS):
-0.10760038 * Rank +2073 =Level (100<rank<=10000)
反推出 1000<Level <2000
-10.40723106 * Level + 21202 = Rank (1000<Level <=2000)
-1.55131557 * Level + 4769 = Rank (2000<Level <=3000)
-0.48629439 * Level + 1785 = Rank (3000<Level <3500)
描述:当Level小于1000,此时未上榜;其余带入分段函数进行取整操作,判断其排名是否大于10000,大于未上榜,小于显示排名。