python单例模式 def __new__(cls, *args, **kwargs): '''单例模式''' if not hasattr(cls, 'instance'): cls.instance = super(CreateFuZhuJianChaRes, cls).__new__(cls) return cls.instance
seaborn小技巧 1.添加中文支持import seaborn as snssns.set_style({'font.sans-serif':['simhei', 'Arial']})2.设置字体大小sns.set(font_scale=1)3.正常显示横纵坐标的负值plt.rcParams['axes.unicode_minus']=False4.科学计数法def formatnum(x, pos): return '$%.1f$x$10^{4}$' % (x/10000)from ma...
pandas常用操作 1.转换数据类型df.apply(pd.to_numeric,errors='ignore')2.宽数据转换为长数据pd.melt(df,id_vars=['col3'])3.重命名列名称df.rename(columns={'col1':'列1'},inplace=True)4.crosstab 混淆矩阵pd.crosstab(df['truth'],df['predict'])5.join操作merge_df=pd.merge(df1,df2,on='col1',how='left')
shap值的使用 import shapshap.initjs()explainer = shap.TreeExplainer(model)shap_values = explainer.shap_values(X_train)shap.summary_plot(shap_values, X_train,max_display=80)
python时间操作 print('时间戳转日期标准格式:')from datetime import datetimet=1640329180format_time = str(datetime.fromtimestamp(t))print(format_time)print('日期标准格式转时间戳:')cday = datetime.strptime('2015-6-1 18:19:59', '%Y-%m-%d %H:%M:%S')timestamp = cday.timestamp()print(tim.
Misusing resampling, leading to a data leakage In the resampling setting, there is a common pitfall that corresponds to resample the entire dataset before splitting it into a train and a test partitions. Note that it would be equivalent to resample the train and test partitions as well.
贝叶斯优化包安装 pip install scikit-optimizepip install hyperoptpip install -i https://pypi.douban.com/simple bayesian-optimization
pyspark 第三方依赖包 spark-submit \--master yarn \--deploy-mode cluster \--driver-memory 1g \--num-executors 1 \--queue default \--conf spark.yarn.dist.archives=hdfs:///user/xxx/conda_env.zip#python36 \--conf spark.pyspark.driver.python=./python36/conda_env/bin/python \
shell中执行mysql语句 mysql -h IP --port=端口 --database=数据库 -u用户 -p密码 -e "select * from hello; " > hello.txt
java 执行scala/java jar包 java 执行命令:java -Djava.ext.dirs=<多个jar包的目录> com.hello 参数1 参数2jar包中需要包含 scala-library-2.11.8.jar等
行政区划简称(包括别称) "京"->"北京市""津"->"天津市""沪"->"上海市""渝"->"重庆市""蒙"->"内蒙古自治区""新"->"新疆维吾尔自治区""藏"->"西藏自治区""宁"->"宁夏回族自治区""桂"->"广西壮族自治区""港"->"香港特别行政区""澳"->"澳门特别行政区""黑"->"黑龙江省""吉"->"吉林省""辽"->"辽宁省""晋"->"山西省""冀"->"河北省""青"-&
java 二维数组排序 Arrays.sort(envelopes, new Comparator<int[]>() { public int compare(int[] o1, int[] o2) { if(o1[0] == o2[0]){ // 若俩数组的第一个元素相等,则比较它们的第二个元素 return o1[1] - o2[1]; }else { // 若俩数组的第一个元素不相等,则按从小到大的顺序...
linux 常见几个日期操作 #获取几天前的日期date -d "20210812 -3 days" +"%Y%m%d"#获取几个月前对应的日期date -d "20210812 -3 month" +"%Y%m%d"#获取几个月前对应的月份first=`date -d "20210803" +"%Y%m"`month=`date -d "${first}01 -3 month" +"%Y%m"`
python spark 提交模板 spark-submit \--master yarn \--deploy-mode mode\--driver-memory 2g \--num-executors 30 \--executor-memory 6G \--executor-cores 4 \--conf spark.shuffle.service.enabled=true \--conf spark.dynamicAllocation.enabled=true \--conf spark.dynamicAllocatio
pyspark启动 pyspark --master yarn \--deploy-mode client \--conf spark.default.parallelism=240 \--queue queue\--driver-memory 2G \--executor-memory 6G \--executor-cores 4 \--num-executors 30