- 博客(11)
- 资源 (1)
- 收藏
- 关注
原创 查询存在A表但不存在B表
SELECT t1.name FROM table1 t1 LEFT JOIN table2 t2 ON t2.name = t1.name WHERE t2.name IS NULL
2020-11-26 17:31:33 161
原创 Hive两表count除法
select 100 * different / total from (select count(*) as different from a, b where ........) t1, (select count(*) as total from a where .......) t2
2020-11-26 10:45:57 2651 1
原创 pandas读取excel,txt,csv,pkl文件等命令
pandas读取txt文件 读取txt文件需要确定txt文件是否符合基本的格式,也就是是否存在\t,,,等特殊的分隔符 一般txt文件长成这个样子 txt文件举例 下面的文件为空格间隔 1 2019-03-22 00:06:24.4463094 中文测试 2 2019-03-22 00:06:32.4565680 需要编辑encoding 3 2019-03-22 00:06:32.6835965 ashshsh 4 2017-03-22 00:06:32.8041945 eggg 读取
2020-11-20 17:01:22 421
原创 多项式回归_搜索参数
# -*- coding: utf-8 -*- import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.metrics import mean_squared_error from sklearn.model_selection.
2020-10-22 16:09:58 248
原创 折线图
import matplotlib.pyplot as plt import pandas as pd movie_pd = pd.read_csv('IMDB_Movie.csv', header = 0) movie_count = movie_pd.groupby('Year').size().reset_index(name = 'Count') movie_count.plot(x = 'Year', y = 'Count', color = 'b', marker = 'o', legend.
2020-10-13 10:49:32 109
原创 lgb
#importing libraries import numpy as np from collections import Counter import pandas as pd import lightgbm as lgb import joblib from sklearn.datasets import load_breast_cancer,load_boston,load_wine from sklearn.model_selection import train_test_split fro.
2020-10-10 14:28:49 180
原创 混淆矩阵画图
from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt import numpy as np y_true = ['Cat', 'Dog', 'Rabbit', 'Cat', 'Cat', 'Rabbit'] y_pred = ['Dog', 'Dog', 'Rabbit', 'Dog', 'Dog', 'Rabbit'] classes=['Cat', 'Dog', 'Rabbit'] confusio.
2020-09-23 18:06:16 355
原创 rfe特征选择
from sklearn.datasets import make_friedman1 from sklearn.feature_selection import RFECV from sklearn.svm import SVR import pandas from sklearn.preprocessing import scale data = pandas.read_csv("iris.csv") print(data) print(data.shape) X = data.iloc[:,0:-.
2020-09-18 14:07:28 1371 1
原创 pandas设置不用科学计数法
import numpy as np np.set_printoptions(suppress=True) pd.set_option(‘display.float_format’, lambda x: ‘%.2f’ % x) #为了直观的显示数字,不采用科学计数法 train_df.describe()
2020-09-04 14:54:32 3772
原创 GridSearch调参xgb
import xgboost as xgb import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score train_data = pd.read_csv(‘train.csv’) # 读取数据 y = train_data.pop(‘30’).values # 用pop方式将训练数据中的标签值y取出来,作为训练目标,这里的‘30’是标签.
2020-08-24 10:16:52 424
pfm_train_without_pca.csv
2020-08-24
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人