比赛
junjian Li
如若前路不明,请俯首看脚下.
展开
-
Nezha预训练备份
build_model_and_tokenizer(args)def build_model_and_tokenizer(args): tokenizer = BertTokenizer.from_pretrained(args.vocab_path) model_config = NeZhaConfig.from_pretrained(args.pretrain_model_path) model = NeZhaForMaskedLM.from_pretrained(pret.原创 2022-01-15 00:37:30 · 838 阅读 · 0 评论 -
pandas 速查
判断pandas的某一列的元素是否在已知的list A里面,保存在A中的那些行. train_o = train[train['compare'].isin(list_a)]如果train[‘compare’]这个元素在list_a中,那么就保存,不在就不保存原创 2021-12-19 12:52:46 · 777 阅读 · 0 评论 -
Target Encoding
翻译自 Kaggle Target Encoding1. 简介目标编码,是针对类别特征的。它是一种将类别特征编码为数字特征的方法,就像one-hot编码一样,不同之处在于,Target encoding使用标签来创建编码。这就是我们所说的监督特征工程。2. 代码# using target encoding# Tutorial: https://www.kaggle.com/ryanholbrook/target-encodingdef target_encoding(name, df, m=1原创 2021-12-12 11:49:14 · 3719 阅读 · 0 评论 -
FeatureUnion的保存和导入
sklearn.pipeline.FeatureUnion的保存和导入保存pickle.dump(pipeline, open(f"./models/toxic_clear_pipeline.pipeline", "wb" ) )导入pipeline = pickle.load( open( f"./models/toxic_clear_pipeline.pipeline", "rb" ) )原创 2021-12-02 20:13:28 · 107 阅读 · 0 评论 -
固定随机种子. F1阈值搜索和F-0.3的实现
def f_score_search(df): t0 = 0.01 v = 0.001 best_t = t0 best_f = 0 best_p, best_r = 0, 0 for step in range(950): curr_t = t0 + step * v p, r, curr_f = f_beta(df_oof['pred'], df_oof['label'], threshold=curr_t)原创 2021-11-25 11:18:08 · 570 阅读 · 0 评论 -
TFIDF代码
Dim_tfidf = 32X = df['post_detail'].valuestfv = TfidfVectorizer(max_df=0.6, min_df=6, sublinear_tf=False)tfv.fit(X)X_tfidf = tfv.transform(X)svd = TruncatedSVD(n_components=Dim_tfidf)svd.fit(X_tfidf)X_svd = svd.transform(X_tfidf)for i in range(原创 2021-09-29 13:38:03 · 107 阅读 · 0 评论