- 博客(1141)
- 资源 (42)
- 收藏
- 关注
转载 GBDT能否被深度学习取代——TabNet
论文阅读:GBDT能否被深度学习取代——TabNet - 简书2021年,谁才是表格类数据模型的王者? - 知乎 (zhihu.com)
2021-11-24 11:11:56 383
转载 2021年,谁才是表格类数据模型的王者?
上次在聊 autoML 框架时顺带提了一下对于表格类数据(也是商业类问题的主要数据形式)表现较好的模型的选择,最近正好在 Twitter 上看到几篇不错的文章,就来稍微展开讨论一下。以我目前的认知,表格类数据的主流模型选择就是树模型(包括 GBDT,随机森林等)和 NN(从 MLP 到各种复杂变种)两类。本文也主要来阐述和对比这两类模型。关于 Kaggle 比赛的分析如果仅考虑模型的精度效果,那么 Kaggle 比赛绝对是最好的检验方式之一。这方面推荐砍手豪大佬的两个系列文章:No free l
2021-11-17 11:12:35 1686
转载 知识蒸馏(Knowledge Distillation)
知识蒸馏(Knowledge Distillation)_AI Flash-CSDN博客_知识蒸馏
2021-11-09 19:14:07 138
转载 Knowledge Distillation: Principles, Algorithms, Applications
Large-scale machine learning and deep learning models are increasingly common. For instance, GPT-3 is trained on 570 GB of text and consists of 175 billion parameters. However, whilst training large models helps improve state-of-the-art performance, depl..
2021-11-09 19:10:52 624
原创 Understanding Domain Adaptation
Note — I assume the reader has some basic knowledge of neural network and its working.Domain adaptation is a field of computer vision, where our goal is to train a neural network on asource datasetand secure a good accuracy on thetarget datasetwhich ...
2021-11-09 19:07:57 325
转载 多目标学习在推荐系统的应用(MMOE/ESMM/PLE)
Alternative Training多目标学习在推荐系统的应用(MMOE/ESMM/PLE) - 知乎
2021-11-02 17:41:41 439
转载 Multi-Task Learning with Deep Neural Networks: A Survey
https://arxiv.org/abs/2009.09796
2021-09-27 14:57:08 503
转载 图神经网络(GNN)训练中过度平滑的问题 over-smoothing
如何解决图神经网络(GNN)训练中过度平滑的问题? - 知乎作者:日知链接:https://www.zhihu.com/question/346942899/answer/835222364来源:知乎著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。不是所有图神经网络都有 over-smooth 的问题,例如,基于 RandomWalk + RNN、基于 Attention 的模型大多不会有这个问题,是可以放心叠深度的~只有部分图卷积神经网络会有该问题。.
2021-09-17 18:39:52 1559
转载 Contrastive Representation Learning
Contrastive Representation Learninghttps://lilianweng.github.io/lil-log/2021/05/31/contrastive-representation-learning.htmlThe main idea of contrastive learning is to learn representations such that similar samples stay close to each other, while dis.
2021-09-14 20:18:00 542
转载 Graph Neural Networks – Libraries, Tools, and Learning Resources
Graph Neural Networks - Libraries, Tools, and Learning Resources - neptune.ai
2021-09-14 18:44:38 67
原创 基于密度峰值的聚类(DPCA)
1、背景介绍 密度峰值算法(Clustering by fast search and find of density peaks)由Alex Rodriguez和Alessandro Laio于2014年提出,并将论文发表在Science上。Science上的这篇文章《Clustering by fast search and find of density peaks》主要讲的是一种基于密度的聚类方法,基于密度的聚类方法的主要思想是寻找被低密度区域分离的高密度区域。 密度峰值算法(DPCA)基于这
2021-07-06 11:54:17 5205 6
转载 XPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TU
In this post, I argue for and demonstrate how to train a model optimized on a treatment’s causal effect. This involves predicting thelifta treatment is expected to have over the control, which is defined as the difference in an outcomeYbetween treatmen...
2021-06-17 20:19:41 756
转载 代价敏感学习初探 - 有偏损失函数设计
搬砖https://www.cnblogs.com/LittleHann/p/10587512.html
2021-06-16 13:18:57 2638
转载 Understanding Domain Adaptation
Note — I assume the reader has some basic knowledge of neural network and its working.Domain adaptation is a field of computer vision, where our goal is to train a neural network on asource datasetand secure a good accuracy on thetarget datasetwhich ...
2021-06-16 13:15:01 224
转载 What’s the Difference Between RMSE and RMSLE?
https://medium.com/analytics-vidhya/root-mean-square-log-error-rmse-vs-rmlse-935c6cc1802aIntroductionThere has been a lot of evaluation metrics when it comes to Regression problem and Root Mean Square Error or RMSE, in short, has been among the “got.
2021-05-21 13:49:39 729
原创 推荐算法之贝叶斯个性化排序 BPR
https://www.biaodianfu.com/bpr.htmlhttps://www.jianshu.com/p/fd3081abf951https://www.jianshu.com/p/eb54c6a5d08bhttps://blog.csdn.net/qq_38861305/article/details/100942019
2021-03-30 11:27:58 146
转载 Deep Domain Adaptation In Computer Vision
During the last decade, the field of Computer vision has made huge progress. This progress is mostly due to the undeniable effectiveness of Convolutional Neural Networks (CNNs). CNNs allow for very precise predictions if trained with high-quality annota...
2021-03-15 11:27:04 451
原创 推荐系统中的排序学习
https://lumingdong.cn/learning-to-rank-in-recommendation-system.html
2021-03-15 11:08:22 135
转载 Beyond Predictive Models: The Causal Story Behind Hotel Booking Cancellations
Understanding why Hotel Bookings are cancelled using Microsoft DoWhy in PythonImage by authorWhy did Medium/Linkedin or any other platform recommend this post to you? More importantly, what piqued your interest and made you click on this post? Was.
2021-03-12 15:18:29 367
转载 Causal Inference: Trying to Understand the Question of Why
Why are you reading this article?Why did you choose to learn about causal inference? Why are you thinking that this is a really weird way to start an article? Who knows. A more interesting question to ask is why can we, as humans, think about and understa.
2021-03-12 15:03:33 670
原创 pandas通过loc生成新的列
pandas中一个很便捷的使用方法通过loc、iloc、ix等索引方式,这里记录一下:df.loc[条件,新增列] = 赋初始值如果新增列名为已有列名,则在原来的数据列上改变import pandas as pdimport numpy as npdata = pd.DataFrame(np.random.randint(0,100,40).reshape(10,4),columns=list('abcd'))print(data)data.loc[data.d >= 50,'.
2021-03-03 12:10:18 828 1
原创 if else 连写
def is_bad_fpd_7_handle(x): if x>7: return 1 elif x>0: return -1 elif x<=0: return 0 else : return np.NaN等价于:x=np.NaNb=1 if x > 7 else -1 if x>0 else 0 if x<=0 else np.NaN...
2021-03-02 15:38:02 313
转载 How to train a GAN model in keras?
https://medium.com/dive-into-ml-ai/using-kerass-model-fit-to-train-a-gan-model-a0f02ed6d39eIn this article, I present three different methods for training a Discriminator-generator (GAN) model using keras(v2.4.3)on a tensorflow(v2.2.0)backend. The...
2021-03-01 11:32:31 176
原创 tf.py_func()函数
tensorflow由于构建的是静态图,所以导致在tf.Session().run()之前是没有实际值的,因此,在网络搭建的时候,是不能对tensor进行判值操作的,即不能插入if…else…之类的代码。第二,相较于numpy array,Tensorflow中对tensor的操作接口灵活性并没有那么高,使得Tensorflow的灵活性减弱。在笔者使用Tensorflow的一年中积累的编程经验来看,扩展Tensorflow程序的灵活性,有一个重要的手段,就是使用tf.py_func接口。 接口解析代
2021-02-26 11:19:45 142
原创 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
https://blog.csdn.net/cv_family_z/article/details/78749992https://blog.csdn.net/cdknight_happy/article/details/102618883https://zhuanlan.zhihu.com/p/146082763
2021-02-02 10:12:43 146 1
原创 How to use Learning Curves to Diagnose Machine Learning Model Performance
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
2021-02-01 11:25:27 105
转载 如何根据训练/验证损失曲线诊断我们的CNN
前言在关于训练神经网路的诸多技巧Tricks(完全总结版)这篇文章中,我们大概描述了大部分所有可能在训练神经网络中使用的技巧,这对如何提升神经网络的准确度是很有效的。然而在实际中,在方法几乎定型的时候,我们往往需要针对自己的任务和自己设计的神经网络进行debug才能达到不错的效果,这也就是一个不断调试不断改进的一个过程。(炼金何尝不是呢?各种配方温度时间等等的调整)那么到底如何去Debug呢?如何Debug以下的内容部分来自CS231n课程,以及汇总了自己在训练神经网络中遇到的很多
2021-02-01 11:22:33 6363 4
转载 调参技巧
转自:https://zhuanlan.zhihu.com/p/56745640本期问题能否聊一聊深度学习中的调参技巧?我们主要从以下几个方面来讲.1. 深度学习中有哪些参数需要调?2. 深度学习在什么时候需要动用调参技巧?又如何调参?3. 训练网络的一般过程是什么?1. 深度学习有哪些需要们关注的参数呢?大家记住一点:需要用到调参技巧的参数都是超参数!!因此,这个问题还可以换成更专业一点:神经网络中有哪些超参数?主要从两个方面来看:和网络设计相关的参数:神经网络的.
2021-01-29 18:38:00 650
原创 为什么深度神经网络验证集损失低于训练集
1. 在训练的过程中应用了正则化,但是在对验证集计算损失的时候没有采用正则化。比如在损失函数中加入了L1,L2等正则项,或者dropout。正则化会牺牲训练精度,但是可以通过提高验证集和测试集的精度防止过拟合。如果在验证集中也加入正则项,那么会改善验证集损失小于训练集损失这种情况。2. 在计算训练集的损失时,它是边训练边计算的,不是等训练完一轮(epoch)后再计算总的训练集损失的。实际上,我们的数据是一个batch一个batch的输入到模型中训练的。在一轮训练中,每训练完一个batch就计算一下该ba
2021-01-28 10:34:39 1587
转载 金融风控里的WOE前的分箱一定要单调吗?
转:今天我们来讲讲一个金融风控里的“常识点”,就是那种我们习以为常但若要讲出个所以然来比较困难的点,正如标题所言:WOE前的分箱一定要单调吗?????✍️ 背景交代相信每一个在金融风控领域做过模型的人,应该对分箱满足badrate单调性有一定的认知,特别是在用逻辑回归做A卡的时候,老司机们会经常对我们说变量要满足单调性,当变量单调了,再进行WOE转换,然后作为LR的入参喂给模型,简单训练一下就收工。但作为一个合格的风控建模大师,仅仅知道这些套路还是不够的,我们需要进一步去思考一下当中的原理,或者
2021-01-11 16:38:24 2350 1
转载 使用 Isotonic Regression 校准分类器
21 December 20151. 引言对有监督机器学习问题,通常的训练流程包括这样几步:先建立起模型,然后在训练集上训练模型,如果有超参数,还需要在验证集上应用交叉验证以确定超参数,总之最终会得到一个模型。在这样的流程下,不断优化模型,如果在测试集上取得了较高的准确率、召回率、F-score或者AUC后,那事情就结束了吗,模型的输出结果是符合需要的吗?这并不一定。当给定一个样本,大部分分类器能够输出该样本属于某类的分数,通常这个分数介于0到1之间,我们称之为概率,严格来讲,是后验概率,数学上
2021-01-11 15:18:55 559
转载 Netflix推荐系统模型的快速线上评估方法——Interleaving
这里是「王喆的机器学习笔记」的第十八篇文章,今天我们关注模型的评估和线上测试。有经验的算法工程师肯定非常清楚,在一个模型的开发周期中,占工作量大头的其实是特征工程和模型评估及上线的过程。在机器学习平台已经非常成熟的现在,模型结构的实现和调整反而仅仅是几行代码的事情。所以如果能够将模型评估和线上AB Test的效率提高,那一定是大大解放算法工程师效率的事情。今天这篇文章我们就介绍一下流媒体巨头Netflix的“独门线上评估秘笈”——Interleaving。周所周知,Netflix是美国的流媒体巨头,
2020-12-23 16:00:12 356
Pro Go The Complete Guide -go语言学习最新书籍
2023-06-19
Advanced_Programming_in_the_UNIX_Environment,_3rd
2018-11-30
Deep_Learning_Quick_Reference
2018-09-01
Convex Optimization Algorithms
2018-09-01
Guide.to.Medical.Image.Analysis.Methods.and.Algorithms
2018-09-01
Python Machine Learning Machine Learning and Deep Learning
2018-03-27
Data Structures and Algorithms Using Python and C++
2018-03-27
R_for_Data_Science
2018-03-27
Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow
2018-03-17
Approximate.Dynamic.Programming.2011
2018-01-17
Swarm Intelligence Principles Advances and Applications
2018-01-13
Reinforcement Learning With Open A TensorFlow and Keras Using Python.pdf
2017-12-18
Fundamentals of Deep Learning完整非扫描版本2017
2017-12-16
Text Mining in Practice with R 2017.12
2017-12-13
Text_Mining-From_Ontology_Learning_to_Automated_Text_Processing_Applications
2017-12-13
Tensorflow 机器学习参考手册2007
2017-11-22
Spark大数据处理技术 带标签 完整版
2017-11-12
模式分类11
2016-11-07
集体编程智慧
2016-11-07
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人