IT独白者-CSDN博客

原创 LeetCode 最长公共子序列和子串

求两个字符串的最长公共子串(Longest Common Substring)和最长公共子序列(Longest Common Subsequence)的区别在于最长公共子串是在原来的字符串中是连续的，而子序列只需要保持相对顺序一致，并不要求连续。例如：X={a,Q,1,1}，Y={a,1,1,d,f}；则这两个字符串的最长公共子序列为{a,1,1}，但是这两个字符串的最长公共子串为{1,1...

2019-08-06 10:39:13 2183 1

转载论文精读-DeepFM

转载一篇博客文章https://blog.csdn.net/zynash2/article/details/79348540DNN部分的详细推导见：https://blog.csdn.net/zynash2/article/details/79360195

2019-06-02 10:49:36 481

转载 go的一些基本知识

go 编译，依赖 GOPATH 找到srchttps://blog.csdn.net/u012210379/article/details/50443636go logginghttp://www.yeolar.com/note/2014/12/20/glog/-log_dir=”./logs” -v=3 指定目录和log 级别go inithttps://stackoverfl...

2019-04-17 13:54:57 375

原创 python连接mysql数据库实例

利用python去连接mysql数据库，通引入mysqldb这个第三方库去连接即可。可以写一个DB class，里面涵盖要连接的主机、用户名、密码、数据库name、端口号等信息。import MySQLdbimport sysreload(sys)sys.setdefaultencoding('utf8')import MySQLdb.cursorsclass DB(objec...

2019-03-05 10:04:07 1610

在CTR预估中，一种做法是采用人工来做feature engineering，将一些非线性的feature转换为线性的feature，然后喂给LR之类的线性model来做在线学习，在这个过程中，对于一些categorical feature，比如user_id，advertisement_id，直接做one-hot encoding（一般还会对feature做笛卡尔积）会导致维度爆炸，hashin...

2019-01-20 20:42:09 2318 5

转载梯度下降之MiniBatch与并行化方法

问题的引入：考虑一个典型的有监督机器学习问题，给定m个训练样本S={x(i),y(i)}，通过经验风险最小化来得到一组权值w，则现在对于整个训练集待优化目标函数为：其中为单个训练样本（x(i),y(i)）的损失函数，单个样本的损失表示如下：引入L2正则，即在损失函数中引入，那么最终的损失为：注意单个样本引入损失为（并不用除以m）：正则化的解释这里的正则化项可...

2019-01-20 20:13:06 1932 1

转载 AUC计算方法总结

在机器学习的分类问题中，尤其是二分类问题中，常常需要有评判标准，那么在这些评判标准中，最常用的就是准确率、召回率、ROC和AUC。其中，在实际使用中，我们往往使用AUC作为评判标准，那么如何计算AUC就是非常重要的。在常见的方法中，最常用的就是通过计算ROC，然后计算ROC下与X轴围成的面积作为AUC的值，但是这种方法非常简单，不会用于实际计算中。所以我们需要另外寻找方法来计算。那么，这就出现...

2019-01-20 20:04:47 3867

转载 FM算法研究及python代码实现

1. 什么是FM？FM即Factor Machine，因子分解机。2. 为什么需要FM？1、特征组合是许多机器学习建模过程中遇到的问题，如果对特征直接建模，很有可能会忽略掉特征与特征之间的关联信息，因此，可以通过构建新的交叉特征这一特征组合方式提高模型的效果。2、高维的稀疏矩阵是实际工程中常见的问题，并直接会导致计算量过大，特征权值更新缓慢。试想一个10000*100的表，每一列都...

2019-01-16 11:45:15 1501

转载【机器学习】LR的分布式（并行化）实现——理论篇

逻辑回归（Logistic Regression，简称LR）是机器学习中十分常用的一种分类算法，在互联网领域得到了广泛的应用，无论是在广告系统中进行CTR预估，推荐系统中的预估转换率，反垃圾系统中的识别垃圾内容……都可以看到它的身影。LR以其简单的原理和应用的普适性受到了广大应用者的青睐。实际情况中，由于受到单机处理能力和效率的限制，在利用大规模样本数据进行训练的时候往往需要将求解LR问题的过程进...

2019-01-06 09:35:23 2038

转载 Feed流简介

LZ可能要去新的公司从事Feed流推荐相关的工作，在此之前，打算先对这一块内容做一个简单的介绍，也有利于我自身后续在这一方面的深耕。在互联网领域，尤其现在的移动互联网时代，Feed流产品是非常常见的，比如我们每天都会用到的朋友圈，微博，就是一种非常典型的Feed流产品，还有图片分享网站Pinterest，花瓣网等又是另一种形式的Feed流产品。除此之外，很多App的都会有一个模块，要么叫动态，...

2018-12-16 14:27:56 9452

原创寻找给定的一个数组中第k大的一个数，或者是寻找前k大个数

这道题是一道常见的面试题，其实这道题可以用快速排序的思想来实现，而且求前k大个数和第k大的数，其实思路是一样的，都是用快排的思想。只要不停遍历，直到找到分界点，即该分界点的右边的数都比该分界点大；该分界点左边的数都比该分界点小。而且刚好该分界点即为第k大的数。 public static void Kth(int[] nums,int k) { int resu...

2018-11-17 16:17:37 2668

原创为什么svm算法在求解过程中，需要将原始问题转化为对偶问题？

对偶问题将原始问题中的约束转为了对偶问题中的等式约束方便核函数的引入改变了问题的复杂度。由求特征向量w转化为求比例系数a，在原始问题下，求解的复杂度与样本的维度有关，即w的维度。在对偶问题下，只与样本数量有关。求解更高效，因为只用求解比例系数a，而比例系数a只有支持向量才为非0，其他全为0....

2018-11-11 21:56:53 3823 1

原创二叉树的先序遍历(递归和非递归)、中序遍历(递归和非递归)、后序遍历(非递归)及层次遍历java实现

二叉树的先序遍历，递归实现： public List<Integer> preorderTraversal(TreeNode root) { //用栈来实现 List<Integer> list = new ArrayList<Integer>(); PreOrderTraversal(root,list...

2018-11-11 17:16:07 1613

原创 LeetCode Find and Replace Pattern 查找和替换模式

You have a list of words and a pattern, and you want to know which words in words matches the pattern.A word matches the pattern if there exists a permutation of letters p so that after replacing ev...

2018-10-17 22:44:32 707

原创 LeetCode Repeated String Match 重复叠加字符串匹配

Given two strings A and B, find the minimum number of times A has to be repeated such that B is a substring of it. If no such solution, return -1.For example, with A = "abcd" and B = "cdabcdab".Re...

2018-10-16 23:21:30 467

转载《机器学习实战》二分-kMeans算法（二分K均值聚类）

首先二分-K均值是为了解决k-均值的用户自定义输入簇值k所延伸出来的自己判断k数目，其基本思路是：为了得到k个簇，将所有点的集合分裂成两个簇，从这些簇中选取一个继续分裂，如此下去，直到产生k个簇。伪代码：初始化簇表，使之包含由所有的点组成的簇。repeat 从簇表中取出一个簇。 {对选定的簇进行多次二分试验} for i=1 to 试验次数 do 使...

2018-10-16 21:49:28 5311 3

转载 CART回归树算法过程

CART决策树算法是一种分类及回归树算法，既可以用于分类，也可以用于回归。但是在李航老师的《统计学习方法》一书中，并没有详细介绍回归树，更多的是介绍分类树，所以有必要对CART回归树进行简单介绍，有利于对CART树用于回归的操作，因为后续GBDT算法也是在CART回归树的基础上进行的，所以深入理解CART回归树非常重要。回归树：使用平方误差最小准则训练集为：D={(x1,y1), (x2,...

2018-10-15 12:27:52 12145 1

转载线性判别分析LDA原理总结

　在主成分分析（PCA）原理总结中，我们对降维算法PCA做了总结。这里我们就对另外一种经典的降维方法线性判别分析（Linear Discriminant Analysis, 以下简称LDA）做一个总结。LDA在模式识别领域（比如人脸识别，舰艇识别等图形图像识别领域）中有非常广泛的应用，因此我们有必要了解下它的算法原理。　　　　在学习LDA之前，有必要将其自然语言处理领域的LDA区别开来，在自然...

2018-10-14 16:50:32 430

原创 LeetCode String Compression 字符串压缩

Given an array of characters, compress it in-place.The length after compression must always be smaller than or equal to the original array.Every element of the array should be a character (not int...

2018-10-13 08:56:15 250

原创 Anagram 字母易位词

两个单词如果包含有相同的字母，只是次序不同，则称这两个词为字母易位词，例如："silent"和"listen".而"apple"和"aplee"就不是字母易位词。请用最小的算法复杂度来实现监测两个单词是否是字母易位词。看到这个题，需要用最小的时间复杂度来判断，那么如果用比较一般的方法，比如用hashMap来实现，算法的时间复杂度就会比较高。那么就需要另辟蹊径，找一个简单的方法。关于字符串的题目...

2018-10-10 17:06:14 796

转载【机器学习】k-fold cross validation（k-折叠交叉验证）

交叉验证的目的：在实际训练中，模型通常对训练数据好，但是对训练数据之外的数据拟合程度差。用于评价模型的泛化能力，从而进行模型选择。交叉验证的基本思想：把在某种意义下将原始数据(dataset)进行分组,一部分做为训练集(train set),另一部分做为验证集(validation set or test set),首先用训练集对模型进行训练,再利用验证集来测试模型的泛化误差。另外，现实中数据...

2018-10-09 22:50:34 7433

原创 LeetCode Repeated Substring Pattern 重复的子字符串

Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. You may assume the given string consists of lowercase Engli...

2018-10-09 20:27:53 210

原创 LeetCode Student Attendance Record I 学生出勤记录I

You are given a string representing an attendance record for a student. The record only contains the following three characters: 'A' : Absent. 'L' : Late. 'P' : Present. A student could be re...

2018-10-09 14:42:26 295

转载《机器学习实战》第5章逻辑斯蒂回归数学推导

在《机器学习实战》一书的第5章逻辑斯蒂回归的代码介绍中，p79中开头有一句，“此处略去了一个简单的数学推导”，那么到底略去了哪一个简单的数学推导呢？本着要将这个算法彻底搞明白的态度，笔者在百度上搜了好多资料，终于找到了相关的资料，以供参考。从上图中按照逻辑斯蒂回归算法，利用梯度下降法求解其最值的方法，可以看到，最后求得的w如上图最后更新迭代所示。那么《机器学习实战》一书中，通过代码理解...

2018-10-08 22:25:07 696

原创 LeetCode Construct String from Binary Tree 根据二叉树创建字符串

You need to construct a string consists of parenthesis and integers from a binary tree with the preorder traversing way.The null node needs to be represented by empty parenthesis pair "()". And you ...

2018-10-07 22:21:43 232

转载 knn算法与kd树实现

最近邻法和k-近邻法　　下面图片中只有三种豆，有三个豆是未知的种类，如何判定他们的种类？　　提供一种思路，即：未知的豆离哪种豆最近就认为未知豆和该豆是同一种类。由此，我们引出最近邻算法的定义：为了判定未知样本的类别，以全部训练样本作为代表点，计算未知样本与所有训练样本的距离，并以最近邻者的类别作为决策未知样本类别的唯一依据。但是，最近邻算法明显是存在缺陷的，比如下面的例子：有一个未知形状...

2018-10-04 15:27:54 1168

原创 LeetCode All Nodes Distance K in Binary Tree

给定一个二叉树（具有根结点 root），一个目标结点 target ，和一个整数值 K 。返回到目标结点 target 距离为 K 的所有结点的值的列表。答案可以以任何顺序返回。示例 1：输入：root = [3,5,1,6,2,0,8,null,null,7,4], target = 5, K = 2输出：[7,4,1]解释：所求结点为与目标结点（值为 5）距离为 2...

2018-10-03 22:12:34 335

原创 Maximum Depth of N-ary Tree

Given a n-ary tree, find its maximum depth.The maximum depth is the number of nodes along the longest path from the root node down to the farthest leaf node.For example, given a 3-ary tree: ...

2018-09-19 16:48:36 232

原创 LeetCode Hand of Straights

Alice has a hand of cards, given as an array of integers.Now she wants to rearrange the cards into groups so that each group is size W, and consists of W consecutive cards.Return true if and only ...

2018-09-12 10:26:51 273

原创 LeetCode Add Strings

Given two non-negative integers num1 and num2 represented as string, return the sum of num1 and num2.Note:The length of both num1 and num2 is < 5100. Both num1 and num2 contains only digits 0-...

2018-09-04 15:50:33 225

原创数组最大连续子序列和

题目：给定一个数组，其中元素可正可负，求其中最大连续子序列的和。这题是一道非常经典的面试题，会经常出现在各种面试中，具体有好几种不同时间复杂度的解法，那么最好的方法是用动态规划方法来求解。第一种：时间复杂度为O(n^3)暴力法求解。三层循环，从起点和终点开始，第一层循环确定起点，第二层循环确定终点，第三层循环在起点和终点之间遍历。public static int maxSubA...

2018-09-04 14:11:12 5540 1

转载机器学习“特征编码”解析

1 为什么要进行特征编码？我们拿到的数据通常比较脏乱，可能会带有各种非数字特殊符号，比如中文。下面这个表中显示了我们最原始的数据集。而实际上机器学习模型需要的数据是数字型的，因为只有数字类型才能进行计算。因此，对于各种特殊的特征值，我们都需要对其进行相应的编码，也是量化的过程。2 特征编码类型本篇，我们主要说一下分类型特征的编码方式。对于分类型数据的编码，我们通常会使用两种方式来实...

2018-09-04 06:57:34 14799 2

原创 LeetCode Min Cost Climbing Stairs

On a staircase, the i-th step has some non-negative cost cost[i] assigned (0 indexed).Once you pay the cost, you can either climb one or two steps. You need to find minimum cost to reach the top of ...

2018-09-03 17:13:12 156

原创 LeetCode Positions of Large Groups

In a string S of lowercase letters, these letters form consecutive groups of the same character.For example, a string like S = "abbxxxxzyy" has the groups "a", "bb", "xxxx", "z" and "yy".Call

2018-09-03 13:14:07 221

原创 LeetCode Maximum Product of Three Numbers

Given an integer array, find three numbers whose product is maximum and output the maximum product.Example 1:Input: [1,2,3]Output: 6 Example 2:Input: [1,2,3,4]Output: 24 Note:The...

2018-09-02 20:46:51 198

原创 LeetCode Contains Duplicate III

Given an array of integers, find out whether there are two distinct indices i and j in the array such that the absolute difference between nums[i] and nums[j] is at most t and the absolute difference ...

2018-09-02 19:54:37 213

原创 LeetCode Decode String

Given an encoded string, return it's decoded string.The encoding rule is: k[encoded_string], where the encoded_string inside the square brackets is being repeated exactly k times. Note that k is gua...

2018-08-29 23:37:13 413

原创 LeetCode Next Greater Element II

Given a circular array (the next element of the last element is the first element of the array), print the Next Greater Number for every element. The Next Greater Number of a number x is the first gre...

2018-08-29 06:42:03 239

原创 LeetCode Next Greater Element I

You are given two arrays (without duplicates) nums1 and nums2 where nums1’s elements are subset of nums2. Find all the next greater numbers for nums1's elements in the corresponding places of nums2....

2018-08-28 10:19:18 320

原创 LeetCode Daily Temperatures

Given a list of daily temperatures, produce a list that, for each day in the input, tells you how many days you would have to wait until a warmer temperature. If there is no future day for which this ...

2018-08-28 09:57:07 269

开发Struts应用的步骤及中文乱码处理.doc

空空如也