- 博客(30)
- 收藏
- 关注
原创 计算文章中每个词的权重值-信息熵及代码实现
计算出每个词的信息熵可以用来作为词的权重,信息熵公式是:W代表该词,p代表该词左右出现的不同词的数目。比如现在某篇文章中出现了两次 A W C, 一次B W D那么W的左侧信息熵为:2/3表示词组A在3次中出现了2次,B只出现了一次,故为1/3.W右侧的信息熵也是一样的。如果是A W C, B W C那么W右侧就是0,因为是 -1log(1)。对所有的词
2016-06-29 16:15:32 9030 4
原创 基于标题分类的文章主题句识别与提取方法
基于标题分类的主题句提取方法基于标题分类的主题句提取方法可描述为: 给定一篇新闻报道, 计算标题与新闻主题词集的相似度, 判断标题是否具有提示性。对于提示性标题,抽取新闻报道中与其最相似的句子作为主题句; 否则, 综合利用多种特征计算新闻报道中句子的重要性, 将得分最高的句子作为主题句。算法过程:1. 构造新闻的主题词集(1) 对于爬取的有标签的或关
2016-06-24 17:53:46 9669 5
原创 使用CRF++进行分词的原理和实现过程
使用CRF分词的原理和实现过程目前业内分词效果最好的是CRF模型,而CRF++是CRF实现的比较成熟的工具,下面是用CRF++做分词的过程。1.使用4-tags标记,对训练语料做预处理分别用B代表词首,E代表词尾,M代表词中,S代表单字词。然后使用python将训练语料中的词处理成CRF输入的格式。如句子:海內外 關注 的 一九九七 年 七月 一 日 終於 來到
2016-06-22 20:58:54 8056
原创 spark性能调优
spark性能调优有很多措施,下面说说我用到的一些调优手段。1.RDD分片数和executor个数的协调要想充分的使数据并行执行,并且能充分的利用每一个executor,则在rdd的个数与executor的个数之间要有一个合适的值。若rdd的个数较多而executor的个数较少,则会导致部分rdd需要等待空闲的executor,这样不能使所有数据同时并行执行。若rdd较少,而executo
2016-06-21 18:33:33 6328 1
原创 斯坦福和NLTK英语短语词组抽取工具原理及源码理解
一、斯坦福短语抽取工具实现了四个方法来进行短语搭配抽取(1)基于统计频率数的方法该方法用于查找长度为2或者3并且连续的短语搭配。因此只处理bigrams和trigrams语料库。对于候选短语集,首先使用预定义的词性序列做一个初步的过滤,将不符合该词性序列的短语组合过滤掉。预定义的词性组合为:NN_NNJJ_NNVB_NNNN_NN_NNJJ_NN_NNNN_
2016-06-12 12:07:55 11996 2
原创 328. Odd Even Linked List
Given a singly linked list, group all odd nodes together followed by the even nodes. Please note here we are talking about the node number and not the value in the nodes.You should try to do it in
2016-06-05 18:37:28 417
原创 326. Power of Three
Given an integer, write a function to determine if it is a power of three.public class Solution { public boolean isPowerOfThree(int n) { double res = Math.log(n)/Math.log(3); ret
2016-06-05 18:36:50 261
原创 292. Nim Game
You are playing the following Nim Game with your friend: There is a heap of stones on the table, each time one of you take turns to remove 1 to 3 stones. The one who removes the last stone will be the
2016-06-05 18:36:06 271
原创 258. Add Digits
Given a non-negative integer num, repeatedly add all its digits until the result has only one digit.For example:Given num = 38, the process is like: 3 + 8 = 11, 1 + 1 = 2. Since 2 has on
2016-06-05 18:35:18 280
原创 242. Valid Anagram
Given two strings s and t, write a function to determine if t is an anagram of s.For example,s = "anagram", t = "nagaram", return true.s = "rat", t = "car", return false.public class Solutio
2016-06-05 18:34:18 256
原创 237. Delete Node in a Linked List
Write a function to delete a node (except the tail) in a singly linked list, given only access to that node.Supposed the linked list is 1 -> 2 -> 3 -> 4 and you are given the third node with value
2016-06-05 18:33:33 256
原创 231. Power of Two
Given an integer, write a function to determine if it is a power of two.public class Solution { public boolean isPowerOfTwo(int n) { return n > 0 && (n & (n - 1)) == 0; }}
2016-06-05 18:32:23 268
原创 226. Invert Binary Tree
Invert a binary tree. 4 / \ 2 7 / \ / \1 3 6 9to 4 / \ 7 2 / \ / \9 6 3 1/** * Definition for a binary tree node. * public class TreeNode { *
2016-06-05 18:31:17 239
原创 217. Contains Duplicate
Given an array of integers, find if the array contains any duplicates. Your function should return true if any value appears at least twice in the array, and it should return false if every element
2016-06-05 18:30:13 259
原创 203. Remove Linked List Elements
Remove all elements from a linked list of integers that have value val.ExampleGiven: 1 --> 2 --> 6 --> 3 --> 4 --> 5 --> 6, val = 6Return: 1 --> 2 --> 3 --> 4 --> 5/** * Definition for sing
2016-06-05 18:29:08 268
原创 202. Happy Number
Write an algorithm to determine if a number is "happy".A happy number is a number defined by the following process: Starting with any positive integer, replace the number by the sum of the squares
2016-06-05 18:28:20 275
原创 110. Balanced Binary Tree
Given a binary tree, determine if it is height-balanced.For this problem, a height-balanced binary tree is defined as a binary tree in which the depth of the two subtrees of every node never diffe
2016-06-05 18:27:07 403
原创 104. Maximum Depth of Binary Tree
Given a binary tree, find its maximum depth.The maximum depth is the number of nodes along the longest path from the root node down to the farthest leaf node./** * Definition for a binary tree
2016-06-05 18:26:15 294
原创 基于spark实现的CRF模型的使用与源码分析
CRF基于spark实现的过程与源码分析Crf-spark实现时基于spark的LBFGS算法实现,由于在spark的mllib库中实现了LBFGS算法,因此在使用crf训练时调用该算法在spark平台上将会使迭代更加快速。缩短训练时间。源码地址:https://github.com/lihait/CRF-Spark源码是scala语言写的,将源码下载后使用sbt工具打包成
2016-06-03 21:21:57 3712 2
原创 70. Climbing Stairs
You are climbing a stair case. It takes n steps to reach to the top.Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?public class Solution { pu
2016-06-01 18:42:40 319
原创 67. Add Binary
Given two binary strings, return their sum (also a binary string).For example,a = "11"b = "1"Return "100".public class Solution { public String addBinary(String a, String b) {
2016-06-01 18:41:49 306
原创 66. Plus One
Given a non-negative number represented as an array of digits, plus one to the number.The digits are stored such that the most significant digit is at the head of the list.import java.math.Big
2016-06-01 18:41:01 279
原创 38. Count and Say
The count-and-say sequence is the sequence of integers beginning as follows:1, 11, 21, 1211, 111221, ...1 is read off as "one 1" or 11.11 is read off as "two 1s" or 21.21 is read off as
2016-06-01 18:39:34 237
原创 28. Implement strStr()
public class Solution { public int strStr(String haystack, String needle) { /** int l1 = haystack.length(); int l2 = needle.length(); char[] haystack1 = haystack.toCharArray();
2016-06-01 18:38:26 226
原创 27. Remove Element
Given an array and a value, remove all instances of that value in place and return the new length.Do not allocate extra space for another array, you must do this in place with constant memory.
2016-06-01 18:37:21 246
原创 26. Remove Duplicates from Sorted Array
Given a sorted array, remove the duplicates in place such that each element appear only once and return the new length.Do not allocate extra space for another array, you must do this in place with
2016-06-01 18:36:23 338
原创 21. Merge Two Sorted Lists
Merge two sorted linked lists and return it as a new list. The new list should be made by splicing together the nodes of the first two lists./** * Definition for singly-linked list. * public cla
2016-06-01 18:34:30 214
原创 19. Remove Nth Node From End of List
/** * Definition for singly-linked list. * struct ListNode { * int val; * ListNode *next; * ListNode(int x) : val(x), next(NULL) {} * }; */class Solution {public: ListNode* re
2016-06-01 18:33:15 296
原创 13. Roman to Integer
Given a roman numeral, convert it to an integer.Input is guaranteed to be within the range from 1 to 3999.public class Solution { public int romanToInt(String s) { int res = 0; /
2016-06-01 18:32:05 343
原创 Stanford Parser中文句法分析器的使用
Contents一、使用时注意两点二、stanford parser 命令行使用1 处理一个中文的句子 2 词性标注 和 生成 依存关系 3 图形工具界面 三、句法分析树标注集 一、使用时注意两点:1。 中文内存大小设置:在运行--运行配置--自变量--vm参数中-加入:--Xmx1024m2。 Tokenize指的是是否分词。一定选Tokeniz
2016-06-01 16:29:42 21565 5
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人