自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(30)
  • 收藏
  • 关注

原创 计算文章中每个词的权重值-信息熵及代码实现

计算出每个词的信息熵可以用来作为词的权重,信息熵公式是:W代表该词,p代表该词左右出现的不同词的数目。比如现在某篇文章中出现了两次 A W C, 一次B W D那么W的左侧信息熵为:2/3表示词组A在3次中出现了2次,B只出现了一次,故为1/3.W右侧的信息熵也是一样的。如果是A W C, B W C那么W右侧就是0,因为是 -1log(1)。对所有的词

2016-06-29 16:15:32 9030 4

原创 基于标题分类的文章主题句识别与提取方法

基于标题分类的主题句提取方法基于标题分类的主题句提取方法可描述为: 给定一篇新闻报道, 计算标题与新闻主题词集的相似度, 判断标题是否具有提示性。对于提示性标题,抽取新闻报道中与其最相似的句子作为主题句; 否则, 综合利用多种特征计算新闻报道中句子的重要性, 将得分最高的句子作为主题句。算法过程:1.      构造新闻的主题词集(1)      对于爬取的有标签的或关

2016-06-24 17:53:46 9669 5

原创 使用CRF++进行分词的原理和实现过程

使用CRF分词的原理和实现过程目前业内分词效果最好的是CRF模型,而CRF++是CRF实现的比较成熟的工具,下面是用CRF++做分词的过程。1.使用4-tags标记,对训练语料做预处理分别用B代表词首,E代表词尾,M代表词中,S代表单字词。然后使用python将训练语料中的词处理成CRF输入的格式。如句子:海內外  關注  的  一九九七  年  七月  一  日  終於  來到

2016-06-22 20:58:54 8056

原创 spark性能调优

spark性能调优有很多措施,下面说说我用到的一些调优手段。1.RDD分片数和executor个数的协调要想充分的使数据并行执行,并且能充分的利用每一个executor,则在rdd的个数与executor的个数之间要有一个合适的值。若rdd的个数较多而executor的个数较少,则会导致部分rdd需要等待空闲的executor,这样不能使所有数据同时并行执行。若rdd较少,而executo

2016-06-21 18:33:33 6328 1

原创 斯坦福和NLTK英语短语词组抽取工具原理及源码理解

一、斯坦福短语抽取工具实现了四个方法来进行短语搭配抽取(1)基于统计频率数的方法该方法用于查找长度为2或者3并且连续的短语搭配。因此只处理bigrams和trigrams语料库。对于候选短语集,首先使用预定义的词性序列做一个初步的过滤,将不符合该词性序列的短语组合过滤掉。预定义的词性组合为:NN_NNJJ_NNVB_NNNN_NN_NNJJ_NN_NNNN_

2016-06-12 12:07:55 11996 2

原创 328. Odd Even Linked List

Given a singly linked list, group all odd nodes together followed by the even nodes. Please note here we are talking about the node number and not the value in the nodes.You should try to do it in

2016-06-05 18:37:28 417

原创 326. Power of Three

Given an integer, write a function to determine if it is a power of three.public class Solution { public boolean isPowerOfThree(int n) { double res = Math.log(n)/Math.log(3); ret

2016-06-05 18:36:50 261

原创 292. Nim Game

You are playing the following Nim Game with your friend: There is a heap of stones on the table, each time one of you take turns to remove 1 to 3 stones. The one who removes the last stone will be the

2016-06-05 18:36:06 271

原创 258. Add Digits

Given a non-negative integer num, repeatedly add all its digits until the result has only one digit.For example:Given num = 38, the process is like: 3 + 8 = 11, 1 + 1 = 2. Since 2 has on

2016-06-05 18:35:18 280

原创 242. Valid Anagram

Given two strings s and t, write a function to determine if t is an anagram of s.For example,s = "anagram", t = "nagaram", return true.s = "rat", t = "car", return false.public class Solutio

2016-06-05 18:34:18 256

原创 237. Delete Node in a Linked List

Write a function to delete a node (except the tail) in a singly linked list, given only access to that node.Supposed the linked list is 1 -> 2 -> 3 -> 4 and you are given the third node with value

2016-06-05 18:33:33 256

原创 231. Power of Two

Given an integer, write a function to determine if it is a power of two.public class Solution { public boolean isPowerOfTwo(int n) { return n > 0 && (n & (n - 1)) == 0; }}

2016-06-05 18:32:23 268

原创 226. Invert Binary Tree

Invert a binary tree. 4 / \ 2 7 / \ / \1 3 6 9to 4 / \ 7 2 / \ / \9 6 3 1/** * Definition for a binary tree node. * public class TreeNode { *

2016-06-05 18:31:17 239

原创 217. Contains Duplicate

Given an array of integers, find if the array contains any duplicates. Your function should return true if any value appears at least twice in the array, and it should return false if every element

2016-06-05 18:30:13 259

原创 203. Remove Linked List Elements

Remove all elements from a linked list of integers that have value val.ExampleGiven: 1 --> 2 --> 6 --> 3 --> 4 --> 5 --> 6, val = 6Return: 1 --> 2 --> 3 --> 4 --> 5/** * Definition for sing

2016-06-05 18:29:08 268

原创 202. Happy Number

Write an algorithm to determine if a number is "happy".A happy number is a number defined by the following process: Starting with any positive integer, replace the number by the sum of the squares

2016-06-05 18:28:20 275

原创 110. Balanced Binary Tree

Given a binary tree, determine if it is height-balanced.For this problem, a height-balanced binary tree is defined as a binary tree in which the depth of the two subtrees of every node never diffe

2016-06-05 18:27:07 403

原创 104. Maximum Depth of Binary Tree

Given a binary tree, find its maximum depth.The maximum depth is the number of nodes along the longest path from the root node down to the farthest leaf node./** * Definition for a binary tree

2016-06-05 18:26:15 294

原创 基于spark实现的CRF模型的使用与源码分析

CRF基于spark实现的过程与源码分析Crf-spark实现时基于spark的LBFGS算法实现,由于在spark的mllib库中实现了LBFGS算法,因此在使用crf训练时调用该算法在spark平台上将会使迭代更加快速。缩短训练时间。源码地址:https://github.com/lihait/CRF-Spark源码是scala语言写的,将源码下载后使用sbt工具打包成

2016-06-03 21:21:57 3712 2

原创 70. Climbing Stairs

You are climbing a stair case. It takes n steps to reach to the top.Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?public class Solution { pu

2016-06-01 18:42:40 319

原创 67. Add Binary

Given two binary strings, return their sum (also a binary string).For example,a = "11"b = "1"Return "100".public class Solution { public String addBinary(String a, String b) {

2016-06-01 18:41:49 306

原创 66. Plus One

Given a non-negative number represented as an array of digits, plus one to the number.The digits are stored such that the most significant digit is at the head of the list.import java.math.Big

2016-06-01 18:41:01 279

原创 38. Count and Say

The count-and-say sequence is the sequence of integers beginning as follows:1, 11, 21, 1211, 111221, ...1 is read off as "one 1" or 11.11 is read off as "two 1s" or 21.21 is read off as 

2016-06-01 18:39:34 237

原创 28. Implement strStr()

public class Solution { public int strStr(String haystack, String needle) { /** int l1 = haystack.length(); int l2 = needle.length(); char[] haystack1 = haystack.toCharArray();

2016-06-01 18:38:26 226

原创 27. Remove Element

Given an array and a value, remove all instances of that value in place and return the new length.Do not allocate extra space for another array, you must do this in place with constant memory.

2016-06-01 18:37:21 246

原创 26. Remove Duplicates from Sorted Array

Given a sorted array, remove the duplicates in place such that each element appear only once and return the new length.Do not allocate extra space for another array, you must do this in place with

2016-06-01 18:36:23 338

原创 21. Merge Two Sorted Lists

Merge two sorted linked lists and return it as a new list. The new list should be made by splicing together the nodes of the first two lists./** * Definition for singly-linked list. * public cla

2016-06-01 18:34:30 214

原创 19. Remove Nth Node From End of List

/** * Definition for singly-linked list. * struct ListNode { * int val; * ListNode *next; * ListNode(int x) : val(x), next(NULL) {} * }; */class Solution {public: ListNode* re

2016-06-01 18:33:15 296

原创 13. Roman to Integer

Given a roman numeral, convert it to an integer.Input is guaranteed to be within the range from 1 to 3999.public class Solution { public int romanToInt(String s) { int res = 0; /

2016-06-01 18:32:05 343

原创 Stanford Parser中文句法分析器的使用

Contents一、使用时注意两点二、stanford parser 命令行使用1  处理一个中文的句子 2  词性标注 和 生成 依存关系 3  图形工具界面 三、句法分析树标注集 一、使用时注意两点:1。 中文内存大小设置:在运行--运行配置--自变量--vm参数中-加入:--Xmx1024m2。 Tokenize指的是是否分词。一定选Tokeniz

2016-06-01 16:29:42 21565 5

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除