原创 文本挖掘预处理的流程总结


2014-06-07 01:11:40 27121 1

原创 LeetCode总结

最近完成了www.leetcode.com的online judge中151道算法题目。除各个题目有特殊巧妙的解法以外,大部分题目都是经典的算法或者数据结构,因此做了如下小结

2014-01-04 20:19:03 184447 31

原创 算法复习笔记 | 排序算法比较


2013-09-14 00:32:16 4061 1

原创 LeetCode | Reverse Words in a String

题目:Given an input string, reverse the string word by word.For example,Given s = "the sky is blue",return "blue is sky the".思路:方法1:首先把句子看做由词组成的,例如“A B C”,因此可以将句子的所有字符前后交换,得到“C

2014-08-25 22:22:31 23844 13

原创 NLP | 自然语言处理 - 考虑词汇的语法解析(Lexicalized PCFG)

NLP | 自然语言处理 - 语法解析(Parsing, and Context-Free Grammars) 这一章我们讲到了上下文无关语法(PCFG - Probabilistic Context-Free Grammar)的解析方法。本章将在此基础上扩展,讨论更深入的PCFG算法。

2014-08-14 16:32:39 11680 2

原创 NLP | 自然语言处理 - 语法解析(Parsing, and Context-Free Grammars)


2014-07-12 00:38:15 74891 3

原创 NLP | 自然语言处理 - 标注问题与隐马尔科夫模型(Tagging Problems, and Hidden Markov Models)

在自然语言处理中有一个常见的任务,即标注。常见的有:1)词性标注(Part-Of-Speech Tagging),将句子中的每个词标注词性,例如名词、动词等;2)实体标注(Name Entity Tagging),将句子中的特殊词标注,例如地址、日期、人物姓名等。粗略看来,这并不是一个简单问题。首先每个词都可能有多个含义,不同情况表达不同含义;其次,一个词的含义或者词性也受到前后多个词的影响。然后隐马尔科夫模型却从数学上给出了一个近乎完美的解决方案。

2014-07-02 01:40:44 10609

原创 NLP | 自然语言处理 - 语言模型(Language Modeling)

语音识别”这样的场景,机器通过一定的算法将语音转换为文字,显然这个过程是及其容易出错的。例如,用户发音“Recognition Speech”,机器可能会正确地识别文字为“Recognition speech”,但是也可以不小心错误地识别为“Wrench a nice beach"。简单地从词法上进行分析,我们无法得到正确的识别,但是计算机也不懂语法,那么我们应该如果处理这个问题呢?一个简单易行的方法就是用统计学方法(马尔可夫链)从概率上来判断各个识别的正确可能性。

2014-06-10 23:48:33 35671 4

原创 .Net Framework中的Python - IronPython


2014-06-08 00:07:00 11268 2

原创 小数学解决大问题 - 分类器组合方法(由民主投票想到)


2014-02-10 14:08:18 5121

原创 小数学解决大问题 - 信息熵(由对数函数想到)

在这个物欲横流的社会中,任何物品都是明码实价的,甚至许多虚拟的物品(爱情、亲情)都可能用金钱来衡量。对于计算器科学而言,我们也希望能对信息做一个量化的衡量。比如,这篇博客包含多少信息量。可能有的人会说这个问题很简单啊,我们可以通过字数来衡量,但是仔细想想,这是占不住脚的。例如,提到“苹果”,很多人都能够联想到这个物品的形状、颜色、味道等,信息量非常大。而提到“鼋鼍”,或许我们完全不知道这是什么,但是至少我们学习到了这样一个新的词汇。因此,我们要搞清楚一件非常非常 不确定的事,或是我们一无所知的事情,就需要了

2014-01-25 00:55:03 16793 2

原创 小数学解决大问题 - 布隆过滤器 Bloom Filter(由数字进制想到)

布隆过滤器 Bloom Filter在很多博客中的有提到,但是我希望在本篇博客中按照我的理解将Bloom Filter尽量简单的呈现出来。其实从人类起源开始,人类就在尝试利用语言来描述世界,而语言恰恰是人与人之间交流的重要工具,例如,A告诉B“苹果”,B立刻能够想象出苹果的形状、颜色、作用、苹果公司、乔布斯等信息。显然易见,通过传递简单的两个字,人与人之间传递了大量事先已知的信息。随着互联网的发展,计算机与计算机之间也需要通信,其实这个道理与人之间的通信有几分类似,只不过计算机与计算机之间是用数学的语言在交

2014-01-23 22:12:45 5799

原创 小数学解决大问题 - 异构词问题 Anagrams(由素数的性质想到)


2014-01-22 12:06:24 3700 2

原创 小数学解决大问题 - 切饼问题(由数组想到)


2014-01-22 10:51:55 12372

原创 LeetCode | Surrounded Regions

题目:Given a 2D board containing 'X' and 'O', capture all regions surrounded by 'X'.A region is captured by flipping all 'O's into 'X's in that surrounded region .For example,X X X

2014-01-04 13:21:00 6877 6

原创 LeetCode | Clone Graph

题目:Clone an undirected graph. Each node in the graph contains a label and a list of its neighbors.OJ's undirected graph serialization:Nodes are labeled uniquely.We use # as a separat

2014-01-03 22:42:29 11229

原创 LeetCode | Copy List with Random Pointer

题目:A linked list is given such that each node contains an additional random pointer which could point to any node in the list or null.Return a deep copy of the list.思路:普通的链表复制就

2014-01-03 22:06:42 5114

原创 LeetCode | Word Ladder II

题目:Given two words (start and end), and a dictionary, find all shortest transformation sequence(s) from start to end, such that:Only one letter can be changed at a timeEach intermediate word

2014-01-03 13:59:26 4437

原创 LeetCode | Candy


2014-01-02 10:49:16 7893

原创 LeetCode | Trapping Rain Water

题目:Given n non-negative integers representing an elevation map where the width of each bar is 1, compute how much water it is able to trap after raining.For example, Given [0,1,0,2,1,0,1,3

2013-12-31 16:35:53 3568

原创 LeetCode | Substring with Concatenation of All Words

题目:You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and w

2013-12-31 16:17:07 3722

原创 LeetCode| Scramble String

题目:Given a string s1, we may represent it as a binary tree by partitioning it to two non-empty substrings recursively.Below is one possible representation of s1 = "great": great /

2013-12-31 15:24:51 3929

原创 LeetCode | Longest Valid Parentheses

题目:Given a string containing just the characters '(' and ')', find the length of the longest valid (well-formed) parentheses substring.For "(()", the longest valid parentheses substring is

2013-12-31 13:56:13 2091

原创 LeetCode | Text Justification

题目:Given an array of words and a length L, format the text such that each line has exactly L characters and is fully (left and right) justified.You should pack your words in a greedy app

2013-12-27 01:00:14 2079 1

原创 LeetCode | Count and Say

题目:The count-and-say sequence is the sequence of integers beginning as follows:1, 11, 21, 1211, 111221, ...1 is read off as "one 1" or 11.11 is read off as "two 1s" or 21.21 is rea

2013-12-26 23:02:28 2264 3

原创 LeetCode | Combination Sum II

题目:Given a collection of candidate numbers (C) and a target number (T), find all unique combinations in C where the candidate numbers sums to T.Each number in C may only be used once in th

2013-12-26 22:45:34 2218

原创 LeetCode | Combination Sum

题目:Given a set of candidate numbers (C) and a target number (T), find all unique combinations in C where the candidate numbers sums to T.The same repeated number may be chosen from C unlim

2013-12-26 22:34:28 2216

原创 leetCode | Next Permutation

题目:Implement next permutation, which rearranges numbers into the lexicographically next greater permutation of numbers.If such arrangement is not possible, it must rearrange it as the lowest

2013-12-26 22:14:05 4738

原创 LeetCode | Divide Two Integers

题目:Divide two integers without using multiplication, division and mod operator.思路:1)考虑边界问题。2)考虑INT_MIN与INT_MAX绝对值之间差1。3)考虑符号。类似http://blog.csdn.net/lanxu_yy/article/details/11686

2013-12-26 21:32:32 2047

原创 LeetCode | String to Integer (atoi)

题目:Implement atoi to convert a string to an integer.Hint: Carefully consider all possible input cases. If you want a challenge, please do not see below and ask yourself what are the possible

2013-12-26 20:50:39 1927

原创 LeetCode | Word Ladder

题目:Given two words (start and end), and a dictionary, find the length of shortest transformation sequence fromstart to end, such that:Only one letter can be changed at a timeEach intermediat

2013-12-26 17:52:39 1949 1

原创 LeetCode | Implement strStr()

题目:Implement strStr().Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.思路:BF或者KMP算法。BF无法通过时间复杂度要求。代码:class S

2013-12-26 14:24:03 2665

原创 LeetCode | Minimum Window Substring

题目:Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).For example,S = "ADOBECODEBANC"T = "ABC"Minimum windo

2013-12-26 12:43:54 1678 3

原创 LeetCode | Multiply Strings

题目:Given two numbers represented as strings, return multiplication of the numbers as a string.Note: The numbers can be arbitrarily large and are non-negative.思路:模拟乘法的过程。首先第二个数的每一

2013-12-25 17:07:03 1392

原创 LeetCode | Merge Intervals

题目:Given a collection of intervals, merge all overlapping intervals.For example,Given [1,3],[2,6],[8,10],[15,18],return [1,6],[8,10],[15,18].思路:首先按照start来排序,然后依次合并相邻的重叠项。

2013-12-25 16:43:17 1447

原创 LeetCode | Insert Interval

题目:Given a set of non-overlapping intervals, insert a new interval into the intervals (merge if necessary).You may assume that the intervals were initially sorted according to their start ti

2013-12-25 16:15:46 1490

原创 LeetCode | Distinct Subsequences

题目:Given a string S and a string T, count the number of distinct subsequences of T in S.A subsequence of a string is a new string which is formed from the original string by deleting some (c

2013-12-25 13:00:51 2747

原创 LeetCode | Maximal Rectangle

题目:Given a 2D binary matrix filled with 0's and 1's, find the largest rectangle containing all ones and return its area.思路:基本思路是循环确定一个左上角的点与右下角的点,然后判断该区域是否都为’1‘。如下图:为了循环利用,我们可以采用动态

2013-12-24 17:32:55 5727

原创 LeetCode | Spiral Matrix II

题目:Given an integer n, generate a square matrix filled with elements from 1 to n2 in spiral order.For example,Given n = 3,You should return the following matrix:[ [ 1, 2, 3 ], [ 8, 9

2013-12-24 14:26:04 1418

原创 LeetCode | Spiral Matrix

题目:Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral order.For example,Given the following matrix:[ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9

2013-12-24 14:15:25 1792

