2015年11月_xiewenbo

12月 11月 10月 09月 08月 06月 04月 03月

转载 [spark]groupbykey reducebykey

为什么建议尽量在Spark中少用GroupByKey，让我们看一下使用两种不同的方式去计算单词的个数，第一种方式使用reduceByKey ；另外一种方式使用groupByKey，代码如下： 01 # User: 过往记忆 02 # Date: 2015-05-18 03 #

2015-11-25 20:13:42 860

转载 [spark]Spark性能调优实战

Spark特别适用于多次操作特定的数据，分mem-only和mem & disk。其中mem-only:效率高，但占用大量的内存，成本很高;mem & disk:内存用完后，会自动向磁盘迁移，解决了内存不足的问题，却带来了数据的置换的消费。Spark常见的调优工具有nman、Jmeter和Jprofile,以下是Spark调优的一个实例分析： 1、场景：精确客户群对一个容量为300g的客户信

2015-11-25 19:50:55 768

转载 [spark]Spark算子：RDD基本转换操作(5)–mapPartitions、mapPartitionsWithIndex

mapPartitions def mapPartitions[U](f: (Iterator[T]) => Iterator[U], preservesPartitioning: Boolean = false)(implicit arg0: ClassTag[U]): RDD[U] 该函数和map函数类似，只不过映射函数的参数由RDD中的每一个元素变成了RDD中每一个分区的迭代器。

2015-11-25 19:32:56 1536

转载 [spark]map 与 flatMap 的区别

通过一个实验来看Spark 中 map 与 flatMap 的区别。步骤一：将测试数据放到hdfs上面 hadoopdfs -put data1/test1.txt /tmp/test1.txt 该测试数据有两行文本：步骤二：在Spark中创建一个RDD来读取hdfs文件/tmp/test1.txt 步骤三：查看

2015-11-25 19:28:37 460

转载 [spark]计算视频播放数每个区间占用比例

Printing elements of an RDD Another common idiom is attempting to print out the elements of an RDD using rdd.foreach(println) or rdd.map(println). On a single machine, this will generate the expe

2015-11-25 19:23:23 1147

转载 httpclient提交json参数

httpclient使用post提交json参数，（跟使用表单提交区分）： [java] view plaincopy private void httpReqUrl(List list, String url) throws ClientProtocolException, IOException {

2015-11-23 15:28:31 908

httpclient tutorial httpclient 指南

httpclient 指南包括了详细的调用和常用代码 The Hyper-Text Transfer Protocol (HTTP) is perhaps the most significant protocol used on the Internet today. Web services, network-enabled appliances and the growth of network computing continue to expand the role of the HTTP protocol beyond user-driven web browsers, while increasing the number of applications that require HTTP support. Although the java.net package provides basic functionality for accessing resources via HTTP, it doesn't provide the full flexibility or functionality needed by many applications. HttpClient seeks to fill this void by providing an efficient, up-to-date, and feature-rich package implementing the client side of the most recent HTTP standards and recommendations. Designed for extension while providing robust support for the base HTTP protocol, HttpClient may be of interest to anyone building HTTP-aware client applications such as web browsers, web service clients, or systems that leverage or extend the HTTP protocol for distributed communication.

2018-03-08

mask rcnn paper

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

2018-03-07

Applying Deep Learning To Answer Selection

Applying Deep Learning To Answer Selection- A Study And An Open Task

2018-03-07

Learning Phrase Representations using RNN Encoder–Decoder

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

2018-03-07

BPTT BackPropagation Through Time.pdf

BPTT paper This report provides detailed description and necessary derivations for the BackPropagation Through Time (BPTT) algorithm. BPTT is often used to learn recurrent neural networks (RNN). Contrary to feed-forward neural networks, the RNN is characterized by the ability of encoding longer past information, thus very suitable for sequential models. The BPTT extends the ordinary BP algorithm to suit the recurrent neural architecture.

2018-03-07

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人