hfs-delimited and lfs-delimited

Hey guys,

I've pushed a snapshot update to Cascalog that includes two new taps -- hfs-delimited and lfs-delimited. These support the same keyword options as the other hfs-* and lfs-* taps, with a few extras I'll detail below.

If any of you find these useful, I'd really appreciate it if you would give them a try and let me know how the API works out for you. This feature is available in either of the following builds:

[cascalog "1.8.7-SNAPSHOT"]
[cascalog "1.9.0-wip8"]

As an example, say you had a textfile with data like this:

exchange,stock_symbol,date,open,high,low,close,volume,adj
NYSE,AA,2008-03-05,37.01,37.9,36.13,36.6,17752400,36.6
NYSE,AA,2008-03-04,38.85,39.28,38.26,38.37,11279900,38.37


The default separator is a tab character, so the standard hfs-delimited tap with no options would produce 1-tuples with a single line of text:

(hfs-delimited "/path/to/file")
;; makes textlines

The ":delimiter" option allows you to change this:

(hfs-delimited "/pathto/data"
:delimiter ",")

;; produces 9-tuples, all strings

Now we have the problem of the header line getting in the way. :skip-header? to the rescue:

(hfs-delimited "/pathto/data"
:delimiter ","
:skip-header? true)

;; produces 9-tuples of strings

Next, if you include a vector of classes with the :classes keyword, the tap will do class conversions on the fields for you:

(hfs-delimited "/pathto/data"
:delimiter ","
:classes [String String String Float Float Float Float Integer Float]
:skip-header? true)

;; produces 9-tuples with the above classes -- numbers are parsed properly, strings stay strings.

Finally, by providing :outfields you gain the ability to select out specific fields by name:

(def stock-tap
(hfs-delimited "/pathto/data"
:delimiter ","
:outfields ["?exchange" "?stock-sym" "?date" "?open" "?high" "?low" "?close" "?volume" "?adj"]
:classes [String String String Float Float Float Float Integer Float]
:skip-header? true))


(select-fields stock-tap ["?stock-sym" "?open"])
;; returns 2-tuples of [String, Float] pairs representing the stock symbol and opening price for each line.

Looking forward to hearing your feedback! The API here will probably change a bit before release, so get your notes in now.

Cheers,


http://grokbase.com/t/gg/cascalog-user/123ky5apsx/new-taps-hfs-delimited-and-lfs-delimited

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值