现有文本huge.txt,前 5 行内容如下:
f1 | f2 |
yewhhgfifsbplrxankqazzewzkhfxjetiprfvyinchmdventatkry | lwxazkmczmpcluechdtfgwapgvyzfxqczcuvadkfqrcciptmpo |
viqxbdjjzkdcytdnjiuexottvgdjkafhykbotjsupyuybvgycqhfsdlypuftbezga | mmoermrlbovwmfnxgctizucfccatwlvugnqvikhbgaqvamwbzqluwavgcjtonutairrafrpywtwtpocgltmfrxz |
plhdyslghehlptlsczizhjbtcqwasvspjqyeifsnqagqovvdukxftsp | tlisnnguudbqgrupqpoqjfshldpuwjdkfeizhkfwsvmdspswusmclhqzzxaumvwrerbsl |
bltnilcncwgnsyxeosdtytvpdbxuiwukdqpgvvbihoqvvmhogmffzpivuysbhgitfqxptyuofsukmz | ajojwbcfptahjetpnmkbsfrblubvvjxyestplybzpxxwsrppgteoreckkscrsu |
… | … |
该文本有200GB,现需要从中随机抽样 1 万行数据
借助集算器可以很方便地完成这件事。
1. 在集算器中编写脚本sample.dfx:
A | |
---|---|
1 | =file("huge.txt") |
2 | =A1.cur |