mongodb 转换为年月
A year or so back I was asked to have a play with MongoDB; within half an hour I had
大约一年前,我被要求在MongoDB上玩耍。 半个小时之内
downloaded, installed and started the daemon, and had a console window open. 下载 ,安装并启动守护程序,并打开控制台窗口。After an hour or two of playing at the command line I created a database or two, a couple of collections and a number of handcrafted JSON documents. At which point I went in search of a GUI and found RockMongo.
在命令行上玩了一两个小时之后,我创建了一个或两个数据库,几个集合和一些手工制作的JSON文档。 那时我去寻找GUI,找到了RockMongo 。
Another half hour of playing and I had a Web based GUI, that's great for ad-hoc queries and admin tasks, but your still to left to manually handcraft and enter your own JSON documents, via a textarea box. At which point I realised that if I was to evaluate the map-reduce functionality, attempt to join data from two collections, let alone identify and evaluate any Business Reporting tools, that I would need to start cutting some code and convert an existing data source.
再玩了半个小时,我有了一个基于Web的GUI,非常适合临时查询和管理任务,但是您仍然可以通过文本框手动手工制作并输入自己的JSON文档。 在这一点上,我意识到,如果我要评估map-reduce功能,尝试将两个集合中的数据联接起来,更不用说识别和评估任何业务报告工具,那么我就需要开始切割一些代码并转换现有的数据源。 。
Data wasn't really an issue as the Companies business was data, so a simple choice of either hooking up to a DB or using one or more of the many Excel reports kicking around and picking a language. After a quick surf of the MongoDB site and read of the Perl tutorial I chose the latter and Perl.
数据并不是真正的问题,因为公司业务就是数据,因此可以简单地选择连接到数据库,还是使用众多Excel报告中的一个或多个并选择一种语言。 快速浏览MongoDB站点并阅读Perl 教程后,我选择了后者和Perl。
After installing the necessary Perl libraries, enter:
安装必要的Perl库后,输入:
cpan YAML Data::Dumper Spreadsheet::ParseExcel Tie::IxHash Encode Scalar::Util JSON MongoDB MongoDB::OID File::Basename
A quick play with both the MongoDB and Spreadsheet::ParseExcel examples and a little bit of thought, I had hacked together a very basic (and slightly naughty - blindly inserts without checking the status) command line tool that will happily convert an Excel Workbook (XLS not XLSX) into:
我快速浏览了MongoDB和Spreadsheet :: ParseExcel示例,并加了一点思考,我一起破解了一个非常基本的(而且很顽皮-盲目插入而不检查状态)命令行工具,该工具可以愉快地转换Excel工作簿( XLS而不是XLSX)转换为:
A Database - named after the file
数据库-以文件命名
A Series of collections - One per Worksheet present in the Workbook, and named accordingly
一系列集合-工作簿中存在每个工作表一个集合,并相应命名
A Series of Documents in each Collection, where each document represents one row from a Worksheet, with it's Key names taken from the Cell (Column) names in Row 1 of the sheet
每个集合中的一系列文档,其中每个文档代表工作表中的一行,其键名取自工作表第1行中的单元格(列)名称
Anyway enough of a rant, some code.
无论如何,有些代码。
#!/usr/bin/perl -w
# Purpose: Insert each Worksheet, in an Excel Workbook, into an existing MongoDB, of the same name as the Excel(.xls).
# The worksheet names are mapped to the collection names, and the column named to the document hash labels.
# Assumes each sheet is named and that the first ROW on each sheet contains the hash(field) names.
#
use strict;
use Spreadsheet::ParseExcel;
use MongoDB;
use MongoDB::OID;
use Tie::IxHash;
die "You must provide a filename to $0 to be parsed as an Excel file" unless @ARGV;
my $sDbName = $ARGV[0];
$sDbName =~ s/\.xls//i;
my $oExcel = new Spreadsheet::ParseExcel;
my $oBook = $oExcel->Parse($ARGV[0]);
my $oConn = MongoDB::Connection->new(host => 'some.server:27017');
my $oDB = $oConn->$sDbName;
my ($sColName, %hNewDoc, $hColToInsertInto, $sFieldName, $iR, $iC, $oWkS, $oWkC);
print "FILE :", $oBook->{File} , "\n";
print "DB: $sDbName\n";
print "Collection Count :", $oBook->{SheetCount} , "\n";
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++)
{
$oWkS = $oBook->{Worksheet}[$iSheet];
$sColName = $oWkS->{Name};
$hColToInsertInto = $oDB->$sColName;
print "Collection(WorkSheet name):", $sColName, "\n";
for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++)
{
tie ( %hNewDoc, "Tie::IxHash");
for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++)
{
$sFieldName = $oWkS->{Cells}[$oWkS->{MinRow}][$iC]->Value;
$oWkC = $oWkS->{Cells}[$iR][$iC];
$hNewDoc{$sFieldName} = $oWkC->Value if($oWkC && $sFieldName);
}
$hColToInsertInto->insert(\%hNewDoc);
}
print "Documents inserted(Rows):", ($oWkS->{MaxRow} - $oWkS->{MinRow}), "\n";
}
Change the connection ($oConn) string to suit, and if needed add a user-id and password to the arguments.
更改连接($ oConn)字符串以适合,并在需要时向参数添加用户ID和密码。
If you need XLSX support a quick switch to
如果需要XLSX支持,请快速切换至
Spreadsheet::XLSX is all that's needed. Alternatively it only takes a few lines of code, to detect the filetype and call the appropriate library. Spreadsheet :: XLSX就是所需要的。 或者,只需要花费几行代码即可检测文件类型并调用适当的库。The above is a simple hack, assumes everything in a cell is a string / scalar, if preserving type is important, a little function with a few regexp can be used in conjunction with a few if statements to ensure numbers / dates remain in the applicable format when written to the DB
上面是一个简单的技巧,假设单元格中的所有内容都是字符串/标量,如果保留类型很重要,则可以将带有一些regexp的小功能与一些if语句结合使用,以确保数字/日期保留在适用范围内写入数据库时的格式
Apparently the command line is scarey, so if asked to share consider wrapping your logic in a CGI script / upload form :)
显然,命令行很吓人,因此如果要求共享,请考虑将逻辑包装在CGI脚本/上传表格中:)
The script should return some output along the following lines:
该脚本应按照以下几行返回一些输出:
arober11@wibble:~/src/perl> ./mongoTST.pl testData.xls
FILE :testData.xls
DB: testData
Collection Count :1
Collection(WorkSheet name):Sheet1
Documents inserted(Rows):244
arober11@wibble:~/src/perl>
mongodb 转换为年月