开源框架 Apache GORA 提供了一个内存中的大数据的数据模型和持久性。
Gora 支持列存储,关键值存储,文档存储和关系数据库管理系统,具有广泛的Apache Hadoop的MapReduce的支持和分析数据。
GORA使用步骤:
1、配置gora.properties文件
- gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
- gora.datastore.autocreateschema=true
2、定义数据源BEAN,以JSON格式定义数据源BEAN,
创建一个json文件,内容如下
- {
- "type": "record",
- "name": "Pageview",
- "namespace": "org.apache.gora.tutorial.log.generated",
- "fields" : [
- {"name": "url", "type": "string"},
- {"name": "timestamp", "type": "long"},
- {"name": "ip", "type": "string"},
- {"name": "httpMethod", "type": "string"},
- {"name": "httpStatusCode", "type": "int"},
- {"name": "responseSize", "type": "int"},
- {"name": "referrer", "type": "string"},
- {"name": "userAgent", "type": "string"}
- ]
- }
3、apache gora使用了arvo框架作为orm映射的实体,这里可以使用gora自带的工具来对json文件进行编译,获取你要的实体对象
- $ bin/gora goracompile
编译工具说明如下:
- $ Usage: GoraCompiler <schema file> <output dir> [-license <id>]
- <schema file> - individual avsc file to be compiled or a directory path containing avsc files
- <output dir> - output directory for generated Java files
- [-license <id>] - the preferred license header to add to the
- generated Java file. Current options include;
- ASLv2 (Apache Software License v2.0)
- AGPLv3 (GNU Affero General Public License)
- CDDLv1 (Common Development and Distribution License v1.0)
- FDLv13 (GNU Free Documentation License v1.3)
- GPLv1 (GNU General Public License v1.0)
- GPLv2 (GNU General Public License v2.0)
- GPLv3 (GNU General Public License v3.0)
- LGPLv21 (GNU Lesser General Public License v2.1)
- LGPLv3 (GNU Lesser General Public License v2.1)
示例:
- $ bin/gora goracompiler gora-tutorial/src/main/avro/pageview.json gora-tutorial/src/main/java/
4、定义数据存储映射:gora-hbase-mapping.xml
完成以上三部工作之后,接下来需要做的是实体和表的映射配置
示例如下:
- <!-- This is gora-sql-mapping.xml
- <gora-orm>
- <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog">
- <primarykey column="line"/>
- <field name="url" column="url" length="512" primarykey="true"/>
- <field name="timestamp" column="timestamp"/>
- <field name="ip" column="ip" length="16"/>
- <field name="httpMethod" column="httpMethod" length="6"/>
- <;field name="httpStatusCode" column="httpStatusCode"/>
- <field name="responseSize" column="responseSize"/>
- <field name="referrer" column="referrer" length="512"/>
- <field name="userAgent" column="userAgent" length="512"/>
- </class>
- ...
- </gora-orm>
- -->
- <gora-orm>
- <table name="Pageview"> <!-- optional descriptors for tables -->
- <family name="common"> <!-- This can also have params like compression, bloom filters -->
- <family name="http"/>
- <family name="misc"/>
- </table>
- <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog">
- <field name="url" family="common" qualifier="url"/>
- <field name="timestamp" family="common" qualifier="timestamp"/>
- <field name="ip" family="common" qualifier="ip" />
- <field name="httpMethod" family="http" qualifier="httpMethod"/>
- <field name="httpStatusCode" family="http" qualifier="httpStatusCode"/>
- <field name="responseSize" family="http" qualifier="responseSize"/>
- <field name="referrer" family="misc" qualifier="referrer"/>
- <field name="userAgent" family="misc" qualifier="userAgent"/>
- </class>
- ...
- </gora-orm>
5、Api
1)、初始化创建HBaseStore对象
- private void init() throws IOException {
- dataStore = DataStoreFactory.getDataStore(Long.class, Pageview.class);
- }
这里GORA会根据你上面编译的实体类以及gora-hbase-mapping.xml帮你创建好相应的hbase数据库表
2)、数据存储
- /** Stores the pageview object with the given key */
- private void storePageview(long key, Pageview pageview) throws IOException {
- dataStore.put(key, pageview);
- }
3)、读取数据
- /** Fetches a single pageview object and prints it*/
- private void get(long key) throws IOException {
- Pageview pageview = dataStore.get(key);
- printPageview(pageview);
- }
4)、查询
- /** Queries and prints pageview object that have keys between startKey and endKey*/
- private void query(long startKey, long endKey) throws IOException {
- Query<Long, Pageview> query = dataStore.newQuery();
- //set the properties of query
- query.setStartKey(startKey);
- query.setEndKey(endKey);
- Result<Long, Pageview> result = query.execute();
- printResult(result);
- }
遍历结果
- private void printResult(Result<Long, Pageview> result) throws IOException {
- while(result.next()) { //advances the Result object and breaks if at end
- long resultKey = result.getKey(); //obtain current key
- Pageview resultPageview = result.get(); //obtain current value object
- //print the results
- System.out.println(resultKey + ":");
- printPageview(resultPageview);
- }
- System.out.println("Number of pageviews from the query:" + result.getOffset());
- }
5)、删除数据
- /**Deletes the pageview with the given line number */
- private void delete(long lineNum) throws Exception {
- dataStore.delete(lineNum);
- dataStore.flush(); //write changes may need to be flushed before they are committed
- }
- /** This method illustrates delete by query call */
- private void deleteByQuery(long startKey, long endKey) throws IOException {
- //Constructs a query from the dataStore. The matching rows to this query will be deleted
- Query<Long, Pageview> query = dataStore.newQuery();
- //set the properties of query
- query.setStartKey(startKey);
- query.setEndKey(endKey);
- dataStore.deleteByQuery(query);
- }
6)、MapReduce支持
JOB:
- public Job createJob(DataStore<Long, Pageview> inStore
- , DataStore<String, MetricDatum> outStore, int numReducer) throws IOException {
- Job job = new Job(getConf());
- job.setJobName("Log Analytics");
- job.setNumReduceTasks(numReducer);
- job.setJarByClass(getClass());
- /* Mappers are initialized with GoraMapper.initMapper() or
- * GoraInputFormat.setInput()*/
- GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
- , LogAnalyticsMapper.class, true);
- /* Reducers are initialized with GoraReducer#initReducer().
- * If the output is not to be persisted via Gora, any reducer
- * can be used instead. */
- GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);
- return job;
- }
Mapper:
- private TextLong tuple;
- protected void map(Long key, Pageview pageview, Context context)
- throws IOException ,InterruptedException {
- Utf8 url = pageview.getUrl();
- long day = getDay(pageview.getTimestamp());
- tuple.getKey().set(url.toString());
- tuple.getValue().set(day);
- context.write(tuple, one);
- };
Reducer:
- protected void reduce(TextLong tuple
- , Iterable<LongWritable> values, Context context)
- throws IOException ,InterruptedException {
- long sum = 0L; //sum up the values
- for(LongWritable value: values) {
- sum+= value.get();
- }
- String dimension = tuple.getKey().toString();
- long timestamp = tuple.getValue().get();
- metricDatum.setMetricDimension(new Utf8(dimension));
- metricDatum.setTimestamp(timestamp);
- String key = metricDatum.getMetricDimension().toString();
- metricDatum.setMetric(sum);
- context.write(key, metricDatum);
- };
GORA除了支持HBASE外,还支持sql(mysql、hsql),dynamodb,cassandra,accumulo。需要的话大伙可以试试其他功能。具体使用与上面的使用方法类似!
参考 :http://blog.csdn.net/weijonathan/article/details/16863159