Avro技术应用_6. Avro Format & Text Format 之间的转换 --待完善

本文将跟大家探讨一下,Avro 数据格式与文本文件格式直接的转换方法。

具体内容将会在后续进行完善,敬请期待:
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
greenplum-db-6.2.1-rhel7-x86_64.rpm Pivotal Greenplum 6.2 Release Notes This document contains pertinent release information about Pivotal Greenplum Database 6.2 releases. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see Pivotal Greenplum Database end of life policy. Pivotal Greenplum 6 software is available for download from the Pivotal Greenplum page on Pivotal Network. Pivotal Greenplum 6 is based on the open source Greenplum Database project code. Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support. Release 6.2.1 Release Date: 2019-12-12 Pivotal Greenplum 6.2.1 is a minor release that includes new features and resolves several issues. New Features Greenplum Database 6.2.1 includes these new features: Greenplum Database supports materialized views. Materialized views are similar to views. A materialized view enables you to save a frequently used or complex query, then access the query results in a SELECT statement as if they were a table. Materialized views persist the query results in a table-like form. Materialized view data cannot be directly updated. To refresh the materialized view data, use the REFRESH MATERIALIZED VIEW command. See Creating and Managing Materialized Views. Note: Known Issues and Limitations describes a limitation of materialized view support in Greenplum 6.2.1. The gpinitsystem utility supports the --ignore-warnings option. The option controls the value returned by gpinitsystem when warnings or an error occurs. If you specify this option, gpinitsystem returns 0 if warnings occurred during system initialization, and returns a non-zero value if a fatal error occurs. If this option is not specified, gpinitsystem returns 1 if initialization completes with warnings, and returns value of 2 or greater if a fatal error occurs. PXF version 5.10.0 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.10.0 below. PXF Version 5.10.0 PXF 5.10.0 includes the following new and changed features: PXF has improved its performance when reading a large number of files from HDFS or an object store. PXF bundles newer tomcat and jackson libraries. The PXF JDBC Connector now supports pushdown of OR and NOT logical filter operators when specified in a JDBC named query or in an external table query filter condition. PXF supports writing Avro-format data to Hadoop and object stores. Refer to Reading and Writing HDFS Avro Data for more information about this feature. PXF is now certified with Hadoop 2.x and 3.1.x and Hive Server 2.x and 3.1, and bundles new and upgraded Hadoop libraries to support these versions. PXF supports Kerberos authentication to Hive Server 2.x and 3.1.x. PXF supports per-server user impersonation configuration. PXF supports concurrent access to multiple Kerberized Hadoop clusters. In previous releases of Greenplum Database, PXF supported accessing a single Hadoop cluster secured with Kerberos, and this Hadoop cluster must have been configured as the default PXF server. PXF introduces a new template file, pxf-site.xml, to specify the Kerberos and impersonation property settings for a Hadoop or JDBC server configuration. Refer to About Kerberos and User Impersonation Configuration (pxf-site.xml) for more information about this file. PXF now supports connecting to Hadoop with a configurable Hadoop user identity. PXF previously supported only proxy access to Hadoop via the gpadmin Greenplum user. PXF version 5.10.0 deprecates the following configuration properties. Note: These property settings continue to work. The PXF_USER_IMPERSONATION, PXF_PRINCIPAL, and PXF_KEYTAB settings in the pxf-env.sh file. You can use the pxf-site.xml file to configure Kerberos and impersonation settings for your new Hadoop server configurations. The pxf.impersonation.jdbc property setting in the jdbc-site.xml file. You can use the pxf.service.user.impersonation property to configure user impersonation for a new JDBC server configuration. Note: If you have previously configured a PXF JDBC server to access Kerberos-secured Hive, you must upgrade the server definition. See Upgrading PXF in Greenplum 6.x for more information. Changed Features Greenplum Database 6.2.1 includes these changed features: Greenplum Stream Server version 1.3.1 is included in the Greenplum distribution. Resolved Issues Pivotal Greenplum 6.2.1 is a minor release that resolves these issues: 29454 - gpstart During Greenplum Database start up, the gpstart utility did not report when a segment instance failed to start. The utility always displayed 0 skipped segment starts. This issue has been resolved. gpstart output was also enhanced to provide additional warnings and summary information about the number of skipped segments. For example: [WARNING]:-********
Avro是一个轻量级的数据序列化框架,同时也提供了RPC功能。Avro提供了一个基于JSON的schema定义文件来描述数据结构,使得Avro能够支持动态的数据类型。Avro还提供了一个名为avro-rpc的模块,用于实现基于Avro的RPC。 下面我们来对avro-rpc进行性能测试。我们将使用Python 3.7作为客户端和服务端编程语言,并使用Apache Bench来进行压力测试。 首先,我们需要安装avroavro-rpc模块: ``` pip install avro pip install avro-rpc ``` 接下来,我们编写一个简单的RPC服务端程序: ```python import avro.protocol import avro.ipc import socket PROTOCOL = avro.protocol.parse(open("test.avpr").read()) class RpcServer(object): def __init__(self, host, port): self.server = avro.ipc.HTTPServer(self.handle_request) self.server.add_listener((host, port)) def handle_request(self, request, protocol): message_name = request.message_name request_params = request.request_params print("Received request: {} {}".format(message_name, request_params)) if message_name == "ping": return "pong" elif message_name == "echo": return request_params else: raise avro.AvroRemoteException("Unknown message: {}".format(message_name)) def serve_forever(self): self.server.start() self.server.join() if __name__ == "__main__": server = RpcServer("localhost", 8080) server.serve_forever() ``` 这个RPC服务端程序会监听localhost的8080端口,并实现了两个RPC方法:ping和echo。当客户端调用ping方法时,服务端会返回字符串“pong”;当客户端调用echo方法时,服务端会返回客户端传递的参数。 接下来,我们编写一个简单的RPC客户端程序: ```python import avro.protocol import avro.ipc import socket PROTOCOL = avro.protocol.parse(open("test.avpr").read()) class RpcClient(object): def __init__(self, host, port): self.transceiver = avro.ipc.HTTPTransceiver((host, port)) self.requestor = avro.ipc.Requestor(PROTOCOL, self.transceiver) def ping(self): return self.requestor.request("ping", []) def echo(self, message): return self.requestor.request("echo", [message]) if __name__ == "__main__": client = RpcClient("localhost", 8080) print(client.ping()) print(client.echo("Hello, world!")) ``` 这个RPC客户端程序会连接到localhost的8080端口,并调用服务端实现的ping和echo方法。 接下来,我们使用Apache Bench来进行压力测试: ``` ab -n 10000 -c 10 http://localhost:8080/ ``` 这个命令会模拟10个并发连接,总共发送10000个请求。我们可以通过修改-n和-c参数来改变测试规模。 测试结果如下: ``` Server Software: Server Hostname: localhost Server Port: 8080 Document Path: / Document Length: 4 bytes Concurrency Level: 10 Time taken for tests: 7.194 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1830000 bytes HTML transferred: 40000 bytes Requests per second: 1390.36 [#/sec] (mean) Time per request: 7.194 [ms] (mean) Time per request: 0.719 [ms] (mean, across all concurrent requests) Transfer rate: 248.18 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 14 Processing: 1 7 2.3 7 24 Waiting: 1 7 2.3 7 24 Total: 1 7 2.3 7 24 Percentage of the requests served within a certain time (ms) 50% 7 66% 8 75% 8 80% 8 90% 10 95% 12 98% 15 99% 17 100% 24 (longest request) ``` 从测试结果中可以看出,avro-rpc在处理10000个请求时,平均每个请求处理时间为7.194毫秒。每秒处理请求数为1390.36,处理速度较快,而且没有出现失败的请求。因此,我们可以认为avro-rpc是一个性能良好的轻量级RPC框架。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值