使用场景
如题,生产数据源源不断的传入kinesis实时流中,然后与redis中已缓存的数据做匹配,若匹配成功则将指定的数据替换后传入kinesis流,用以在AWS平台使用firehouse做后续处理存入S3存储库。
因为可能因为数据量的大小或者其他因素,可以选择两种方式将数据传入kinesis,put_record和put_records。
因为生产数据放入kinesis流中时是经过KMS、base64以及avro序列化的,而测试时则是直接通过脚本对应avro.schema的字段随便生成的字符串,但现在不需要这些,所以直接用笨方法在程序里创建一个与流数据值对应的字段组成的数组,因为需要从数据中找到vin所对应的值,所以转换成了字典类型的数据(没有使用索引去得)。然后直接与传入redis中的vin值匹配,将匹配成功的redis中的vin所对应的vehicle_id值把kinesis流数据中的vin值替换掉(先通过vin确认是否匹配成功)。
代码示例:
import base64
from io import StringIO, BytesIO
import json
import boto3
import redis
import datetime
import os
import time
import multiprocessing
import logging
import hashlib
import uuid
import aws_kinesis_agg.aggregator
import time
from avro.io import DatumReader, BinaryDecoder
import avro.schema
import re
import aws_encryption_sdk
# Create a KMS master key provider.
#master_key_provider = aws_encryption_sdk.KMSMasterKeyProvider(**dict(key_ids=[os.environ.get("KMS_KEY_ID")]))
# Getting the Avro schema
#with open('vehiclestate.avdl','r') as f:
# schema = avro.schema.Parse(f.read())
#def decrypt_message(message):
# plaintext, decrypt_header = aws_encryption_sdk.decrypt(
# source=message,
# key_provider=master_key_provider
# )
# return plaintext
#def avro_decode(message, SCHEMA):
# reader = DatumReader(SCHEMA)
# message_bytes = BytesIO(message)
# decoder = BinaryDecoder(message_bytes)
# event_dict = reader.read(decoder)
# return event_dict
#Set logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)
#Get client from boto3
firehose = boto3.client('firehose')
s3 = boto3.client('s3')
kinesis = boto3.client('kinesis')
#Set a maximum of 500 pieces of data per batch
KINESIS_MAX_BATCH_SIZE = 500
#Get date now
date = datetime.datetime.now()
def lambda_handler(event, context):
# Logger output
logger.info(f"Starting processing events from kinesis. Total {len(event['Records'])}")
# redis client to query the elasticache
cache_client = transform_vin_number_init_()
# Counter
record_counter = 0
# Event json buffer
json_records = []
# Go through many records sent by Kinesis
for record in event['Records']:
# Get Kinesis approximateArrivalTimestamp
aws_arrival_timestamp = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(record['kinesis']['approximateArrivalTimestamp']))
# Counter adjust
record_counter+=1
# Time start
start_time = time.time()
# Decrypt the incoming text before decoding utf-8
decrypted_message = str(base64.b64decode(record['kinesis']['data']),'utf-8')
message_arr = decrypted_message.split(',')
key_arr = ['details_timestamp','details_id','details_source','vin','trigger','servicetype','messagename','messagerecipient',&