AWS实战:Dynamodb到Redshift数据同步

本文详细介绍了如何在AWS环境中,使用Glue ETL Job、Glue Streaming Job和Kinesis Firehose进行Dynamodb到Redshift的数据同步。内容包括资源准备(如VPC、Redshift Cluster的创建)、同步方案的适用场景、优缺点分析、资源部署、测试过程以及在实际操作中遇到的坑点和解决方案,为读者提供了完整的实战指导。
摘要由CSDN通过智能技术生成

AWS Dynamodb简介

  • Amazon DynamoDB 是一种完全托管式、无服务器的 NoSQL 键值数据库,旨在运行任何规模的高性能应用程序。
  • DynamoDB能在任何规模下实现不到10毫秒级的一致响应,并且它的存储空间无限,可在任何规模提供可靠的性能。
  • DynamoDB 提供内置安全性、连续备份、自动多区域复制、内存缓存和数据导出工具。

Redshift简介

  • Amazon Redshift是一个快速、功能强大、完全托管的PB级别数据仓库服务。用户可以在刚开始使用几百GB的数据,然后在后期扩容到PB级别的数据容量。
  • Redshift是一种联机分析处理OLAP(Online Analytics Processing)的类型,支持复杂的分析操作,侧重决策支持,并且能提供直观易懂的查询结果。

资源准备

VPC

  • vpc
    • cird block: 10.10.0.0/16
  • internet gateway
  • elastic ip address
  • nat gateway:使用elastic ip address作为public ip
  • public subnet
    • 三个Availability Zone
  • private subnet
    • 三个Availability Zone
  • public route table:public subnet关联的route table
    • destination: 0.0.0.0/0 target: internet-gateway-id(允许与外界进行通信)
    • destination:10.10.0.0/16 local(内部通信)
  • private route table:private subnet关联的route table
    • destination:10.10.0.0/16 local(内部通信)
    • destination: 0.0.0.0/0 target: nat-gateway-id(允许内部访问外界)
  • web server security group
    • 允许任意ip对443端口进行访问
    • 允许自己的ipdui22端口进行访问,以便ssh到服务器上向数据库插入数据
  • glue redshift connection security group
    • 只包含一条self-referencing rule ,允许同一个security group对所有tcp端口进行访
    • 创建Glue connection时需要使用该security group:
    • Reference: glue connection security group must have a self-referencing rule to allow to allow AWS Glue components to communicate. Specifically, add or confirm that there is a rule of Type All TCP, Protocol is TCP, Port Range includes all ports, and whose Source is the same security group name as the Group ID.
  • private redshift security group
    • 允许vpc内部(10.10.0.0/24)对5439端口进行访问
    • 允许glue connection security group对5439端口进行访问
  • public redshift security group
    • 允许vpc内部(10.10.0.0/24)对5439端口进行访问
    • 允许kenisis firehose所在region的public ip 对5439端口进行访问
      • 13.58.135.96/27 for US East (Ohio)

      • 52.70.63.192/27 for US East (N. Virginia)

      • 13.57.135.192/27 for US West (N. California)

      • 52.89.255.224/27 for US West (Oregon)

      • 18.253.138.96/27 for AWS GovCloud (US-East)

      • 52.61.204.160/27 for AWS GovCloud (US-West)

      • 35.183.92.128/27 for Canada (Central)

      • 18.162.221.32/27 for Asia Pacific (Hong Kong)

      • 13.232.67.32/27 for Asia Pacific (Mumbai)

      • 13.209.1.64/27 for Asia Pacific (Seoul)

      • 13.228.64.192/27 for Asia Pacific (Singapore)

      • 13.210.67.224/27 for Asia Pacific (Sydney)

      • 13.113.196.224/27 for Asia Pacific (Tokyo)

      • 52.81.151.32/27 for China (Beijing)

      • 161.189.23.64/27 for China (Ningxia)

      • 35.158.127.160/27 for Europe (Frankfurt)

      • 52.19.239.192/27 for Europe (Ireland)

      • 18.130.1.96/27 for Europe (London)

      • 35.180.1.96/27 for Europe (Paris)

      • 13.53.63.224/27 for Europe (Stockholm)

      • 15.185.91.0/27 for Middle East (Bahrain)

      • 18.228.1.128/27 for South America (São Paulo)

      • 15.161.135.128/27 for Europe (Milan)

      • 13.244.121.224/27 for Africa (Cape Town)

      • 13.208.177.192/27 for Asia Pacific (Osaka)

      • 108.136.221.64/27 for Asia Pacific (Jakarta)

      • 3.28.159.32/27 for Middle East (UAE)

      • 18.100.71.96/27 for Europe (Spain)

      • 16.62.183.32/27 for Europe (Zurich)

      • 18.60.192.128/27 for Asia Pacific (Hyderabad)

VPC全部资源的serverless文件:

  • custom:bucketNamePrefix 替换为自己的创建的bucket
  • 运行以下命令部署: sls deploy -c vpc.yml
  • vpc.yml
  • service: dynamodb-to-redshift-vpc
    
    custom:
      bucketNamePrefix: "jessica"
    
    provider:
      name: aws
      region: ${opt:region, "ap-southeast-1"}
      stackName: ${self:service}
      deploymentBucket:
        name: com.${self:custom.bucketNamePrefix}.deploy-bucket
        serverSideEncryption: AES256
    
    resources:
      Parameters:
        VpcName:
          Type: String
          Default: "test-vpc"
    
      Resources:
        VPC:
          Type: "AWS::EC2::VPC"
          Properties:
            CidrBlock: "10.10.0.0/16"
            EnableDnsSupport: true
            EnableDnsHostnames: true
            InstanceTenancy: default
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}"
        # Internet Gateway
        InternetGateway:
          Type: "AWS::EC2::InternetGateway"
          Properties:
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_InternetGateway"
        VPCGatewayAttachment:
          Type: "AWS::EC2::VPCGatewayAttachment"
          Properties:
            VpcId: !Ref VPC
            InternetGatewayId: !Ref InternetGateway
    
        # web server security group
        WebServerSecurityGroup:
          Type: AWS::EC2::SecurityGroup
          Properties:
            GroupDescription: Allow access from public
            VpcId: !Ref VPC
            SecurityGroupIngress:
              - IpProtocol: tcp
                FromPort: 443
                ToPort: 443
                CidrIp: "0.0.0.0/0"
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_WebServerSecurityGroup"
    
        # public route table
        RouteTablePublic:
          Type: "AWS::EC2::RouteTable"
          Properties:
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_RouteTablePublic"
        RouteTablePublicInternetRoute:
          Type: "AWS::EC2::Route"
          DependsOn: VPCGatewayAttachment
          Properties:
            RouteTableId: !Ref RouteTablePublic
            DestinationCidrBlock: "0.0.0.0/0"
            GatewayId: !Ref InternetGateway
    
        # public subnet
        SubnetAPublic:
          Type: "AWS::EC2::Subnet"
          Properties:
            AvailabilityZone: !Select [0, !GetAZs ""]
            CidrBlock: "10.10.0.0/24"
            MapPublicIpOnLaunch: true
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_SubnetAPublic"
        RouteTableAssociationAPublic:
          Type: "AWS::EC2::SubnetRouteTableAssociation"
          Properties:
            SubnetId: !Ref SubnetAPublic
            RouteTableId: !Ref RouteTablePublic
    
        SubnetBPublic:
          Type: "AWS::EC2::Subnet"
          Properties:
            AvailabilityZone: !Select [1, !GetAZs ""]
            CidrBlock: "10.10.32.0/24"
            MapPublicIpOnLaunch: true
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_SubnetBPublic"
        RouteTableAssociationBPublic:
          Type: "AWS::EC2::SubnetRouteTableAssociation"
          Properties:
            SubnetId: !Ref SubnetBPublic
            RouteTableId: !Ref RouteTablePublic
    
        SubnetCPublic:
          Type: "AWS::EC2::Subnet"
          Properties:
            AvailabilityZone: !Select [2, !GetAZs ""]
            CidrBlock: "10.10.64.0/24"
            MapPublicIpOnLaunch: true
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_SubnetCPublic"
        RouteTableAssociationCPublic:
          Type: "AWS::EC2::SubnetRouteTableAssociation"
          Properties:
            SubnetId: !Ref SubnetCPublic
            RouteTableId: !Ref RouteTablePublic
    
        # redshift security group
        PrivateRedshiftSecurityGroup:
          Type: AWS::EC2::SecurityGroup
          Properties:
            GroupDescription: Allow access from inside vpc
            VpcId: !Ref VPC
            SecurityGroupIngress:
              - IpProtocol: tcp
                FromPort: 5439
                ToPort: 5439
                CidrIp: 10.10.0.0/24
              - IpProtocol: tcp
                FromPort: 5439
                ToPort: 5439
                SourceSecurityGroupId: !GetAtt GlueRedshiftConnectionSecurityGroup.GroupId
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_PrivateRedshiftSecurityGroup"
        # redshift security group
        PublicRedshiftSecurityGroup:
          Type: AWS::EC2::SecurityGroup
          Properties:
            GroupDescription: Allow access from inside vpc and Kinesis Data Firehose CIDR block
            VpcId: !Ref VPC
            SecurityGroupIngress:
              - IpProtocol: tcp
                FromPort: 5439
                ToPort: 5439
                CidrIp: 10.10.0.0/24
              - IpProtocol: tcp
                FromPort: 5439
                ToPort: 5439
                CidrIp: 13.228.64.192/27
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_PublicRedshiftSecurityGroup"
        GlueRedshiftConnectionSecurityGroup:
          Type: AWS::EC2::SecurityGroup
          Properties:
            GroupDescription: Allow self referring for all tcp ports
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_GlueRedshiftConnectionSecurityGroup"
        GlueRedshiftConnectionSecurityGroupSelfReferringInboundRule:
          Type: "AWS::EC2::SecurityGroupIngress"
          Properties:
            GroupId: !GetAtt GlueRedshiftConnectionSecurityGroup.GroupId
            IpProtocol: tcp
            FromPort: 0
            ToPort: 65535
            SourceSecurityGroupId: !GetAtt GlueRedshiftConnectionSecurityGroup.GroupId
            SourceSecurityGroupOwnerId: !Sub "${aws:accountId}"
        # nat gateway
        EIP:
          Type: "AWS::EC2::EIP"
          Properties:
            Domain: vpc
        NatGateway:
          Type: "AWS::EC2::NatGateway"
          Properties:
            AllocationId: !GetAtt "EIP.AllocationId"
            SubnetId: !Ref SubnetAPublic
    
        # private route table
        RouteTablePrivate:
          Type: "AWS::EC2::RouteTable"
          Properties:
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_RouteTablePrivate"
        RouteTablePrivateRoute:
          Type: "AWS::EC2::Route"
          Properties:
            RouteTableId: !Ref RouteTablePrivate
            DestinationCidrBlock: "0.0.0.0/0"
            NatGatewayId: !Ref NatGateway
    
        # private subnet
        SubnetAPrivate:
          Type: "AWS::EC2::Subnet"
          Properties:
            AvailabilityZone: !Select [0, !GetAZs ""]
            CidrBlock: "10.10.16.0/24"
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_SubnetAPrivate"
        RouteTableAssociationAPrivate:
          Type: "AWS::EC2::SubnetRouteTableAssociation"
          Properties:
            SubnetId: !Ref SubnetAPrivate
            RouteTableId: !Ref RouteTablePrivate
        SubnetBPrivate:
          Type: "AWS::EC2::Subnet"
          Properties:
            AvailabilityZone: !Select [1, !GetAZs ""]
            CidrBlock: "10.10.48.0/24"
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_SubnetBPrivate"
        RouteTableAssociationBPrivate:
          Type: "AWS::EC2::SubnetRouteTableAssociation"
          Properties:
            SubnetId: !Ref SubnetBPrivate
            RouteTableId: !Ref RouteTablePrivate
        SubnetCPrivate:
          Type: "AWS::EC2::Subnet"
          Properties:
            AvailabilityZone: !Select [2, !GetAZs ""]
            CidrBlock: "10.10.80.0/24"
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                Value: !Sub "VPC_${VpcName}_SubnetCPrivate"
        RouteTableAssociationCPrivate:
          Type: "AWS::EC2::SubnetRouteTableAssociation"
          Properties:
            SubnetId: !Ref SubnetCPrivate
            RouteTableId: !Ref RouteTablePrivate
    
      Outputs:
        VPC:
          Description: "VPC."
          Value: !Ref VPC
          Export:
            Name: !Sub "${self:provider.stackName}"
        SubnetsPublic:
          Description: "Subnets public."
          Value:
            !Join [
              ",",
              [!Ref SubnetAPublic, !Ref SubnetBPublic, !Ref SubnetCPublic],
            ]
          Export:
            Name: !Sub "${self:provider.stackName}-PublicSubnets"
        SubnetsPrivate:
          Description: "Subnets private."
          Value:
            !Join [
              ",",
              [!Ref SubnetAPrivate, !Ref SubnetBPrivate, !Ref SubnetCPrivate],
            ]
          Export:
            Name: !Sub "${self:provider.stackName}-PrivateSubnets"
        DefaultSecurityGroup:
          Description: "VPC Default Security Group"
          Value: !GetAtt VPC.DefaultSecurityGroup
          Export:
            Name: !Sub "${self:provider.stackName}-DefaultSecurityGroup"
        WebServerSecurityGroup:
          Description: "VPC Web Server Security Group"
          Value: !Ref WebServerSecurityGroup
          Export:
            Name: !Sub "${self:provider.stackName}-WebServerSecurityGroup"
        PrivateRedshiftSecurityGroup:
          Description: "The id of the RedshiftSecurityGroup"
          Value: !Ref PrivateRedshiftSecurityGroup
          Export:
            Name: !Sub "${self:provider.stackName}-PrivateRedshiftSecurityGroup"
        PublicRedshiftSecurityGroup:
          Description: "The id of the RedshiftSecurityGroup"
          Value: !Ref PublicRedshiftSecurityGroup
          Export:
            Name: !Sub "${self:provider.stackName}-PublicRedshiftSecurityGroup"
        GlueRedshiftConnectionSecurityGroup:
          Description: "The id of the self referring security group"
          Value: !Ref GlueRedshiftConnectionSecurityGroup
          Export:
            Name: !Sub "${self:provider.stackName}-GlueSelfRefringSecurityGroup"
    

Redshift Cluster

  • Private Cluster subnet group
    • 创建一个包含private subnet的private subnet group
  • Private Cluster:用于测试glue job同步数据到redshift,PubliclyAccessible必须设为false,否则glue job无法连接
    • ClusterSubnetGroupName
      • 使用private subnet group
    • VpcSecurityGroupIds
      • 使用privat
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值