dynamodb 数据迁移_修改dynamodb表中的数据

dynamodb 数据迁移

You’re working on a long standing project when you realize a non-trivial schema change will deem all of your historical data useless or invalid. You wouldn’t want to delete all of the data and start collecting from scratch, so how can you alter your entire DynamoDB table without wasting too much time combing through documentation and devising an entirely new program to do this?

当您意识到一个不重要的模式更改将认为您的所有历史数据无用或无效时,您正在从事一个长期的项目。 您不想删除所有数据并从头开始收集数据,那么如何在不浪费太多时间梳理文档和设计一个全新程序来更改整个DynamoDB表的情况下呢?

I encountered this issue when developing a large-scale project that was dependent on millions of time-specific data entries stored in DynamoDB. After using a standard date format (MM/dd/yyyy) for some time, I discovered that the ISO date format (yyyy-MM-dd) was better suited for flexible range queries (You can read more on this in my previous article), so I wanted my data to reflect that instead.

在开发依赖于DynamoDB中存储的数百万个特定于时间的数据条目的大型项目时,我遇到了此问题。 在使用标准日期格式( MM/dd/yyyy )一段时间后,我发现ISO日期格式( yyyy-MM-dd )更适合于灵活范围查询(您可以在上一篇文章中了解更多内容 ) ,因此我希望我的数据能够反映出来。

The project was written in Go, but even with the AWS Go SDK, I found it incredibly difficult to find straightforward answers explaining how to update specific fields in my tables. After doing some lengthy research, I was finally able to write a program that successfully modified millions of table entries so that I could continue using them in my application. Here’s how I did it.

该项目是用Go编写的,但是即使使用AWS Go SDK,我也发现很难找到简单的答案来解释如何更新表中的特定字段。 经过长时间的研究,我终于能够编写一个程序,成功修改了数百万个表条目,以便可以在应用程序中继续使用它们。 这是我的方法。

前言:使用全局二级索引 (Preface: Use a Global Secondary Index)

The Query operation finds items based on specific primary key values whereas the Scan operation examines every single item. So rather than performing a Scan on your entire table, AWS suggests using Query for faster response times. Plus, the larger the table or index, the slower Scan performs. Therefore, creating global secondary indexes organized by primary keys was crucial for this program. Using indexes, you can efficiently query outdated items based on attributes of interest and modify their appropriate fields.

Query操作根据特定的主键值查找项目,而Scan操作检查每个项目。 因此,AWS建议不要使用Query以获得更快的响应时间,而不是对整个表执行Scan 。 另外,表或索引越大, Scan执行越慢。 因此,创建由主键组织的全局二级索引对该程序至关重要。 使用索引,您可以根据感兴趣的属性有效地查询过时的项目并修改其适当的字段。

For my application, I created an index called weekday-date-index with the primary key specified as weekday and the sort key specified as date. This allowed me to query items under each individual weekday, isolate items with the incorrect date format, and initiate an update.

对于我的应用程序,我创建了一个名为weekday-date-index ,其主键指定为weekday ,而排序键指定为date 。 这使我可以查询每个工作日下的项目,隔离日期格式不正确的项目,然后启动更新。

You can read more on scan and query best practices here.

您可以在此处阅读有关扫描和查询最佳做法的更多信息

1.设置 (1. Set up)

I started by defining some constants to store the old format I was targeting, oldLayout, and the new format I wanted, newLayout. I also defined a schema for my table items so that I could easily inspect or edit any necessary attributes.

我首先定义了一些常数,以存储要定位的旧格式oldLayout和想要的新格式newLayout 。 我还为表项定义了一个架构,以便可以轻松检查或编辑任何必要的属性。

const oldLayout = "1/2/2006"
const newLayout = "2006-01-02"type indexedItem struct {
UUID string `json:"uuid"`
Date string `json:"date"`
Metric string `json:"metric"`
Value float64 `json:"value"`
Weekday string `json:"weekday"`
}

Next was to establish a new DynamoDB session using the appropriate access key and secret key.

接下来是使用适当的访问密钥和秘密密钥建立一个新的DynamoDB会话。

sess, _ := session.NewSession(&aws.Config{Region:      aws.String("us-west-2"), Credentials: credentials.NewStaticCredentials(
dbAccess,
dbSecret,
"",
),
})
dbClient := dynamodb.New(sess)

Note: For this example we’re using static credentials, but the AWS SDK provides other ways as well. You can read more about it here.

注意 :在此示例中,我们使用静态凭证,但是AWS开发工具包还提供了其他方法。 您可以在此处了解更多信息。

2.创建一个查询 (2. Create a query)

Since my approach was to target one weekday at a time using the primary key of my index, I created an array of weekday strings to loop through.

由于我的方法是使用索引的主键一次定位一个工作日,因此我创建了一个循环的工作日字符串数组。

weekdays := []string{"Monday", "Tuesday", "Wednesday", "Thursday",     
"Friday", "Saturday", "Sunday"}for _, weekday := range weekdays { var queryInput = &dynamodb.QueryInput{TableName: aws.String("table-example"),IndexName: aws.String("weekday-date-index"),KeyConditions: map[string]*dynamodb.Condition{
"weekday": {
ComparisonOperator: aws.String("EQ"),
AttributeValueList: []*dynamodb.AttributeValue{
{
S: aws.String(weekday),
},
},
},
},
}...

For the query input I wrote above, you can see that I specified a table name, index name, and a key condition on the weekday attribute. Key conditions provide the selection criteria for a query operation. For a query on a table, key conditions can only be on primary key attributes, a crucial detail that amplifies the significance of having an effective primary key in your index. Additionally, the key name and value may only be provided as an “EQ” condition, despite the availability of other options (EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN).

对于我在上面编写的查询输入,您可以看到我在weekday属性中指定了表名,索引名和键条件。 关键条件提供了查询操作的选择条件。 对于表查询,键条件只能位于主键属性上 ,这是一个关键细节,它放大了在索引中拥有有效主键的重要性 此外,尽管可以使用其他选项(EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN),但密钥名称和值只能作为“ EQ”条件提供。

We can now perform the query and store its response:

现在,我们可以执行查询并存储其响应:

var resp, err = dbClient.Query(queryInput)
if err != nil {
// handle error case
}

3.创建一个更新 (3. Create an update)

The query response should return as an array of items with attributes that match the criteria we specified as input. These items can be unmarshaled to Go types with the help of the dynamoattribute sub package for more straightforward access to each attribute.

查询响应应以具有属性匹配我们指定为输入条件的项目数组的形式返回。 可以使用dynamoattribute子程序包将这些项目解编为Go类型,以更直接地访问每个属性。

for _, i := range resp.Items {   item := indexedItem{}
err = dynamodbattribute.UnmarshalMap(i, &item)
if err != nil {
// handle error case
} ...

This is where you can check for inconsistencies in the data and make any changes you need to make. In my case, I was looking for date entries under the format MM/dd/yyyy that I could reformat to ISO yyyy-MM-dd.

您可以在此处检查数据中的不一致之处,并进行所需的任何更改。 就我而言,我正在寻找格式为MM/dd/yyyy date条目,可以将其重新格式化为ISO yyyy-MM-dd

   // check if date is formatted incorrectly
if strings.Contains(item.Date, "/") { incorrectDate, err := time.Parse(oldLayout, item.Date)
if err != nil {
// handle error case
} // reformat time to be the correct format
correctDate := incorrectDate.Format(newLayout) ...

You should now have everything you need to create the input for your update.

现在,您应该拥有创建更新输入所需的一切。

   ...
input := &dynamodb.UpdateItemInput{TableName: aws.String("table-example"),Key: map[string]*dynamodb.AttributeValue{
"uuid": {
S: aws.String(item.UUID),
},
},ExpressionAttributeNames: map[string]*string {
"#date": aws.String("date"),
},ExpressionAttributeValues:
map[string]*dynamodb.AttributeValue{
":d": {
S: aws.String(correctDate),
},
},ReturnValues: aws.String("UPDATED_NEW"),UpdateExpression: aws.String("set #date=:d"),
} ...

Here’s a breakdown of the input fields I included:

这是我包括的输入字段的细分:

  • TableName (required field): Your table name

    TableName ( 必填字段) :您的表名

  • Key (required field): The query response should return each item with its primary key (a UUID in my case), so now this information can be easily referenced to update each item

    ( 必填字段) :查询响应应返回每个项目及其主键(在我的情况下为UUID),因此现在可以轻松地参考此信息来更新每个项目

  • ExpressionAttributeNames: These are substitution tokens for attribute names to prevent conflicts with DynamoDB reserved words or to prevent special characters in an attribute name from being misinterpreted. The # character is actually used to dereference an attribute name, so specifying #date as the string "date" (the actual name of the field) allows us to use it in theUpdateExpression since DATE is a DynamoDB reserved word

    ExpressionAttributeNames :这些是属性名称的替换标记,以防止与DynamoDB保留字冲突或防止误解属性名称中的特殊字符。 #字符实际上是用来取消引用属性名称的,因此将#date指定为字符串"date" (字段的实际名称)使我们可以在UpdateExpression使用它,因为DATE是DynamoDB保留字。

  • ExpressionAttributeValues: A similar idea to ExpressionAttributeNames, except these are the actual substituted values. The : character is used to dereference an attribute value, so here I used :d to specify a string containing the correct format that I reference in the UpdateExpression

    ExpressionAttributeValues:类似的想法ExpressionAttributeNames ,除了这些是实际取代的:字符用于取消引用属性值,因此在这里我使用:d指定一个字符串,该字符串包含我在UpdateExpression引用的正确格式

  • ReturnValues: A sanity check! Use this if you want to get item attributes before or after they are updated. I used the key string UPDATED_NEW to return only updated attributes as they appear after the update takes place

    ReturnValues :健全性检查! 如果要在更新项目属性之前或之后获取项目属性,请使用此属性。 我使用了字符串UPDATED_NEW来仅返回更新后出现的更新属性。

  • UpdateExpression: A string expression that should define which attributes to update, which action to perform, and what values to use. Here, I use set to replace the already existing date attribute with a new value

    UpdateExpression :一个字符串表达式,应定义要更新的属性,要执行的操作以及要使用的值。 在这里,我使用set用新值替换已经存在的date属性

The last step is to actually update the item using your input.

最后一步是使用您的输入实际更新项目。

   ...   _, err = dbClient.UpdateItem(input)    
if err != nil {
// handle error case
}

You should now have all the working parts to update your DynamoDB table!

现在,您应该具有所有工作部分来更新DynamoDB表!

Thanks for reading! This was my first time working with DynamoDB, but knowing that there are limited AWS developer guides for Go, I hope this helps others facing any similar issues.

谢谢阅读! 这是我第一次使用DynamoDB,但是知道Go的AWS开发人员指南有限,我希望这可以帮助其他面临类似问题的人。

If you’re interested in the source code, you can access it here!

如果您对源代码感兴趣,可以在这里访问它!

翻译自: https://medium.com/cloud-native-the-gathering/modify-data-in-your-dynamodb-table-97f43bfedfaa

dynamodb 数据迁移

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值