dynamodb 数据迁移
You’re working on a long standing project when you realize a non-trivial schema change will deem all of your historical data useless or invalid. You wouldn’t want to delete all of the data and start collecting from scratch, so how can you alter your entire DynamoDB table without wasting too much time combing through documentation and devising an entirely new program to do this?
当您意识到一个不重要的模式更改将认为您的所有历史数据无用或无效时,您正在从事一个长期的项目。 您不想删除所有数据并从头开始收集数据,那么如何在不浪费太多时间梳理文档和设计一个全新程序来更改整个DynamoDB表的情况下呢?
I encountered this issue when developing a large-scale project that was dependent on millions of time-specific data entries stored in DynamoDB. After using a standard date format (MM/dd/yyyy
) for some time, I discovered that the ISO date format (yyyy-MM-dd
) was better suited for flexible range queries (You can read more on this in my previous article), so I wanted my data to reflect that instead.
在开发依赖于DynamoDB中存储的数百万个特定于时间的数据条目的大型项目时,我遇到了此问题。 在使用标准日期格式( MM/dd/yyyy
)一段时间后,我发现ISO日期格式( yyyy-MM-dd
)更适合于灵活范围查询(您可以在上一篇文章中了解更多内容 ) ,因此我希望我的数据能够反映出来。
The project was written in Go, but even with the AWS Go SDK, I found it incredibly difficult to find straightforward answers explaining how to update specific fields in my tables. After doing some lengthy research, I was finally able to write a program that successfully modified millions of table entries so that I could continue using them in my application. Here’s how I did it.
该项目是用Go编写的,但是即使使用AWS Go SDK,我也发现很难找到简单的答案来解释如何更新表中的特定字段。 经过长时间的研究,我终于能够编写一个程序,成功修改了数百万个表条目,以便可以在应用程序中继续使用它们。 这是我的方法。
前言:使用全局二级索引 (Preface: Use a Global Secondary Index)
The Query
operation finds items based on specific primary key values whereas the Scan
operation examines every single item. So rather than performing a Scan
on your entire table, AWS suggests using Query
for faster response times. Plus, the larger the table or index, the slower Scan
performs. Therefore, creating global secondary indexes organized by primary keys was crucial for this program. Using indexes, you can efficiently query outdated items based on attributes of interest and modify their appropriate fields.
Query
操作根据特定的主键值查找项目,而Scan
操作检查每个项目。 因此,AWS建议不要使用Query
以获得更快的响应时间,而不是对整个表执行Scan
。 另外,表或索引越大, Scan
执行越慢。 因此,创建由主键组织的全局二级索引对该程序至关重要。 使用索引,您可以根据感兴趣的属性有效地查询过时的项目并修改其适当的字段。
For my application, I created an index called weekday-date-index
with the primary key specified as weekday
and the sort key specified as date
. This allowed me to query items under each individual weekday, isolate items with the incorrect date format, and initiate an update.
对于我的应用程序,我创建了一个名为weekday-date-index
,其主键指定为weekday
,而排序键指定为date
。 这使我可以查询每个工作日下的项目,隔离日期格式不正确的项目,然后启动更新。
You can read more on scan and query best practices here.
1.设置 (1. Set up)
I started by defining some constants to store the old format I was targeting, oldLayout
, and the new format I wanted, newLayout
. I also defined a schema for my table items so that I could easily inspect or edit any necessary attributes.
我首先定义了一些常数,以存储要定位的旧格式oldLayout
和想要的新格式newLayout
。 我还为表项定义了一个架构,以便可以轻松检查或编辑任何必要的属性。
const oldLayout = "1/2/2006"
const newLayout = "2006-01-02"type indexedItem struct {
UUID string `json:"uuid"`
Date string `json:"date"`
Metric string `json:"metric"`
Value float64 `json:"value"`
Weekday string `json:"weekday"`
}
Next was to establish a new DynamoDB session using the appropriate access key and secret key.
接下来是使用适当的访问密钥和秘密密钥建立一个新的DynamoDB会话。
sess, _ := session.NewSession(&aws.Config{Region: aws.String("us-west-2"), Credentials: credentials.NewStaticCredentials(
dbAccess,
dbSecret,
"",
),
})
dbClient := dynamodb.New(sess)
Note: For this example we’re using static credentials, but the AWS SDK provides other ways as well. You can read more about it here.
注意 :在此示例中,我们使用静态凭证,但是AWS开发工具包还提供了其他方法。 您可以在此处了解更多信息。
2.创建一个查询 (2. Create a query)
Since my approach was to target one weekday at a time using the primary key of my index, I created an array of weekday strings to loop through.
由于我的方法是使用索引的主键一次定位一个工作日,因此我创建了一个循环的工作日字符串数组。
weekdays := []string{"Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"}for _, weekday := range weekdays { var queryInput = &dynamodb.QueryInput{TableName: aws.String("table-example"),IndexName: aws.String("weekday-date-index"),KeyConditions: map[string]*dynamodb.Condition{
"weekday": {
ComparisonOperator: aws.String("EQ"),
AttributeValueList: []*dynamodb.AttributeValue{
{
S: aws.String(weekday),
},
},
},
},
}...
For the query input I wrote above, you can see that I specified a table name, index name, and a key condition on the weekday
attribute. Key conditions provide the selection criteria for a query operation. For a query on a table, key conditions can only be on primary key attributes, a crucial detail that amplifies the significance of having an effective primary key in your index. Additionally, the key name and value may only be provided as an “EQ” condition, despite the availability of other options (EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN).
对于我在上面编写的查询输入,您可以看到我在weekday
属性中指定了表名,索引名和键条件。 关键条件提供了查询操作的选择条件。 对于表查询,键条件只能位于主键属性上 ,这是一个关键细节,它放大了在索引中拥有有效主键的重要性。 此外,尽管可以使用其他选项(EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN),但密钥名称和值只能作为“ EQ”条件提供。
We can now perform the query and store its response:
现在,我们可以执行查询并存储其响应:
var resp, err = dbClient.Query(queryInput)
if err != nil {
// handle error case
}
3.创建一个更新 (3. Create an update)
The query response should return as an array of items with attributes that match the criteria we specified as input. These items can be unmarshaled to Go types with the help of the dynamoattribute
sub package for more straightforward access to each attribute.
查询响应应以具有属性匹配我们指定为输入条件的项目数组的形式返回。 可以使用dynamoattribute
子程序包将这些项目解编为Go类型,以更直接地访问每个属性。
for _, i := range resp.Items { item := indexedItem{}
err = dynamodbattribute.UnmarshalMap(i, &item)
if err != nil {
// handle error case
} ...
This is where you can check for inconsistencies in the data and make any changes you need to make. In my case, I was looking for date
entries under the format MM/dd/yyyy
that I could reformat to ISO yyyy-MM-dd
.
您可以在此处检查数据中的不一致之处,并进行所需的任何更改。 就我而言,我正在寻找格式为MM/dd/yyyy
date
条目,可以将其重新格式化为ISO yyyy-MM-dd
。
// check if date is formatted incorrectly
if strings.Contains(item.Date, "/") { incorrectDate, err := time.Parse(oldLayout, item.Date)
if err != nil {
// handle error case
} // reformat time to be the correct format
correctDate := incorrectDate.Format(newLayout) ...
You should now have everything you need to create the input for your update.
现在,您应该拥有创建更新输入所需的一切。
...
input := &dynamodb.UpdateItemInput{TableName: aws.String("table-example"),Key: map[string]*dynamodb.AttributeValue{
"uuid": {
S: aws.String(item.UUID),
},
},ExpressionAttributeNames: map[string]*string {
"#date": aws.String("date"),
},ExpressionAttributeValues:
map[string]*dynamodb.AttributeValue{
":d": {
S: aws.String(correctDate),
},
},ReturnValues: aws.String("UPDATED_NEW"),UpdateExpression: aws.String("set #date=:d"),
} ...
Here’s a breakdown of the input fields I included:
这是我包括的输入字段的细分:
TableName (required field): Your table name
TableName ( 必填字段) :您的表名
Key (required field): The query response should return each item with its primary key (a UUID in my case), so now this information can be easily referenced to update each item
键 ( 必填字段) :查询响应应返回每个项目及其主键(在我的情况下为UUID),因此现在可以轻松地参考此信息来更新每个项目
ExpressionAttributeNames: These are substitution tokens for attribute names to prevent conflicts with DynamoDB reserved words or to prevent special characters in an attribute name from being misinterpreted. The
#
character is actually used to dereference an attribute name, so specifying#date
as the string"date"
(the actual name of the field) allows us to use it in theUpdateExpression
since DATE is a DynamoDB reserved wordExpressionAttributeNames :这些是属性名称的替换标记,以防止与DynamoDB保留字冲突或防止误解属性名称中的特殊字符。
#
字符实际上是用来取消引用属性名称的,因此将#date
指定为字符串"date"
(字段的实际名称)使我们可以在UpdateExpression
使用它,因为DATE是DynamoDB保留字。ExpressionAttributeValues: A similar idea to
ExpressionAttributeNames
, except these are the actual substituted values. The:
character is used to dereference an attribute value, so here I used:d
to specify a string containing the correct format that I reference in theUpdateExpression
ExpressionAttributeValues:类似的想法
ExpressionAttributeNames
,除了这些是实际取代的值 。:
字符用于取消引用属性值,因此在这里我使用:d
指定一个字符串,该字符串包含我在UpdateExpression
引用的正确格式ReturnValues: A sanity check! Use this if you want to get item attributes before or after they are updated. I used the key string
UPDATED_NEW
to return only updated attributes as they appear after the update takes placeReturnValues :健全性检查! 如果要在更新项目属性之前或之后获取项目属性,请使用此属性。 我使用了字符串
UPDATED_NEW
来仅返回更新后出现的更新属性。UpdateExpression: A string expression that should define which attributes to update, which action to perform, and what values to use. Here, I use
set
to replace the already existingdate
attribute with a new valueUpdateExpression :一个字符串表达式,应定义要更新的属性,要执行的操作以及要使用的值。 在这里,我使用
set
用新值替换已经存在的date
属性
The last step is to actually update the item using your input.
最后一步是使用您的输入实际更新项目。
... _, err = dbClient.UpdateItem(input)
if err != nil {
// handle error case
}
You should now have all the working parts to update your DynamoDB table!
现在,您应该具有所有工作部分来更新DynamoDB表!
Thanks for reading! This was my first time working with DynamoDB, but knowing that there are limited AWS developer guides for Go, I hope this helps others facing any similar issues.
谢谢阅读! 这是我第一次使用DynamoDB,但是知道Go的AWS开发人员指南有限,我希望这可以帮助其他面临类似问题的人。
If you’re interested in the source code, you can access it here!
如果您对源代码感兴趣,可以在这里访问它!
翻译自: https://medium.com/cloud-native-the-gathering/modify-data-in-your-dynamodb-table-97f43bfedfaa
dynamodb 数据迁移