Amazon DynamoDB 是一种完全托管的 NoSQL 数据库服务,提供快速且可预测的性能,同时还能够实现无缝扩展。使用 DynamoDB,您可以免除操作和扩展分布式数据库的管理工作负担,因而无需担心硬件预置、设置和配置、复制、软件修补或集群扩展等问题。DynamoDB 还提供静态加密,这消除了在保护敏感数据时涉及的操作负担和复杂性。
背景
有时需要将DynamoDB中的数据完全拷贝下来。需要用到扫描操作。
Amazon DynamoDB 中的 Scan
操作读取表或二级索引中的每个项目。默认情况下,Scan
操作返回表或索引中每个项目的全部数据属性。但是,单个 Scan
请求最多可检索 1 MB 数据。对于大表需要进行多次扫描操作。且需要注意的是读取限制,如果超出限制,那么就会告警。
我的表格结构如下
JAVA环境的配置
(1)JDK和maven环境就不说了。
(2)将用户的认证相关内容放入配置文件当中,也可以直接写在代码当中(后面解决方法二就是直接写的)
(3)maven配置
因为用到了写入csv文件,因此引入了csv相关工具包。
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bom</artifactId>
<version>1.11.327</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-dynamodb</artifactId>
</dependency>
<dependency>
<groupId>net.sourceforge.javacsv</groupId>
<artifactId>javacsv</artifactId>
<version>2.0</version>
</dependency>
</dependencies>
解决方法(一)小容量读取
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.PropertiesCredentials;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.amazonaws.services.dynamodbv2.model.ScanRequest;
import com.amazonaws.services.dynamodbv2.model.ScanResult;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Map;
public class DynamoDBUtils {
public static void main(String[] args) throws IOException {
AWSCredentials credentials = new PropertiesCredentials(new File("src/main/resources/key.properties"));
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().
withCredentials(new AWSStaticCredentialsProvider(credentials)).
withRegion("us-east-1").build();
ScanRequest scanRequest = new ScanRequest().withTableName("yucheng");
ScanResult result = client.scan(scanRequest);
for (Map<String, AttributeValue> item : result.getItems()) {
System.out.println(item);
String query = item.get("query").getS();
System.out.println(query);
List<AttributeValue> asin_list = item.get("asin_list").getL();
for (AttributeValue value : asin_list) {
System.out.print(value.getS() + " ");
}
System.out.println();
}
}
}
读取结果如下。
解决方法(二)大容量读取
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.amazonaws.services.dynamodbv2.model.ScanRequest;
import com.amazonaws.services.dynamodbv2.model.ScanResult;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import com.csvreader.CsvWriter;
public class DynamoDBUtils {
private static String region = "us-east-1";//替换成自己的
private static String AWS_ACCESS_KEY_ID = "XXXXXX";//替换成自己的
private static String AWS_SECRET_ACCESS_KEY = "XXXXXXXXXXX";//替换成自己的
public static void main(String[] args) {
try {
f();
} catch (InterruptedException | IOException e) {
e.printStackTrace();
}
}
public static void f() throws IOException, InterruptedException {
String filePath = "XXXX.tsv";//替换成自己的
AWSCredentials credentials = new BasicAWSCredentials(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY);
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(credentials)).withRegion(region).build();
CsvWriter csvWriter = new CsvWriter(filePath, '\t', Charset.forName("UTF-8"));
int count = 0;
Map<String, AttributeValue> lastKeyEvaluated = null;
do {
count++;
if (count % 10 == 0) {
System.out.println(count);
}
ScanRequest scanRequest = new ScanRequest()
.withTableName("yucheng")
.withExclusiveStartKey(lastKeyEvaluated);
ScanResult result = client.scan(scanRequest);
for (Map<String, AttributeValue> item : result.getItems()) {
String query = item.get("query").getS();
List<AttributeValue> asinList = item.get("asin_list").getL();
List<String> asins = new ArrayList<>();
for (AttributeValue value : asinList) {
asins.add(value.getS());
}
String[] line = new String[2];
line[0] = query;
line[1] = String.join(",", asins);
csvWriter.writeRecord(line);
}
lastKeyEvaluated = result.getLastEvaluatedKey();
//为了控制频率,需要将此处进行一个暂停,不然DynamoDB承受不住就会告警
Thread.sleep(40000);
} while (lastKeyEvaluated != null);
csvWriter.close();
}
}