在使用Eleasticsearch进行索引维护的过程中,如果你的应用场景需要频繁的大批量的索引写入,再使用上篇中提到的维护方法的话显然效率是低下的,此时推荐使用bulkIndex来提升效率。批写入数据块的大小取决于你的数据集及集群的配置。
下面我们以Spring Boot结合Elasticsearch创建一个示例项目,从基本的pom配置开始
-
<dependency>
-
<groupId>com.google.code.gson</groupId>
-
<artifactId>gson</artifactId>
-
<version>1.4</version>
-
</dependency>
-
<dependency>
-
<groupId>org.springframework.boot</groupId>
-
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
-
</dependency>
application.properties配置
-
#elasticsearch config
-
spring.data.elasticsearch.cluster-name:elasticsearch
-
spring.data.elasticsearch.cluster-nodes:192.168.1.105:9300
-
-
#application config
-
server.port=8080
-
spring.application.name=esp-app
我们需要定义域的实体和一个Spring data的基本的CRUD支持库类。用id注释定义标识符字段,如果你没有指定ID字段,Elasticsearch不能索引你的文件。同时需要指定索引名称类型,@Document注解也有助于我们设置分片和副本数量。
-
@Data
-
@Document(indexName = "carIndex", type = "carType", shards = 1, replicas = 0)
-
public class Car implements Serializable {
-
/**
-
* serialVersionUID:
-
* @since JDK 1.6
-
*/
-
private static final long serialVersionUID = 1L;
-
@Id
-
private Long id;
-
private String brand;
-
private String model;
-
private BigDecimal amount;
-
-
public Car(Long id, String brand, String model, BigDecimal amount) {
-
this.id = id;
-
this.brand = brand;
-
this.model = model;
-
this.amount = amount;
-
}
-
}
接着定义一个IndexService并使用bulk请求来处理索引,操作前首先要判断索引是否存在,以免出现异常。为了更好的掌握Java API,这里采用了不同于上篇中ElasticSearchRepository的ElasticSearchTemplate工具集,相对来讲功能更加丰富。
-
@Service
-
public class IndexerService {
-
private static final String CAR_INDEX_NAME = "car_index";
-
private static final String CAR_INDEX_TYPE = "car_type";
-
@Autowired
-
ElasticsearchTemplate elasticsearchTemplate;
-
-
public long bulkIndex() throws Exception {
-
int counter = 0;
-
try {
-
//判断索引是否存在
-
if (!elasticsearchTemplate.indexExists(CAR_INDEX_NAME)) {
-
elasticsearchTemplate.createIndex(CAR_INDEX_NAME);
-
}
-
Gson gson = new Gson();
-
List<IndexQuery> queries = new ArrayList<IndexQuery>();
-
List<Car> cars = assembleTestData();
-
for (Car car : cars) {
-
IndexQuery indexQuery = new IndexQuery();
-
indexQuery.setId(car.getId().toString());
-
indexQuery.setSource(gson.toJson(car));
-
indexQuery.setIndexName(CAR_INDEX_NAME);
-
indexQuery.setType(CAR_INDEX_TYPE);
-
queries.add(indexQuery);
-
//分批提交索引
-
if (counter % 500 == 0) {
-
elasticsearchTemplate.bulkIndex(queries);
-
queries.clear();
-
System.out.println("bulkIndex counter : " + counter);
-
}
-
counter++;
-
}
-
//不足批的索引最后不要忘记提交
-
if (queries.size() > 0) {
-
elasticsearchTemplate.bulkIndex(queries);
-
}
-
elasticsearchTemplate.refresh(CAR_INDEX_NAME);
-
System.out.println("bulkIndex completed.");
-
} catch (Exception e) {
-
System.out.println("IndexerService.bulkIndex e;" + e.getMessage());
-
throw e;
-
}
-
-
return -1;
-
}
-
-
private List<Car> assembleTestData() {
-
List<Car> cars = new ArrayList<Car>();
-
//随机生成10000个索引,以便下一次批量写入
-
for (int i = 0; i < 10000; i++) {
-
cars.add(new Car(RandomUtils.nextLong(1, 11111), RandomStringUtils.randomAscii(20), RandomStringUtils.randomAlphabetic(15), BigDecimal.valueOf(78000)));
-
}
-
return cars;
-
}
-
}
再下面的工作就比较简单了,可以编写一个RestController接受请求来测试或者CommandLineRunner,在系统启动时就加载上面的方法。
-
@SpringBootApplication
-
@RestController
-
public class ESPApplicatoin {
-
-
public static void main(String[] args) {
-
SpringApplication.run(ESPApplicatoin.class, args);
-
}
-
-
@Autowired
-
IndexerService indexService;
-
-
-
@RequestMapping(value = "bulkIndex",method = RequestMethod.POST)
-
public void bulkIndex(){
-
try {
-
indexService.bulkIndex();
-
} catch (Exception e) {
-
e.printStackTrace();
-
}
-
}
-
}
CommandLineRunner方法类:
-
@Component
-
public class AppLoader implements CommandLineRunner {
-
@Autowired
-
IndexerService indexerService;
-
-
@Override
-
public void run(String... strings) throws Exception {
-
indexerService.bulkIndex();
-
}
-
}
结束后,就可在通过地址http://localhost:9200/car_index/_search/来查看索引到底有无生效。注:要特别关注版本的兼容问题,如果用Es 5+的话,显然不能采用Spring Data Elasticsearch的方式。
Spring Boot Version (x) | Spring Data Elasticsearch Version (y) | Elasticsearch Version (z) |
---|---|---|
x <= 1.3.5 | y <= 1.3.4 | z <= 1.7.2* |
x >= 1.4.x | 2.0.0 <=y < 5.0.0** | 2.0.0 <= z < 5.0.0** |
(*) - require manual change in your project pom file (solution 2.)
(**) - Next big ES release with breaking changes
>>>案例地址:https://github.com/backkoms/spring-boot-elasticsearch
欢迎加入我的星球
扩展阅读:
Spring Boot + Elasticsearch 实现索引的日常维护
基于SpringCloud的Microservices架构实战案例-序篇