42. Amazon CloudSearch

Overview

  • Amazon CloudSearch is a fully managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website or application.
  • With Amazon CloudSearch you can search large collections of data such as web pages, document files, forum posts, or product information. 
  • As your volume of data and traffic fluctuates, Amazon CloudSearch scales to meet your needs.
  • You can use Amazon CloudSearch to index and search both structured data and plain text. Amazon CloudSearch features:
    • Full text search with language-specific text processing
    • Boolean search
    • Prefix searches
    • Range searches
    • Term boosting
    • Faceting
    • Highlighting
    • Autocomplete Suggestions
  • You can get search results in JSON or XML, sort and filter results based on field values, and sort results alphabetically, numerically, or according to custom expressions.
  • To build a search solution with Amazon CloudSearch, you take the following steps:
    • Create and configure a search domain: If you have multiple collections of data that you want to make searchable, you can create multiple search domains.
    • Upload the data you want to search to your domain. 
    • Search your domain. 

How Search Works

  • The collection of data that you want to search (sometimes referred to as your corpus) can consist of unstructured full-text documents, semi-structured documents such as those formatted in mark-up languages like XML, or structured data that conforms to a strict data model.
  • Each item that you want to be able to search (such as a forum post or web page) is represented as a document.
  • Every document has a unique ID and one or more fields that contain the data that you want to search and include in results.
  • To make your data searchable:
    • you represent it as a batch of documents in either JSON or XML and upload the batch to your search domain
    • Amazon CloudSearch then generates a search index from your document data according to your domain's configuration options
    • You submit queries against this index to find the documents that meet specific search criteria.

Indexing in Amazon CloudSearch

  • To build a search index from your data, Amazon CloudSearch needs the following information:
    • Which document fields do you want to search?
    • Which document field values do you want to retrieve with the search results?
    • Which document fields represent categories that you want to use to refine and filter search results?
    • How should the text within a particular field be processed?
  • You define this metadata in your domain configuration by configuring indexing options
  • You must configure a corresponding index field for each document field that occurs in your data—there's a one-to-one mapping between document fields and the fields in your Amazon CloudSearch index. In addition to the index field name, you specify the following:
    • The index field type
    • Whether the field is searchable (text and text-array fields are always searchable)
    • Whether the field can be used as a category (facet)
    • Whether the field value can be returned with the search results
    • Whether the field can be used to sort the results
    • Whether highlights can be returned for the field
    • A default value to use if no value is specified in the document data.

Facets in Amazon CloudSearch

  • A facet is an index field that represents a category that you want to use to refine and filter search results.
  • A facet can be any date, literal, or numeric field that has faceting enabled in your domain configuration.
  • For each facet, Amazon CloudSearch calculates the number of hits that share the same value

Text Processing in Amazon CloudSearch

  • During indexing, Amazon CloudSearch processes the contents of text and text-array fields according to the language-specific analysis scheme configured for the field.
  • An analysis scheme controls how the text is normalized, tokenized, and stemmed, and specifies any stopwords or synonyms to take into account during indexing.
  • Amazon CloudSearch provides default analysis schemes for each supported language. 

Sorting Results in Amazon CloudSearch

  • You can customize how search results are ranked by defining expressions that calculate custom values for every document that matches your search criteria. 

Search Requests in Amazon CloudSearch

  • You submit search requests to your domain's search endpoint as HTTP/HTTPS GET requests.
  • You can specify a variety of options to constrain your search, request facet information, control ranking, and specify what you want to be returned in the results.
  • You can get search results in either JSON or XML. By default, Amazon CloudSearch returns results in JSON.
  • When you submit a search request, Amazon CloudSearch performs text processing on the search string. The search string is processed to:
    • Convert all characters to lowercase
    • Split the string into separate terms on whitespace and punctuation boundaries
    • Remove terms that are on the stopword list for the field being searched.
    • Map stems and synonyms according to the stemming and synonym options configure for the field being searched.
  • By default, Amazon CloudSearch returns search results ranked according to the hits' relevance _scores
  •  Alternatively, your request can specify the index field or expression that you want to use to sort the hits. 

Automatic Scaling

  • When you create a search domain, a single instance is deployed for the domain.
  • Amazon CloudSearch automatically scales the domain by adding instances as the volume of data or traffic increases.
  • When the amount of data you add to your domain exceeds the capacity of the initial search instance type:
    • Amazon CloudSearch scales your search domain to a larger search instance type.
    • After a domain exceeds the capacity of the largest search instance type, Amazon CloudSearch partitions the search index across multiple search instances
    • The number of search instances required to hold the index partitions is sometimes referred to as the domain's width.
  • As your search request volume or complexity increases, it takes more processing power to handle the load.
    • When a search instance nears its maximum load, Amazon CloudSearch deploys a duplicate search instance to provide additional processing power.
    • The number of duplicate search instances is sometimes referred to as the domain's depth.

Reference

What Is Amazon CloudSearch? - Amazon CloudSearch

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值