Understanding the Search Service Architecture

Search is one of the most important services in SharePoint 2013, like it has been throughout theSharePoint product development. This is because it enables users to quickly andeasily find the content held in SharePoint, both from SharePoint  Content Databases and from external datasources, through the Business Connectivity Service. In SharePoint 2013, theunderlying architecture of Search redeveloped, to provide a richer enterprisesearch experience for users and streamlined administration.

 

The Search Architecture

 

TheSearch service has been rearchitected in SharePoint 2013. The architecture changes have been made to ensure that Search offers ahigher level of redundancy forsingle and multiple farm environments.One of the most obvious changes in the new architecture is the inclusion of the FAST technologies into the Search service. Thisenhances search functionalityand makes it easier for solutionarchitects to design and deploy a fullfeatured search solution in their organizations. Search combines a range of elements on application servers, which include:

 

  • Crawl Component
  • Content Processing Component
  • Analytics Processing Component
  • Indexing Component
  • Query Processing Component
  • Search Administration Component

 

Thesecomponents interact to complete search data ingestion to the search index andfor query results surfacing. Toingest data, the Crawl Component interrogates the content sources that you have configured, either on the SharePoint farmor on external sources, such as Microsoft Exchange or Lotus Notes. The crawled items are processed bythe Content Processing Component to format them appropriately to be stored on the index. Information from theAnalytics Processing Component is used in this process, to identify useful associated item information, suchas previous user interaction. The data, or artifacts, are written to the index, which is a series of files andfolders that are stored on disk and referred to, collectively, as the index file. The Query Processing Componentreceives queries from the Web Front End (WFE) server, processes and sends the query to the IndexingComponent, which returns result sets. The Query Processing Component performs additional processing toaggregate and clean the results and then sends the result sets back to the WFE to be rendered for theuser. There are temporary and permanentstorage databases used throughout the process.

 

The Crawl Component

 

TheCrawl Component is a part of the crawl and content processing architecture, which comprises the Crawl Component, the crawl database,and the Content ProcessingComponent. The crawl role is responsiblefor crawling content sources to build a search index. This means that the component reviews each of the documents or pages in a source location.

 

Crawl Connectors:

 

  • HTTP
  • File Shares
  • SharePoint Sites
  • User profiles
  • Exchange
  • Lotus Notes
  • Custom

 

TheCrawl Component delivers content from files and pages, together with theirassociated metadata, to theContent Processing Component. The crawled items that are passed to the ContentProcessing Component haveassociated properties, such as title and author. Crawl uses connectors, such asthe Lotus Notes Connector, toaccess content sources and retrieve data. These were known as protocol handlersin previous versions ofSharePoint Products and Technologies. The properties are grouped, based on the connector or IFilter—a piece of code thatenables the specific file formats to be indexed and thus be searchable, now referred to as a formathandler—used to crawl the content source; so, Microsoft Office documents (for example, Word and Exceldocuments) would be grouped under Office, whereas properties from websites would be grouped under Web.You can include the contents and metadata of crawled properties in the search index file. To do this, you must map thecrawled properties to managed properties,because only these are included in the search index.

Thecrawler itself does not complete document parsing; that function is provisionedby the Content ProcessingComponent.

TheCrawl Component uses one or more crawl databases to temporarily storeinformation about crawled itemsand to maintain a crawl history. The database holds information such as thelast crawl time, the last crawlID and the last crawl update type. Information about content sources, such astheir schedules and locations,are synchronized to the registry on crawl role servers from the searchadministration database

 

Content Processing Component

Whenthe Crawl Component forwards content and metadata, the Content Processing Component performs tasks on the content to prepare it for the search index file. The search index workswith processed content, calledartifacts. The Content ProcessingComponent tasks include parsing documents, property mapping, and linguisticsprocessing. The latter detects language and extracts language-based entities. TheContent Processing Componentalso writes information aboutthe URLs into the link database, which holds information about links rather than content.Including and excluding file types

 

TheContent Processing Component can only process file data if the file type(extension) is included in the listof available file types on the Manage Files Types page and the crawl server hasthe appropriate format handlerinstalled. By default, some file types, such as email messages with a .emlextension, are supported. Thedefault format handlers focus on content, so while a Microsoft PowerPoint .pptx format appears on the Manage File Types list by default, theMicrosoft PowerPoint .pps presentation file does not.

 

Analytics Processing Component

 

Oneof the major changes to the search service is the inclusion of web analytics functionality. This was previously a separate serviceapplication in SharePointServer 2010. The new Search Analytics function analyzes both the crawled items and how users interact with search results; theseare entitled Search Analyticsand Usage Analytics.

 

Search Analytics

 

Thesearch analytics information is used to improve search relevance and to create search reports, recommendations, and data links.This information is thenreturned to the Content ProcessingComponent for storage in the search index. Information about search activity, such as the number of searchclicks from a search results page, helps to improve the relevance of the search results by analyzing previoususer activities. This information is stored  in the link database.This information is then further analyzed by a series of sub-analyses. Thefollowing table shows thesub-analyses that act on Search Analysis results.

 

Usage Analytics

 

Usageanalytics analyzes usage events, such as views from the event store. When auser completes an action, suchas viewing a page, the event is collected and stored in usage files on each WFEserver. This information ispushed to an event store where it is stored until it is processed by theAnalytics Processing Component.The results are then returned to the Content Processing Component to beincluded in the search index.The usage events that are analyzed include:

 

  • Views
  • Recommendations displayed
  • Recommendation Clicked

 

Index Component

 

Thesearch index is a set of files that are stored in separate folders on a server. The Content Processing Component processes items

provisionedby the Crawl Components, maps crawledproperties to managed properties, and formats these as artifacts that can be stored on the search index. The indexes can include:

 

  • Full-text indexes
  • Indexes of the Managed properties
  • An index for attribute vectors
  • Numeric indexes
  • Index Component
  • Index Partition
  • Index Replica

 

Query Processing Component

 

TheQuery Processing Component, which sits between the Index Component and the search front-end client, handles processing when a user executes a search query and processes theresults to be returned. Whenthe Indexing Component— oranother search provider—returns a result set, the Query Processing Component performs any additional processing that is required.

 

TheQuery Processing Component performs some linguistic processing to maximize query efficiency and effectiveness, such as:

  • Word stemming. Stemming returns words closely related to or stemming from another word. For example, the stemmer relates words such as “jumping,” “jumped,” and “jumper” to the verb “to jump.”
  • Word breaking. This refers to the breaking of words that are linked by some form of hyphenation; for example, the term “server-based“ is linked by a hyphen. In this case, word breaking returns both “server” and “based,” with higher relevance given to an item with both present.

 

The queryprocessing workflow is as follows:

  • The Query Processing Component receives a query from the search front-end client and processes to maximize precision, recall, and relevancy. These actions include:
  • Applying Web Part transformations.

 

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值