See what actually gets indexed in FAST

from:http://blogs.msdn.com/b/thomsven/archive/2011/01/26/seeing-what-actual-gets-indexed.aspx

When FAST Search for SharePoint indexes documents, it uses an internal format known as Fixml (Fast Internal XML). This represents the exact data which get indexed. Being able to inspect this can be useful for troubleshooting such as

  • lemmatised forms of nouns and adjectives
  • refiner values
  • tokenization of terms
  • etc.

Fixml is stored in gzip'ed files under %FASTSEARCH/data/data_fixml. Un-gzip'ing them is fairly straightforward using command line tools that come with FAST, but knowing which file to extract is far less trivial. I will show you how you can do this easily. The prerequisites for this approach is:

  1. SharePoint access rights to do "Edit page" on the FAST Search Center's result page.
  2. File access to the data_fixml folder (on the FAST server(s) doing indexing)

OK, so let's get started. First, run a query in a FAST Search Center, so you get a results page, and click "Edit Page":

6712.edit_2D00_fast_2D00_result_2D00_page.png

Then, go down to the "Core Search Results" web part and choose "Edit web part":

3404.edit_2D00_search_2D00_core_2D00_results_2D00_web_2D00_part.png

Scrolling up to the top right-hand corner, you open the "Display Properties" of the web part like this:

0385.web_2D00_part_2D00_details.png

Uncheck the "Use localization visualization" to enable editing of the "Fetched Properties" setting. Copy out the text and edit it in an editor to add the last two items ( <Column Name="internalid"/> <Column Name="contentid"/> ), shown in bold here:

<Columns>

<Column Name="WorkId"/>
<Column Name="Rank"/>
<Column Name="Title"/>
<Column Name="Author"/>
<Column Name="Size"/>
<Column Name="Path"/>
<Column Name="Description"/>
<Column Name="Write"/>
<Column Name="SiteName"/>
<Column Name="CollapsingStatus"/>
<Column Name="HitHighlightedSummary"/>
<Column Name="HitHighlightedProperties"/>
<Column Name="ContentClass"/>
<Column Name="IsDocument"/>
<Column Name="PictureThumbnailURL"/>
<Column Name="Url"/>
<Column Name="ServerRedirectedUrl"/>
<Column Name="FileExtension"/>
<Column Name="SpSiteUrl"/>
<Column Name="docvector"/>
<Column Name="fcocount"/>
<Column Name="fcoid"/>
<Column Name="PictureHeight"/>
<Column Name="PictureWidth"/>
<Column Name="internalid"/>
<Column Name="contentid"/>

</Columns>

Note: when pasting the above XML snippet, it will have to all be on one line!

Next, click on the "XSL Editor" and replace the default XSLT with the contents of the attached zip file.

You are all set, and can click "OK" at the bottom of the web part editor, and then "Save & Close" in the ribbon on the top of the page. Your screen should now look like this:

2514.show_2D00_all_2D00_link_2D00_appears.png

If you click on the "Show all" link, it will toggle showing all columns returned from FAST. As we just added "internalid" and "contentid", this will appear at the end of the list

8737.show_2D00_all_2D00_columns.png:

Next, we need a PowerShell script which is available in the TechNet Script Repository, Get-FastFixml. Download this code and install it somewhere on your FAST indexing server. Then, run it with internalid and contentid for the relevant document as input

8737.get_2D00_fastfixml.png:

As you can see, it has written the full Fixml to C:\FASTSEARCH\Var\Fixml.xml. We can open this file in an editor, and see for example how "deep dive" has been lemmatized to "deep deeper deepest" and "dive diva divas":

7536.fixml_2D00_in_2D00_notepad.png:

So, we can try searching for "deeper diva", and see that it returns the expected results, and even highlights "deep dive"!

6557.deeper_2D00_diva.png

Reading the fixml contents can be challenging at first. Here is a quick overview of the most important tag name conventions:

Prefix
Sample
Content

bcat
bcatcontent
Composite field, for searching across fields

bconf<level>
bconf7 src="title"
Represents each field, with level, inside bcat

bcon
bconlanguage
Separately searchable field

batv
batvwrite
Sortable field

bavn
bavnauthor
Refinable field

bsum
bsumteaser
Static result field - show "as is"

bsrc
bsrcbody
Dynamic result field - matched to query terms

I must admit that two of my colleagues did all the hard work here:

  • Aaron Grant did the "Show All" toggle XSL
  • Brent Groom did the Get-FASTFixml

Thank you very much, guys!

转载于:https://www.cnblogs.com/frankzye/archive/2013/03/21/2972394.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值