# solr和lucene_使用Apache Lucene和Solr 4进行下一代搜索和分析

solr和lucene

## 快速入门：实际操作中的搜索和分析

• 支持多种类型的联接和分组选项
• 具有可选的面向列的存储
• 提供几种处理文本以及枚举和数字数据类型的方法
• 使您能够定义自己的复杂数据类型以及存储，排名和分析功能

### 建立

• Lucene和Solr。
• Java 6或更高版本。
• 现代的网络浏览器。 （我在Google Chrome和Firefox上进行了测试。）
• 4GB磁盘空间-如果您不想使用所有飞行数据，则更少。
• 通过* nix上的bash （或类似）外壳进行终端访问。 对于Windows，您需要Cygwin。 我只在带有bash shell的OS X上进行过测试。
• 如果您选择使用示例代码包中的下载脚本来下载数据，则为wget 。 您也可以手动下载航班数据。
• 如果要运行任何Java代码示例，则可以使用Apache Ant 1.8+进行编译和打包。

1. 下载本文的示例代码ZIP文件，并将其解压缩到您选择的目录中。 我将此目录称为$SOLR_AIR。 2. 在命令行上，转到$ SOLR_AIR目录：
cd $SOLR_AIR 3. 启动Solr： ./bin/start-solr.sh 4. 运行创建必要字段以对数据建模的脚本： ./bin/setup.sh 5. 将浏览器指向http：// localhost：8983 / solr /＃/以显示新的Solr Admin UI。 图1显示了一个示例： ##### 图1. Solr用户界面 6. 在终端上，查看bin / download-data.sh脚本的内容，以获取有关从RITA和OpenFlights下载内容的详细信息。 手动或通过运行脚本下载数据集： ./bin/download-data.sh 下载可能需要花费大量时间，具体取决于您的带宽。 7. 下载完成后，对部分或全部数据建立索引。 索引所有数据： bin/index.sh 要索引一年的数据，请使用该年的1987年至2008年之间的任何值。 例如： bin/index.sh 1987 8. 索引编制完成后（可能会花费大量时间，具体取决于您的计算机），将浏览器指向http：// localhost：8983 / solr / collection1 / travel。 您将看到一个类似于图2的UI： ##### 图2. Solr Air用户界面 ### 探索数据 随着Solr Air应用程序的启动和运行，您可以浏览数据和UI，以了解可以问的各种问题。 在浏览器中，您应该看到两个主要界面点：地图和搜索框。 对于地图，我从D3出色的Airport示例开始（请参阅参考资料 ）。 我修改并扩展了代码，以直接从Solr而不是D3示例随附的示例CSV文件中加载所有机场信息。 我还对每个机场做了一些初始统计计算，您可以通过将鼠标悬停在特定机场上来查看这些统计数据。 我将使用搜索框展示一些关键功能，以帮助您构建复杂的搜索和分析应用程序。 要遵循该代码，请参阅solr / collection1 / conf / velocity / map.vm文件。 主要重点领域是： • 枢轴面 • 统计功能 • 分组 • Lucene和Solr扩大了对地理空间的支持 这些区域中的每一个都可以帮助您获得答案，例如到达特定机场的飞机的平均延误，或者在两个机场之间（每家航空公司，或者从某个起始机场到附近所有机场之间飞行的飞机）的最常见延误时间机场）。 该应用程序使用Solr的统计功能，再加上Solr的长期刻面功能，可以绘制机场“点”的初始地图，并生成基本信息，例如总航班以及平均，最小和最大延迟时间。 （仅此功能是查找错误数据或至少找到极端异常值的一种绝妙方法。）为了演示这些领域（并展示将Solr与D3集成起来有多么容易），我实现了一些轻量级JavaScript代码，该代码如下： 1. 解析查询。 （生产质量的应用程序可能会在服务器端甚至作为Solr查询解析器插件来执行大多数查询解析。） 2. 创建各种Solr请求。 3. 显示结果。 请求类型为： • 按三个字母的机场代码查找，例如RDUSFO • 每个路由的查找，例如SFO TO ATLRDU TO ATL 。 （不支持多跳。） • 搜索框为空时，单击搜索按钮可显示所有航班的各种统计信息。 • 使用near运算符查找附近的机场，如near:SFOnear:SFO TO ATL • 查找可能的延误，例如likely:SFO 。在不同的行驶距离（小于500英里，500到1000、1000到2000、2000及以后）。 • 任何要送入Solr的/travel请求处理程序的Solr查询，例如&q=AirportCity:Francisco 前面列表中的前三个请求类型都是相同类型的所有变体。 这些变体突出了Solr的枢轴分面功能，例如，显示了每个航空公司每个航班的每条航线最常见的到达延迟时间（例如SFO TO ATL ）。 near选项利用新的Lucene和Solr空间功能来执行显着增强的空间计算，例如复杂的多边形相交。 likely选项展示了Solr的分组功能，可以显示与始发机场相比有一定距离的机场，始发机场的延误时间超过30分钟。 所有这些请求类型都通过少量的D3 JavaScript来为地图添加显示信息。 对于列表中的最后一个请求类型，我只需返回关联的JSON。 此请求类型使您可以自己浏览数据。 如果您在自己的应用程序中使用此请求类型，则自然希望以特定于应用程序的方式使用响应。 现在，自行尝试一些查询。 例如，如果搜索SFO TO ATL ，您应该看到与图3相似的结果： ##### 图3. SFO TO ATL屏幕示例 图3中 ，两个机场在左侧的地图中突出显示。 右侧的航线统计信息列表显示了每个航空公司每趟航班最常见的到达延迟时间。 （我只加载了1987年的数据。）例如，它告诉您，达美航空156号班机五次到达亚特兰大的航班延迟了五分钟，而四次早六分钟。 您可以在浏览器的控制台（例如，在Mac上的Chrome中，选择“查看”->“开发人员”->“ Javascript控制台”）和Solr日志中查看基础的Solr请求。 我使用的SFO-TO-ATL请求（此处仅出于格式化目的分成三行）是： /solr/collection1/travel?&wt=json&facet=true&facet.limit=5&fq=Origin:SFO AND Dest:ATL&q=*:*&facet.pivot=UniqueCarrier,FlightNum,ArrDelay& f.UniqueCarrier.facet.limit=10&f.FlightNum.facet.limit=10 facet.pivot参数提供此请求中的关键功能。 facet.pivot从航空公司（称为UniqueCarrier ）到FlightNum UniqueCarrierFlightNum进行ArrDelay ，从而提供了如图3的 “航线统计”中显示的嵌套结构。 如果尝试使用near查询，如near:JFK ，则结果应类似于图4： ##### 图4.示例屏幕显示了肯尼迪国际机场附近的机场 位于查询near的Solr请求利用了Solr的新空间功能，我将在本文后面详细介绍。 现在，您可以通过查看请求本身来识别此新功能的某些功能（此处出于格式化目的而将其简化）： ... &fq=source:Airports&q=AirportLocationJTS:"IsWithin(Circle(40.639751,-73.778925 d=3))" ... 您可能会猜到，该请求将查找所有落在一个圆内的机场，这些圆的中心在纬度为40.639751度，在经度为-73.778925度，并且半径为3度，大约为111公里。 到目前为止，您应该强烈了解Lucene和Solr应用程序可以以有趣的方式对数据（数字，文本或其他数据）进行切片和切块。 而且由于Lucene和Solr都是开源的，并具有商业友好的许可证，因此您可以自由添加自己的自定义项。 更好的是，Lucene和Solr的4.x系列增加了许多地方，您可以在其中插入自己的想法和功能，而无需检查所有代码。 在接下来查看Lucene 4的一些要点（撰写本文时为4.4版），然后再查看Solr 4的要点时，请记住此功能。 ## Lucene 4：下一代搜索和分析的基础 Lucene的一些关键新增功能和更改包括速度和内存，灵活性，数据结构和构面等类别。 （要查看有关Lucene更改的所有详细信息，请阅读每个Lucene发行版中随附的CHANGES.txt文件。） ### 速度与记忆 尽管通常认为以前的Lucene版本足够快（尤其是相对于可比较的通用搜索库而言），但是Lucene 4的增强使许多操作比以前的版本快得多。 图5中的图形捕获了Lucene索引的性能，以GB /小时为单位。 （Lucene提交者Mike McCandless提供了夜间Lucene基准测试图；请参阅参考资料 。）图5显示，5月上半月[[year？]]发生了巨大的性能改善： ##### 图5. Lucene索引性能 图5所示的改进来自对Lucene如何构建其索引结构以及在构建索引结构时如何处理并发性的一系列更改（以及其他一些更改，包括JVM更改和固态驱动器的使用）。 更改的重点是在Lucene将索引写入磁盘时删除同步。 有关详细信息（不在本文的讨论范围内），请参阅参考资料 ，以获得指向Mike McCandless博客文章的链接。 除了提高整体索引性能外，Lucene 4还可执行近实时 （NRT）索引操作。 NRT操作可以大大减少搜索引擎反映索引更改所花费的时间。 要使用NRT操作，必须在应用程序中的Lucene的IndexWriterIndexReader之间进行一些协调。 清单1（下载包的src / main / java / IndexingExamples.java文件中的片段）说明了这种相互作用： ##### 清单1. Lucene中的NRT搜索示例 ... doc = new HashSet<IndexableField>(); index(writer, doc); //Get a searcher IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(directory)); printResults(searcher); //Now, index one more doc doc.add(new StringField("id", "id_" + 100, Field.Store.YES)); doc.add(new TextField("body", "This is document 100.", Field.Store.YES)); writer.addDocument(doc); //The results are still 100 printResults(searcher); //Don't commit; just open a new searcher directly from the writer searcher = new IndexSearcher(DirectoryReader.open(writer, false)); //The results now reflect the new document that was added printResults(searcher); ... 清单1中 ，我首先为Directory建立索引并将一组文档提交，然后搜索Directory —这是Lucene中的传统方法。 当我继续为另一个文档建立索引时，NRT进入了：Lucene不会执行完全提交，而是从IndexWriter创建一个新的IndexSearcher ，然后进行搜索。 您可以通过转到$ SOLR_AIR目录并执行以下命令序列来运行此示例：

1. ant compile
2. cd build/classes
3. java -cp ../../lib/*:. IndexingExamples

...
Num docs: 100
Num docs: 100
Num docs: 101
...

Lucene 4还包含利用一些更高级的数据结构的内存改进（我在“ 有限状态自动机”和其他文章中对此进行了详细介绍）。 这些改进不仅减少了Lucene的内存占用，而且还大大加快了基于通配符和正则表达式的查询。 另外，代码库不再使用Java String对象，而是管理大量的字节数组分配。 （因此，现在在Lucene中， BytesRef类似乎是无处不在的。）结果，减少了String开销，并且Java堆上的对象数得到了更好的控制，从而降低了世界垃圾回收的可能性。

### 灵活性

Lucene 4.x的灵活性改进为希望从Lucene挤出质量和性能的最后每一点的开发人员（和研究人员）释放了宝贵的机遇。 为了增强灵活性，Lucene提供了两个新的定义明确的插件点。 这两个插件点都已经对Lucene的开发和使用方式产生了重大影响。

##### 清单2.在Lucene中更改Codec实例的示例
...
conf.setCodec(new SimpleTextCodec());
File simpleText = new File("simpletext");
directory = new SimpleFSDirectory(simpleText);
//Let's write to disk so that we can see what it looks like
writer = new IndexWriter(directory, conf);
index(writer, doc);//index the same docs as before
...

##### 清单3. _0.cfs纯文本索引文件的一部分
...
term id_97
doc 97
term id_98
doc 98
term id_99
doc 99
END
doc 0
numfields 4
field 0
name id
type string
value id_100
field 1
name body
type string
value This is document 100.
...

##### 清单4.更改Lucene中的Similarity
conf = new IndexWriterConfig(Version.LUCENE_44, analyzer);
directory = new RAMDirectory();
writer = new IndexWriter(directory, conf);
index(writer, DOC_BODIES);
writer.close();
searcher = new IndexSearcher(DirectoryReader.open(directory));
System.out.println("Lucene default scoring:");
TermQuery query = new TermQuery(new Term("body", "snow"));
printResults(searcher, query, 10);

BM25Similarity bm25Similarity = new BM25Similarity();
conf.setSimilarity(bm25Similarity);
Directory bm25Directory = new RAMDirectory();
writer = new IndexWriter(bm25Directory, conf);
index(writer, DOC_BODIES);
writer.close();
searcher = new IndexSearcher(DirectoryReader.open(bm25Directory));
searcher.setSimilarity(bm25Similarity);
System.out.println("Lucene BM25 scoring:");
printResults(searcher, query, 10);

### 有限状态自动机和其他优点

• DocValues（也称为列跨度字段 ）。
• 有限状态自动机（FSA）和有限状态传感器（FST）。 在本文的其余部分中，我将两者都称为FSA。 （从技术上讲，FST在访问其节点时会输出值，但是对于本文而言，区别并不重要。）

DocValues和FSA都为可能影响您的应用程序的某些类型的操作提供了重要的新性能优势。

##### 清单5.一个简单的Lucene自动机的例子
String[] words = {"hockey", "hawk", "puck", "text", "textual", "anachronism", "anarchy"};
Collection<BytesRef> strings = new ArrayList<BytesRef>();
for (String word : words) {
strings.add(new BytesRef(word));

}
//build up a simple automaton out of several words
Automaton automaton = BasicAutomata.makeStringUnion(strings);
CharacterRunAutomaton run = new CharacterRunAutomaton(automaton);
System.out.println("Match: " + run.run("hockey"));
System.out.println("Match: " + run.run("ha"));

### 刻面

##### 清单6. Lucene构面示例
...
DirectoryTaxonomyWriter taxoWriter =
new DirectoryTaxonomyWriter(facetDir, IndexWriterConfig.OpenMode.CREATE);
FacetFields facetFields = new FacetFields(taxoWriter);
for (int i = 0; i < DOC_BODIES.length; i++) {
String docBody = DOC_BODIES[i];
String category = CATEGORIES[i];
Document doc = new Document();
CategoryPath path = new CategoryPath(category, '/');
//Setup the fields
facetFields.addFields(doc, Collections.singleton(path));//just do a single category path
doc.add(new StringField("id", "id_" + i, Field.Store.YES));
doc.add(new TextField("body", docBody, Field.Store.YES));
writer.addDocument(doc);
}
writer.commit();
taxoWriter.commit();
DirectoryReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
DirectoryTaxonomyReader taxor = new DirectoryTaxonomyReader(taxoWriter);
ArrayList<FacetRequest> facetRequests = new ArrayList<FacetRequest>();
CountFacetRequest home = new CountFacetRequest(new CategoryPath("Home", '/'), 100);
home.setDepth(5);
facetRequests.add(home);
facetRequests.add(new CountFacetRequest(new CategoryPath("Home/Sports", '/'), 10));
facetRequests.add(new CountFacetRequest(new CategoryPath("Home/Weather", '/'), 10));
FacetSearchParams fsp = new FacetSearchParams(facetRequests);

FacetsCollector facetsCollector = FacetsCollector.create(fsp, reader, taxor);
searcher.search(new MatchAllDocsQuery(), facetsCollector);

for (FacetResult fres : facetsCollector.getFacetResults()) {
FacetResultNode root = fres.getFacetResultNode();
printFacet(root, 0);
}

Home (0.0)
Home/Children (3.0)
Home/Children/Nursery Rhymes (3.0)
Home/Weather (2.0)

Home/Sports (2.0)
Home/Sports/Rock Climbing (1.0)
Home/Sports/Hockey (1.0)
Home/Writing (1.0)
Home/Quotes (1.0)
Home/Quotes/Yoda (1.0)
Home/Music (1.0)
Home/Music/Lyrics (1.0)
...

## Solr 4：大规模搜索和分析

### 搜索，构面和相关性

Solr 4的一些新功能旨在使在索引编制以及在搜索和构面方面更容易构建下一代数据驱动的应用程序。 表1总结了重点内容，并在适用时包括命令和代码示例：

##### 表1. Solr 4中的索引，搜索和构面高亮

http://localhost:8983/solr/collection1/travel?&wt=json&facet=true&facet.limit=5&fq=&q=*:* &facet.pivot=Origin,Dest,UniqueCarrier,FlightNum,ArrDelay&indent=true

http://localhost:8983/solr/collection1/travel?&wt=json&q=*:*&fl=*, {!func}docfreq('Origin',%20'SFO')&indent=true

http://localhost:8983/solr/collection1/travel?&wt=json&indent=true&q={!join%20from=IATA%20to=Origin}*:*
Codec支持 更改索引的Codec和各个字段的过帐格式。 对字段使用SimpleTextCodec
<fieldType name="string_simpletext" class="solr.StrField" postingsFormat="SimpleText" />

• 字段变异（例如，串联字段，解析数字，修剪）
• 脚本编写。 使用JavaScript或JavaScript引擎支持的其他代码来处理文档。 请参阅Solr Air示例中的update-script.js文件。
• 语言识别（在3.5中可用，但在这里值得一提），用于识别文档中使用的语言（例如英语或日语）。

curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [{"id": "243551","Origin": {"set":"FOO"}}]'

### 扩展，NoSQL和NRT

How does SolrCloud work? Documents that are sent to Solr 4 when it's running in (optional) distributed mode are routed according to a hashing mechanism to a node in the cluster (called the leader ). The leader is responsible for indexing the document into a shard . A shard is a single index that is served by a leader and zero or more replicas. As an illustration, assume that you have four machines and two shards. When Solr starts, each of the four machines communicates with the other three. Two of the machines are elected leaders, one for each shard. The other two nodes automatically become replicas of one of the shards. If one of the leaders fails for some reason, a replica (in this case the only replica) becomes the leader, thereby guaranteeing that the system still functions properly. You can infer from this example that in a production system enough nodes must participate to ensure that you can handle system outages.

To see SolrCloud in action, you can run launch a two-node, two-shard system by running the start-solr.sh script that you used in the Solr Air example with a -z flag. From the *NIX command line, first shut down your old instance:

kill -9 PROCESS_ID

Then restart the system:

bin/start-solr.sh -c -z

The -c flag erases the old index. The -z flag tells Solr to start up with an embedded version of Apache Zookeeper .

Point your browser at the SolrCloud admin page, http://localhost:8983/solr/#/~cloud, to verify that two nodes are participating in the cluster. You can now re-index your content, and it will be spread across both nodes. All queries to the system are also automatically distributed. You should get the same number of hits for a match-all-documents search against two nodes that you got for one node.

The start-solr.sh script launches Solr with the following command for the first node:

java -Dbootstrap_confdir=\$SOLR_HOME/solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar

The script tells the second node where Zookeeper is:

java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

Embedded Zookeeper is great for getting started, but to ensure high availability and fault tolerance for production systems, set up a stand-alone set of Zookeeper instances in your cluster.

Stacked on top the SolrCloud capabilities are support for NRT and many NoSQL-like functions, such as:

• Optimistic locking
• Atomic updates
• Real-time gets (retrieving a specific document before it is committed)
• Transaction-log-backed durability

Many of the distributed and NoSQL functions in Solr — such as automatic versioning of documents and transaction logs — work out of the box. For a few other features, the descriptions and examples in Table 2 will be helpful:

##### Table 2. Summary of distributed and NoSQL features in Solr 4

Realtime get Retrieve a document, by ID, regardless of its state of indexing or distribution. Get the document whose ID is 243551:
http://localhost:8983/solr/collection1/get?id=243551
Shard splitting Split your index into smaller shards so they can be migrated to new nodes in the cluster. Split shard1 into two shards:
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
NRT Use NRT to search for new content much more quickly than in previous versions. Turn on <autoSoftCommit> in your solrconfig.xml file. 例如：
<autoSoftCommit> <maxTime>5000</maxTime> </autoSoftCommit>>
Document routing Specify which documents live on which nodes. Ensure that all of a user's data is on certain machines. Read Joel Bernstein's blog post (see Related topics ).

http://localhost:8983/solr/admin/collections?action=CREATE&name=hockey&numShards=2

### Going schemaless

Solr's schemaless functionality enables clients to add content rapidly without the overhead of first defining a schema.xml file. Solr examines the incoming data and passes it through a cascading set of value parsers. The value parsers guess the data's type and then automatically add the fields to the internal schema and add the content to the index.

A typical production system (with some exceptions) shouldn't use schemaless, because the value guessing isn't always perfect. For instance, the first time Solr sees a new field, it might identify the field as an integer and thus define an integer FieldType in the underlying schema. But you may discover three weeks later that the field is useless for searching because the rest of the content that Solr sees for that field consists of float point values.

However, schemaless is especially helpful for early-stage development or for indexing content whose format you have little to no control over. For instance, Table 2 includes an example of using the collections API in Solr to create a new collection:

http://localhost:8983/solr/admin/collections?action=CREATE&name=hockey&numShards=2)

After you create the collection, you can use schemaless to add content to it. First, though, take a look at the current schema. As part of implementing schemaless support, Solr also added Representational State Transfer (REST) APIs for accessing the schema. You can see all of the fields defined for the hockey collection by pointing your browser (or cURL on the command line) at http://localhost:8983/solr/hockey/schema/fields. You see all of the fields from the Solr Air example. The schema uses those fields because the create option used my default configuration as the basis for the new collection. You can override that configuration if you want. (A side note: The setup.sh script that's included in the sample code download uses the new schema APIs to create all of the field definitions automatically.)

To add to the collection by using schemaless, run:

bin/schemaless-example.sh

The following JSON is added to the hockey collection that you created earlier:

[
{
"id": "id1",
"team": "Carolina Hurricanes",
"description": "The NHL franchise located in Raleigh, NC",
"cupWins": 1
}
]

As you know from examining the schema before you added this JSON to the collection, the team , description , and cupWins fields are new. When the script ran, Solr guessed their types automatically and created the fields in the schema. To verify, refresh the results at http://localhost:8983/solr/hockey/schema/fields. You should now see team , description , and cupWins all defined in the list of fields.

### Spatial (not just geospatial) improvements

Solr's longstanding support for point-based spatial searching enables you to find all documents that are within some distance of a point. Although Solr supports this approach in an n -dimensional space, most people use it for geospatial search (for example, find all restaurants near my location ). But until now, Solr didn't support more-involved spatial capabilities such as indexing polygons or performing searches within indexed polygons. Some of the highlights of the new spatial package are:

• Support through the Spatial4J library (see Related topics ) for many new spatial types — such as rectangles, circles, lines, and arbitrary polygons — and support for the Well Known Text (WKT) format
• Multivalued indexed fields, which you can use to encode multiple points into the same field
• Configurable precision that gives the developer more control over accuracy versus computation speed
• Fast filtering of content
• Query support for Is Within , Contains , and IsDisjointTo
• Optional support for the Java Topological Suite (JTS) (see Related topics )
• Lucene APIs and artifacts

The schema for the Solr Air application has several field types that are set up to take advantage of this new spatial functionality. I defined two field types for working with the latitude and longitude of the airport data:

<fieldType name="location_jts" class="solr.SpatialRecursivePrefixTreeFieldType"
distErrPct="0.025" spatialContextFactory=
"com.spatial4j.core.context.jts.JtsSpatialContextFactory"
maxDistErr="0.000009" units="degrees"/>

<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
distErrPct="0.025" geo="true" maxDistErr="0.000009" units="degrees"/>

The location_jts field type explicitly uses the optional JTS integration to define a point, and the location_rpt field type doesn't. If you want to index anything more complex than simple rectangles, you need to use the JTS version. The fields' attributes help to define the system's accuracy. These attributes are required at indexing time because Solr, via Lucene and Spatial4j, encodes the data in multiple ways to ensure that the data can be used efficiently at search time. For your applications, you'll likely want to run some tests with your data to determine the tradeoffs to make in terms of index size, precision, and query-time performance.

In addition, the near query that's used in the Solr Air application uses the new spatial-query syntax ( IsWithin on a Circle ) for finding airports near the specified origin and destination airports.

### New administration UI

In wrapping up this section on Solr, I would be remiss if I didn't showcase the much more user-friendly and modern Solr admin UI. The new UI not only cleans up the look and feel but also adds new functionality for SolrCloud, document additions, and much more.

For starters, when you first point your browser at http://localhost:8983/solr/#/, you should see a dashboard that succinctly captures much of the current state of Solr: memory usage, working directories, and more, as in Figure 7:

##### Figure 7. Example Solr dashboard

If you select Cloud in the left side of the dashboard, the UI displays details about SolrCloud. For example, you get in-depth information about the state of configuration, live nodes, and leaders, as well as visualizations of the cluster topology. Figure 8 shows an example. Take a moment to work your way through all of the cloud UI options. (You must be running in SolrCloud mode to see them.)

##### Figure 8. Example SolrCloud UI

The last area of the UI to cover that's not tied to a specific core/collection/index is the Core Admin set of screens. These screens provides point-and-click control over the administration of cores, including adding, deleting, reloading, and swapping cores. Figure 9 shows the Core Admin UI:

##### Figure 9. Example of Core Admin UI

By selecting a core from the Core list, you access an overview of information and statistics that are specific to that core. Figure 10 shows an example:

##### Figure 10. Example core overview

Most of the per-core functionality is similar to the pre-4.x UI's functionality (albeit in a much more pleasant way), with the exception of the Documents option. You can use the Documents option to add documents in various formats (JSON, CSV, XML, and others) to the collection directly from the UI, as Figure 11:

##### Figure 11. Example of adding a document from the UI

You can even upload rich document types such as PDF and Word. Take a moment to add some documents into your index or browse the other per-collection capabilities such as the Query interface or the revamped Analysis screen.

## 前方的路

Next-generation search-engine technology gives users the power to decide what to do with their data. This article gave you a good taste of what Lucene and Solr 4 are capable of, and, I hope, a broader sense of how search engines solve non-text-based search problems that involve analytics and recommendations.

Lucene and Solr are in constant motion, thanks to a large sustaining community that's backed by more than 30 committers and hundreds of contributors. The community is actively developing two main branches: the current officially released 4.x branch and the trunk branch, which represents the next major (5.x) release On the official release branch, the community is committed to backward compatibility and an incremental approach to development that focuses on easy upgrades of current applications. On the trunk branch, the community is a bit less restricted in terms of ensuring compatibility with previous releases. If you want to try out the cutting edge in Lucene or Solr, check the trunk branch of the code out from Subversion or Git (see Related topics ). Whichever path you choose, you can take advantage of Lucene and Solr for powerful search-based analytics that go well beyond plain text search.

### 致谢

Thanks to David Smiley, Erik Hatcher, Yonik Seeley, and Mike McCandless for their help.

solr和lucene

• 0
点赞
• 0
评论
• 0
收藏
• 一键三连
• 扫一扫，分享海报

04-09 84
11-28
©️2021 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。