Solr.1.4.Enterprise.Search.Server.Aug.2009 (Solr Cores)

最新推荐文章于 2024-11-16 16:07:34 发布

envykok

最新推荐文章于 2024-11-16 16:07:34 发布

阅读量210

点赞数

文章标签： solr command performance merge features statistics

Refer to Solr.1.4.Enterprise.Search.Server.Aug.2009.pdf

Solr cores

Recall from Chapter 2 that you can either put different types of data in the same index or use separate indexes. Up to this point, the only way you would know how to use separate indexes is to actually run multiple instances of Solr. However, adding another complete instance of Solr for each type of data you want to index is more heavyweight than needed. Introduced in Solr 1.3 are Solr cores, the solution to managing multiple indexes within a single Solr instance. As a result of hot core reloading and swapping, it also makes administering a single core/index easier. Each Solr core consists of its own configuration files and index of data. Performing searches and indexing in a multicore setup is almost the same as using Solr without cores. You just add the name of the core to the individual URLs. Instead of doing a search through the URL:

http://localhost:8983/solr/select?q=dave%20matthews

in a multicore environment, you would access a core named mbartists through:

http://localhost:8983/solr/mbartists/select?q=dave%20matthews

Other than the introduction of the core name in the URL, you still perform all of your management tasks, searches, and updates in the same way as you always did in a single core setup.

Configuring solr.xml

When Solr starts up, it checks for the presence of a solr.xml file in the solr.home directory. If one exists, then it loads up all the cores defined in solr.xml. We've

used multiple cores in the sample Solr setup shipped with this book to manage the various indexes used in the examples. You can see the multicore configuration at

./examples/cores/solr.xml:

<core name="mbtracks" instanceDir="mbtracks"

dataDir="../../cores_data/mbtracks" />

<core name="mbartists" instanceDir="mbartists"

dataDir="../../cores_data/mbartists" />

<core name="mbreleases" instanceDir="mbreleases"

dataDir="../../cores_data/mbreleases" />

<core name="crawler" instanceDir="crawler"

dataDir="../../cores_data/crawler" />

<core name="karaoke" instanceDir="karaoke"

dataDir="../../cores_data/karaoke" />

</cores>

</solr>

Some of the key configuration values are:

persistent="false" specifies that any changes we make at runtime to the cores, like copying them, are not persisted. If you want to persist between restarting the changes to the cores, then set persistent="true". You would definitely do this if your indexing strategy called for indexing into a virgin core then swapping with the live core when done.

sharedLib="lib" specifies the path to the lib directory containing shared JAR files for all the cores. If you have a core with its own specific JAR files, then you would place them in the core/lib directory. For example, the karaoke core uses Solr Cell (see Chapter 3) for indexing rich content, so the JARs for parsing and extracting data from rich documents are located in

./examples/cores/karaoke/lib/.

Managing cores

While there isn't a nice GUI for managing Solr cores the way there is for some other options, the URLs you use to issue commands to Solr Cores are very straightforward, and they can easily be integrated into other management applications. If you specify persistance="true" in solr.xml, then these changes will be preserved through a reboot by updating solr.xml to reflect the changes. We'll cover a couple of the common commands using the example Solr setup in ./examples. The individual URLs listed below are stored in plain text files in ./examples/7/ to make it easier

to follow along in your own browser:

STATUS: Getting the status of the current cores is done through

http://localhost:8983/solr/admin/cores?action=STATUS.

You can select the status of a specific core, such as mbartists through

http://localhost:8983/solr/admin/cores?action=STATUS&core=

mbartists. The status command provides a nice summary of the various

cores, and it is an easy way to monitor statistics showing the growth of

your various cores.

CREATE: You can generate a new core called karaoke_test based on

the karaoke core, on the fly, using the CREATE command through

http://localhost:8983/solr/admin/cores?action=CREATE&name=karaoke_test&instanceDir=./examples/cores/karaoke_test&config=./cores/karaoke/conf/solrconfig.xml&schema=./cores/karaoke/conf/schema.xml&dataDir=./examples/cores_data/karaoke_test. If you create a new core that has the same name as an old core, then the existing core serves up requests until the new one is generated, and then the new

one takes over.

RENAME: Renaming a core can be useful to when you have fixed names of cores in your client, and you want to make a core fit that name. To rename the mbartists core to the more explicit core name music_brainz_artists, use the URL http://localhost:8983/solr/admin/cores?action=RENAME&core=mbartists&other=music_brainz_artists. This naming change only happens in memory, as it doesn't update the filesystem paths for the index and configuration directories and doesn't make much sense unless you are persisting the name change to solr.xml.

SWAP: Swapping two cores is one of the key benefits of using Solr cores. Swapping allows you to have an offline "on deck" core that is fully populated with updated data. In a single atomic operation, you can swap out the current live core that is servicing requests with your freshly populated

"on deck" core. As it's an atomic operation, your clients won't see any

delay, and there isn't any chance of mixed data being sent to the client.

As an example, we can swap the mbtracks core with the mbreleases core through http://localhost:8983/solr/admin/cores?action=SWAP&core=mbreleases&other=mbtracks. You can verify the swap occurred by going

to the mbtracks Admin page and verifying that Solr Home is displayed as cores/mbreleases/.

RELOAD: As you make minor changes to a core's configuration through solrconfig.xml or schema.xml, you don't want to be stopping and starting Solr constantly. In an environment with even a couple of cores, it can take some tens of seconds to restart all the cores and can go up to a couple of minutes. By using the reload command, you can trigger just a reload of a specific core without impacting the others. A good use of this is to configure a Solr core to be optimized for bulk indexing data. Once data is fully indexed, change the configuration to optimize for searching performance and just reload the core! A simple example for mbartists is http://localhost:8983/solr/admin/cores?action=RELOAD&core=mbartists.

MERGE: The merge command is new to Solr 1.4 and allows you to merge one or more indexes into yet another core. This can be very useful if you've split data across multiple cores and now want to bring them together without

re-indexing the data all over again. You need to issue commits to the individual indexes that are sources for data. After merging, issue another commit to make the searchers aware of the new data. The full set of commands using curl is listed in ./7/MERGE_COMMAND.txt.

Why use multicore

Solr's support of multiple cores in a single instance does more than enabling serving multiple indexes of data in a single Solr instance. Multiple cores also addresses some key needs for maintaining Solr in a production environment:

Rebuilding an index: While Solr has a lot of features to handle doing sparse updates to an index with minimal impact on performance, occasionally you need to bulk update significant amounts of your data. This invariable leads to performance issues, as your searchers are constantly being reopened. By supporting the ability to populate a separate index in a bulk fashion, you can optimize the offline index for updating content. Once the offline index has been fully populated, you can use the SWAP command to take the offline index and make it the live index.

Testing configuration changes: Configuration changes can have very differing impact depending on the type of data you have. If your production Solr has massive amounts of data, moving that to a test or development environment may not be possible. By using the CREATE and the MERGE commands, you can make a copy of a core and test it in relative isolation from the core being used by your end users. Use the RELOAD command to restart your test core to validate your changes. Once you are happy with your changes, you can either SWAP the cores or just reapply your changes to your live core and RELOAD it.

Merging separate indexes together: You may find that over time you have more separate indexes than you need, and you want to merge them together. You can use the mergeindex command to merge two cores together into a third core. However, note that you need to do a commit on both cores and ensure that no new data is indexed while the merge is happening.

Renaming cores at runtime: You can build multiple versions of the same basic core and control which one is accessed by your clients by using the RENAME command to rename a core to match the URL the clients are connecting to.

Why using multiple cores isn't the default approach?

Multi core support was first added in Solr 1.3 and has matured further in Solr 1.4. We strongly encourage you to start out with the multiple core approach, even if your solr.xml currently has a single core configured in it! While slightly more complex than just having a single index, doing this allows you to take advantage of all the administrative goodness for cores. Perhaps one day some of the commands like RELOAD and STATUS might eventually be supported in a single core, or perhaps the single core configuration might become deprecated. Multiple cores will be the key to Solr's future support for massively distributed indexes and/or huge numbers of individualized indexes. Therefore, you can expect to see it continue to evolve.

You can learn more about Solr core related features at http://wiki.apache.org/solr/CoreAdmin.