SolrCloud和Elasticsearch是当前使用最广泛的基于Lucene的全文索引解决方案。当你的系统数据单表达到1亿级别以上时,为了更强大的模糊查询和实时分析功能,强烈建立使用以上2中全文索引,来解决性能瓶颈问题。当然,为了去MIO需求,还可以考虑将数据库系统向MPP的postgres-xl,HBase等迁移。
笔者最近一直在使用SolrCloud开发,虽然网上讲解部署solr Cloud的文档很多,但是完整讲述如何部署SolrCloud,以及二次开发过程的很少。
先写怎么在开发人员win7环境下部署solrCloud 4.10.2+zookeeper伪分布方式。
1、Zookeeper部署
1.1、下载zookeeper:
http://www.apache.org/dyn/closer.cgi/zookeeper/
我使用了2个zkServer做集群,建立SolrCloud目录,分别复制2份ZkServer:
1.2 zookeeper配置
修改solrcloud\zkServer-1\conf下的zoo_sample.cfg 文件为zoo.cfg,使用编辑器修改配置信息.
zkServer-1:
tickTime=2000
initLimit=10
syncLimit=5
<span style="color:#ff0000;"><strong>dataDir=D:/solrcloud/zkServer-1/data</strong></span>
clientPort=2181
<strong>server.1=127.0.0.1:2881:2771
server.2=127.0.0.1:2882:2772</strong>
注意修改zk data存放路径
在该路径下建立myid文件,在文件内输入1表示zkServer 1.
zkServer-2:
tickTime=2000
initLimit=10
syncLimit=5
<span style="color:#ff0000;"><strong>dataDir=D:/solrcloud/zkServer-2/data</strong></span>
clientPort=2182
<span style="color:#ff0000;"><strong>server.1=127.0.0.1:2881:2771
server.2=127.0.0.1:2882:2772</strong></span>
注意修改zk data存放路径
在该路径下建立myid文件,在文件内输入2表示zkServer 2.
1.3、启动zookeeper服务:
进入ZkServer-1\bin路径,点自己zkServer.cmd,启动zk1服务,显示
显示连接server2异常信息不必理会,启动服务2,自然会消失。
同法启动zkServer2.
注意别让你的dos窗口编辑选项在快速编辑模式,不然服务等回车。
2、eclipse中编译并运行ZooKeeper
下载zookeeper3.4.6包,直接源代码包含在内,使用eclipse编译开发。
2.1 建立eclipse工程,编译代码
在dos窗口,当前解压路径下,运行
ant eclipse
只要网络畅通,这步骤执行比较容易,下载的包也不多。
执行成功后,原目录下产生eclipse project的.project和.classpath文件,导入eclipse.
zookeeper还有非java的代码,这里暂时不做编译。
生成的工程jdk采用了1.8,如果只有安装1.7版本,将工程compile level改为1.7就可以消除工程中的编译错误。
2.2 eclipse中运行zookeeper server
查看zkServer.cmd文件,如下:
setlocal
call "%~dp0zkEnv.cmd"
set ZOOMAIN=org.apache.zookeeper.server.quorum.QuorumPeerMain
echo on
java "-Dzookeeper.log.dir=%ZOO_LOG_DIR%" "-Dzookeeper.root.logger=%ZOO_LOG4J_PROP%" -cp "%CLASSPATH%" %ZOOMAIN% "%ZOOCFG%" %*
endlocal
设置eclipse运行方式:
- 设置运行命令:
RUN->RUN Configurations 进入设置界面,如图:
java application->右键new,设置建立信息:
- 设置运行参数:
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.quorum.QuorumPeerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
log4j的配置文件没有生效,将conf下log4j的配置文件log4j.properties文件包含到classpath中即可,可以在工程源码路径中增加conf路径,编译后运行:
2015-04-14 11:25:33,497 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: D:\solrcloud\zkServer-1\conf\zoo.cfg
2015-04-14 11:25:33,512 [myid:] - WARN [main:QuorumPeerConfig@293] - No server failure will be tolerated. You need at least 3 servers.
2015-04-14 11:25:33,512 [myid:] - INFO [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2015-04-14 11:25:33,528 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2015-04-14 11:25:33,528 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2015-04-14 11:25:33,528 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2015-04-14 11:25:33,637 [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2015-04-14 11:25:33,684 [myid:1] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2015-04-14 11:25:33,700 [myid:1] - INFO [main:QuorumPeer@959] - tickTime set to 2000
2015-04-14 11:25:33,700 [myid:1] - INFO [main:QuorumPeer@979] - minSessionTimeout set to -1
2015-04-14 11:25:33,700 [myid:1] - INFO [main:QuorumPeer@990] - maxSessionTimeout set to -1
2015-04-14 11:25:33,700 [myid:1] - INFO [main:QuorumPeer@1005] - initLimit set to 10
2015-04-14 11:25:33,715 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot D:\solrcloud\zkServer-1\data\version-2\snapshot.0
2015-04-14 11:25:33,731 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener@504] - My election bind port: /127.0.0.1:2771
2015-04-14 11:25:33,746 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2015-04-14 11:25:33,746 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id = 1, proposed zxid=0x0
2015-04-14 11:25:33,765 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2015-04-14 11:25:34,786 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address /127.0.0.1:2772
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
在cmd下运行zkServer2,异常信息就不再会出现,eclipse下运行和编译成功,需要的时候可以随时设置断点调试zookeeper程序。
3、部署SolrCloud
3.1 下载并启动Solr5.0
solr.cmd start -c -z "127.0.0.1:2181,127.0.0.1:2182"
正确启动后显示如下信息:
Starting Solr on port 8983 from E:\bigdata\solr-5.0.0\server
Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI
3.2、Solr5.0 管理界面
solr-5.0.0\server\webapps\solr.war | Jetty运行的war包,可以部署到tomcat下运行 |
solr-5.0.0\server\solr-webapp\webapp | 解压war包后的webapp程序 |
solr-5.0.0\server\solr | 默认solr.xml配置文件所在地,可以通过参数指定路径 |
solr-5.0.0\server\logs | 运行日志路径,启动失败的话,到此处查看日志 |
solr-5.0.0\example\example-DIH | 这里有一个solr完整的配置范例 |
solr-5.0.0\contrib | 第三方贡献,常用的是dataimporthander |
3.3 建立Collection
solr
安装包默认提供了一个实例配置:
example/example-DIH/solr/solr/conf
安装包默认提供了一个实例配置: example/example-DIH/solr/solr/conf重要参数:
-c : collection名称
-d : 配置文件的路径,可以使用上面提供的实例配置
-n : 配置名称可以和collection名称不同,默认这个参数不填的话,会使用collection名称作为config名称
-shards : 创建的shard个数,建议和集群节点数量一致。
-replicationFactor : 每个shard的副本数,综合考虑为了保证集群的稳定性,建议配置为 最少2个,最多集群节点数量/shard数量 * 2
E:\bigdata\solr-5.0.0\bin>solr create_collection -c traffic -d E:/bigdata/solr-5.0.0/example/example-DIH/solr/solr/conf/ -shards 2 -replicationFactor 2
Connecting to ZooKeeper at joepc:2181,joepc:2182
Uploading E:\bigdata\solr-5.0.0\example\example-DIH\solr\solr\conf for config traffic to ZooKeeper at joepc:2181,joepc:2182
Creating new collection 'traffic' using command:
http://192.168.56.1:8983/solr/admin/collections?action=CREATE&name=traffic&numShards=2&replicationFactor=2&maxShardsPerNode=4&collection.configName=traffic
{
"responseHeader":{
"status":0,
"QTime":10728},
"success":{"":{
"responseHeader":{
"status":0,
"QTime":10276},
"core":"traffic_shard2_replica2"}}}
从solr adminUI可以看到zookeeper数据中建立了traffic集合,并将配置文件上传到zookeeper中,如果修改schema.xml,solrconfig.xml,可以重新upload:
4、从源码编译并运行SolrCloud
4.1、编译Solr5源码:
下载solr-5.0.0-src.tgz,运行ant查看target:
Main targets:
check-svn-working-copy Checks the status of the SVN working copy
clean Clean Lucene and Solr build dirs
clean-eclipse Removes all Eclipse configuration files
clean-idea Removes all IntelliJ IDEA configuration files
clean-jars Remove all JAR files from lib folders in the checkout
clean-maven-build Clean up Maven POMs in working copy
clean-netbeans Removes all Netbeans configuration files
compile Compile Lucene and Solr
compile-test Compile Lucene and Solr tests and test-frameworks
documentation Generate Lucene and Solr Documentation
documentation-lint Validates the generated documentation (HTML errors, broken links,...)
eclipse Setup Eclipse configuration
generate-maven-artifacts Generate Maven Artifacts for Lucene and Solr
get-maven-poms Copy Maven POMs from dev-tools/maven/ to maven-build/
idea Setup IntelliJ IDEA configuration
ivy-bootstrap Download and install Ivy in the users ant lib dir
jar Build Lucene and Solr Jar files
jar-checksums Recompute SHA1 checksums for all JAR files.
jar-src Build Lucene and Solr Source Jar files
netbeans Setup Netbeans configuration
nightly-smoke Builds an unsigned release and smoke tests it (pass '-DsmokeTestRelease.java8=/path/to/jdk1.8.0' to additionally test
pitest Run PITest on both Lucene and Solr
precommit Run basic checks before committing
rat-sources Runs rat across all sources and tests
regenerate Runs all code regenerators
remove-maven-artifacts Removes all Lucene/Solr Maven artifacts from the local repository
resolve Resolves all dependencies
run-clover Runs all tests to measure coverage and generates report (pass "ANT_OPTS=-Xmx1536M" as environment)
run-maven-build Runs the Maven build using automatically generated POMs
test Test both Lucene and Solr
test-help Test runner help
test-with-heapdumps Runs tests with heap dumps on OOM enabled (if VM supports this)
validate Validate dependencies, licenses, etc.
validate-maven-dependencies Validates maven dependencies, licenses, etc.
Default target: -projecthelp
这里编译用到的就是ant eclipse和ant jar,ant compile等。
因为网络因素,需要将IVY库设置一下到国内网站和本地(已经常用maven的开发人员来说):
solr-5.0.0\lucene\ivy-settings.xml
<ivysettings>
<settings defaultResolver="default"/>
<span style="color:#cc0000;"><property name="local-maven2-dir" value="E:\m2-repo\" /></span>
<properties file="${ivy.settings.dir}/ivy-versions.properties" override="false"/>
<include url="${ivy.default.settings.dir}/ivysettings-public.xml"/>
<include url="${ivy.default.settings.dir}/ivysettings-shared.xml"/>
<include url="${ivy.default.settings.dir}/ivysettings-local.xml"/>
<include url="${ivy.default.settings.dir}/ivysettings-main-chain.xml"/>
<caches lockStrategy="artifact-lock" resolutionCacheDir="${common.build.dir}/ivy-resolution-cache" />
<resolvers>
<span style="color:#ff0000;"> <ibiblio name="sonatype-releases" root="http://maven.oschina.net/content/groups/public/" m2compatible="true" /></span>
<ibiblio name="maven.restlet.org" root="http://maven.restlet.org" m2compatible="true" />
<ibiblio name="releases.cloudera.com" root="http://repository.cloudera.com/content/repositories/releases" m2compatible="true" />
<!-- needed only for newer svnkit releases, e.g. 1.8.x -->
<ibiblio name="svnkit-releases" root="http://maven.tmatesoft.com/content/repositories/releases" m2compatible="true" />
<!-- you might need to tweak this from china so it works -->
<ibiblio name="working-chinese-mirror" root="http://uk.maven.org/maven2" m2compatible="true" />
<span style="color:#ff0000;"> <!-- -->
<filesystem name="local-maven-2" m2compatible="true" local="true">
<artifact
pattern="${local-maven2-dir}/[organisation]/[module]/[revision]/[module]-[revision].[ext]" />
<ivy
pattern="${local-maven2-dir}/[organisation]/[module]/[revision]/[module]-[revision].pom" />
</filesystem></span>
<chain name="default" returnFirst="true" checkmodified="true" changingPattern=".*SNAPSHOT">
<resolver ref="local"/>
<span style="color:#ff0000;"><resolver ref="local-maven-2" /></span>
<resolver ref="main"/>
<resolver ref="maven.restlet.org" />
<resolver ref="sonatype-releases" />
<resolver ref="releases.cloudera.com"/>
<!-- <resolver ref="svnkit-releases" /> -->
<resolver ref="working-chinese-mirror" />
</chain>
</resolvers>
</ivysettings>
红色标注了需要修改的地方,将maven本地库和远程库设置到了本地和国内网站,其中http://maven.restlet.org和http://repository.cloudera.com/content/repositories/releases容易连接超时,可以自己上去下载到本地ivy cache路径,默认是:C:\Users\Administrator\.ivy2\cache根据提示,放到相应的路径下。
运行 ant eclipse
执行成功后,使用eclipse导入项目,显示如图:
src加载了所有的源代码和test代码,经过漫长的eclipse自动编译等待,除了contrib\morphlines-core\src\test的kitesdk因为api变更缘故没法编译通过,其它应该可以全部通过。
实际开发中,不需要保持这么多的src,只需要将核心需要二次开发和调试的代码保留,其它使用打包的工程就可以。
4.2 编译打包solr源码:
solr需要使用Python,perl,subversion,故需要先下载Python,ActivePerl,TortoiseSVN安装,并将他们bin路径加载到全局path变量中。
运行打包编译命令:
ant jar
第一次运行失败,因为contrib\morphlines-core\src\test的kitesdk因为api变更缘故没法编译通过,删除test下java类,再次运行就可以全部通过,可以看到solr目录的build中生成了jar包
打包war:
在solr目录下运行 ant server打包server代码。
成功后相应目录下可以招到solr.war包,具体参考前文的表格。
solr下ant的所有target:
usage:
[echo] Welcome to the Solr project!
[echo] Use 'ant server' to create the Solr server.
[echo] Use 'bin/solr' to run the Solr after it is created.
[echo] And for developers:
[echo] Use 'ant clean' to clean compiled files.
[echo] Use 'ant compile' to compile the source code.
[echo] Use 'ant dist' to build the project JAR files.
[echo] Use 'ant documentation' to build documentation.
[echo] Use 'ant generate-maven-artifacts' to generate maven artifacts.
[echo] Use 'ant package' to generate zip, tgz for distribution.
[echo] Use 'ant test' to run unit tests.
BUILD SUCCESSFUL
至此,所有编译打包完成,可以类似前文运行solr服务。
4.3、eclipse下运行调试Solr5
1、设置Dynamic Web Module
2、设置webcontent路径:
<wb-resource deploy-path="/solr" source-path="/solr/webapp/web" tag="defaultRootSource"/>
修改solr-5.0.0\.settings\.jsdtscope,将<classpathentry kind="src" path="WebContent"/>
改为<classpathentry kind="src" path="solr/webapp/web"/>
close project然后重新打开project后生效;
3、刷新工程:
4、设置源代码生成的classes:
5、修改/solr/webapp/web为完整可以运行的webapp:
6、增加log4j,slf4j等:
7、编译后还有的错误:
- 1、package.html错误,重复输入了<!doctype html public "-//w3c//dtd html 4.0 transitional//en">删除一行就可以;
- 2、JettySolrRunner编译错误,因为想在tomcat下运行,没有引入Jetty包,可以将Jetty包加入或者编译的时候excluding该类,tomcat下,该类没用;
-DSTOP.KEY=solrrocks
<span style="font-family: Arial, Helvetica, sans-serif;">-DSTOP.PORT=7983
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Djava.io.tmpdir=E:\bigdata\solr-5.0.0\server\tmp-Djava.net.preferIPv4Stack=true
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Djetty.home=E:\bigdata\solr-5.0.0\server
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Djetty.port=8983
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Dlog4j.configuration=file:E:\bigdata\solr-5.0.0\server\resources\log4j.properties
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Dsolr.install.dir=E:\bigdata\solr-5.0.0
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Dsolr.solr.home=E:\bigdata\solr-5.0.0\server\solr
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Duser.timezone=UTC
</span><span style="font-family: Arial, Helvetica, sans-serif;">-DzkClientTimeout=15000
</span><span style="font-family: Arial, Helvetica, sans-serif;">-DzkHost=127.0.0.1:2181,127.0.0.1:2182
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+CMSParallelRemarkEnabled
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+CMSScavengeBeforeRemark
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+ParallelRefProcEnabled
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+PrintGCApplicationStoppedTime
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+PrintGCDateStamps
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+PrintGCDetails
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+PrintGCTimeStamps
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+PrintHeapAtGC
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+PrintTenuringDistribution
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+UseCMSInitiatingOccupancyOnly
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+UseConcMarkSweepGC
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:+UseParNewGC
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:-UseSuperWord
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:CMSFullGCsBeforeCompaction=1
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:CMSInitiatingOccupancyFraction=50
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:CMSMaxAbortablePrecleanTime=6000
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:CMSTriggerPermRatio=80
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:ConcGCThreads=4
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:MaxTenuringThreshold=8
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:NewRatio=3
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:ParallelGCThreads=4
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:PretenureSizeThreshold=64m
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:SurvivorRatio=4
</span><span style="font-family: Arial, Helvetica, sans-serif;">-XX:TargetSurvivorRatio=90
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Xloggc:E:\bigdata\solr-5.0.0\server\logs/solr_gc.log
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Xms512m
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Xmx512m
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Xss256k
</span><span style="font-family: Arial, Helvetica, sans-serif;">-verbose:gc</span>
选择有效的参数:
-Xms512m
<span style="font-family: Arial, Helvetica, sans-serif;">-Xmx512m
</span><span style="font-family: Arial, Helvetica, sans-serif;">-Dsolr.solr.home=E:\bigdata\solr-5\solr\server\solr
</span><span style="font-family: Arial, Helvetica, sans-serif;">-DzkHost=127.0.0.1:2181,127.0.0.1:2182</span>
设置到tomcat的jvm 参数里,如果是sysdeo,设置方法如图: