solr搭建企业搜索平台,配置文件

原文:http://blog.csdn.net/xiaozhengdong/article/details/7035914

本文的前提条件是,你已经完成了第一节,将solr搭建起来了。


solr版本solr 3.1

              solr有几个配置文件是最重要的。solr.xml,solrconfig.xml,schema.xml,db-data-config.xml

               如果你不使用solr自带的更新索引的功能,想用solrj编程来实现索引更新,那么我可以告诉你db-data-config.xml文件没什么用,一个一个来介绍下这几个配置文件吧。

    看了网上很多大神的blog,然后自己开发经验之后发现这些配置文件都是蛮好理解的。


    1.  solr.xml配置例子:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
< solr sharedLib="lib" persistent="true">
    <cores adminPath="/admin/cores">
        <core default="true" instanceDir="db" name="db"/>
        <core default="false" instanceDir="mail" name="mail"/>
        <core default="false" instanceDir="tika" name="tika"/>
    </cores>
< /solr>


告诉搜索引擎,db这个文件夹下的配置文件是可用的配置。


2.solrconfig.xml 配置例子

    这个配置文件的东西就多了。先配置最简单的一个也是必须配置的一个。以后祥谈。

  <dataDir>${solr.data.dir:./solr/db/data}</dataDir>


3.db-data-config.xml,即相当于索引对应的数据库是什么,数据库表是什么。

这个是依你的实际情况来看的。

<dataConfig> 
        <dataSource type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://127.0.0.1:1434;DatabaseName=lovo.cn" user="sa" password="xiaozd1!"/> 
    <document name="goodsInfo"> 
            <entity name="solrdb" pk="id" query="select  GoodsId,GoodsName  
                                                                                                                                                                                                                                   
                                                                                                    from table
                                                                                                    ;"> 
                                                                                               
            <field column="GoodsId" name="id" /> 
            <field column="GoodsName" name="GoodsName" />
        </entity> 
    </document> 
< /dataConfig> 

4.schema.xml 配置文件

主要是告诉搜索引擎建索引的时候哪些字段需要分词,哪些字段是什么类型。使用什么分词器,等等

<fields>
   <!-- Valid attributes for fields:
     name: mandatory - the name for the field
     type: mandatory - the name of a previously defined type from the
       <types> section
     indexed: true if this field should be indexed (searchable or sortable)
     stored: true if this field should be retrievable
     multiValued: true if this field may contain multiple values per document
     omitNorms: (expert) set to true to omit the norms associated with
       this field (this disables length normalization and index-time
       boosting for the field, and saves some memory).  Only full-text
       fields or fields that need an index-time boost need norms.
     termVectors: [false] set to true to store the term vector for a
       given field.
       When using MoreLikeThis, fields used for similarity should be
       stored for best performance.
     termPositions: Store position information with the term vector. 
       This will increase storage costs.
     termOffsets: Store offset information with the term vector. This
       will increase storage costs.
     default: a value that should be used if no value is specified
       when adding a document.
   -->

   <field name="id" type="string" indexed="true" stored="true" required="true" />
   <field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
< !--   <field name="name" type="textgen" indexed="true" stored="true"/>-->
   <field name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/>
   <field name="manu" type="textgen" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="features" type="text" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

   <field name="weight" type="float" indexed="true" stored="true"/>
<!--  <field name="price"  type="float" indexed="true" stored="true"/>-->
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />

   <!--
   The following store examples are used to demonstrate the various ways one might _CHOOSE_ to
    implement spatial.  It is highly unlikely that you would ever have ALL of these fields defined.
    -->
   <field name="store" type="location" indexed="true" stored="true"/>

   <!-- Common metadata fields, named specifically to match up with
     SolrCell metadata when parsing rich documents such as Word, PDF.
     Some fields are multiValued only because Tika currently may return
     multiple values for them.
   -->
< !--   <field name="title" type="text" indexed="true" stored="true" multiValued="true"/>-->
   <field name="subject" type="text" indexed="true" stored="true"/>
   <field name="description" type="text" indexed="true" stored="true"/>
   <field name="comments" type="text" indexed="true" stored="true"/>
   <field name="author" type="textgen" indexed="true" stored="true"/>
   <field name="keywords" type="textgen" indexed="true" stored="true"/>
< !--   <field name="category" type="textgen" indexed="true" stored="true"/>-->
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>


   <!-- catchall field, containing all other searchable text fields (implemented
        via copyField further on in this schema  -->
   <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
   
   <!-- catchall text field that indexes tokens both normally and in reverse for efficient
        leading wildcard queries. -->
   <field name="text_rev" type="text_rev" indexed="true" stored="false" multiValued="true"/>

   <!-- non-tokenized version of manufacturer to make it easier to sort or group
        results by manufacturer.  copied from "manu" via copyField -->
   <field name="manu_exact" type="string" indexed="true" stored="false"/>

   <field name="payloads" type="payloads" indexed="true" stored="true"/>

   <!-- Uncommenting the following will create a "timestamp" field using
        a default value of "NOW" to indicate when each document was indexed.
     -->
   <!--
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
     -->
  

   <!-- Dynamic field definitions.  If a field name is not found, dynamicFields
        will be used if the name matches any of the patterns.
        RESTRICTION: the glob-like pattern in the name attribute must have
        a "*" only at the start or the end.
        EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
        Longer patterns will be matched first.  if equal size patterns
        both match, the first appearing in the schema will be used.  -->
   <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
   <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
   <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
   <dynamicField name="*_t"  type="text"    indexed="true"  stored="true"/>
   <dynamicField name="*_txt" type="text"    indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
   <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
   <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>

   <!-- Type used to index the lat and lon components for the "location" FieldType -->
   <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>

   <dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>
   <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>

   <!-- some trie-coded dynamic fields for faster range queries -->
   <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
   <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
   <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
   <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
   <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>

   <dynamicField name="*_pi"  type="pint"    indexed="true"  stored="true"/>

   <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
   <dynamicField name="attr_*" type="textgen" indexed="true" stored="true" multiValued="true"/>

   <dynamicField name="random_*" type="random" />

</fields>


后面专门再写一下solrconfig和schema.xml的详细配置。这里先只谈一下他们各自的作用。

 

solr3.1版本,solr3.x版本大部分应该一致。

  一个一个的配置项来谈谈schema.xml 配置:

以下是针对schema.xml 配置文件的剖析:


              1.  <types></types>这个标签和它的意义一样,是用来表示数据有哪些类型,这些类型当然是solr内部定义的类型和自定义类型。

               2.    <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->

                   和他上面解释一样,string类型是不分词的,要建索引,要存储

                 <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

              3.数值类型,有如下几个类型是默认数值类型,如果想用于排序请用   tint/tfloat/tlong/tdouble类型

       <!--
Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.
    -->
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>

    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>


   <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>


        4.时间类型:如果想用于快速排序查询,用tdate(看到这里我的排序没用tdate,得改啊。。)

     Note: For faster range queries, consider the tdate type

    <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>

    <!-- A Trie based date field for faster date range queries and date faceting. -->
    <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>


    5.专门用于分词的字段。在里面包含了定义使用什么分词器,可以手工定制。

<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">

         <!--建索引用到的分词器,我用的是IKTokenizerFactory,中文分词-->

        <analyzer type="index">
        <!--<tokenizer class="solr.WhitespaceTokenizerFactory"/>-->
        <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>

     <!-- <analyzer type="query">
       <!-- <tokenizer class="solr.WhitespaceTokenizerFactory"/>-->
        <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>

     <!--查询用到的分词器IKTokenizerFactory,为了支持中文查询,这是必须的。-->

      <analyzer type="query">
       <!-- <tokenizer class="solr.WhitespaceTokenizerFactory"/>-->
        <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
  </fieldType>


其他几个类别都是不常用的,也是通过分词器来定义不同的类别。和第五个类似。

6.索引字段名称定义。

   <fields>
   <!-- Valid attributes for fields:
     name: mandatory - the name for the field
     type: mandatory - the name of a previously defined type from the
       <types> section
     indexed: true if this field should be indexed (searchable or sortable)
     stored: true if this field should be retrievable
     multiValued: true if this field may contain multiple values per document
     omitNorms: (expert) set to true to omit the norms associated with
       this field (this disables length normalization and index-time
       boosting for the field, and saves some memory).  Only full-text
       fields or fields that need an index-time boost need norms.
     termVectors: [false] set to true to store the term vector for a
       given field.
       When using MoreLikeThis, fields used for similarity should be
       stored for best performance.
     termPositions: Store position information with the term vector. 
       This will increase storage costs.
     termOffsets: Store offset information with the term vector. This
       will increase storage costs.
     default: a value that should be used if no value is specified
       when adding a document.
   -->

   <field name="id" type="string" indexed="true" stored="true" required="true" />
   <field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
< !--   <field name="name" type="textgen" indexed="true" stored="true"/>-->
   <field name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/>
   <field name="manu" type="textgen" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="features" type="text" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

   <field name="weight" type="float" indexed="true" stored="true"/>
<!--  <field name="price"  type="float" indexed="true" stored="true"/>-->
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />
</fields>


  id:是索引字段的唯一标识。

termVectors="true"属性主要用于相关搜索。

multiValued="true"属性,一般用于多个字段组成一个字段的情况。

   <field name="content_type" type="text" indexed="true" stored="true" multiValued="true"/>

一般用于查询的字段定义为multiValued。

7.  <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>表示动态字段,暂时没用到。

 

在使用Python来安装geopandas包时,由于geopandas依赖于几个其他的Python库(如GDAL, Fiona, Pyproj, Shapely等),因此安装过程可能需要一些额外的步骤。以下是一个基本的安装指南,适用于大多数用户: 使用pip安装 确保Python和pip已安装: 首先,确保你的计算机上已安装了Python和pip。pip是Python的包管理工具,用于安装和管理Python包。 安装依赖库: 由于geopandas依赖于GDAL, Fiona, Pyproj, Shapely等库,你可能需要先安装这些库。通常,你可以通过pip直接安装这些库,但有时候可能需要从其他源下载预编译的二进制包(wheel文件),特别是GDAL和Fiona,因为它们可能包含一些系统级的依赖。 bash pip install GDAL Fiona Pyproj Shapely 注意:在某些系统上,直接使用pip安装GDAL和Fiona可能会遇到问题,因为它们需要编译一些C/C++代码。如果遇到问题,你可以考虑使用conda(一个Python包、依赖和环境管理器)来安装这些库,或者从Unofficial Windows Binaries for Python Extension Packages这样的网站下载预编译的wheel文件。 安装geopandas: 在安装了所有依赖库之后,你可以使用pip来安装geopandas。 bash pip install geopandas 使用conda安装 如果你正在使用conda作为你的Python包管理器,那么安装geopandas和它的依赖可能会更简单一些。 创建一个新的conda环境(可选,但推荐): bash conda create -n geoenv python=3.x anaconda conda activate geoenv 其中3.x是你希望使用的Python版本。 安装geopandas: 使用conda-forge频道来安装geopandas,因为它提供了许多地理空间相关的包。 bash conda install -c conda-forge geopandas 这条命令会自动安装geopandas及其所有依赖。 注意事项 如果你在安装过程中遇到任何问题,比如编译错误或依赖问题,请检查你的Python版本和pip/conda的版本是否是最新的,或者尝试在不同的环境中安装。 某些库(如GDAL)可能需要额外的系统级依赖,如地理空间库(如PROJ和GEOS)。这些依赖可能需要单独安装,具体取决于你的操作系统。 如果你在Windows上遇到问题,并且pip安装失败,尝试从Unofficial Windows Binaries for Python Extension Packages网站下载相应的wheel文件,并使用pip进行安装。 脚本示例 虽然你的问题主要是关于如何安装geopandas,但如果你想要一个Python脚本来重命名文件夹下的文件,在原始名字前面加上字符串"geopandas",以下是一个简单的示例: python import os # 指定文件夹路径 folder_path = 'path/to/your/folder' # 遍历文件夹中的文件 for filename in os.listdir(folder_path): # 构造原始文件路径 old_file_path = os.path.join(folder_path, filename) # 构造新文件名 new_filename = 'geopandas_' + filename # 构造新文件路径 new_file_path = os.path.join(folder_path, new_filename) # 重命名文件 os.rename(old_file_path, new_file_path) print(f'Renamed "{filename}" to "{new_filename}"') 请确保将'path/to/your/folder'替换为你想要重命名文件的实际文件夹路径。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值