全文索引可以很方便的索引存储在INTERNET上的信息。在数据库中只需要存储需要索引的文章的URL就可以了。
由于目前讨论的是DATASTORE属性,因此这个例子只索引HTML文章,对于其他需要使用FILTER属性的文章,在以后讨论。
[@more@]SQL> show user;
USER is "MYUSER"
SQL> CREATE TABLE T (ID NUMBER, DOCS VARCHAR2(1000));
Table created.
SQL> INSERT INTO T VALUES (1, 'http://yangtingkun.itpub.net/');
1 row created.
SQL> INSERT INTO T VALUES (2, 'http://www.itpub.net/');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> exec CTX_DDL.CREATE_PREFERENCE('TEST_URL', 'URL_DATASTORE');
PL/SQL procedure successfully completed.
SQL> exec ctx_ddl.set_attribute('TEST_URL','HTTP_PROXY','172.17.61.29:8002');
PL/SQL procedure successfully completed.
SQL> exec ctx_ddl.set_attribute('TEST_URL','Timeout','300');
PL/SQL procedure successfully completed.
SQL> CREATE INDEX IND_T_DOCS ON T (DOCS) INDEXTYPE IS CTXSYS.CONTEXT
2 PARAMETERS ('DATASTORE TEST_URL');
Index created.
SQL> SELECT * FROM T WHERE CONTAINS(DOCS, 'ORACLE') > 0;
no rows selected
由于公司上网需要设置代理,发现设置了proxy服务器的IP地址和port不行,必须要指定为proxy服务的机器名和port.
如果需要代理才能连到INTERNET上,可以设置URL_DATASTORE的HTTP_PROXY等属性。URL_DATASTORE还包含很多的属性
Attribute Attribute Values
timeout
Specify the timeout in seconds. The valid range is 15 to 3600 seconds. The default is 30.
maxthreads
Specify the maximum number of threads that can be running simultaneously. Use a number between
1and 1024. The default is 8.
urlsize
Specify the maximum length of URL string in bytes. Use a number between 32 and 65535. The default
is 256.
maxurls
Specify maximum size of URL buffer. Use a number between 32 and 65535. The defaults is 256.
maxdocsize
Specify the maximum document size. Use a number between 256 and 2,147,483,647 bytes (2
gigabytes). The defaults is 2,000,000.
http_proxy
Specify the host name of http proxy server. Optionally specify port number with a colon in the
form hostname:port.
ftp_proxy
Specify the host name of ftp proxy server. Optionally specify port number with a colon in the
form hostname:port.
no_proxy
Specify the domain for no proxy server. Use a comma separated string of up to 16 domain names.
最后要说明的是,Oracle中仅仅保存被索引文档的URL地址,因此,如果文档本身发生了变化,Oracle是无法知道的,也无法去同步索引,这时必须通过修改索引列也就是URL地址列的方式来通知Oracle,被索引数据已经发生了变化。
由于目前讨论的是DATASTORE属性,因此这个例子只索引HTML文章,对于其他需要使用FILTER属性的文章,在以后讨论。
[@more@]SQL> show user;
USER is "MYUSER"
SQL> CREATE TABLE T (ID NUMBER, DOCS VARCHAR2(1000));
Table created.
SQL> INSERT INTO T VALUES (1, 'http://yangtingkun.itpub.net/');
1 row created.
SQL> INSERT INTO T VALUES (2, 'http://www.itpub.net/');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> exec CTX_DDL.CREATE_PREFERENCE('TEST_URL', 'URL_DATASTORE');
PL/SQL procedure successfully completed.
SQL> exec ctx_ddl.set_attribute('TEST_URL','HTTP_PROXY','172.17.61.29:8002');
PL/SQL procedure successfully completed.
SQL> exec ctx_ddl.set_attribute('TEST_URL','Timeout','300');
PL/SQL procedure successfully completed.
SQL> CREATE INDEX IND_T_DOCS ON T (DOCS) INDEXTYPE IS CTXSYS.CONTEXT
2 PARAMETERS ('DATASTORE TEST_URL');
Index created.
SQL> SELECT * FROM T WHERE CONTAINS(DOCS, 'ORACLE') > 0;
no rows selected
由于公司上网需要设置代理,发现设置了proxy服务器的IP地址和port不行,必须要指定为proxy服务的机器名和port.
如果需要代理才能连到INTERNET上,可以设置URL_DATASTORE的HTTP_PROXY等属性。URL_DATASTORE还包含很多的属性
Attribute Attribute Values
timeout
Specify the timeout in seconds. The valid range is 15 to 3600 seconds. The default is 30.
maxthreads
Specify the maximum number of threads that can be running simultaneously. Use a number between
1and 1024. The default is 8.
urlsize
Specify the maximum length of URL string in bytes. Use a number between 32 and 65535. The default
is 256.
maxurls
Specify maximum size of URL buffer. Use a number between 32 and 65535. The defaults is 256.
maxdocsize
Specify the maximum document size. Use a number between 256 and 2,147,483,647 bytes (2
gigabytes). The defaults is 2,000,000.
http_proxy
Specify the host name of http proxy server. Optionally specify port number with a colon in the
form hostname:port.
ftp_proxy
Specify the host name of ftp proxy server. Optionally specify port number with a colon in the
form hostname:port.
no_proxy
Specify the domain for no proxy server. Use a comma separated string of up to 16 domain names.
最后要说明的是,Oracle中仅仅保存被索引文档的URL地址,因此,如果文档本身发生了变化,Oracle是无法知道的,也无法去同步索引,这时必须通过修改索引列也就是URL地址列的方式来通知Oracle,被索引数据已经发生了变化。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/271283/viewspace-1022086/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/271283/viewspace-1022086/