A Quick Guide to Running Your First Crawl Job

The Main Console page is displayed after you have installed Heritrix and logged into the WUI.

Enter the name of the new job in the text box with the "Create new job with recommended starting configuration" label. Then click "create."
The new job will be displayed in the list of jobs on the Main Console page. The job will be based on the profile-defaults profile in Hertitrix 3.0. As of Heritrix 3.1, the profile-defaults profile has been eliminated. See Profiles for more information.

Click on the name of the new job and you will be taken to the job page.

The name of the configuration file, crawler-beans.cxml, will be displayed at the top of the page. Next to the name is an "edit" link.
Click on the "edit" link and the contents of the configuration file will be displayed in an editable text area.
At this point you must enter several properties to make the job runnable.
First, add a valid value to the metadata.operatorContactUrl property, such as http://www.archive.org.
Next, populate the <prop> element of the longerOverrides bean with the seed values for the crawl. A test seed is configured for reference. When done click "save changes" at the top of the page. For more detailed information on configuring jobs see Configuring Jobs and Profiles.
From the job screen, click "build." This command will build the Spring infrastructure needed to run the job. In the Job Log the following message will display: "INFO JOB instantiated."
Next, click the "launch" button. This command launches the job in "paused" mode. At this point the job is ready to run.
To run the job, click the "unpause" button. The job will now begin sending requests to the seeds of your crawl. The status of the job will be set to "Running." Refresh the page to see updated statistics.
Note
A job will not be modified if the profile or job it was based on is changed.
Jobs based on the default profile are not ready to run as-is. The metadata.operatorContactUrl must be set to a valid value.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值