Heritrix is relatively easy to automate at the command line using the cron program normally found on Unix and Linux systems. The program crontab can be used to create a schedule for “cron jobs,” which are scheduled executions of one or more commands. The command to create or edit a cron job is crontab –e. This command will open the user’s default editor in order to create the schedule for the cron job. The syntax for the crontab file is as follows:
min hour dayofmonth month dayofweek(0-6) command
An asterisk is used when the value is not needed. Multiple days or months can be separated by a hypen if they are sequential (for example, 1-7 for the 1st through the 7th of a month) or a comma if they are not in order (for example, 1,3,5 for the 1st, the 3rd, and the 5th). A crontab file to have Heritrix on fiat.ischool.utexas.edu run January 1, June 1, and September 1 would read:
0 0 1 1,6,9 * /path/to/bin/heritrix --nowui ORDER_FILE