歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
Linux教程網 >> Linux基礎 >> Linux教程 >> Project 1-1: Ubuntu下配置和運行Heritrix

Project 1-1: Ubuntu下配置和運行Heritrix

日期:2017/2/28 16:27:19   编辑:Linux教程

1. 因為決定從Ubuntu下源碼運行Heritrix, 所以首先安裝Eclipse:

$sudo apt-get install eclipse

2. 從這裡下載Heritrix源碼:

3. 解壓:

$tar xzvf /home/xxx/workspace/heritrix-1.14.4-src.tar.gz

4. 將文件夾改名為heritrix(不是必須的, 這裡是便於導入到eclipse):

$mv heritrix-1.14.4 heritrix

5. 打開eclipse, New->Java Project->Create Project from exsiting source
注意保證輸入的Project Name和項目文件夾名稱一致.

6. 進入Run Configurations, 【Linux公社 www.linuxidc.com 】指定Main class為

org.archive.crawler.Heritrix
添加VM arguments:

-Dheritrix.development -Xmx512m

添加Classpath: Advanced->Add External Folder, 將src/conf和src/resources添加到Classpath中. 否則會報兩個錯誤:

Exception in thread "main" java.io.IOException: Failed to load properties file from filesystem or from classpath.
at org.archive.crawler.Heritrix.getPropertiesInputStream(Heritrix.java:924)
at org.archive.crawler.Heritrix.loadProperties(Heritrix.java:863)
at org.archive.crawler.Heritrix.containerInitialization(Heritrix.java:492)
at org.archive.crawler.Heritrix.main(Heritrix.java:555)

2010-07-10 10:03:01.250 嚴重 thread-10 org.archive.util.ArchiveUtils.() TLD list unavailable
java.lang.NullPointerException
at java.io.Reader.(Unknown Source)
at java.io.InputStreamReader.(Unknown Source)
at org.archive.util.ArchiveUtils.(ArchiveUtils.java:759)
........
更嚴重的是, 會使Heritrix的Modules界面不能改變選擇項.


7. 其他諸如設置用戶名密碼, 添加和運行job之類的, 這裡就不詳細說明了。

Copyright © Linux教程網 All Rights Reserved