歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
Linux教程網 >> Linux基礎 >> Linux教程 >> Nutch1.2運行時可能發生的錯誤以及解決辦法

Nutch1.2運行時可能發生的錯誤以及解決辦法

日期:2017/2/28 15:49:33   编辑:Linux教程

錯誤1.由linux下允許打開的最大文件數量引起

錯誤消息:
java.io.IOException: background merge hit exception: _0:C500->_0 _1:C500->_0 _2:C500->_..... [optimize]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2310)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2249)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2219)
at org.apache.nutch.indexer.lucene.LuceneWriter.close(LuceneWriter.java:237)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at org.apache.Hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.FileNotFoundException: /var/lib/crawlzilla/nutch-crawler/mapred/local/index/_682243155/_6a.frq (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:76)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:87)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:67)
at org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:129)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:609)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4239)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:231)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:288)

原因及解決方法:
Java程序運行在Unix/Linux環境下,並且該Java程序需要對文件做大量的操作,則會產生這樣的異常。
Unix/Linux環境下有文件句柄的限制,可以使用ulimit -n查看當前環境允許打開的文件句柄數量,默認為1024。但是在我們的Java程序並發接近於或者多余1024而同時又在頻繁的讀寫文件,所以會出

現以上異常,解決方式是按實際需要增大對文件句柄的限制數。
命令:
ulimit –n 32768


錯誤2.硬盤空間不足

錯誤消息:
Error: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:84)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:218)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:157)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2454)

原因及解決方法:
當硬盤不足時nutch會等待可用空間,如果是分布式的話可以增加一個運行節點。

Nutch的詳細介紹:請點這裡
Nutch的下載地址:請點這裡

Copyright © Linux教程網 All Rights Reserved