0
Planned

[2.0] Some PDF books are locking up scan

Joshua Moon 7 years ago updated 7 years ago 4

So first of all, 2.0 is a beautiful enhancement to an already awesome program. Thanks for your hard work.


With 2.0 set up, I went to scan my books and comics, but it stops scanning when it gets to some PDF books. These items do not show up in the log when this happens. The first time I resolved it by removing the book in question, but I don't want to have to skip what may be hundreds of books.


This is the only PDF related error I could find (note that this particular pdf imported fine, however):


20170415 08:14:32 [Scanner thread] WARN com.ubooquity.fileformat.pdf.b - Problem while reading file: F:\Books\Calibre\Douglas Adams\Dirk Gently's Holistic Detective Ag (1496)\Dirk Gently's Holistic Detectiv - Douglas Adams.pdf

java.lang.IndexOutOfBoundsException: Index: 2, Size: 0
at java.util.ArrayList.rangeCheck(Unknown Source) ~[na:1.8.0_73]
at java.util.ArrayList.get(Unknown Source) ~[na:1.8.0_73]
at org.apache.pdfbox.cos.COSArray.get(COSArray.java:210) ~[pdfbox-2.0.4.jar.7365617063950314879.tmp:2.0.4]
at org.apache.pdfbox.rendering.PageDrawer.getAnnotationBorder(PageDrawer.java:987) ~[pdfbox-2.0.4.jar.7365617063950314879.tmp:2.0.4]
at org.apache.pdfbox.rendering.PageDrawer.drawAnnotationLinkBorder(PageDrawer.java:1039) ~[pdfbox-2.0.4.jar.7365617063950314879.tmp:2.0.4]
at org.apache.pdfbox.rendering.PageDrawer.showAnnotation(PageDrawer.java:960) ~[pdfbox-2.0.4.jar.7365617063950314879.tmp:2.0.4]
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:193) ~[pdfbox-2.0.4.jar.7365617063950314879.tmp:2.0.4]
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145) ~[pdfbox-2.0.4.jar.7365617063950314879.tmp:2.0.4]
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) ~[pdfbox-2.0.4.jar.7365617063950314879.tmp:2.0.4]
at com.ubooquity.fileformat.pdf.b.a(SourceFile:71) ~[Ubooquity.jar:2.0.0]
at com.ubooquity.fileformat.pdf.b.a(SourceFile:44) ~[Ubooquity.jar:2.0.0]
at com.ubooquity.data.feeder.a.b(SourceFile:358) [Ubooquity.jar:2.0.0]
at com.ubooquity.data.feeder.a.a(SourceFile:193) [Ubooquity.jar:2.0.0]
at com.ubooquity.data.feeder.a.a(SourceFile:293) [Ubooquity.jar:2.0.0]
at com.ubooquity.data.feeder.a.b(SourceFile:106) [Ubooquity.jar:2.0.0]
at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_73]


The last PDF that it stopped scanning (before I removed books and went to comics only) was Randall Monroe's XKCD volume 0.


As a side note, while I love the bookmarking, it would be wonderful to have a "recently read" or "currently reading" section of some sort so that I don't have to dig through my collection every time to resume a comic.

Planned

PDF is a never-ending source of pain. :D

The error you got comes from inside the PDF parsing library I use. Not much I can do about it unfortunately (without digging deep inside the lib, which I can't do).


I plan to allow using external PDF applications to parse the files (MuPDF, Poppler, seem to be quite good at handling all kinf of PDF files). It should get much better then.


As a side note, while I love the bookmarking, it would be wonderful to have a "recently read" or "currently reading" section of some sort so that I don't have to dig through my collection every time to resume a comic.

I almost started implementing this feature, then I decided to release this beta version instead. ;)

It's definitely planned !

Glad to hear that's something you're working on!


Would it be safe to assume then that this error is related to it locking up on some PDFs? And would I be better off waiting for support for other apps

That's hard to say, especially since the PDF that triggered the error you showed me was correctly imported.

If you want, send me a full log (provided there are other errors inside, otherwise I won't find anything useful anyway). I'll take a look and try to understand what happened.


Given my rythm of release, I can't really advise waiting... ;)


(also, did you have the same problem with a previous version of Ubooquity ?)

The log doesn't have anything else useful. And I don't read a lot of digital books at the moment, so I'm not too concerned with waiting.


As far as I can remember, anything before v2.0 worked just fine with the other PDF files.