0
Answered

Technics to scan huge files collection and skip already processed files

Thankful 2 years ago updated by Tom 2 years ago 4

Hello,

I am wondering how you have managed to have such a good performance when we scan our directory of comics. Would you agree to share that piece of code or the technics used ?


thank you in advance !

Answered

The scan process principle is very simple: Ubooquity lists files to get their path and last modification date (it's very fast as the files are not read, only the disk "table of content" is), then it compares the modification date it stored in its database during the previous scan and processes only the files that have been modified since the previous scan (by "processes", I mean reading the file to extract the cover and metadata, which is the time consuming operation).

So the first scan of a big collection will take a long time, but subsequent scans will be very fast if only a few files have been added.

Quick question on this. If a file is removed/moved does it delete all metadata/thumbnail or cache it for later use. On one hand, I can see it not being deleted  being useful of a file is moved and then moved back  (or if it can link the other version somehow), but on the other hand I can see the cache of the files growing quite large.

The metada (in the database) and cover file thumbnails are deleted.