Your comments

We can do that.  I might have to change the logic a little because I pull the publisher logo off of the main publisher list page.  To scrape the hidden ones I'd have to go to the publisher page, then navigate to a comic from that publisher to pull their logo.  If I made the flow the same for the non-hidden and the hidden that would make stuff more re-usable.  I was thinking a few other things I could store in the data would be the scraper that was used to pull in that data (just thinking ahead if there would ever be more than one importer).  If I have both the source url (which I'm already storing) and the importer used I'd have the ability to re-scrape an individual publisher, or list of publishers (by publisher name more than likely).  Then to add a new one I could either make it so that you could just add a new entry to the json file and specify the url and importer to use, or I could make it so that you could add it via the CLI.

True.  If you provide a list of those publishers I could add those.  Would just need the URLs and I can pipe it through the same scraping as the rest of them.  In terms of distributing the actual publisher folders.  I don't think i'll check those in to github, but I'll host them on google drive or something.  Mainly because it would make the repo over 200 megs which is pretty crazy.

I can actually get that style working with a slightly different html layout.  It would actually solve the the padding issue I describe in the project as well.  As I would be able to read in the padding from comixology and use it in this layout.


For reference, this is the proposed layout changes: https://github.com/astraldragon/ubooquity-comixology-publishers/pull/7


I added A Wave Blue World as it show cases the layout that I was trying to solve, which actually can't work with the current way the pages lay out.

Awesome.  Yeah I noticed that the links in your templates didn't work in my setup cause I have the reverse proxy set up.  I decided to follow the layout provided here as it makes sense to align with that.  I noticed too that comiXology has some other interesting layouts that I didn't account for.  For example, https://www.comixology.co.uk/A-Wave-Blue-World/comics-publisher/76-0?ref=YnJvd3NlL3B1Ymxpc2hlci9kZXNrdG9wL2xpc3QvcHVibGlzaGVyTGlzdA

I've got a project up on GitHub now.  It is pretty rough atm but it gets the job done.  One thing I didn't realize is comixology sometimes has custom spacing between the banner and info below it.  Not much I can do about that so I just have to default everything to no spacing and that has to be hand tweaked on a publisher by publisher basis.  Anywho project is at https://github.com/astraldragon/ubooquity-comixology-publishers.  I've got the first two publishers generated maybe if you could check if they look alright.  If they do I can generate the full list.  There's a few other issues in terms of polish (error handling and such, making script additive) that I'll look at over time.

I've got all the templating working now.  Last things are fetching the images and making the data generation additive instead of starting over from scratch each time. 

The other option would be because I'm making everything JSON driven I could just have a second list of data that isn't automatically scrapable.  And just merge my scraped list with the static list, and generate everything from that point.

From a scraping perspective the ones that don't exist on the main publisher page would all have to be hand done by url.  Which wouldn't be hard, the pages are the same, just can't get to them in a nice way.  For my first go I'm just going to focus on what can be browsed through the publisher page.

Cool.  I shall continue working on this then.  So far I have it scraping and populating a JSON file with all the display data I'd need to write out the HTML templates.  Writing out the templates should be pretty simple.  According to my script Comixology has 198 publishers as of right now.

Would anyone be interested in something that populated more publishers?  I see that the extras has a fair number of publishers in it but on comixology itself it has a lot more publishers.  I started writing a script to extract all the publisher data from comixology.  I could use that to generate the HTML templates for this theme as well.