Messages de Rogue Scholar

Publié in iPhylo

How to cite: Page, R. (2023). Document layout analysis. Some notes to self on document layout analysis. I’m revisiting the problem of taking a PDF or a scanned document and determining its structure (for example, where is the title, abstract, bibliography, where are the figures and their captions, etc.). There are lots of papers on this topic, and lots of tools.

Publié in iPhylo

A while ago I posted BHL to PDF workflow which was a sketch of a work flow to generate clean, searchable PDFs from Biodiversity Heritage Library (BHL) content: I've made some progress on putting this together, as well as expanded the goal somewhat. In fact, there are several goals:BioStor articles need to be archived somewhere. At the moment they live on my server, and metadata is also served by BHL (as the "parts" you see in a scanned volume).

Publié in iPhylo

Since I won't be able to be at the Biodiversity Heritage Library's Life and Literature meeting I thought I'd share some ideas for their Life and Literature Code Challenge. The deadline is pretty close (October 17) so having ideas now isn't terribly helpful I admit. That aside, here are some thoughts inspired by the challenge.