Messages de Rogue Scholar

language
Publié in iPhylo

How to cite: Page, R. (2023). Document layout analysis. https://doi.org/10.59350/z574z-dcw92 Some notes to self on document layout analysis. I’m revisiting the problem of taking a PDF or a scanned document and determining its structure (for example, where is the title, abstract, bibliography, where are the figures and their captions, etc.). There are lots of papers on this topic, and lots of tools.

Publié in iPhylo

Note to self. The challenge of finding specimen citations in papers keeps coming around. It seems that this is basically the same problem as finding citations to papers, and can be approached in much the same way. If you want to build a database of reference from scratch, one way is to scrape citations from papers (e.g., from the "literature cited" section), convert those strings into structured data, and add those to your database.