Published December 16, 2024 | https://doi.org/10.59350/g0y96-dwq81

Turn any webpage into markdown for LLM-friendly input

Creators & Contributors

Feature image

Last week I posted about a web app that turns a GitHub repo into a single text file for LLM-friendly input.

This is great for capturing LLM-friendly text from a GitHub repo, but what about any other arbitrary website or PDF? I was catching up on Simon Willison's newsletter reading about an app he made with Claude artifacts that uses the Jina Reader API to generate Markdown from a website.

You don't need to use the API to do this. Simply adding r.jina.ai/ in front of any URL will return LLM-friendly markdown for the website.

Demo

The examples below demonstrate using this service to get LLM-friendly plain text Markdown from a website, PDF, and a GitHub repo.

  1. My Learning in Public post:

    1. Original: https://blog.stephenturner.us/p/learning-in-public

    2. New: r.jina.ai/https://blog.stephenturner.us/p/learning-in-public

  1. This also works on PDFs. This paper on arXiv, "WorkflowHub: a registry for computational workflows":

    1. Original: https://arxiv.org/pdf/2410.06941

    2. New: r.jina.ai/https://arxiv.org/pdf/2410.06941

Subscribe now

Bookmarklet

I created a bookmarklet that will add r.jina.ai/ in front of the URL for any page you're currently on.

javascript:(function(){window.location.href='https://r.jina.ai/' + window.location.href;})();

To use this:

  1. Copy the code above.

  2. Open your browser's bookmarks manager, or on Chrome, right-click your bookmark bar and add a new page.

  3. In the URL/location field, paste the code above and save the bookmark.

  4. When you're on a page, click the button and r.jina.ai/ will be added in front of the URL.

Subscribe now

Additional details

Description

Last week I posted about a web app that turns a GitHub repo into a single text file for LLM-friendly input. This is great for capturing LLM-friendly text from a GitHub repo, but what about any other arbitrary website or PDF? I was catching up on Simon Willison's newsletter reading about an app he made with Claude artifacts that uses the Jina Reader API to generate Markdown from a website. You don't need to use the API to do this.

Identifiers

UUID
cb481daf-5678-4fe1-819f-d084fc6098b3
GUID
150552722
URL
https://blog.stephenturner.us/p/turn-any-webpage-into-markdown-for-llm-friendly-input

Dates

Issued
2024-12-16T15:03:49
Updated
2024-12-16T15:03:49