What you get
A plain UTF-8 .txt file containing the text from the PDF, page-marked. The text is extracted via Google Cloud Vision OCR, so it works whether the PDF is real text or a scanned image of text — both come out the same way on the other side.
Why OCR a PDF?
- Searchable. Pull text out so you can grep it, paste it, feed it into a spreadsheet, or run it through a script.
- Editable. A scanned PDF is just an image — no amount of "edit" in Word will let you change a word. OCR turns it into text first.
- Accessible. Screen readers can't read images; they can read text.
- Re-formattable. Once it's text, you can drop it into anything — DOCX, HTML, Markdown, a CMS.
How it works
- Open the home page, drop your PDF in the box.
- Pick OCR (Extract Text) from the dropdown.
- Hit Convert. Each page is rendered to an image, OCR'd, and assembled into a single text file.
- Click Download. Each page is delimited with
--- Page N ---for easy splitting.
What works well
- Standard text PDFs — extracted nearly perfectly.
- Clean, high-contrast scans (300+ DPI).
- Latin-script European languages (English, French, Spanish, German, Italian, Portuguese).
- Multi-column layouts — Vision is good at reading order.
What doesn't
- Faint, skewed, or low-resolution scans — accuracy drops fast.
- Handwritten pages — partial recognition only.
- Heavily designed pages with text on textured backgrounds.
- Password-protected PDFs — remove the password first.
Tips
- If the source is a paper document, scan at 300 DPI or higher with good lighting.
- For long PDFs, expect up to a minute of processing time — each page is rendered then sent to Vision.
- If you only need the text from one page, extract that page first in Preview / Adobe Reader, then upload the smaller file.
FAQ
Will it work on a scanned PDF? Yes — that's exactly what OCR is for.
Does it preserve formatting? No. Output is plain text, page-delimited. For formatted output, OCR to text first, then format manually.
What languages? English by default. Most Latin-script European languages work. CJK and right-to-left scripts are on request.
How private is this? The PDF and the resulting text are auto-deleted after one hour. See Security.