Convert PDF to text — extract text from a PDF (OCR)

Q: Does the PDF to text output preserve formatting?

No. Output is plain UTF-8 text, page-delimited with --- Page N --- markers. Bold, italics, fonts, columns, and tables are not preserved. If you need a formatted, editable version of the PDF, convert to DOCX instead — that's a different tool because the OCR pipeline is text-only.

What you get

A plain UTF-8 .txt file containing the text from the PDF, page-marked. The text is extracted via Google Cloud Vision OCR, so it works whether the PDF is real text or a scanned image of text — both come out the same way on the other side.

Why OCR a PDF?

Searchable. Pull text out so you can grep it, paste it, feed it into a spreadsheet, or run it through a script.
Editable. A scanned PDF is just an image — no amount of "edit" in Word will let you change a word. OCR turns it into text first.
Accessible. Screen readers can't read images; they can read text.
Re-formattable. Once it's text, you can drop it into anything — DOCX, HTML, Markdown, a CMS.

How it works

Use the converter above. No signup required.
Drop your PDF. Drag and drop one or more PDFs into the upload box (up to five files, 20 MB each). Both real text PDFs and scanned image-PDFs work.
Pick OCR (Extract Text) from the dropdown. Each page is rendered to an image and sent to Google Cloud Vision — typical processing time is 1 to 3 seconds per page.
Convert and download the .txt. Click Convert; a download link appears for a UTF-8 text file, page-delimited with --- Page N --- markers for easy splitting.

What works well

Standard text PDFs — extracted nearly perfectly.
Clean, high-contrast scans (300+ DPI).
Latin-script European languages (English, French, Spanish, German, Italian, Portuguese).
Multi-column layouts — Vision is good at reading order.

What doesn't

Faint, skewed, or low-resolution scans — accuracy drops fast.
Handwritten pages — partial recognition only.
Heavily designed pages with text on textured backgrounds.
Password-protected PDFs — remove the password first.

Tips

If the source is a paper document, scan at 300 DPI or higher with good lighting.
For long PDFs, expect up to a minute of processing time — each page is rendered then sent to Vision.
If you only need the text from one page, extract that page first in Preview / Adobe Reader, then upload the smaller file.

FAQ

Does this work on scanned PDFs? Yes — that's exactly what OCR is for. A scanned PDF is just a stack of images, so plain copy-paste or text extraction won't find anything. The converter renders each page to an image, runs it through Google Cloud Vision OCR, and assembles the recognized text into a single .txt file with per-page delimiters.

Does the PDF to text output preserve formatting? No. Output is plain UTF-8 text, page-delimited with --- Page N --- markers. Bold, italics, fonts, columns, and tables are not preserved. If you need a formatted, editable version of the PDF, convert to DOCX instead — that's a different tool because the OCR pipeline is text-only.

What languages does the PDF OCR support? English by default, plus most Latin-script European languages (French, Spanish, German, Italian, Portuguese, Dutch). CJK and right-to-left scripts (Arabic, Hebrew) are available on request via the contact form.

How long does PDF OCR take? Roughly 1 to 3 seconds per page. A 50-page document typically takes about a minute to render and OCR. If you only need a few pages, extract them first in Preview or Adobe Reader and upload the smaller file — that's much faster than processing the full document.

What happens to my PDF after the OCR runs? The uploaded PDF and the resulting text are auto-deleted from our servers after one hour. We don't store or analyze your document beyond completing the OCR pass. See Security for the full data-handling policy.

Image → Text (OCR) → same OCR, image source
PDF → DOCX → for layout-preserving edits
DOCX → PDF → save the cleaned-up text back as PDF
How OCR works →
Why scanned PDFs look bad →
How to extract text from a screenshot →

PDF to
text.

Recent Conversions