PDF Security

Why Your PDF Might Still Have Hidden Text After Redaction

You draw a black box over a Social Security number, hit export, and send the file to opposing counsel. The redaction looks perfect. What you cannot see is that the nine digits underneath the black bar are still in the file and can be extracted in seconds with any PDF text tool. This is not a rare edge case. It is the default behavior of every PDF editor that uses overlay-based redaction. Here is what actually happens, and how to do it right.

The problem with drawing boxes over text

PDF is a page-description language, not an image format. When you place a black rectangle over existing text, the PDF structure now has two overlapping elements: the original character objects below and the black rectangle above. Most PDF viewers render the top element and hide the bottom one from view. But the underlying characters are still in the file's content stream, along with their exact position data.

A 2019 study by the University of Illinois found that redaction failures affected over 60% of produced documents in eDiscovery reviews at major law firms. In 2021, a medical insurer accidentally exposed patient SSNs in public court filings because a paralegal had used a highlight tool rather than a proper redaction tool. The redaction had been "applied" in the visual sense but the underlying data was intact.

  • Black rectangle overlaid on text original characters still in content stream
  • Copy-paste from the PDF still extracts the hidden text
  • Text search scripts can find and extract all redacted values
  • PDF repair tools and parsers can strip the overlay and reveal the original content
  • Metadata fields may still contain the original data in other parts of the file
Try the PDF Redact tool

What proper redaction actually requires

True redaction removes the content permanently. In the physical world, a black marker applied with enough pressure destroys the paper fibers and makes the text irrecoverable. Digital redaction must do the same thing: replace the original content with an absence of content, not just an opaque layer on top.

For a PDF, the correct approach is rasterization converting each page to a flat image at sufficient resolution, then embedding that image as the new page content. After this operation, the page contains no character objects, no text streams, and no position data. What remains is a photograph of the page with the redaction bars baked in as pixels.

PDFtopia's Secure mode does exactly this. It first finds all matching text using the pdfjs text extraction API to get exact character positions, applies the black rectangles to the pixel layer, then rasterizes the page at the chosen DPI and reassembles the result into a new PDF. The original text cannot be recovered from the output.

See how PDF Flattening works

What types of data are most commonly missed

Most redaction failures share a common pattern: the person applying the redaction only thought about the most obvious instance of the sensitive data. But PDFs contain the same information in multiple forms headers, footers, annotations, metadata, and embedded full-text search indexes.

  • Names in headers and footers attorney names, client names, and company names often appear on every page in running headers.
  • Account numbers and IDs in tables financial documents list account numbers across multiple rows; a single row redaction misses the others.
  • Email addresses in CC fields email threads embedded in PDFs carry CC and To fields that are easy to overlook.
  • Metadata and document properties author, company, and application fields in PDF metadata may expose information not visible in the body text.
  • Annotations and comments PDF comments and sticky notes are separate content streams that may survive simple overlay redaction.

PDFtopia's auto-detectors catch the most common patterns (emails, phone numbers, SSNs, credit card numbers, IBANs, IPv4 addresses, and dates) across the entire document at once. Combined with custom keywords, this covers the most frequent sources of accidental data leakage in redacted documents.

How to permanently redact a PDF in under 2 minutes

Using PDFtopia's Secure mode, the workflow for a permanently redacted document is:

  1. Open the PDF Redact tool

    Navigate to PDFtopia pdf-redact and load your PDF. The file stays on your device no upload.

  2. Add terms or enable detectors

    Type specific terms you need removed (names, account numbers, case IDs) in the keywords field. Enable auto-detectors for emails, phone numbers, SSNs, credit cards, IBANs, and dates.

  3. Select Secure mode

    Choose "Secure · text removed" to rasterize the pages. This permanently destroys the underlying character data at the chosen DPI.

  4. Review and download

    The tool reports how many matches were found per term. Preview the first page, then download the permanently redacted PDF.

Open the PDF Redact tool

Quick mode vs. Secure mode when to use each

PDFtopia offers both an overlay (Quick) mode and a rasterization (Secure) mode. The choice depends on your threat model and what the document will be used for.

  • Quick mode (overlay) applies black rectangles using the PDF's own drawing operators. Fast, reversible. Use for drafts, internal review copies, or when you need to preserve the original PDF structure for later editing.
  • Secure mode (rasterization) converts each page to a high-resolution image. Output cannot be undone. Use for any document that will be shared externally: legal filings, court submissions, contract exchanges, HR records, medical documents, or anything subject to privacy regulations.

The safe default is Secure mode. Switch to Quick mode only when you have a specific reason to preserve the underlying PDF structure and will keep the original file.

Frequently asked questions

Does drawing black boxes over text actually remove it from a PDF?

No. When you draw a black rectangle on top of text in a PDF, the underlying text characters remain in the file's content stream. A text-select tool, a copy-paste action, or a simple script can extract the hidden text. This is the most common redaction mistake in legal, medical, and financial documents.

What does 'Secure' redaction do differently?

Secure mode rasterizes the PDF page converting it to a high-resolution image and reassembles those images into a new PDF. The original text characters no longer exist anywhere in the file. The black bars are baked into the pixel layer, making the redaction permanent and non-recoverable.

What types of data can PDFtopia's auto-detectors find?

The tool scans for email addresses, US phone numbers, Social Security numbers (NNN-NN-NNNN), credit card numbers, IBAN bank account numbers, IPv4 addresses, and common date formats. You can run any combination of detectors alongside your own custom keywords.

Can I preview what will be redacted before downloading?

Yes. The tool scans your document and reports exactly how many matches were found per term or detector. You can review the list before committing to redaction. The results screen shows a preview of the first page so you can confirm the redaction looks correct.

Does the tool work on scanned PDFs?

Scanned PDFs contain images rather than text, so the text scanner cannot find matches in them. For scanned documents, use the Flatten tool first to OCR the content, then apply redaction. Or use Secure mode directly it will rasterize everything including any handwritten or printed text visible in the image.

Is my document uploaded to a server when I use the redaction tool?

No. PDFtopia processes files entirely inside your browser. The PDF is loaded into memory, text is extracted and analyzed locally, and the redacted output is generated on your device. No file content is ever transmitted to any server.