pdf-parse2
TypeScript icon, indicating that this package has built-in type declarations

1.0.4 • Public • Published

version downloads license node type size

PDF Parse

A pure JavaScript, cross-platform module designed for extracting text from PDF files using pdf.js.

Features

  • Extract text from PDF files.
  • Supports both browser and Node.js environments.
  • Easy to use with promise-based API.

Installation

npm install pdf-parse2

Or

yarn add pdf-parse2

Usage

Node.js

const fs = require('fs');
const PDFParse = require('pdf-parse2');

(async () => {
  const dataBuffer = fs.readFileSync('path/to/your/document.pdf');
  const PDFParse = new PDFParse();

  try {
    const pdfData = await PDFParse.loadPDF(dataBuffer);
    console.log('Text:', pdfData.text);
  } catch (error) {
    console.error(error);
  }
})();

Browser

Ensure you include pdf.js library in your project. You can then use PDFParse similar to the Node.js example, but with fetching the PDF file using Fetch API or XMLHttpRequest.

API Reference

  • loadPDF(src, options): Loads a PDF file and extracts text. src can be a Buffer or ArrayBuffer. options is optional.

  • renderPage(pageData, options): A helper function for rendering a single page. This function is used internally by loadPDF.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an issue for any bugs or feature requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Package Sidebar

Install

npm i pdf-parse2

Weekly Downloads

15

Version

1.0.4

License

MIT

Unpacked Size

24.3 kB

Total Files

17

Last publish

Collaborators

  • necm1