- PDF stands for Portable Document Format.
- It uses .pdf extension.
- It is used to present and exchange documents reliably, independent of software, hardware, or operating system.
- Invented by Adobe, PDF is now an open standard maintained by the International Organization for Standardization (ISO).
- PDFs can contain links and buttons, form fields, audio, video, and business logic.
- in order to work with pdf files,we are import PyPDF2 module.
how to install PyPDF2 module:
pip install PyPDF2
Extracting text from PDF file:
import PyPDF2
pdfFileObj = open('example.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
print(pdfReader.numPages)
pageObj = pdfReader.getPage(0)
print(pageObj.extractText( ))
pdfFileObj.close( )
No comments:
Post a Comment