Thursday, 11 May 2017

how working with PDF files in python





  • PDF stands for Portable Document Format
  • It uses .pdf extension. 
  • It is used to present and exchange documents reliably, independent of software, hardware, or operating system.
  • Invented by Adobe, PDF is now an open standard maintained by the International Organization for Standardization (ISO). 
  • PDFs can contain links and buttons, form fields, audio, video, and business logic.
  • in order to work with pdf files,we are import PyPDF2 module.

how to install PyPDF2 module:

pip install PyPDF2

Extracting text from PDF file:

import PyPDF2 

pdfFileObj = open('example.pdf', 'rb') 

pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

print(pdfReader.numPages) 

pageObj = pdfReader.getPage(0) 

print(pageObj.extractText( )) 

pdfFileObj.close( )



No comments:

Post a Comment