- Python can create and modify Word documents, which have the .docx file extension.
- in order to work with word documents we need to install python-docx module.
- pip install python-docx
- after install that module,import that module by using command import docx , not import python-docx.
- If you don’t have Word, LibreOffice Writer and OpenOffice Writer are both free alternative applications for Windows, OS X, and Linux that can be used to open .docx files.
- Compared to plain-text, .docx files have a lot of structure.
- This structure is represented by three different data types in Python-Docx.
- At the highest level, a
Document
object represents the entire document.
- The
Document
object contains a list ofParagraph
objects for the paragraphs in the document.
- Each of these
Paragraph
objects contains a list of one or moreRun
objects.
Reading Word Documents:
import docx
doc = docx.Document('demo.docx')
len(doc.paragraphs)
doc.paragraphs[0].text
doc.paragraphs[1].text
len(doc.paragraphs[1].runs)
doc.paragraphs[1].runs[0].text
doc.paragraphs[1].runs[1].text
doc.paragraphs[1].runs[2].text
doc.paragraphs[1].runs[3].text
Getting the Full Text from a .docx File:
readDocx.py
import docx
def getText(filename):
doc = docx.Document(filename)
fullText = [ ]
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
- The readDocx.py program can be imported like any other module.
- Now if you just need the text from a Word document, you can enter the following:
import readDocx
print(readDocx.getText('demo.docx'))
Writing Word Documents:
import docx
doc = docx.Document( )
doc.add_paragraph('Hello world!')
doc.save('helloworld.docx')
Create a Word document:
from docx import Document
d = Document()
d.add_heading('Hamlet')
d.add_heading('dramatis personae', 2)
d.add_paragraph('Hamlet, the Prince of Denmark')
d.save('hamlet.docx')
Read a Word document:
document = Document('hamlet.docx')
for para in document.paragraphs:
print(para.text)
No comments:
Post a Comment