Reading and Writing XML Files in Python

Introduction

XML (stands for Extensible Markup Language), was designed so that it easily can be interpreted by both computers and humans. This language has some set of rules used to encode documents in a certain format. In this article, we will discuss methods that are used to read and write XML files.

Parsing is the process of reading and writing data in an XML file. In Parsing, the logical components of data are analyzed. Therefore when we say reading an XML file, we are referring to parsing the XML file.

In this article, we will see two libraries that are used for XML parsing. These two libraries are:

BeautifulSoup which is used alongside the LXML and XML parsers.
ElementTree Library.

BeautifulSoup

In Python, BeautifulSoup is a library that is used to read and write on an XML file. Now before using this library, we type the following command:

pip install beautifulsoup4
pip install lxml

Reading Data From an XML File

Two steps are involved while parsing an XML file.

  1. Finding Tags
  2. Extracting from tags

Throughout the article, we will use the XML file that is mentioned below.

<?xml version="1.0" encoding="utf-8"?>
<saranghe>
<child name="Frank" test="0">
  FRANK likes EVERYONE
</child>
<unique>
  Add a video URL in here
</unique>
<child name="Texas" test="1">
  TEXAS is a PLACE
</child>
<child name="Frank" test="2">
  Exclusively
</child>
<unique>
  Add a workbook URL here.
</unique>
<data>
  Add the content of your article here
  <family>
  Add the font family of your text here
  </family>
  <size>
  Add the font size of your text here
  </size>
</data>
</saranghe>

Example 1

from bs4 import BeautifulSoup

with open('dict.xml', 'r') as f:
d = f.read()
   
Bs_d = BeautifulSoup(d, "xml")
unq = Bs_d.find_all('unique')

print(unq)
name = Bs_d.find('child', {'name':'Frank'})

print(b_name)

# Extracting the data stored in a
# specific attribute of the
# `child` tag
value = b_name.get('test')

print(value)

Output

[<unique>
    Add a video URL in here
  </unique>, <unique>
    Add a workbook URL here
  </unique>]
<child name="Frank" test="0">
    FRANK likes EVERYONE
  </child>
0

Example 2

from bs4 import BeautifulSoup

with open('dict.xml', 'r') as f:
data = f.read()

bs_data = BeautifulSoup(data, 'xml')

for tag in bs_data.find_all('child', {'name':'Frank'}):
tag['test'] = "WHAT !!"

print(bs_data.prettify())

Output

<?xml version="1.0" encoding="utf-8"?>
<saranghe>
  <child name="Frank" test="WHAT !!">
FRANK likes EVERYONE
  </child>
  <unique>
Add a video URL in here
  </unique>
  <child name="Texas" test="1">
TEXAS is a PLACE
  </child>
  <child name="Frank" test="WHAT !!">
Exclusively
  </child>
  <unique>
Add a workbook URL here
  </unique>
  <data>
Add the content of your article here
  <family>
Add the font family of your text here
  </family>
  <size>
Add the font size of your text here
  </size>
  </data>
</saranghe>

Using Elementree

This module gives us an abundance of tools to manipulate XML files. It is included in python’s standard library. And because of that, the user does not need to install anything externally.

Code

import xml.etree.ElementTree as ET
t = ET.parse('dict.xml')
r = t.getroot()
print(r)
print(r[0].attrib)
print(r[5][0].text)

Output

<Element 'saranghe' at 0x7fe97c2cee08>
{'name': 'Frank', 'test':'0'}

    Add the font family of your text here

write your code here: Coding Playground

Writing XML Files

Now, we will have a look at some methods which are used to write data on the XML files. In the code below we have to create a new XML file from the scratch.

Now at first, using ET.Element(‘chess’), we will make a parent tag (root) which will be under the chess. Now when the root is defined, the other subtype elements are made under that root tag. After it using the, we will make a subtag named Opening which will be under the chess tag ET.SubElement(). Now using set() (a tag found under a SubElement(), used to determine the attribute to a tag), under the Opening tag, we will create two more tags namely E4 lying under the E4 and D4 tags. and D4. After it uses the attribute text which is inside the SubElement function, we will add text. Now at last we will convert the data type of the content to bytes objects from the ‘XML.etree.ElementTree.Element’ with the help of a function named as toString(). Lastly, we will flush all the data to a file named gos.xml which will be opened in writing binary mode.

Example

import xml.etree.ElementTree as ET

d = ET.Element('chess')

e1 = ET.SubElement(d, 'Opening')

s_e1 = ET.SubElement(e1, 'E4')
s_e2 = ET.SubElement(e1, 'D4')

s_e1.set('type', 'Accepted')
s_e2.set('type', 'Declined')

s_e1.text = "King's Gambit Accepted"
s_e2.text = "Queen's Gambit Declined"

b_xml = ET.tostring(d)

with open("GFG.xml", "wb") as x:
x.write(b_xml)

Output

<chess>
    <Opening>
        <E4 type="Accepted">
            King's Gambit Accepted
        </E4>
        <D4 type="Declined">
            Queen's Gambit Declined
        </ D4>
    </Opening>
</chess>

write your code here: Coding Playground