Introduction
XML (stands for Extensible Markup Language), was designed so that it easily can be interpreted by both computers and humans. This language has some set of rules used to encode documents in a certain format. In this article, we will discuss methods that are used to read and write XML files.
Parsing is the process of reading and writing data in an XML file. In Parsing, the logical components of data are analyzed. Therefore when we say reading an XML file, we are referring to parsing the XML file.
In this article, we will see two libraries that are used for XML parsing. These two libraries are:
BeautifulSoup which is used alongside the LXML and XML parsers.
ElementTree Library.
BeautifulSoup
In Python, BeautifulSoup is a library that is used to read and write on an XML file. Now before using this library, we type the following command:
pip install beautifulsoup4 pip install lxml |
Reading Data From an XML File
Two steps are involved while parsing an XML file.
- Finding Tags
- Extracting from tags
Throughout the article, we will use the XML file that is mentioned below.
<?xml version="1.0" encoding="utf-8"?> <saranghe> <child name="Frank" test="0"> FRANK likes EVERYONE </child> <unique> Add a video URL in here </unique> <child name="Texas" test="1"> TEXAS is a PLACE </child> <child name="Frank" test="2"> Exclusively </child> <unique> Add a workbook URL here. </unique> <data> Add the content of your article here <family> Add the font family of your text here </family> <size> Add the font size of your text here </size> </data> </saranghe> |
Example 1
from bs4 import BeautifulSoup
with open('dict.xml', 'r') as f: d = f.read() Bs_d = BeautifulSoup(d, "xml") unq = Bs_d.find_all('unique')
print(unq) name = Bs_d.find('child', {'name':'Frank'})
print(b_name)
# Extracting the data stored in a # specific attribute of the # `child` tag value = b_name.get('test')
print(value) |
Output
[<unique> Add a video URL in here </unique>, <unique> Add a workbook URL here </unique>] <child name="Frank" test="0"> FRANK likes EVERYONE </child> 0 |
Example 2
from bs4 import BeautifulSoup
with open('dict.xml', 'r') as f: data = f.read()
bs_data = BeautifulSoup(data, 'xml')
for tag in bs_data.find_all('child', {'name':'Frank'}): tag['test'] = "WHAT !!"
print(bs_data.prettify()) |
Output
<?xml version="1.0" encoding="utf-8"?> <saranghe> <child name="Frank" test="WHAT !!"> FRANK likes EVERYONE </child> <unique> Add a video URL in here </unique> <child name="Texas" test="1"> TEXAS is a PLACE </child> <child name="Frank" test="WHAT !!"> Exclusively </child> <unique> Add a workbook URL here </unique> <data> Add the content of your article here <family> Add the font family of your text here </family> <size> Add the font size of your text here </size> </data> </saranghe> |
Using Elementree
This module gives us an abundance of tools to manipulate XML files. It is included in python’s standard library. And because of that, the user does not need to install anything externally.
Code
import xml.etree.ElementTree as ET t = ET.parse('dict.xml') r = t.getroot() print(r) print(r[0].attrib) print(r[5][0].text) |
Output
<Element 'saranghe' at 0x7fe97c2cee08> {'name': 'Frank', 'test':'0'}
Add the font family of your text here |
Writing XML Files
Now, we will have a look at some methods which are used to write data on the XML files. In the code below we have to create a new XML file from the scratch.
Now at first, using ET.Element(‘chess’), we will make a parent tag (root) which will be under the chess. Now when the root is defined, the other subtype elements are made under that root tag. After it using the, we will make a subtag named Opening which will be under the chess tag ET.SubElement(). Now using set() (a tag found under a SubElement(), used to determine the attribute to a tag), under the Opening tag, we will create two more tags namely E4 lying under the E4 and D4 tags. and D4. After it uses the attribute text which is inside the SubElement function, we will add text. Now at last we will convert the data type of the content to bytes objects from the ‘XML.etree.ElementTree.Element’ with the help of a function named as toString(). Lastly, we will flush all the data to a file named gos.xml which will be opened in writing binary mode.
Example
import xml.etree.ElementTree as ET
d = ET.Element('chess')
e1 = ET.SubElement(d, 'Opening')
s_e1 = ET.SubElement(e1, 'E4') s_e2 = ET.SubElement(e1, 'D4')
s_e1.set('type', 'Accepted') s_e2.set('type', 'Declined')
s_e1.text = "King's Gambit Accepted" s_e2.text = "Queen's Gambit Declined"
b_xml = ET.tostring(d)
with open("GFG.xml", "wb") as x: x.write(b_xml) |
Output
<chess> <Opening> <E4 type="Accepted"> King's Gambit Accepted </E4> <D4 type="Declined"> Queen's Gambit Declined </ D4> </Opening> </chess> |