Hello coders!! In this article we will learn how to convert an xml file into a csv file in python. So without much ado lets directly dive into the topic.
What is XML?
XML stands for Extensible Markup Language. It is a hierarchical data used to track data (small to medium amount) without using SQL as the backbone. It is designed to store the data and also transfer it. As a result, the XML format data is not very user-friendly, which gives rise to the need to change the format into a more easy and user-friendly one, like CSV.
What is CSV?
CSV stands for Comma Separated Values. As the name suggests, the data is separated using comma as delimiters. It is the most used format for import and export format of spreadsheets and databases.
Converting an xml to csv file:
Consider the following XML file :
<State>
<Resident Id="100">
<Name>John Doe</Name>
<Phone>1234567891</Phone>
<Email>[email protected]</Email>
</Resident>
<Resident Id="101">
<Name>Jane Doe</Name>
<Phone>1234567891</Phone>
<Email>[email protected]</Email>
</Resident>
.
.
.
.
</State>
In this xml file we are storing the details of the residents of a state. The information stored are:
- Id
- Name
- Phone Number
- Email Address
Here, we have taken only two details as an example:
- Id: 100
- Name: John Doe
- Phone: 1234567891
- Email: [email protected]
- Id: 101
- Name: Jane Doe
- Phone: 1234567891
- Email: [email protected]
Python Program to Convert XML to CSV
From the above example, we can understand that if the number of residents increases, it becomes difficult to read and understand the data. This is the reason why we will now convert this XML format file into a CSV format file.
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("sample.xml")
root = tree.getroot()
Resident_data = open('ResidentData.csv', 'w')
csvwriter = csv.writer(Resident_data)
resident_head = []
count = 0
for member in root.findall('Resident'):
resident = []
address_list = []
if count == 0:
name = member.find('Name').tag
resident_head.append(name)
Phone = member.find('Phone').tag
resident_head.append(Phone)
Email = member.find('Email').tag
resident_head.append(Email)
csvwriter.writerow(resident_head)
count = count + 1
name = member.find('Name').text
resident.append(name)
Phone = member.find('Phone').text
resident.append(Phone)
Email = member.find('Email').text
resident.append(Email)
csvwriter.writerow(resident)
Resident_data.close()
The output of above code:
Name | PhoneNumber | EmailAddress |
---|---|---|
John Doe | 1234567891 | [email protected] |
Jane Doe | 1234567891 | [email protected] |
Explanation of code for Converting Python XML to CSV:
At first,two modules are imported:
- xml.etree.ElementTree: to create a simple API for parsing and implementing XML data
- CSV: to access the data in CSV format
The next two functions is used to parse through the xml file:
- parse(): to parses= the ‘Sample.xml’ file
- getroot(): returns the root element of ‘Sample.xml’
Next, we have opened a file ResidentData.csv dor writing
We then created a CSV writer object Resident_data
Then We iterated through the XML file, starting from the root element
We directly added the details of the root element to our CSV format file and incremented the counter value
For other entries we appended the details to the CSV format file without changing the counter value
After each entry we have also closed the particular row
When all the data is entered, we close the CSV format file as well.
Must Read
- Using Pandas to CSV() with Perfection
- Python int to Binary | Integer to Binary Conversion
- [Best] Ways to Delete a File in Python
- Convert Text File to PDF Using Python | FPDF
Conclusion: Python XML to CSV
In this article, we learned about the conversion of Python XML and CSV format. We saw examples for each and further learned a python implementation of converting an XML file to a CSV file in Python.
However, if you have any doubts or questions, do let me know in the comment section below. I will try to help you as soon as possible.
Happy Pythoning!
This is very neat, but how about an example structured differently, such as where field names are named in a “key’ and values in a “string”, like this? (Yes, this is a .plist export from some management software.)
<dict>
<key>C02GD…1DV13</key>
<dict>
<key>architecture</key>
<string>i386</string>
<key>cn</key>
<string>mb-e2-07</string>
<key>dstudio-bootcamp-windows-computer-name</key>
<string></string>
<key>dstudio-host-ard-field-1</key>
<string></string>
<key>dstudio-host-ard-field-2</key>
<string></string>
<key>dstudio-host-ard-field-3</key>
<string></string>
<key>dstudio-host-ard-field-4</key>
<string></string>
<key>dstudio-host-ard-ignore-empty-fields</key>
<string>NO</string>
<key>dstudio-host-model-identifier</key>
<string>MacBookPro8,1</string>
<key>dstudio-host-primary-key</key>
<string>dstudio-host-serial-number</string>
<key>dstudio-host-serial-number</key>
<string>C02GD…1DV13</string>
<key>dstudio-host-type</key>
<string>Mac</string>
<key>dstudio-hostname</key>
<string>mb-e2-07</string>
<key>dstudio-mac-addr</key>
<string>3c:07:…:76:e0</string>
</dict>
<key>C02GF…SDV13</key>
<dict>
<key>architecture</key>
<string>i386</string>
<key>cn</key>
<string>loan-mac-music-05</string>
<key>dstudio-bootcamp-windows-computer-name</key>
<string></string>
<key>dstudio-host-ard-field-1</key>
<string></string>
<key>dstudio-host-ard-field-2</key>
<string></string>
<key>dstudio-host-ard-field-3</key>
<string></string>
<key>dstudio-host-ard-field-4</key>
<string></string>
<key>dstudio-host-ard-ignore-empty-fields</key>
<string>NO</string>
<key>dstudio-host-model-identifier</key>
<string>MacBookPro8,1</string>
<key>dstudio-host-primary-key</key>
<string>dstudio-host-serial-number</string>
<key>dstudio-host-serial-number</key>
<string>C02GF…SDV13</string>
<key>dstudio-host-type</key>
<string>Mac</string>
<key>dstudio-hostname</key>
<string>loan-mac-music-05</string>
<key>dstudio-last-workflow</key>
<string>394AF8D4-6854-…-9A69-FE8C3244BA59</string>
<key>dstudio-last-workflow-duration</key>
<string>0:00:00</string>
<key>dstudio-last-workflow-execution-date</key>
<string>2017/02/17, 13:13:35</string>
<key>dstudio-last-workflow-status</key>
<string>failed</string>
<key>dstudio-mac-addr</key>
<string>3c:07:…:79:81</string>
</dict>
Hi,
Since you have a different type of data structure, you need to change the code a little bit.
The following code will work best for you to form key and string pairs in CSV.
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("sample.xml")
root = tree.getroot()
data = open('data.csv', 'w',newline='')
csvwriter = csv.writer(data)
for member in root.findall('dict'):
for i, j in zip(member.findall('key'), member.findall('string')):
print(j.text)
if j.text is None:
csvwriter.writerow([i.text, "empty"])
else:
csvwriter.writerow([i.text, j.text])
data.close()
Thank you for the lesson. Very clean.
What if I want the ID to be in a separate column like:
ID Name. PhoneNumber. EmailAddress.
100 John Doe. 1234567891. [email protected]
101 Jane Doe. 1234567891. [email protected]
How do I go about achieving that?
Hi,
You can access the attribute of current member of XML by following code –
ids = member.attrib.get('Id')
then you can use
resident.append(ids)
to add Id attribute in your csv.Regards,
Pratik
Thanks Pratik.
It worked.
Can I reach out to you privately? You can write me an email.
Thanks again.
Sure!
Example structured differently, such as where field names are named in a DataType Unit Valid No BondType No Value Max Min and values in a “string”, like this?
225:41,248:10,249:5,250:18,251:150,252:590,253:0,3001:
<ParaInfo>
<DataType>Short</DataType>
<Unit>ms</Unit>
<Valid>True</Valid>
</ParaInfo>
<Groups>
<Group>
<No>2</No>
<BondType>Normal Wire</BondType>
<Stages>
<Stage>
<No>Contact</No>
<Value>2</Value>
<Max>2</Max>
<Min>2</Min>
</Stage>
<Stage>
<No>Base</No>
<Value>5</Value>
<Max>5</Max>
<Min>2</Min>
</Stage>
</Stages>
</Group>
</Groups>,3002:
I’ve created a simple xml parser for your XML given above.
Integrate this with CSV or any other format you wish to export.
Also, make sure you add a root to your XML. So your XML will look like this –
Regards,
Pratik
I want to extract data from XML save to csv show the results as follows.Because I did it and it didn’t work like this.