XML to CSV Conversion Using Python

Hello coders!! In this article we will learn how to convert an xml file into a csv file in python. So without much ado lets directly dive into the topic.

What is XML?

XML stands for Extensible Markup Language. It is a hierarchical data used to track data (small to medium amount) without using SQL as the backbone. It is designed to store the data and also transfer it. As a result, the XML format data is not very user-friendly, which gives rise to the need to change the format into a more easy and user-friendly one, like CSV.

What is CSV?

CSV stands for Comma Separated Values. As the name suggests, the data is separated using comma as delimiters. It is the most used format for import and export format of spreadsheets and databases. 

Converting an xml to csv file:

Consider the following XML file :

<State>
<Resident Id="100">
<Name>John Doe</Name>
	<Phone>1234567891</Phone>
	<Email>[email protected]</Email>
	
</Resident>
<Resident Id="101">
	<Name>Jane Doe</Name>
	<Phone>1234567891</Phone>
	<Email>[email protected]</Email>
	
</Resident>
.
.
.
.
</State>

In this xml file we are storing the details of the residents of a state. The information stored are:

  • Id
  • Name
  • Phone Number
  • Email Address

Here, we have taken only two details as an example:

Python Program to Convert XML to CSV

From the above example, we can understand that if the number of residents increases, it becomes difficult to read and understand the data. This is the reason why we will now convert this XML format file into a CSV format file.

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("sample.xml")
root = tree.getroot()

Resident_data = open('ResidentData.csv', 'w')

csvwriter = csv.writer(Resident_data)
resident_head = []

count = 0
for member in root.findall('Resident'):
	resident = []
	address_list = []
	if count == 0:
		name = member.find('Name').tag
		resident_head.append(name)
		Phone = member.find('Phone').tag
		resident_head.append(Phone)
		Email = member.find('Email').tag
		resident_head.append(Email)
		
		csvwriter.writerow(resident_head)
		count = count + 1

	name = member.find('Name').text
	resident.append(name)
	Phone = member.find('Phone').text
	resident.append(Phone)
	Email = member.find('Email').text
	resident.append(Email)
	
	csvwriter.writerow(resident)


Resident_data.close()

The output of above code:

NamePhoneNumberEmailAddress
John Doe1234567891[email protected]
Jane Doe1234567891[email protected]
Output

Explanation of code for Converting Python XML to CSV:

At first,two modules are imported:

  • xml.etree.ElementTree: to create a simple API for parsing and implementing XML data
  • CSV: to access the data in CSV format

The next two functions is used to parse through the xml file:

  • parse(): to parses= the ‘Sample.xml’ file
  • getroot(): returns the root element of ‘Sample.xml’

Next, we have opened a file ResidentData.csv dor writing

We then created a CSV writer object Resident_data

Then We iterated through the XML file, starting from the root element

We directly added the details of the root element to our CSV format file and incremented the counter value

For other entries we appended the details to the CSV format file without changing the counter value

After each entry we have also closed the particular row

When all the data is entered, we close the CSV format file as well.

Must Read

Conclusion: Python XML to CSV

In this article, we learned about the conversion of Python XML and CSV format. We saw examples for each and further learned a python implementation of converting an XML file to a CSV file in Python.

However, if you have any doubts or questions, do let me know in the comment section below. I will try to help you as soon as possible.

Happy Pythoning!

Subscribe
Notify of
guest
9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ian Green
Ian Green
3 years ago

This is very neat, but how about an example structured differently, such as where field names are named in a “key’ and values in a “string”, like this? (Yes, this is a .plist export from some management software.)

<dict>
<key>C02GD…1DV13</key>
<dict>
<key>architecture</key>
<string>i386</string>
<key>cn</key>
<string>mb-e2-07</string>
<key>dstudio-bootcamp-windows-computer-name</key>
<string></string>
<key>dstudio-host-ard-field-1</key>
<string></string>
<key>dstudio-host-ard-field-2</key>
<string></string>
<key>dstudio-host-ard-field-3</key>
<string></string>
<key>dstudio-host-ard-field-4</key>
<string></string>
<key>dstudio-host-ard-ignore-empty-fields</key>
<string>NO</string>
<key>dstudio-host-model-identifier</key>
<string>MacBookPro8,1</string>
<key>dstudio-host-primary-key</key>
<string>dstudio-host-serial-number</string>
<key>dstudio-host-serial-number</key>
<string>C02GD…1DV13</string>
<key>dstudio-host-type</key>
<string>Mac</string>
<key>dstudio-hostname</key>
<string>mb-e2-07</string>
<key>dstudio-mac-addr</key>
<string>3c:07:…:76:e0</string>
</dict>
<key>C02GF…SDV13</key>
<dict>
<key>architecture</key>
<string>i386</string>
<key>cn</key>
<string>loan-mac-music-05</string>
<key>dstudio-bootcamp-windows-computer-name</key>
<string></string>
<key>dstudio-host-ard-field-1</key>
<string></string>
<key>dstudio-host-ard-field-2</key>
<string></string>
<key>dstudio-host-ard-field-3</key>
<string></string>
<key>dstudio-host-ard-field-4</key>
<string></string>
<key>dstudio-host-ard-ignore-empty-fields</key>
<string>NO</string>
<key>dstudio-host-model-identifier</key>
<string>MacBookPro8,1</string>
<key>dstudio-host-primary-key</key>
<string>dstudio-host-serial-number</string>
<key>dstudio-host-serial-number</key>
<string>C02GF…SDV13</string>
<key>dstudio-host-type</key>
<string>Mac</string>
<key>dstudio-hostname</key>
<string>loan-mac-music-05</string>
<key>dstudio-last-workflow</key>
<string>394AF8D4-6854-…-9A69-FE8C3244BA59</string>
<key>dstudio-last-workflow-duration</key>
<string>0:00:00</string>
<key>dstudio-last-workflow-execution-date</key>
<string>2017/02/17, 13:13:35</string>
<key>dstudio-last-workflow-status</key>
<string>failed</string>
<key>dstudio-mac-addr</key>
<string>3c:07:…:79:81</string>
</dict>

Pratik Kinage
Admin
3 years ago
Reply to  Ian Green

Hi,
Since you have a different type of data structure, you need to change the code a little bit.
The following code will work best for you to form key and string pairs in CSV.

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("sample.xml")
root = tree.getroot()
data = open('data.csv', 'w',newline='')
csvwriter = csv.writer(data)
for member in root.findall('dict'):
for i, j in zip(member.findall('key'), member.findall('string')):
print(j.text)
if j.text is None:
csvwriter.writerow([i.text, "empty"])
else:
csvwriter.writerow([i.text, j.text])
data.close()

Kings
Kings
3 years ago

Thank you for the lesson. Very clean.

What if I want the ID to be in a separate column like:
ID Name. PhoneNumber. EmailAddress.
100 John Doe. 1234567891. [email protected]
101 Jane Doe. 1234567891. [email protected]

How do I go about achieving that?

Pratik Kinage
Admin
3 years ago
Reply to  Kings

Hi,

You can access the attribute of current member of XML by following code –
ids = member.attrib.get('Id')
then you can use resident.append(ids) to add Id attribute in your csv.

Regards,
Pratik

Kings
Kings
3 years ago
Reply to  Pratik Kinage

Thanks Pratik.
It worked.
Can I reach out to you privately? You can write me an email.
Thanks again.

Pratik Kinage
Admin
3 years ago
Reply to  Kings

Sure!

psb
psb
3 years ago

Example structured differently, such as where field names are named in a DataType Unit Valid No BondType No Value Max Min and values in a “string”, like this?
225:41,248:10,249:5,250:18,251:150,252:590,253:0,3001:
<ParaInfo>
    <DataType>Short</DataType>
    <Unit>ms</Unit>
    <Valid>True</Valid>
</ParaInfo>
<Groups>
    <Group>
        <No>2</No>
        <BondType>Normal Wire</BondType>
        <Stages>
            <Stage>
                <No>Contact</No>
                <Value>2</Value>
                <Max>2</Max>
                <Min>2</Min>
            </Stage>
            <Stage>
                <No>Base</No>
                <Value>5</Value>
                <Max>5</Max>
                <Min>2</Min>
            </Stage>
        </Stages>
    </Group>
</Groups>,3002:

Python Pool
Admin
3 years ago
Reply to  psb

I’ve created a simple xml parser for your XML given above.

import xml.etree.ElementTree as ET
 
tree = ET.parse("temp.xml")
root = tree.getroot()


for groups in root.findall('Groups'):
    for group in groups:
        print("No:", group.find("No").text)
        print("BondType:", group.find("BondType").text)
        for stages in group.findall("Stages"):
            for stage in stages.findall("Stage"):
                print("Stage begin")
                print("No:", stage.find("No").text)
                print("Value:", stage.find("Value").text)
                print("Max:", stage.find("Max").text)
                print("Min:", stage.find("Min").text)
                print("Stage end")

Integrate this with CSV or any other format you wish to export.

Also, make sure you add a root to your XML. So your XML will look like this –

<data>
    <ParaInfo>
        ...
    </ParaInfo>
    <Groups>
        ...
    </Groups>
</data>

Regards,
Pratik

Last edited 3 years ago by Python Pool
psb
psb
3 years ago
Reply to  Python Pool

I want to extract data from XML save to csv show the results as follows.Because I did it and it didn’t work like this.

Screenshot 2021-09-07 144849.png