Read and Write Data from Excel using ‘Openpyxl’

The Python library Openpyxl reads and writes Excel files with .xlsx, .xlsm, and .xltm extensions. Furthermore, this tool assists developers in collaborating and executing tasks with Excel files in a procedural manner. This includes reading and writing information, as well as offering functions to manage Excel documents, such as creating charts, handling sheet names, and implementing conditional formatting.

Additionally, it provides more advanced capabilities for Excel document manipulation. It can help automate repetitive tasks, process large amounts of data, or integrate Excel with other tools.

Where it is being used

Data analysts use it to automate their process of reading, writing, and modifying data in excel files for massive data analysis, manipulation, and visualization tasks.

Let me give you an example: A person could use openpyxl to read an excel file, perform data transformations, and then write the transformed data back to the same or a different file.

Some of the tasks that are involved in working with excel files are :

  • Automating data processing and analysis tasks
  • Integrating Excel data with other systems
  • Automating repetitive tasks in Excel
  • Working with large amounts of data in Excel

Individuals and organizations across various fields can use it to extend their data processing to include excel, especially for those working with Python.

Functions of openpyxl:

Openpyxl supports many features for reading, writing, and manipulating excel files. Some include :

  • Reading data from an existing file (.xlsx file)
  • Writing data to an existing or new file
  • Modifying cells, rows, and columns in an excel file
  • Adding charts, images, and tables to a .xlsx file
  • Formatting cells such as font size, color, and bold text.

Let’s see in detail about reading and writing data from excel with simple codes.

Reading data from a File

Reading data from an existing file in openpyxl is the process of getting data from the Excel (.xlsx) file and storing it in a very programmatic way so that it can be used in your Python code. Follow these steps:

Import the openpyxl library: Start by importing the openpyxl library into your Python code using the following line:

import openpyxl

Load the workbook: Use the openpyxl.load_workbook() function to load the Excel file into your Python code. The function takes the file path of the Excel file as an argument. 

workbook = openpyxl.load_workbook("example.xlsx")

Select the worksheet: The Excel file comprises one or more worksheets. To access a specific worksheet, you can now use the workbook[sheet_name] syntax, where sheet_name is the worksheet name you want to access. Moreover, you can access the active worksheet using the ‘workbook.active’ property.

sheet = workbook.active

Access the cells: Once you have selected the worksheet, you can access individual cells within it. To access a cell, you use the cell property of the worksheet object, passing the row and column numbers as arguments.

cell = sheet.cell(row=1, column=1)

To retrieve the cell value: To retrieve the data stored in a cell, you can use the value property of the cell object.

cell_value = cell.value

These are some of the functions that are used in reading data from an existing file using openpyxl.

Writing data to the excel file

Writing data to an existing file or a new file in openpyxl is the process of adding or updating data in an excel (.xlsx) file using Python. This can be done for an existing file or a new file that you want to create. This involves the following steps as well :

Import the openpyxl library: Start by importing the openpyxl library into your Python code, similar to what we have done in reading the data using the following line:

import openpyxl 

To load or create the workbook: If you want to write to an existing file, use the ‘openpyxl.load_workbook()’ function to load the Excel file into your Python code. The function takes the file path of the Excel file as an argument.

workbook = openpyxl.load_workbook("existing_file.xlsx")

Then, if you want to create a new file, use openpyxl.workbook() function to create a new workbook.

workbook = openpyxl.Workbook()

To select or create the worksheet

sheet = workbook.active

Further, to write data to a cell, use the cell property of the worksheet object, passing the row and column numbers as arguments. Then, set the value property of the cell object to the data you want to write.

sheet.cell(row=1, column=1).value = "Data 1"
sheet.cell(row=2, column=1).value = "Data 2"

To save the workbook: Finally, save the workbook to the file using the workbook.save() method, passing the file path as an argument.

workbook.save("new_file.xlsx")

These are some of the functions that are used in writing data from an existing file or in a new file.

Features of openpyxl:

The openpyxl library supports various features. You can use them to format and manipulate the contents of Microsoft Excel (.xlsx) files in Python. To use these features, you must load the workbook, select the sheet, and then access the cells and their properties. You can then use the available methods and attributes to set cell values, add hyperlinks, merge cells, apply bold formatting, etc. Finally, you need to save the changes to the workbook to persist the modifications.

Some of the features we are going to see in detail are:

  • openpyxl append
  • openpyxl bold
  • openpyxl hyperlink
  • openpyxl conditional formatting
  • openpyxl merge cells

Openpyxl append:

“Append” refers to adding new data to the end of an existing sheet in an Excel workbook. To do this using openpyxl, you need to load the current workbook, select the sheet you want to add data to, find the last row in the sheet, create a new row by incrementing the previous row, and assign values to the cells in the new row. Finally, you need to save the changes to the workbook.

Openpyxl bold

Bold in openpyxl refers to a font style used to make the text in a cell appear in bold typeface. In Openpyxl, you can apply the bold font style to cells in an Excel sheet by setting the font.the bold attribute of a cell to True.

Hyperlink in an Excel worksheet can be done using openpyxl, you can use the Hyperlink class.

Here’s a codec example of how to add a hyperlink to a cell:

from openpyxl import Workbook
from openpyxl.styles import Font
from openpyxl.utils import get_column_letter
from openpyxl.worksheet.hyperlink import Hyperlink

wb = Workbook()
ws = wb.active

font = Font(bold=True)
c = ws['A1']
c.font = font

# Add a hyperlink to cell A1
url = "https://www.example.com"
c.value = "Example Website"
c.hyperlink = Hyperlink(url, "Visit the Website")

wb.save("example.xlsx")

This will add a hyperlink to the excel.

Conditional formatting using openpyxl:

It allows you to add conditional formatting to worksheets. You can use the conditional_formatting attribute of a worksheet object to add conditional formatting rules to a cell or range of cells.

Here is a simple example: you can add a color scale rule that varies cell color based on cell value, with low values appearing red and high values appearing green. To add the rule, use the add method of the conditional_formatting  attribute and specify the range and the direction you want to apply.

from openpyxl import Workbook
from openpyxl.formatting.rule import ColorScaleRule

wb = Workbook()
ws = wb.active

data = [    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9], ]

for row in data:
    ws.append(row)  #appending the row

color_scale = ColorScaleRule(start_type='percentile', start_value=10, start_color='ff0000',
                              end_type='percentile', end_value=90, end_color='00ff00')
ws.conditional_formatting.add('A1:C3', color_scale)  #implementing conditional formatting

wb.save('conditional_formatting.xlsx')  #saving the workbook

Merge cells in openpyxl:

It provides the capability to merge cells in a worksheet. Use the merge_cells method of the Worksheet object to merge cells in a rectangular range by specifying the cell range with a string argument passed to the method.

For example, ws.merge_cells(‘A1:C1’) will merge the cells from A1 to C1. Merging cells helps format a worksheet and make it more readable. However, it’s important to note that only the value of the top-left cell in the merged range will be preserved after merging cells. The other cells will be overwritten. So it’s best to merge cells after entering the data you want to keep in the merged range.

Comparison between openpyxl and xlrd:

Openpyxlxlrd
It is a library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.xlrd is a library for reading data and formatting information from Excel files, including xls and xlsx files.
We use openpyxl.workbook method to open the excel file in openpyxl. We use xlrd.open_workbook method to open the excel file in xlrd.
It allows for creating and modifying Excel spreadsheets and provides many options for formatting and styling.It does not provide a way to write or modify Excel files, but it has extensive support for reading and formatting information.
In It, it is possible to search the records per range conditions.In xlrd, searching the records as per range conditions is impossible.

Comparison between openpyxl and pandas

OpenpyxlPandas
As we know that the purpose of openpyxl is to read and write excel files.While pandas is a library for data analysis and manipulation.
Represents data in the form of cells, rows, and columns of an Excel worksheetPandas represent data in a data frame, a two-dimensional labeled data structure.
Provides basic operations for reading and writing Excel filesPandas provides a rich set of data analysis and manipulation operations, including filtering, aggregation, and transformation.
It is slower than Pandas when reading and writing large Excel files, as it operates on a cell-by-cell basis.It reads data into memory and operates on the data frame, which is faster for large datasets.

FAQs

Which is better, openpyxl vs. XlsxWriter?

Openpyxl offers more Excel control and formatting features, while XlsxWriter is optimized for high performance and low memory use, focusing on fast data writing. Openpyxl has more features, but XlsxWriter is easier and faster.

Which is better, Pandas or openpyxl?

If you’re working with large amounts of data and need to perform complex analysis, Pandas is likely the better choice. If you’re working with Excel files and want to manipulate or read their contents in Python, Openpyxl is the better choice.

Is openpyxl needed for Pandas?

No, openpyxl is not needed for Pandas. Pandas can read and write to various file formats, including Excel files, without the need for openpyxl. It can interact with Excel files at a more low-level, but it is not a required dependency for Pandas.

Conclusion

In short, openpyxl can be used to:

  • Read and write data to/from Excel files
  •   Create new Excel files
  •   Modify existing Excel files
  •   Extract information from Excel files

We also learned about the features and studied comparisons between openpyxl vs. xlrd and vs. pandas.

By using openpyxl, you can save time and effort by automating Excel tasks and integrating Excel with other tools and systems.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments