Overview excel structure

2017-12-30_10-40-27

Install working environment

Install python & openpyxl with anaconda

Both Python and openpyxl could be installed by Anaconda. Go to download page and download Anaconda for Python 3.6. After download, just double click to install with default option.

2017-12-25_13-36-16

After install, open command prompt and typing in python . The anaconda and python version should show up as below

2017-12-25_13-39-22

From python shell, import openpyxl to make sure that it is installed successfully.

2017-12-25_13-41-35

Install Sublime as python IDE

In this course I use Sublimetext as Python IDE to writing code (you can choose any other text edition you want). I choose Sublimtext due to :

Part 1 : Working with existing xlsx file

Loading workbook from local file

First thing we need to do is import module openpyxl with

import openpyxl

This module provide function allow us to to create a workbook or load a workbook from a file. In the first part of this course we will load a existing workbook from a 500.xlsx file.

# loading workbook from file
wb = openpyxl.load_workbook('500.xlsx')

Access worksheet from workbook

After load workbook, we could access current active worksheet with code. Only has one active worksheet at a time, and you could access data only from active worksheet. Default active worksheet is the first sheet.

# get current active worksheet
current_active_worksheet = wb.get_active_sheet()
print(current_active_worksheet)

To get name of all worksheets

# get all sheet name
all_sheet_names = wb.get_sheet_names()
print(all_sheet_names)

Now, you can set active sheet by name

current_active_worksheet = wb.get_sheet_by_name('uk-500')
print(current_active_worksheet)

Following loop could help to access all sheet

# for loop to access all sheet inside workbook
for ws in wb:
	print(ws.title)

Access one cell from worksheet

With MS Excel:

One cell could be accessed simply specified by combination column and row. And then actual data could be accessed with value cell property.

# print out data at column A and row 2 of current worksheet
print(ws['A2'].value)
# set value for one cell
ws['A2'] = 'Tan Pham'
# print out new data
print(ws['A2'].value)

Access one row, column from worksheet

If we only specify just column, we could access all cells inside that column

# print all data from column A (the first column)
for cell in ws['A']:
	print(cell.value)

If we only specify just row, we could access all cells inside that row

# print all data from row 1, which are columns name
for cell in ws['1']:
	print(cell.value)

Access all rows and columns from worksheet

Incase we want to scan the whole sheet, openpyl provide property call rows and columns.

# access all data , row by row
for row in ws.rows:
    for cell in row:
        print(cell.value)
        
# access all data , column by column
for column in ws.columns:
	for cell in column:
		print(cell.value)

Save workbook

After doing modify data, to save workbook with name.

# save workbook with name
wb.save("500.xlsx")

Part 2 : Working with new xlsx file

Create a new workbook

To create a new empty workbook. A workbook is always created with at least one worksheet.

from openpyxl import Workbook
# create a new workbook
wb = Workbook()
# get the default worksheet
ws = wb.active
# set worksheet title
ws.title = "first sheet"

Add cell data

Cell data is set by combination of column and row.

# set value for first row as column name
ws['A1']='first column'
ws['B1']='second column'

# set value for second row
ws['A2']=1
ws['B2']=2

Save workbook

wb.save('new_file.xlsx')

Bonus : Document Reference Link

https://media.readthedocs.org/pdf/openpyxl/latest/openpyxl.pdf