Reading An Excel With Pandas Basing On Columns' Colors
I have an xlsx file, with columns with various coloring. I want to read only the white columns of this excel in python using pandas, but I have no clues on hot to do this. I am ab
Solution 1:
(Disclosure: I'm one of the authors of the library I'm going to suggest)
With StyleFrame (that wraps pandas) you can read an excel file into a dataframe without loosing the style data.
Consider the following sheet:
And the following code:
from styleframe import StyleFrame, utils
# from StyleFrame import StyleFrame, utils (if using version < 3.X)
sf = StyleFrame.read_excel('test.xlsx', read_style=True)
print(sf)
# b p y# 0 nan 3 1000.0# 1 3.0 4 2.0# 2 4.0 5 42902.72396767148
sf = sf[[col for col in sf.columns
if col.style.fill.fgColor.rgb in ('FFFFFFFF', utils.colors.white)]]
# "white" can be represented as 'FFFFFFFF' or# '00FFFFFF' (which is what utils.colors.white is set to)print(sf)
# b# 0 nan# 1 3.0# 2 4.0
Solution 2:
This can not be done in pandas. You will need to use other library to read the xlsx file and determine what columns are white. I'd suggest using openpyxl
library.
Then your script will follow this steps:
- Open xlsx file
- Read and filter the data (you can access the cell color) and save the results
- Create pandas dataframe
Edit: Switched xlrd
to openpyxl
as xlrd
is no longer actively maintained
Post a Comment for "Reading An Excel With Pandas Basing On Columns' Colors"