Skip to content Skip to sidebar Skip to footer

Reading An Excel With Pandas Basing On Columns' Colors

I have an xlsx file, with columns with various coloring. I want to read only the white columns of this excel in python using pandas, but I have no clues on hot to do this. I am ab

Solution 1:

(Disclosure: I'm one of the authors of the library I'm going to suggest)

With StyleFrame (that wraps pandas) you can read an excel file into a dataframe without loosing the style data.

Consider the following sheet:

enter image description here

And the following code:

from styleframe import StyleFrame, utils
# from StyleFrame import StyleFrame, utils (if using version < 3.X)

sf = StyleFrame.read_excel('test.xlsx', read_style=True)
print(sf)

#          b  p                  y#     0  nan  3             1000.0#     1  3.0  4                2.0#     2  4.0  5  42902.72396767148

sf = sf[[col for col in sf.columns
         if col.style.fill.fgColor.rgb in ('FFFFFFFF', utils.colors.white)]]
         # "white" can be represented as 'FFFFFFFF' or# '00FFFFFF' (which is what utils.colors.white is set to)print(sf)

#          b#    0   nan#    1   3.0#    2   4.0

Solution 2:

This can not be done in pandas. You will need to use other library to read the xlsx file and determine what columns are white. I'd suggest using openpyxl library.

Then your script will follow this steps:

  1. Open xlsx file
  2. Read and filter the data (you can access the cell color) and save the results
  3. Create pandas dataframe

Edit: Switched xlrd to openpyxl as xlrd is no longer actively maintained

Post a Comment for "Reading An Excel With Pandas Basing On Columns' Colors"