Reading An Excel With Pandas Basing On Columns' Colors

October 30, 2024 Post a Comment

I have an xlsx file, with columns with various coloring. I want to read only the white columns of this excel in python using pandas, but I have no clues on hot to do this. I am ab

Solution 1:

(Disclosure: I'm one of the authors of the library I'm going to suggest)

With StyleFrame (that wraps pandas) you can read an excel file into a dataframe without loosing the style data.

Consider the following sheet:

And the following code:

Baca Juga

from styleframe import StyleFrame, utils
# from StyleFrame import StyleFrame, utils (if using version < 3.X)

sf = StyleFrame.read_excel('test.xlsx', read_style=True)
print(sf)

#          b  p                  y#     0  nan  3             1000.0#     1  3.0  4                2.0#     2  4.0  5  42902.72396767148

sf = sf[[col for col in sf.columns
         if col.style.fill.fgColor.rgb in ('FFFFFFFF', utils.colors.white)]]
         # "white" can be represented as 'FFFFFFFF' or# '00FFFFFF' (which is what utils.colors.white is set to)print(sf)

#          b#    0   nan#    1   3.0#    2   4.0

Solution 2:

This can not be done in pandas. You will need to use other library to read the xlsx file and determine what columns are white. I'd suggest using openpyxl library.

Then your script will follow this steps:

Open xlsx file
Read and filter the data (you can access the cell color) and save the results
Create pandas dataframe

Edit: Switched xlrd to openpyxl as xlrd is no longer actively maintained

tmahurin

Reading An Excel With Pandas Basing On Columns' Colors

Solution 1:

Solution 2:

Post a Comment for "Reading An Excel With Pandas Basing On Columns' Colors"

Widget HTML #3