Skip to content Skip to sidebar Skip to footer

Pandas Add New "rank" Columns For Every Column

I have a df like so (actual df has 4.5 mil rows, 23 cols): group feature col1 col2 col3 g1 f1 1 10 100 g1 f1 11 9 1000 g1 f2 0 8

Solution 1:

I would use groupby on ['group', 'feature'] to produce an intermediary dataframe containing the sum, avg and max columns (not the ranks), and then again groupby on group only to produce the ranks.

Intermediary dataframe:

df2 = pd.concat([
    df.iloc[:,[0,1,2]].groupby(['group', 'feature']).sum(),
    df.iloc[:,[0,1,3]].groupby(['group', 'feature']).mean(),
    df.iloc[:,[0,1,4]].groupby(['group', 'feature']).max()
    ], axis=1)

The intermediary dataframe is:

               col1      col2  col3
group feature                      
g1    f1         129.5000001000
      f2          08.000000200
g2    f1          27.000000330
      f2          37.000000331
      f3          17.000000100
g3    f1          77.666667101

Now for the final dataframe:

df3 = df2.groupby('group').rank(method='min', ascending=False).reset_index()

which finally gives:

group feature  col1  col2  col3
0    g1      f1   1.01.01.01    g1      f2   2.02.02.02    g2      f1   2.01.02.03    g2      f2   1.01.01.04    g2      f3   3.01.03.05    g3      f1   1.01.01.0

For the second part of the question, I would just change the indexing of the intermediary dataframe, and compute ranks after grouping on 'feature':

dfx4 = dfx.reset_index().set_index(['feature', 'group']
                                   ).sort_index().groupby('feature').rank(
                                   method='min', ascending=False
                                   ).reset_index()

which gives:

  feature group  col1  col2  col3
0      f1    g1   1.01.01.01      f1    g2   3.03.02.02      f1    g3   2.02.03.03      f2    g1   2.01.02.04      f2    g2   1.02.01.05      f3    g2   1.01.01.0

Post a Comment for "Pandas Add New "rank" Columns For Every Column"