Pandas Add New "rank" Columns For Every Column
I have a df like so (actual df has 4.5 mil rows, 23 cols): group feature col1 col2 col3 g1 f1 1 10 100 g1 f1 11 9 1000 g1 f2 0 8
Solution 1:
I would use groupby
on ['group', 'feature']
to produce an intermediary dataframe containing the sum, avg and max columns (not the ranks), and then again groupby
on group
only to produce the ranks.
Intermediary dataframe:
df2 = pd.concat([
df.iloc[:,[0,1,2]].groupby(['group', 'feature']).sum(),
df.iloc[:,[0,1,3]].groupby(['group', 'feature']).mean(),
df.iloc[:,[0,1,4]].groupby(['group', 'feature']).max()
], axis=1)
The intermediary dataframe is:
col1 col2 col3
group feature
g1 f1 129.5000001000
f2 08.000000200
g2 f1 27.000000330
f2 37.000000331
f3 17.000000100
g3 f1 77.666667101
Now for the final dataframe:
df3 = df2.groupby('group').rank(method='min', ascending=False).reset_index()
which finally gives:
group feature col1 col2 col3
0 g1 f1 1.01.01.01 g1 f2 2.02.02.02 g2 f1 2.01.02.03 g2 f2 1.01.01.04 g2 f3 3.01.03.05 g3 f1 1.01.01.0
For the second part of the question, I would just change the indexing of the intermediary dataframe, and compute ranks after grouping on 'feature'
:
dfx4 = dfx.reset_index().set_index(['feature', 'group']
).sort_index().groupby('feature').rank(
method='min', ascending=False
).reset_index()
which gives:
feature group col1 col2 col3
0 f1 g1 1.01.01.01 f1 g2 3.03.02.02 f1 g3 2.02.03.03 f2 g1 2.01.02.04 f2 g2 1.02.01.05 f3 g2 1.01.01.0
Post a Comment for "Pandas Add New "rank" Columns For Every Column"