Skip to content Skip to sidebar Skip to footer

Apply Python Function To One Pandas Column And Apply The Output To Multiple Columns

Hello Community, I have read so many answers and blogs yet I am not able to figure out what simple thing I am missing out!. I am using 'conditions' function to define all the condi

Solution 1:

One way around this is with a list comprehension :

df[['cat', 'subcat']] = [("insufficient", "resolution")  ifword== "Category1"else 
                         ("insufficient", "information") ifword== "Category2"else
                         ("Duplicate", "ID repeated")    ifword== "Category3"else 
                         ("NA", "NA")
                         for word in df.remark]

  remark      desc               cat         subcat
0   NA        Present          NA              NA
1   NA        Present          NA              NA
2   Category1   NA          insufficient    resolution
3   Category2   Present     insufficient    information
4   Category3   NA          Duplicate       ID repeated

@dm2's answer shows how to pull it off with your function. The first apply(conditions) creates a series containing tuples, the second apply creates individual columns, forming a dataframe that you can then assign to cat and subcat.

The reason why I suggest a list comprehension is because, one you are dealing with Strings, and in Pandas, working with strings via vanilla python is more often than not faster. Also, with the list comprehension the processing is done once, you do not need to apply the conditions function and then call pd.Series. That gives you a faster speed. Testing will assert or debunk this.

Solution 2:

You could do:

 df[['cat','subcat']] = df['remark'].apply(conditions).apply(pd.Series)

Output:

  remark      desc               cat         subcat
0NA        Present          NANA1NA        Present          NANA2   Category1   NA          insufficient    resolution
3   Category2   Present     insufficient    information
4   Category3   NA          Duplicate       ID repeated

Edit: This might be the simpler way to apply your function that you already have, but in case you have a huge DataFrame, for faster code check out the answer by @sammywemmy using list comprehension.

Solution 3:

You're passing the entire dataframe where you just need to pass the lambda variable (x).

df[['cat','subcat']] = df['remark'].apply(lambda x: pd.Series([*conditions(x)]))

* on iterables can unpack them so you don't need to call the same function twice to extract output. Perhaps the compiler resolves this but I don't think so...

Solution 4:

You can use series.replace with a mapping dictionary

df['cat'] = df.remark.replace({'Category1': 'insufficient',
    'Category2': 'insufficient', 'Category3': 'Duplicate'})
df['subcat'] = df.remark.replace({'Category1': 'resolution',
    'Category2': 'information', 'Category3': 'ID repeated'})

print(df)
      remark     desc           cat       subcat
0         NA  Present            NA           NA
1         NA  Present            NA           NA
2  Category1       NA  insufficient   resolution
3  Category2  Present  insufficient  information
4  Category3       NA     Duplicate  ID repeated

Post a Comment for "Apply Python Function To One Pandas Column And Apply The Output To Multiple Columns"