Apply Python Function To One Pandas Column And Apply The Output To Multiple Columns
Solution 1:
One way around this is with a list comprehension :
df[['cat', 'subcat']] = [("insufficient", "resolution") ifword== "Category1"else
("insufficient", "information") ifword== "Category2"else
("Duplicate", "ID repeated") ifword== "Category3"else
("NA", "NA")
for word in df.remark]
remark desc cat subcat
0 NA Present NA NA
1 NA Present NA NA
2 Category1 NA insufficient resolution
3 Category2 Present insufficient information
4 Category3 NA Duplicate ID repeated
@dm2's answer shows how to pull it off with your function. The first apply(conditions)
creates a series containing tuples, the second apply
creates individual columns, forming a dataframe that you can then assign to cat
and subcat
.
The reason why I suggest a list comprehension is because, one you are dealing with Strings, and in Pandas, working with strings via vanilla python is more often than not faster. Also, with the list comprehension the processing is done once, you do not need to apply the conditions function and then call pd.Series
. That gives you a faster speed. Testing will assert or debunk this.
Solution 2:
You could do:
df[['cat','subcat']] = df['remark'].apply(conditions).apply(pd.Series)
Output:
remark desc cat subcat
0NA Present NANA1NA Present NANA2 Category1 NA insufficient resolution
3 Category2 Present insufficient information
4 Category3 NA Duplicate ID repeated
Edit: This might be the simpler way to apply your function that you already have, but in case you have a huge DataFrame, for faster code check out the answer by @sammywemmy using list comprehension.
Solution 3:
You're passing the entire dataframe
where you just need to pass the lambda variable (x
).
df[['cat','subcat']] = df['remark'].apply(lambda x: pd.Series([*conditions(x)]))
*
on iterables can unpack
them so you don't need to call the same function twice to extract output. Perhaps the compiler resolves this but I don't think so...
Solution 4:
You can use series.replace
with a mapping dictionary
df['cat'] = df.remark.replace({'Category1': 'insufficient',
'Category2': 'insufficient', 'Category3': 'Duplicate'})
df['subcat'] = df.remark.replace({'Category1': 'resolution',
'Category2': 'information', 'Category3': 'ID repeated'})
print(df)
remark desc cat subcat
0 NA Present NA NA
1 NA Present NA NA
2 Category1 NA insufficient resolution
3 Category2 Present insufficient information
4 Category3 NA Duplicate ID repeated
Post a Comment for "Apply Python Function To One Pandas Column And Apply The Output To Multiple Columns"