Key Error When Using Regex Quantifier Python

August 20, 2024 Post a Comment

I am trying to capture words following specified stocks in a pandas df. I have several stocks in the format $IBM and am setting a python regex pattern to search each tweet for 3-5

Solution 1:

When you build your pattern, there is an empty alternative left at the end, so your pattern effectively matches any string, every empty space before non-matching texts.

You need to build the pattern like

(?:\$IBM|\$GOOGLE)\s+(\w+(?:\s+\S+){3,5})

You may use

pattern = r'(?:{})\s+(\w+(?:\s+\S+){{3,5}})'.format(
              "|".join(map(re.escape, stock_news['Word'])))

Mind that the literal curly braces inside an f-string or a format string must be doubled.

Regex details

(?:\$IBM|\$GOOGLE) - a non-capturing group matching either $IBM or $GOOGLE
\s+ - 1+ whitespaces
(\w+(?:\s+\S+){3,5}) - Capturing group 1 (when using str.findall, only this part will be returned):
- \w+ - 1+ word chars
- (?:\s+\S+){3,5} - a non-capturing* group matching three, four or five occurrences of 1+ whitespaces followed with 1+ non-whitespace characters

Note that non-capturing groups are meant to group some patterns, or quantify them, without actually allocating any memory buffer for the values they match, so that you could capture only what you need to return/keep.

tmahurin

Key Error When Using Regex Quantifier Python

Solution 1:

Post a Comment for "Key Error When Using Regex Quantifier Python"

Widget HTML #3