Key Error When Using Regex Quantifier Python
I am trying to capture words following specified stocks in a pandas df. I have several stocks in the format $IBM and am setting a python regex pattern to search each tweet for 3-5
Solution 1:
When you build your pattern, there is an empty alternative left at the end, so your pattern effectively matches any string, every empty space before non-matching texts.
You need to build the pattern like
(?:\$IBM|\$GOOGLE)\s+(\w+(?:\s+\S+){3,5})
You may use
pattern = r'(?:{})\s+(\w+(?:\s+\S+){{3,5}})'.format(
"|".join(map(re.escape, stock_news['Word'])))
Mind that the literal curly braces inside an f-string or a format string must be doubled.
Regex details
(?:\$IBM|\$GOOGLE)
- a non-capturing group matching either$IBM
or$GOOGLE
\s+
- 1+ whitespaces(\w+(?:\s+\S+){3,5})
- Capturing group 1 (when usingstr.findall
, only this part will be returned):\w+
- 1+ word chars(?:\s+\S+){3,5}
- a non-capturing* group matching three, four or five occurrences of 1+ whitespaces followed with 1+ non-whitespace characters
Note that non-capturing groups are meant to group some patterns, or quantify them, without actually allocating any memory buffer for the values they match, so that you could capture only what you need to return/keep.
Post a Comment for "Key Error When Using Regex Quantifier Python"