Pyparsing Nestedexpr And Nested Parentheses

October 23, 2024 Post a Comment

I am working on a very simple 'querying syntax' usable by people with reasonable technical skills (i.e., not coders per se, but able to touch on the subject) A typical example of w

Solution 1:

nestedExpr is a convenience expression in pyparsing, to make it easy to define text with matched opening and closing characters. When you want to parse the nested contents, then nestedExpr is usually not well structured enough.

The query syntax you are trying to parse is better served using pyparsing's infixNotation method. You can see several examples at the pyparsing wiki's Examples page - SimpleBool is is very similar to what you are parsing.

"Infix notation" is a general parsing term for expressions where the operator is in between its related operands (vs. "postfix notation" where the operator follows the operands, as in "2 3 +" instead of "2 + 3"; or "prefix notation" which looks like "+ 2 3"). Operators can have an order of precedence in evaluation that can override left-to-right order - for instance, in "2 + 3 * 4", precedence of operations dictates that multiplication gets evaluated before addition. Infix notation also supports using parentheses or other grouping characters to override that precedence, as in "(2 + 3) * 4" to force the addition operation to be done first.

pyparsing's infixNotation method takes a base operand expression, and then a list of operator definition tuples, in order of precedence. For instance, 4-function integer arithmetic would look like:

parser = infixNotation(integer,
             [
             (oneOf('* /'), 2, opAssoc.LEFT),
             (oneOf('+ -'), 2, opAssoc.LEFT),
             ])

Meaning that we will parse integer operands, with '*' and '/' binary left-associative operations and '+' and '-' binary operations, in that order. Support for parentheses to override the order is built into infixNotation.

Query strings are often some combination of boolean operations NOT, AND, and OR, and typically evaluated in that order of precedence. In your case, the operands for these operators are comparison expressions, like "address = street" or "age between [20,30]". So if you define an expression for a comparison expression, of the form fieldname operator value, then you can use infixNotation to do the right grouping of AND's and OR's:

import pyparsing as ppquery_expr= pp.infixNotation(comparison_expr,
                [
                    (NOT, 1, pp.opAssoc.RIGHT,),
                    (AND, 2, pp.opAssoc.LEFT,),
                    (OR, 2, pp.opAssoc.LEFT,),
                ])

Finally, I suggest you define a class to take the comparison tokens as class init args, then you can attach behavior to that class to evaluate the comparisons and output debug strings, something like:

classComparisonExpr:def__init__(self, tokens):
        self.tokens = tokens

    def__str__(self):
        return"Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(
                            *self.tokens.asList())

# attach the class to the comparison expression
comparison_expr.addParseAction(ComparisonExpr)

Then you can get output like:

query_expr.parseString(sample).pprint()

[[Comparison:({'field': 'address', 'operator': 'like', 'value': 'street'}),
  'AND',
  Comparison:({'field': 'vote', 'operator': '=', 'value': True}),
  'AND',
  [[Comparison:({'field': 'age', 'operator': '>=', 'value': 25}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'M'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'F'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': '>=', 'value': 70}),
    'AND',
    Comparison:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]

The SimpleBool.py example has more details to show you how to create this class, and related classes for NOT, AND, and OR operators.

EDIT:

"Is there a way to return RESULT with dictionaries and not ComparisonExpr instances?" The __repr__ method on your ComparisonExpr class is being called instead of __str__. Easiest solution is to add to your class:

__repr__ = __str__

Or just rename __str__ to __repr__.

"The only thing unknown left is for me to turn 'true' into True and '[20,30]' into [20, 30]"

Try:

CK = CaselessKeyword  # 'cause I'm lazy
bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')
LBRACK,RBRACK = map(Suppress, "[]")
# parse numbers using pyparsing_common.number, which includes the str->int conversion parse action
num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)

Then add these to your VALUE expression:

VALUE = bool_literal | num_list | Word(unicode_printables)

Lastly:

from pprint import pprint
pprint(RESULT)

I got so tired of importing pprint all the time to do just this, I just added it to the API for ParseResults. Try:

RESULT.pprint()  # no import required on your part

print(RESULT.dump()) # will also show indented list of named fields

EDIT2

LASTLY, results names are good to learn. If you make this change to COMPARISON, everything still works as you have it:

COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')

But now you can write:

defasDict(self):
    returnself.tokens.asDict()

And you can access the parsed values by name instead of index position (either using result['field'] notation or result.field notation).

tmahurin

Pyparsing Nestedexpr And Nested Parentheses

Solution 1:

Post a Comment for "Pyparsing Nestedexpr And Nested Parentheses"

Widget HTML #3