[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

Questions about regcomp.c



Comments in the code say that the handler for [\N{...}] (only in
character classes) loses all but the first for multiple code points
returned.  But I don't understand how this can possibly happen since
\N{} returns a single code point, unless perhaps a user-defined handler
can return a longer string, which it appears it can from the
documentation.  Is this correct, or am I missing something.

The FOLDCHAR node is not invoked for hex, octal, and \N{} (Bug #59616).
  The problem is that the tests for this are done before the code
determines that the code point warrants invoking this node, and for hex
and octal, another node type has already been emitted by this time.  I
see precedent in the code for overwriting a node with another one when
the code changes its mind about what type is needed.  But would there be
a better way to solve this problem?

Also, there are hard-coded hex constants in regcomp.c that are used as
cases in switches to determine if FOLDCHAR should be invoked.  It
appears to me that these are only valid for ASCII-ish machines.  Am I
correct?


Follow-Ups from:
demerphq <demerphq@gmail.com>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]