[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

[perl #59616] FOLDCHAR regop not produced for \x, \0, \N{U+....}



# New Ticket Created by  karl williamson 
# Please include the string:  [perl #59616]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=59616 >


The FOLDCHAR regop is used for case insensitive matching of 3 
characters, including LATIN SMALL LETTER SHARP S.  The problem is that 
it does not get applied if those characters are placed in the pattern 
using hex, octal, or the character name format with the name of the form 
\N{U+...}

 From reading the list archive about this, it appears that people 
thought about the hex form, but from my reading the code and doing the 
following experiment, it doesn't work.

use charnames ':full';
use utf8;
print __LINE__, " ", ("ss" =~ /ß/i ? "yes" : "no"), "\n";
print __LINE__, " ", ("ss" =~ /\xdf/i ? "yes" : "no"), "\n";
print __LINE__, " ", ("\x100ss" =~ /\x100ß/i ? "yes" : "no"), "\n";
print __LINE__, " ", ("\x100ss" =~ /\x100\xdf/i ? "yes" : "no"), "\n";
print __LINE__, " ", ("\x100ss" =~ /\x100\337/i ? "yes" : "no"), "\n";
print __LINE__, " ", ("\x100ss" =~ /\x100\N{U+00DF}/i ? "yes" : "no"), "\n";
print __LINE__, " ", ("\x100ss" =~ /\x100\N{LATIN SMALL LETTER SHARP 
S}/i ? "yes" : "no"), "\n";



yields on my computer:
  3 yes
  4 no
  5 yes
  6 no
  7 no
  8 no
  9 yes

The no on line 4 comes from the current situation that something must be 
stored as utf8 in order to enable the full unicode semantics.  But the 
no's on lines 6-8 shouldn't be there, because the \x100 means that the 
pattern should be stored as utf8.  I haven't analyzed why using the full 
character name should work, when the U+ form does not.

I'm using
Summary of my perl5 (revision 5 version 11 subversion 0 patch 34418) 
configuration:
   Platform:
     osname=linux, osvers=2.6.24-19-generic, archname=i686-linux
     uname='linux karl 2.6.24-19-generic #1 smp wed aug 20 22:56:21 utc 
2008 i686 gnulinux '
     config_args=''
     hint=recommended, useposix=true, d_sigaction=define
     useithreads=undef, usemultiplicity=undef
     useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
     use64bitint=undef, use64bitall=undef, uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector 
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
     optimize='-O2',
     cppflags='-fno-strict-aliasing -pipe -fstack-protector 
-I/usr/local/include'
     ccversion='', gccversion='4.2.3 (Ubuntu 4.2.3-2ubuntu7)', 
gccosandvers=''
     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
     alignbytes=4, prototype=define
   Linker and Libraries:
     ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
     libpth=/usr/local/lib /lib /usr/lib
     libs=-lnsl -ldl -lm -lcrypt -lutil -lc
     perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
     libc=/lib/libc-2.7.so, so=so, useshrplib=false, libperl=libperl.a
     gnulibc_version='2.7'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
     cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib 
-fstack-protector'


Characteristics of this binary (from libperl):
   Compile-time options: PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
                         USE_LARGE_FILES USE_PERLIO
   Locally applied patches:
         DEVEL
   Built under linux
   Compiled at Sep 25 2008 12:31:18
   @INC:
     /home/khw/perl5.11/lib/5.11.0/i686-linux
     /home/khw/perl5.11/lib/5.11.0
     /home/khw/localperl/lib/site_perl/5.11.0/i686-linux
     /home/khw/localperl/lib/site_perl/5.11.0
     /home/khw/localperl/lib/site_perl
     .



[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]