[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

odd behaviour of substr(lc($utf8)) in 5.8.8



Hi folks,

I don't know if this is a bug but it sure is odd - I'm more of a perl 
user than core developer and I haven't found anyone or any documents 
that can explain this one.

If it's been fixed recently please point me towards the change logs!

(see below for perl -V output)

--------------------------------------------------Test case:
#!/usr/bin/perl
use utf8;  # so we can add UTF-8 chars below
use strict;
use warnings;

my @words=split(' ','Some £long £piece of£ text containing UTF-8, A 
Really£ Long£ £Line £of órandom nicé and fruití words which have VERY 
LITTLE MEANING and are ALL JUMBLED UP by the code, this should be 
interesting to watch');

for(my $i=0;$i<10;$i++){

    # generate a random string from the words above:
    my $long_string='';
    for(my $n=0;$n<6;$n++){
        $long_string.=$words[rand(scalar(@words))].' ';
    }

    my $short1=substr(lc($long_string),0,256);

    my $short2=lc($long_string);
    $short2=substr($short2,0,256); # in theory should be == $short1

    if($short1 ne $short2){
            print "DIFFERENT at iteration 
$i:\n\t\tshort1=$short1\n\t\tshort2=$short2\n";
    }
}
print "\n";

---------------------------------------------------------------------------

Try running it a few times - on various machines I've tried, this will 
show $short1 being randomly truncated - sometimes not at all in the 10 
iterations - on no particular boundary, and $short2 exactly as 
expected.  It's specific to the construct  substr( lc( $utf8_text ), n, n ).

----------------------------------------------------------------------------

Perl -V output:

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.15.7, archname=x86_64-linux-gnu-thread-multi
    uname='linux king 2.6.15.7 #1 smp sun sep 23 13:51:52 utc 2007 
x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN 
-Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr 
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local 
-Dsitelib=/usr/local/share/perl/5.8.8 
-Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl 
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm 
-Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN 
-fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.1.2 (Ubuntu 4.1.2-0ubuntu4)', 
gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.5.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
    gnulibc_version='2.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
                        PERL_MALLOC_WRAP THREADS_HAVE_PIDS USE_64_BIT_ALL
                        USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
                        USE_PERLIO USE_REENTRANT_API
  Built under linux
  Compiled at Dec  4 2007 09:01:45
  @INC:
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl

---------------------------------------------------------------------

any advice appreciated,

John



Follow-Ups from:
Nicholas Clark <nick@ccl4.org>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]