[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]
use bytes pragma
Glenn Linderman wrote:
> On approximately 10/7/2008 7:05 AM, came the following characters from
> the keyboard of David Nicol:
>> On Mon, Oct 6, 2008 at 11:04 PM, Glenn Linderman <perl@nevcal.com> wrote:
>>> \w is meaningless on binary data, for example, although character
>>> classes (could
>>> be called byte classes) could still be useful without character
>>> semantics.
>>
>> Lets say one is faced with a legacy delimited file that uses 0xFF for
>> a separator. Running
>>
>> use bytes;
>> @strings = $data =~ /(\w+)/g;
>>
>> could be handy.
>
>
> I guess your legacy delimited file is intended to be ASCII text, with
> each string delimited by 0xFF, but that is only a guess, since you
> didn't make it clear.
>
> @strings = split ( /\xFF/, $data )
>
> would do the same job, be independent of "use bytes;", and allow for
> punctuation and control characters in the @strings. You didn't state
> that the @strings should contain only alphanumerics, but your code does.
> Of course, even if the @strings are supposed to only contain
> alphanumerics, your code would treat punctuation and control characters
> as additional delimiters and not only ignore the error case, but make it
> impossible to detect without reexamining $data. My code would treat
> only \xFF as delimiters (per your specification), and then additional
> code could be written to check the resulting @strings for validity as
> appropriate.
>
> You'll need to contrive a more useful, and more completely specified
> example to be convincing.
>
>
I don't want to get embroiled at this time with changes to this pragma.
I notice that in uniintro it says "...the "bytes" pragma and its only
defined function "length()"..." If I understand this correctly, it
supports Glenn's position as to the original intent of this pragma, but
it is also wrong, as there are a number of other functions defined on
it, such as substr().
The code I have looked at is permeated with hooks to make this pragma
work as if one were in the C locale.
So I think it is unwise to change it at this time. I wonder how often
this pragma is used in practice.
- References to:
-
karl williamson <public@khwilliamson.com>
Glenn Linderman <perl@NevCal.com>
"David Nicol" <davidnicol@gmail.com>
Glenn Linderman <perl@NevCal.com>
[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]