[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

Need advice on gotchas on upgrading unicode db to 5.1



I'm working to bring the unicode database in perl up to the latest, 5.1. 
  Parts of it were upgraded earlier, but not all, and that presents a 
problem.

In particular the Property Value Aliases were not upgraded.  These 
include short names for various properties.  There were a number of new 
scripts defined for 5.1, such as for the Lydian language.  mktables 
looks for the abbreviation for a property and creates a file using that 
name.  If no abbreviation is found, it uses the full name.  This means 
that all the new scripts in 5.1 have been stored using their full names, 
instead of their accepted abbreviations since our list of abbreviations 
was out-of-date.  By bringing the abbreviations up to date, mktables 
generates a file using those instead of the full names.  Same content, 
different name.  So, for example, it generates Lydi.pl instead of 
Lydian.pl.

Since mktables.lst was not generated, they haven't been listed anywhere, 
and since the documentation was not upgraded, they aren't documented. 
But, it is possible I suppose for some programmer to have noticed their 
existence way down in lib/unicore/lib/gc_sc, and is using them.  I don't 
recall these as being documented as a public interface.  So, I want to 
know, is it ok to change their names, or should I create duplicate files 
for these 8 files, or something else?

Related, is that Unicode has decided at 5.1 to capitalize their 
preferred names for decomposition types, which we store in 
lib/unicore/lib/dt.  This means that those file names would also change, 
for example from sqr.pl to Sqr.pl.  I can hack up mktables to always do 
a lower case for these, but is it necessary?

Thanks


Follow-Ups from:
demerphq <demerphq@gmail.com>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]