Data::Rlist - A lightweight data language for Perl and C++
    use Data::Rlist;
File and string I/O for any Perl data $thing:
    ### Compile data as text.
                  WriteData $thing, $filename;  # compile data into file
                  WriteData $thing, \$string;   # compile data into buffer
    $string_ref = WriteData $thing;             # dto.
    $string     = OutlineData $thing;           # compile printable text
    $string     = StringizeData $thing;         # compile text in a compact form (no newlines)
    $string     = SqueezeData $thing;           # compile text in a super-compact form (no whitespace)
    ### Parse data from text.
    $thing      = ReadData $filename;           # parse data from file
    $thing      = ReadData \$string;            # parse data from string buffer
ReadData, WriteData etc. are auto-exported functions. Alternately we use:
    ### Qualified functions to parse text.
    $thing      = Data::Rlist::read($filename);
    $thing      = Data::Rlist::read($string_ref);
    $thing      = Data::Rlist::read_string($string_or_string_ref);
    ### Qualified functions to compile data into text.
                  Data::Rlist::write($thing, $filename);
    $string_ref = Data::Rlist::write_string($thing);
    $string     = Data::Rlist::write_string_value($thing);
    ### Print data to STDOUT.
    PrintData $thing;
The object-oriented interface:
    ### For objects the '-output' attribute refers to a string buffer or is a filename.
    ### The '-data' attribute defines the value or reference to be compiled into text.
    $object     = new Data::Rlist(-data => $thing, -output => \$target)
    $string_ref = $object->write;           # compile into $target, return \$target
    $string_ref = $object->write_string;    # compile into new string ($target not touched)
    $string     = $object->write_string_value; # dto. but return string value
    ### Print data to STDOUT.
    print $object->write_string_value;
    print ${$object->write};                # returns \$target
    ### Set output file and write $thing to disk.
    $object->set(-output => ".foorc");
    $object->write;                         # write "./.foorc", return 1
    $object->write(".barrc");               # write "./.barrc" (the filename overrides -output)
    ### The '-input' attribute defines the text to be compiled, either as
    ### string reference or filename.
    $object->set(-input => \$input_string); # assign some text
    $thing      = $object->read;            # parse $input_string into Perl data
    $thing      = $object->read($other_string); # parse $other_string (the argument overrides -input)
    $object->set(-input => ".foorc");       # assign some input file
    $foorc      = $object->read;            # parse ".foorc"
    $barrc      = $object->read(".barrc");  # parse some other file
    $thing      = $object->read(\$string);  # parse some string buffer
    $thing      = $object->read_string($string_or_ref); # dto.
Create deep-copies of any Perl data. The metaphor ``keelhaul'' vividly connotes that $thing is stringified, then compiled back:
    ### Compile a value or ref $thing into text, then parse back into data.
    $reloaded   = KeelhaulData $thing;
    $reloaded   = Data::Rlist::keelhaul($thing);
    $object     = new Data::Rlist(-data => $thing);
    $reloaded   = $object->keelhaul;
Do deep-comparisons of any Perl data:
    ### Deep-compare $a and $b and get a description of all type/value differences.
    @diffs      = CompareData($a, $b);
For more information see compile, keelhaul, and deep_compare.
Random-Lists (Rlist) is a tag/value text format, which can ``stringify'' any data structure in 7-bit ASCII text. The basic types are lists and scalars. The syntax is similar, but not equal to Perl's. For example,
    ( "hello", "world" )
    { "hello" = "world"; }
designates two lists, the first of which is sequential, the second associative. The format...
- allows the definition of hierachical and constant data,
- has no user-defined types, no keywords, no variables,
- has no arithmetic expressions,
- uses 7-bit-ASCII character encoding and escape sequences,
- uses C-style numbers and strings,
- has an extremely minimal syntax implementable in any programming language and system.
You can write any Perl data structure into files as legible text. Like with CSV the lexical overhead of Rlist is minimal: files are merely data.
You can read compiled texts back in Perl and C++ programs. No information will be lost between different program languages, and floating-point numbers keep their precision.
You can also compile structured CSV text from Perl data, using special functions from this package that will keep numbers precise and properly quote strings.
Since Rlist has no user-defined types the data is structured out of simple scalars and lists. It is conceivable, however, to develop a simple type system and store type information along with the actual data. Otherwise the data structures are tacit consents between the users of the data. See also the implemenation notes for Perl and C++.
Rlist text uses the 7-bit-ASCII character set. The 95 printable character codes 32 to 126 occupy one character. Codes 0 to 31 and 127 to 255 require four characters each: the \ escape character followed by the octal code number. For example, the German Umlaut character ü (252) is translated into \374. An exception are the following codes:
    ASCII               ESCAPED AS
    -----               ----------
      9 tab               \t
     10 linefeed          \n
     13 return            \r
     34 quote     "       \"
     39 quote     '       \'
     92 backslash \       \\
Values are either scalars, array elements or the value of a pair. Each value is constant.
The default scalar value is the empty string "".  So in Perl undef is compiled into "".
Numbers constants adhere to the IEEE 754 syntax for integer- and floating-point numbers (i.e., the same lexical conventions as in C and C++ apply).
Strings constants consisting only of  [a-zA-Z_0-9-/~:.@] characters ``look like identifiers'' (aka
symbols) need  not to be  quoted.  Otherwise string  constants follow the C  language lexicography.
They strings must  be placed in double-quotes (single-quotes are not  allowed).  Quoted strings are
also escaped (i.e., characters are converted to the input character set of 7-bit ASCII).
You can define a string using a line-oriented form of quoting based on the UNIX shell here-document syntax and RFC 111. Multiline quoted strings can be expressed with
    <<DELIMITER
Following the sigil << an identifier specifies how to terminate the string scalar. The value of the scalar will be all lines following the current line down to the line starting with the delimiter (i.e., the delimiter must be at column 1). There must be no space between the sigil and the identifier.
EXAMPLES
Quoted strings:
    "Hello, World!"
Unquoted strings (symbols, identifiers):
    foobar   cogito.ergo.sum   Memento::mori
Here-document strings:
    <<hamlet
    "This above all: to thine own self be true". - (Act I, Scene III).
    hamlet
Integegers and floats:
    38   10e-6   -.7   3.141592653589793
For more information see is_symbol, is_number and escape7.
We have two types of lists: sequential (aka array) and associative (aka map, hash, dictionary).
EXAMPLES
Arrays:
    ( 1, 2, ( 3, "Audiatur et altera pars!" ) )
Maps:
    {
        key = value;
        standalone-key;
        Pi = 3.14159;
        "meta-syntactic names" = (foo, bar, "lorem ipsum", Acme, ___);
        var = {
            log = {
                messages = <<LOG;
    Nov 27 21:55:04 localhost kernel: TSC appears to be running slowly. Marking it as unstable
    Nov 27 22:34:27 localhost kernel: Uniform CD-ROM driver Revision: 3.20
    Nov 27 22:34:27 localhost kernel: Loading iSCSI transport class v2.0-724.<6>PNP: No PS/2 controller found. Probing ports directly.
    Nov 27 22:34:27 localhost kernel: wifi0: Atheros 5212: mem=0x26000000, irq=11
    LOG
            };
        };
    }
Binary data can be represented as base64-encoded string, or here-document string. For example,
    use MIME::Base64;
    $str = encode_base64($binary_buf);
The result $str will be a string broken into  lines of no more than 76 characters each; the 76th
character  will be  a  newline "\n".   Here  is a  complete  Perl program  that  creates a  file
random.rls:
    use MIME::Base64;
    use Data::Rlist;
    our $binary_data = join('', map { chr(int rand 256) } 1..300);
    our $sample = { random_string => encode_base64($binary_data) };
    WriteData $sample, 'random.rls';
These few lines create a file random.rls containing text like the following:
    {
        random_string = <<___
    w5BFJIB3UxX/NVQkpKkCxEulDJ0ZR3ku1dBw9iPu2UVNIr71Y0qsL4WxvR/rN8VgswNDygI0xelb
    aK3FytOrFg6c1EgaOtEudmUdCfGamjsRNHE2s5RiY0ZiaC5E5XCm9H087dAjUHPtOiZEpZVt3wAc
    KfoV97kETH3BU8/bFGOqscCIVLUwD9NIIBWtAw6m4evm42kNhDdQKA3dNXvhbI260pUzwXiLYg8q
    MDO8rSdcpL4Lm+tYikKrgCih9UxpWbfus+yHWIoKo/6tW4KFoufGFf3zcgnurYSSG2KRLKkmyEa+
    s19vvUNmjOH0j1Ph0ZTi2pFucIhok4krJi0B5yNbQStQaq23v7sTqNom/xdRgAITROUIoel5sQIn
    CqxenNM/M4uiUBV9OhyP
    ___
    ;
    }
Note that WriteData  uses the predefined "default" configuration,  which enables here-doc
strings.  See also the MIME::Base64 manpage.
Rlist text  can define embedded Perl  programs, called nanonscripts.  The  embedded program text
has the form of a here-document with the special delimiter
"perl".  After  the Rlist text has  been parsed you call  evaluate_nanoscripts to eval
all embedded Perl in the order of definiton.  The function arranges it that within the eval...
the $root variable refers to the root of the input, as unblessed array- or hash-reference;
the $this variable refers to the array or hash that stores the currently eval'd nanoscript;
the $where variable stores the name of the key, or the index, within $this.
The nanoscript can use this information to oriented itself within the parsed data, or even to modify the data in-place. The result of eval'ing will replace the nanoscript text. You can also eval the embedded Perl codes programmatically, using the nanoscripts and result functions.
EXAMPLES
Simple example of an Rlist text that hosts Perl code:
    (<<perl)
    print "Hello, World!";
    perl
Here is a more complex example that defines a list of nanoscripts, and evaluates them:
    use Data::Rlist;
    $data = join('', <DATA>);
    $data = EvaluateData \$data;
    __END__
    ( <<perl, <<perl, <<perl, <<perl )
    print "Hello World!\n"          # english
    perl
    print "Hallo Welt!\n"           # german
    perl
    print "Bonjour le monde!\n"     # french
    perl
    print "Olá mundo!\n"            # spanish
    perl
When we execute the above script the following output is printed before the script exits:
    Hello World!
    Hallo Welt!
    Bonjour le monde!
    Olá mundo!
Note  that  when  the  Rlist  text  after  __END__  is  placed  in  some_file,  we  can  call
EvaluateData("some_file") for the same effect.  The next example modifies the parsed data
in place.  Imagine a file this_file_modifies_itself with the following content:
    ( <<perl )
    ReadData(\\'{ foo = bar; }');
    perl
When we parse this file using
    $data = ReadData("this_file_modifies_itself");
to $data will be assigned the following Perl value
    [ "ReadData(\\'{ foo = bar; }');\n" ]
Next we call Data::Rlist::evaluate_nanoscripts() to ``morph'' this value into
    [ { 'foo' => 'bar' } ]
The same effect can be achieved in just one call
    $data = EvaluateData("this_file_modifies_itself");
Rlist supports multiple forms of comments: // or # single-line-comments, and /* */ multi-line-comments. You may use all three forms at will.
The core functions to cultivate package objects are new, dock, set and get. When a regular package function is called in object context some omitted arguments are read from object attributes. This is true for the following functions: read, write, read_string, write_string, read_csv, write_csv, read_conf, write_conf and keelhaul.
Unless called in object context the first argument has an indifferent meaning (i.e., it is no Data::Rlist reference). Then read expects an input file or string, write the data to compile etc.
Create a Data::Rlist object from the hash ATTRIBUTES. For example,
    $self = Data::Rlist->new(-input => 'this.dat',
                             -data => $thing,
                             -output => 'that.dat');
For this object the call $self->read() reads from this.dat, and $self->write() writes any Perl data $thing to that.dat.
REGULAR OBJECT ATTRIBUTES
-input => INPUT
-filter => FILTER
-filter_args => FILTER-ARGS
Defines what Rlist text to parse and how to preprocess an input file. INPUT is a filename or string reference. FILTER can be 1 to select the standard C preprocessor cpp. These attributes are applied by read, read_string, read_conf and read_csv.
-data => DATA
-options => OPTIONS
-output => OUTPUT
Defines the Perl data to be compiled into text (DATA), how it shall be compiled (OPTIONS) and where to store the compiled text (OUTPUT). When OUTPUT is string reference the compiled text will be stored in that string. When OUTPUT is undef a new string is created. When OUTPUT is a string value it is a filename. These attributes are applied by write, write_string, write_conf, write_csv and keelhaul.
-header => HEADER
Defines an array of text lines, each of which will by prefixed by a # and then written at the top of the output file.
-delimiter => DELIMITER
Defines the field delimiter for .csv-files. Applied by read_csv and read_conf.
-columns => STRINGS
Defines the column names for .csv-files to be written into the first line.
ATTRIBUTES THAT MASQUERADE PACKAGE GLOBALS
The attributes listed below raise new values for package globals for the time an object method runs.
-InputRecordSeparator => FLAG
Masquerades $/, which affects how lines are read and written to and from Rlist- and CSV-files. You may also set $/ by yourself. See perlport and perlvar.
-MaxDepth => INTEGER
-SafeCppMode => FLAG
-RoundScientific => FLAG
Masquerade $Data::Rlist::MaxDepth, $Data::Rlist::SafeCppMode and $Data::Rlist::RoundScientific.
-EchoStderr => FLAG
Print read errors and warnings message on STDERR (default: off).
-DefaultCsvDelimiter => REGEX
-DefaultConfDelimiter => REGEX
Masquerades $Data::Rlist::DefaultCsvDelimiter  and $Data::Rlist::DefaultConfDelimiter.  These
globals define  the default regexes  to use  when the -options  attribute does not  specifiy the
"delimiter" regex.  Applied by read_csv and read_conf.
-DefaultConfSeparator => STRING
Masquerades $Data::Rlist::DefaultConfSeparator,  the default string to use  when the -options
attribute   does  not  specifiy   the  "separator"  string.    Applied  by
write_conf.
Localize object SELF within the package and run SUB. This means that some of SELF's attribute masqquerade few package globals for the time SUB runs. SELF then locks the package, and $Data::Rlist::Locked is greater than 0.
Reset or initialize object attributes, then return SELF. Each ATTRIBUTE is a name/value-pair. See new for a list of valid names. For example,
    $obj->set(-input => \$str, -output => 'temp.rls', -options => 'squeezed');
Get some attribute NAME from object SELF. Unless NAME exists returns DEFAULT. The require method has no default value, hence it dies unless NAME exists. has returns true when NAME exists, false otherwise. For NAME the leading hyphen is optional. For example,
    $self->get('foo');          # returns $self->{-foo} or undef
    $self->get(-foo=>);         # dto.
    $self->get('foo', 42);      # returns $self->{-foo} or 42
Parse data from INPUT, which specifies some Rlist-text. See also errors, write.
PARAMETERS
INPUT shall be either
- some Rlist object created by new,
- a string reference, in which case read and read_string parse Rlist text from it,
- a string scalar, in which case read assumes a file to parse.
See open_input for the FILTER and FILTER-ARGS parameters, which are used to preprocess an input file. When an input file cannot be open'd and flock'd this function dies. When INPUT is an object, arguments for FILTER and FILTER-ARGS eventually override the -filter and -filter_args attributes.
RESULT
The parsed data as array- or hash-reference, or undef if there was no data. The latter may also be the case when file consist only of comments/whitespace.
NOTES
This function may die. Dying is Perl's mechanism to raise exceptions, which eventually can be catched with eval. For example,
    my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';
This code fragment traps the die exception, so that eval returns undef or the result of calling hostname. The following example uses eval to trap exceptions thrown by read:
    $object = new Data::Rlist(-input => $thingfile);
    $thing = eval { $object->read };
    unless (defined $thing) {
        if ($object->errors) {
            print STDERR "$thingfile has syntax errors"
        } else {
            print STDERR "$thingfile not found, is locked or empty"
        }
    } else {
        # Can use $thing
            .
            .
    }
Parse data from INPUT, which specifies some comma-separated-values (CSV) text. Both functions
- read data from strings or files,
- use an optional delimiter,
- ignore delimiters in quoted strings,
- ignore empty lines,
- ignore lines begun with #.
read_conf is a variant of read_csv dedicated to configuration files. Such files consist of lines of the form
    key = value
PARAMETERS
For INPUT see read. For FILTER, FILTER-ARGS see open_input.
OPTIONS  can be  used to  override the  "delimiter"  regex.  For  example, a
delimiter of '\s+'  splits the line at horizontal whitespace into  multiple values (with respect
of quoted strings).   For read_csv the delimiter defaults to  '\s*,\s*', and for read_conf
to '\s*=\s*'.  See also write_csv and write_conf.
RESULT
Both functions return a list of lists. Each embedded array defines the fields in a line.
EXAMPLES
Un/quoting of values happens implicitly. Given a file db.conf
    # Comment
    SERVER      = hostname
    DATABASE    = database_name
    LOGIN       = "user,password"
the call $opts=ReadConf("db.conf") assigns
    [ [ 'SERVER', 'hostname' ],
      [ 'DATABASE', 'database_name' ],
      [ 'LOGIN', 'user,password' ]
    ]
The WriteConf function can be used to create or update the configuration:
    push @$opts, [ 'MAGIC VALUE' => 3.14_15 ];
    WriteConf('db.conf', { precision => 2 });
This writes to db.conf:
    SERVER = hostname
    DATABASE = database_name
    LOGIN = "user,password"
    "MAGIC VALUE" = 3.14
Calls read to parse Rlist language productions from the string or string-reference INPUT. When INPUT is an object do this for its -input attribute.
Return the last result of calling read, which is either undef or some array- or hash-reference. When SELF is passed as object reference, returns the result that occured the last time SELF had called read.
In list context return an array of nanoscripts defined by the last call to read. When SELF is passed return this information for the last time SELF had called read. The result has the form:
    ( [ $hash_or_array_ref, $key_or_index ], # 1st nanoscript
      [ $hash_or_array_ref, $key_or_index ], # 2nd nanoscript
        .
        .
        .
    )
In scalar context return a reference to the above. This information defines the location of all embedded Perl scripts within the result, and can be used to eval them programmatically. See also result, evaluate_nanoscripts.
Evaluates all nanoscripts defined by the last call to read. When called as method evaluates the nanoscripts defined by the last time SELF had called read. Returns the number of scripts or 0 if none were available. Each script is replaced by the result of eval'ing it. (For details and examples see Embedded Perl Code (Nanoscripts).)
In list context returns a list of compile-time messages that occurred in the last call to read. In scalar context returns an array reference. When an package object SELF is passed returns the information for the last time SELF had called read.
Returns the number of syntax errors and warnings that occurred in the last call to read. When called as method returns the number that occured the last time SELF had called read.
Example:
    use Data::Rlist;
    our $data = ReadData 'things.rls';
    if (Data::Rlist::errors() || Data::Rlist::warnings()) {
        print join("\n", Data::Rlist::messages())
    } else {
        # Ok, $data is an array- or hash-reference.
        die unless $data;
    }
Returns the number of times the last compile violated $Data::Rlist::MaxDepth. When called as method returns the information for the last time SELF had called compile.
Returns true when the last call to parse yielded undef, because there was nothing to parse. When called as method returns the information for the last time SELF had called parse.
Transliterates Perl data into Rlist text and write the text to a file or string buffer. write is auto-exported as WriteData.
PARAMETERS
DATA is either an object generated by new, or any Perl data including undef. In case of an object the actual DATA value is defined by its -data attribute. (When -data refers to another Rlist object, this other object is invoked.)
OUTPUT defines the output location, as filename, string-reference or undef. When undef the function allocates a string and returns a reference to it. OUTPUT defaults to the -output attribute when DATA defines an object.
OPTIONS  define how  to compile  DATA: when  undef or  "fast" uses  compile_fast, when
"perl"  uses  compile_Perl,  otherwise   compile.   Defaults  to  the  -options
attribute when DATA is an object.
HEADER is a reference to an array of strings that shall be printed literally at the top of an output file. Defaults to the -header attribute when DATA is an object.
RESULT
When write creates a file it returns 0 for failure or 1 for success. Otherwise it returns a string reference.
EXAMPLES
    $self = new Data::Rlist(-data => $thing, -output => $output);
    $self->write;   # Compile $thing into a file ($output is a filename)
                    # or string ($output is a string reference).
    Data::Rlist::write($thing, $output);    # dto., but using the functional interface.
Write DATA as comma-separated-values (CSV) to file or string OUTPUT. write_conf writes configuration files where each line contains a tagname, a separator and a value.
PARAMETERS
DATA is either an object, or defines the data to be compiled as reference to an array of arrays. write_conf uses only the first and second fields. For example,
    [ [ a, b, c ],      # fields of line 1
      [ d, e, f, g ],   # fields line 2
        .
        .
    ]
OPTIONS  specifies  the  comma-separator  ("separator"),  how to  quote  ("auto_quote"),  the
linefeed ("eol_space") and the numeric precision ("precision").  COLUMNS specifies the column
names to be written to the first line.  Likewise  the text from the HEADER array is written in form
of #-comments at the top of an output file.
RESULT
When a file was created both function return 0 for failure, or 1 for success. Otherwise they return a reference to the compiled text.
EXAMPLES
Functional interface:
    use Data::Rlist;            # imports WriteCSV
    WriteCSV($thing, "foo.dat");
    WriteCSV($thing, "foo.dat", { separator => '; ' }, [qw/GBKNR VBKNR EL LaD/]);
    WriteCSV($thing, \$target_string);
    $string_ref = WriteCSV($thing);
Object-oriented interface:
    $object = new Data::Rlist(-data => $thing, -output => "foo.dat",
                              -options => { separator => '; ' },
                              -columns => [qw/GBKNR VBKNR EL LaD LaD_V/]);
    $object->write_csv;         # write $thing as CSV to foo.dat
    $object->write;             # write $thing as Rlist to foo.dat
    $object->set(-output => \$target_string);
    $object->write_csv;         # write $thing as CSV to $target_string
Stringify any Perl data  and return a reference to the string.   Works like write but always
compiles  to a  new string  to which  it  returns a  reference.  The  default for  OPTIONS will  be
"string".
Stringify  any  Perl  dats  and  return  the  compiled  text  string  value.   OPTIONS  default  to
"default".  For example,
    print "\n\$thing dumped: ", Data::Rlist::write_string_value($thing);
    $self = new Data::Rlist(-data => $thing);
    print "\nsame \$thing dumped: ", $self->write_string_value;
Do a deep copy of DATA according to OPTIONS. First the function compiles DATA to Rlist text, then restores the data from exactly this text. This process is called ``keelhauling data'', and allows us to
- adjust the accuracy of numbers,
- break circular-references,
- drop \*foo{THING}s,
- bring multiple data sets to the same, common basis.
It is useful (e.g.) when DATA had been hatched by some other code, and you don't know whether it is hierachical, or if typeglob-refs nist inside. Then keelhaul it to clean it from its past. For example, to bring all numbers in
    $thing = { foo => [ [ .00057260 ], -1.6804e-4 ] };
to a certain accuracy, use
    $deep_copy_of_thing = Data::Rlist::keelhaul($thing, { precision => 4 });
All number scalars in $thing are rounded to 4 decimal places, so they're finally comparable as floating-point numbers. To $deep_copy_of_thing is assigned the hash-reference
    { foo => [ [ 0.0006 ], -0.0002 ] }
Likewise one can convert all floats to integers:
    $make_integers = new Data::Rlist(-data => $thing, -options => { precision => 0 });
    $thing_without_floats = $make_integers->keelhaul;
When keelhaul is called in an array context it also returns the text from which the copy had been built. For example,
    $deep_copy = Data::Rlist::keelhaul($thing);
    ($deep_copy, $rlist_text) = Data::Rlist::keelhaul($thing);
    $deep_copy = new Data::Rlist(-data => $thing)->keelhaul;
DETAILS
keelhaul won't throw die nor return an error, but be prepared for the following effects:
ARRAY, HASH, SCALAR and REF references were compiled, whether blessed or not. (Since compiling does not store type information, keelhaul will turn blessed references into barbars again.)
IO, GLOB and FORMAT references have been converted into strings.
Depending on the compile options, CODE references are invoked, deparsed back into their function bodies, or dropped.
Depending on the compile options floats are rounded, or are converted to integers.
undef'd array elements are converted into the default scalar value "".
Unless $Data::Rlist::MaxDepth is 0, anything deeper than $Data::Rlist::MaxDepth will be thrown away.
When the data contains objects, no special methods are triggered to ``freeze'' and ``thaw'' the objects.
See also compile and deep_compare
Return   are   predefined   hash-reference    of   compile   otppns.    PREDEF-NAME   defaults   to
"default".
Completes OPTIONS  with BASICS, so that  all pairs not already  in OPTIONS are  copied from BASICS.
Always returns a new hash-reference, i.e., neither OPTIONS nor BASICS are modified.  Both arguments
define  hashes  or  some  predefined  options  name.   BASICS  defaults  to
"default".  For example,
    $options = complete_options({ precision => 0 }, 'squeezed')
merges  the  predefined  options  for  "squeezed" text  with  a  numeric
precision of 0  (converts all floats to  integers).
Open/close Rlist text file or string INPUT for parsing. Used internally by read and read_csv.
PREPROCESSING
The function can preprocess the INPUT file using FILTER. Use the special value 1 to select the default C preprocessor (gcc -E -Wp,-C). FILTER-ARGS is an optional string of additional command-line arguments to be appended to FILTER. For example,
    my $foo = Data::Rlist::read("foo", 1, "-DEXTRA")
eventually does not parse foo, but the output of the command
    gcc -E -Wp,-C -DEXTRA foo
Hence within foo now C-preprocessor-statements are allowed. For example,
    {
    #ifdef EXTRA
    #include "extra.rlist"
    #endif
        123 = (1, 2, 3);
        foobar = {
            .
            .
SAFE CPP MODE
This mode uses sed and a temporary file. It is enabled by setting $Data::Rlist::SafeCppMode to 1 (the default is 0). It protects single-line #-comments when FILTER begins with either gcc, g++ or cpp. open_input then additionally runs sed to convert all input lines beginning with whitespace plus the # character. Only the following cpp-commands are excluded, and only when they appear in column 1:
- #include and #pragma
- #define and #undef
- #if, #ifdef, #else and #endif.
For all other lines sed converts # into ##. This prevents the C preprocessor from evaluating them. Because of Perl's limited open function, which isn't able to dissolve long pipes, the invocation of sed requires a temporary file. The temporary file is created in the same directory as the input file. When you only use // and /* */ comments, however, this read mode is not required.
Lexical scanner. Called by parse to split the current line into tokens. lex reads # or // single-line-comment and /* */ multi-line-comment as regular white-spaces. Otherwise it returns tokens according to the following table:
    RESULT      MEANING
    ------      -------
    '{' '}'     Punctuation
    '(' ')'     Punctuation
    ','         Operator
    ';'         Punctuation
    '='         Operator
    'v'         Constant value as number, string, list or hash
    '??'        Error
    undef       EOF
lex appends all here-doc-lines with a newline character. For example,
        <<test1
        a
        b
        test1
is effectively read as "a\nb\n", which is the same value as the equivalent here-doc in Perl has.
So, not all  strings can be encoded as a  here-doc.  For example, it might not  be quite obvious to
many programmers that "foo\nbar" cannot be expressed as here-doc.
Read the next line of text from the current input. Return 0 if at_eof, otherwise return 1.
Return true if current input file/string is exhausted, false otherwise.
Read Rlist language productions from current input. This is a fast, non-recursive parser driven by the parser map %Data::Rlist::Rules, and fed by lex. It is called internally by read. parse returns an array- or hash-reference, or undef in case of parsing errors.
Build Rlist text from DATA:
Reference-types SCALAR, HASH, ARRAY and REF are compiled into text, whether blessed or not.
Reference-types CODE are compiled depending on the "code_refs" setting in
OPTIONS.
Reference-types GLOB (typeglob-refs), IO and FORMAT (file-
and  directory  handles) cannot  be  dissolved,  and are  compiled  into  the strings  "?GLOB?",
"?IO?" and "?FORMAT?".
undef'd values in arrays are compiled into the default Rlist "".
When FH is defined compile directly to this file and return 1.  Otherwise build a string and return
a reference  to it.  This is  the compilation function called  when the OPTIONS  argument passed to
write is not omitted, and is not "fast" or "perl".
Build Rlist text from DATA, as fast as actually possible with pure Perl:
Reference-types SCALAR, HASH, ARRAY and REF are compiled into text, whether blessed or not.
CODE, GLOB, IO and FORMAT are compiled into the strings "?CODE?", "?IO?",
"?GLOB?" and "?FORMAT?".
undef'd values in arrays are compiled into the default Rlist "".
compile_fast is  the default compilation  function. It is  called when you pass  undef or
"fast"  in  place of  the  OPTIONS  parameter  (see write,  write_string).   Since
compile_fast  considers no  compile options  it will  not call  code, round  numbers, detect
self-referential data etc.  Also compile_fast always compiles into a unique package variable
to which it returns a reference.
Like compile_fast,  but do not compile  Rlist text - compile  DATA into Perl  syntax. It can
then  be eval'd.   This renders  more compact,  and more  exact output  as  the Data::Dumper manpage. For
example, only  strings are quoted.  To  enable this compilation  function pass "perl" to  as the
OPTIONS argument, or set the -options attribute of package objects to this string.
The utility functions in this section are generally useful when handling stringified data. Internally quote7, escape7, is_integer etc. apply precompiled regexes and precomputed ASCII tables. split_quoted and parse_quoted simplify Text::ParseWords. round and equal are working solutions for floating-point numbers. deep_compare is a smart function to ``diff'' two Perl variables. All these functions are very fast and mature.
Returns true when a scalar looks like a positive or negative integer constant. The function applies the compiled regex $Data::Rlist::REInteger.
Test for strings that look like numbers. is_number can be used to test whether a scalar looks like a integer/float constant (numeric literal). The function applies the compiled regex $Data::Rlist::REFloat. Note that it doesn't match
- leading or trailing whitespace,
- lexical conventions such as the "0b" (binary), "0" (octal), "0x" (hex) prefix to denote a
  number-base other than decimal, and
- Perls' legible numbers, e.g. 3.14_15_92,
- the IEEE 754 notations of Infinite and NaN.
See also
    $ perldoc -q "whether a scalar is a number"
Test for symbolic names. is_symbol can be used to test whether a scalar looks like a symbolic name. Such strings need not to be quoted. Rlist defines symbolic names as a superset of C identifier names:
    [a-zA-Z_0-9]                    # C/C++ character set for identifiers
    [a-zA-Z_0-9\-/\~:\.@]           # Rlist character set for symbolic names
    [a-zA-Z_][a-zA-Z_0-9]*                  # match C/C++ identifier
    [a-zA-Z_\-/\~:@][a-zA-Z_0-9\-/\~:\.@]*  # match Rlist symbolic name
For example, names such as std::foo, msg.warnings, --verbose, calculation-info need not be quoted.
Returns true when a scalar is an integer, a number, a symbolic name or some quoted string.
The opposite of is_value. Such scalars will be turned into quoted strings by compile and compile_fast.
Converts TEXT into 7-bit-ASCII. All characters not in the set of the 95 printable ASCII characters are escaped. The following ASCII codes will be converted to escaped octal numbers, i.e. 3 digits prefixed by a slash:
    0x00 to 0x1F
    0x80 to 0xFF
    " ' \
The  difference  between  the  two  functions  is that  quote7  additionally  places  TEXT  into
double-quotes.    For  example,   quote7(qq'``Früher  Mittag\n''')   returns  "\"Fr\374her
Mittag\n\"", while escape7 returns \"Fr\374her Mittag\n\"
Return quote7(TEXT) if is_random_text(TEXT); otherwise (TEXT defines a symbolic name or number) return TEXT.
Return unquote7(TEXT) when TEXT is enclosed by double-quotes; otherwise returns TEXT.
Combines recipes 1.11 and 1.12 from the Perl Cookbook. HERE-DOC-STRING shall be a here-document. The function checks whether each line begins with a common prefix, and if so, strips that off. If no prefix it takes the amount of leading whitespace found the first line and removes that much off each subsequent line.
Unless  COLUMNS  is defined  returns  the  new here-doc-string.  Otherwise,  takes  the string  and
reformats it into  a paragraph having no line  more than COLUMNS characters long.  FIRSTTAB will be
the indent  for the first  line, DEFAULTTAB  the indent for  every subsequent line.  Unless passed,
FIRSTTAB and DEFAULTTAB default to the empty string "".
Divide the string INPUT into a list of strings.  DELIMITER is a regular expression specifying where
to split (default: '\s+').  The functions won't  split at DELIMITERs inside quotes, or which are
backslashed.
parse_quoted works like split_quoted but  additionally removes all quotes and backslashes
from   the   splitted   fields.    Both   functions   effectively   simplify   the   interface   of
Text::ParseWords.  In an array context they return  a list of substrings, otherwise the count of
substrings.    An  empty   array   is  returned   in   case  of   unbalanced  double-quotes,   e.g.
split_quoted('foo,"bar').
EXAMPLES
    sub split_and_list($) {
        print ($i++, " '$_'\n") foreach split_quoted(shift)
    }
    split_and_list(q("fee foo" bar))
        0 '"fee foo"'
        1 'bar'
    split_and_list(q("fee foo"\ bar))
        0 '"fee foo"\ bar'
The  default   DELIMITER  '\s+'  handles   newlines.   split_quoted("foo\nbar\n")  returns
('foo', 'bar',  '') and hence can  be used to to  split a large string  of unchomp'd input
lines into words:
    split_and_list("foo  \r\n bar\n")
        0 'foo'
        1 'bar'
        2 ''
The DELIMITER matches everywhere outside of quoted constructs, so in case of the default '\s+'
you may want to remove heading/trailing whitespace. Consider
    split_and_list("\nfoo")
    split_and_list("\tfoo")
        0 ''
        1 'foo'
and
    split_and_list(" foo ")
        0 ''
        1 'foo'
        2 ''
parse_quoted additionally removes all quotes and backslashes from the splitted fields:
    sub parse_and_list($) {
        print ($i++, " '$_'\n") foreach parse_quoted(shift)
    }
    parse_and_list(q("fee foo" bar))
        0 'fee foo'
        1 'bar'
    parse_and_list(q("fee foo"\ bar))
        0 'fee foo bar'
MORE EXAMPLES
String 'field\ one  "field\ two"':
    ('field\ one', '"field\ two"')  # split_quoted
    ('field one', 'field two')      # parse_quoted
String 'field\,one, field", two"' with a DELIMITER of '\s*,\s*':
    ('field\,one', 'field", two"')  # split_quoted
    ('field,one', 'field, two')     # parse_quoted
Split a large string $soup (mnemonic: slurped from a file) into lines, at LF or CR+LF:
    @lines = split_quoted($soup, '\r*\n');
Then transform all @lines by correctly splitting each line into ``naked'' values:
    @table = map { [ parse_quoted($_, '\s*,\s') ] } @lines
Here is some more complete code to parse a .csv-file with quoted fields, escaped commas:
    open my $fh, "foo.csv" or die $!;
    local $/;                   # enable localized slurp mode
    my $content = <$fh>;        # slurp whole file at once
    close $fh;
    my @lines = split_quoted($content, '\r*\n');
    die q(unbalanced " in input) unless @lines;
    my @table = map { [ map { parse_quoted($_, '\s*,\s') } ] } @lines
In core this is what read_csv does. deep_compare allows you to test what split_quoted and parse_quoted return. For example, the following code shall never die:
    croak if deep_compare([split_quoted("fee fie foo")], ['fee', 'fie', 'foo']);
    croak if deep_compare( parse_quoted('"fee fie foo"'), 1);
equal returns true if NUM1 and NUM2 are equal to PRECISION number of decimal places (default: 6). For details see round.
Compare and round floating-point numbers NUM1 and NUM2 (as string- or number scalars).
When the "precision" compile option is defined, round is called during compilation on all
numbers.
Normally round will return a number in fixed-point notation. When the package-global $Data::Rlist::RoundScientific is true, however, round formats the number in either normal or exponential (scientific) notation, whichever is more appropriate for its magnitude. This differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are not included. Also, the decimal point is not included on whole numbers. For example, round(42) does not return 42.000000, and round(0.12) returns 0.12, not 0.120000.
MACHINE ACCURACY
One needs a function like equal to compare floats, because IEEE 754 single- and double precision implementations are not absolute - in contrast to the numbers they actually represent. In all machines non-integer numbers are only an approximation to the numeric truth. In other words, they're not commutative. For example, given two floats a and b, the result of a+b might be different than that of b+a. For another example, it is a mathematical truth that a * b = b * a, but not necessarily in a computer.
Each machine has its own accuracy, called the machine epsilon, which is the difference between 1 and the smallest exactly representable number greater than one. Most of the time only floats can be compared that have been carried out to a certain number of decimal places. In general this is the case when two floats that result from a numeric operation are compared - but not two constants. (Constants are accurate through to lexical conventions of the language. The Perl and C syntaxes for numbers simply won't allow you to write down inaccurate numbers.)
See also recipes 2.2 and 2.3 in the Perl Cookbook.
EXAMPLES
    CALL                    RETURNS NUMBER
    ----                    --------------
    round('0.9957', 3)       0.996
    round(42, 2)             42
    round(0.12)              0.120000
    round(0.99, 2)           0.99
    round(0.991, 2)          0.99
    round(0.99, 1)           1.0
    round(1.096, 2)          1.10
    round(+.99950678)        0.999510
    round(-.00057260)       -0.000573
    round(-1.6804e-6)       -0.000002
Compare and analyze two numbers, strings or references. Generates a list of messages describing exactly all unequal data. Hence, for any Perl data $a and $b one can assert:
    croak "$a differs from $b" if deep_compare($a, $b);
When PRECISION is defined all numbers in A and B are round'd before actually comparing them. When TRACE_FLAG is true traces progress.
RESULT
Returns an array of messages, each describing unequal data, or data that cannot be compared because of type- or value-mismatching. The array is empty when deep comparison of A and B found no unequal numbers or strings, and only indifferent types.
EXAMPLES
The result is line-oriented, and for each mismatch it returns a single message. For a simple example,
    Data::Rlist::deep_compare(undef, 1)
yields
    <<undef>> cmp <<1>>   stop! 1st undefined, 2nd defined (1)
Forks a process and waits for completion. The function will extract the exit-code, test whether the process died and prints status messages on STDERR. fork_and_wait hence is a handy wrapper around the built-in system and exec functions. Returns an array of three values:
    ($exit_code, $failed, $coredump)
$exit_code is -1 when the program failed to execute (e.g. it wasn't found or the current user has insufficient rights). Otherwise $exit_code is between 0 and 255. When the program died on receipt of a signal (like SIGINT or SIGQUIT) then $signal stores it. When $coredump is true the program died and a core-file was written.
Concatenates and  forms all  TEXT strings  into a  symbolic name that  can be  used as  a pathname.
synthesize_pathname  is a  useful  function to  concatenate  strings and  nearby converting  all
characters that do  not qualify as filename-characters, into "_" and  "-".  The result cannot
only be used as file- or URL name, but also (coinstantaneously) as hash key, database name etc.
Make compile round all numbers to PLACES decimal places, by calling round on each scalar that looks like a number. By default PLACES is undef, which means floats are not rounded.
Causes compile to masquerade $Data::Rlist::RoundScientific. See round.
Defines  how compile  shall treat  CODE reference.   Legal values  for TOKEN  are  0 (the
default), "call" and "deparse".
- 0 compiles subroutine references into the string "?CODE?".
- "call" calls the code, then compiles the return value.
- "deparse" serializes the code using B::Deparse (reproducing the Perl source).
If enabled compile internally use multiple threads. Note that can speedup compilation only on machines with at least COUNT CPUs.
If enabled strings with at least two newlines in them are written as
here-document, when possible.  To qualify as here-document a string has to have
at least two LFs ("\n"), one of which must terminate it.
When true (default) do not quote strings that look like identifiers (see is_symbol). When false quote all strings. Hash keys are not affected.
write_csv and write_conf interpret this flag differently: false means not to quote at all; true quotes only strings that don't look like numbers and that aren't yet quoted.
When NUMBER is  greater than 0 use "eol_space"  (linefeed) to split data to many  lines. It will
insert a linefeed after every NUMBERth array value.
If enabled, and "outline_data" is also enabled, prints { and } on distinct lines when
compiling Perl hashes with at least one pair.
The comma-separator string to be used by write_csv.  The default is ','.
Field-delimiter for read_csv.  There is no  default value.  To read configuration files, for
example, you may use '\s*=\s*' or '\s+'. To read CSV-files use e.g. '\s*[,;]\s*'.
The following options format the generated Rlist; normally you don't want to modify them:
Count of physical, horizontal TAB characters to use at the begin-of-line per indentation level. Defaults to 1. Note that we don't use blanks, because they blow up the size of generated text without measure.
End-of-line string to  use (the linefeed).  For  example, legal values are "",  " ", "\n",
"\r\n" etc. The default  is undef, which means to use the current  value of $/.  Note that
this  is  a compile-option  that  only  affects compile.   When  parsing  files the  builtin
readline function is called, which uses $/.
String to write after ( and {, and before } and ) when compiling arrays and hashes.
Comma and semicolon strings, which shall be at least "," and ";".  No matter what,
compile will always print the "eol_space" string after the "semicolon_punct" string.
String to make up key/value-pairs. Defaults to " = ".
The OPTIONS parameter accepted by some package functions is either a hash-ref or the name of a predefined set:
Default if writing to a file.
Compact, no newlines/here-docs. Renders a ``string of data''.
Optimize the compiled Rlist for maximum readability.
Very compact, no whitespace at all. For very large Rlists.
Compile data in Perl syntax, using compile_Perl, not compile. The output then can be eval'd, but it cannot be read back.
Compile data as fast as possible, using compile_fast, not compile.
All  functions   that  define   an  OPTIONS   parameter  do   implicitly  call
complete_options to complete the argument from  one of the predefined sets, and additionally
from "default".   Therefore you can always  define nothing, or  a ``lazy subset of  options''. For
example,
    my $obj = new Data::Rlist(-data => $thing);
    $obj->write('thing.rls', { scientific => 1, precision => 8 });
Example:
    use Data::Rlist qw/:floats :strings/;
Imports maybe_quote7, quote7, escape7, unquote7, unescape7, unhere, is_random_text, is_number, is_symbol, split_quoted, and parse_quoted.
Imports predefined_options and complete_options.
Imports deep_compare, fork_and_wait and synthesize_pathname.
The following functions are implicitly imported into the callers symbol table. (But you may say require Data::Rlist instead of use Data::Rlist to prohibit auto-import. See also perlmod.)
These are aliases for Data::Rlist::read, Data::Rlist::read_csv and Data::Rlist::read_conf.
Like ReadData but implicitly call Data::Rlist::evaluate_nanoscripts in case parsing was successful.
These     are    aliases     for     Data::Rlist::write,    Data::Rlist::write_string
Data::Rlist::write_csv and Data::Rlist::write_conf.  OPTIONS default to "default".
These   are  aliases  for   Data::Rlist::write_string_value.   OutlineData   applies  the
predefined   "outlined"   options,   while   StringizeData   applies
"string" and SqueezeData() "squeezed".  When
specified, OPTIONS are merged into the.  For example,
    print "\n\$thing: ", OutlineData($thing, { precision => 12 });
rounds all numbers in $thing to 12 digits.
An alias for
    print OutlineData(DATA, OPTIONS);
These are aliases for keelhaul and deep_compare. For example,
    use Data::Rlist;
        .
        .
    my($copy, $as_text) = KeelhaulData($thing);
String- and number values:
    "Hello, World!"
    foo                         # compiles to { 'foo' => undef }
    3.1415                      # compiles to { 3.1415 => undef }
Array values:
    (1, a, 4, "b u z")          # list of numbers/strings
    ((1, 2),
     (3, 4))                    # list of list (4x4 matrix)
    ((1, a, 3, "foo bar"),
     (7, c, 0, ""))             # another list of lists
Here-document strings:
        $hello = ReadData(\<<HELLO)
        ( <<DEUTSCH, <<ENGLISH, <<FRANCAIS, <<CASTELLANO, <<KLINGON, <<BRAINF_CK )
    Hallo Welt!
    DEUTSCH
    Hello World!
    ENGLISH
    Bonjour le monde!
    FRANCAIS
    Ola mundo!
    CASTELLANO
    ~ nuqneH { ~ 'u' ~ nuqneH disp disp } name
    nuqneH
    KLINGON
    ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++
    ..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
    BRAINF_CK
    HELLO
Compiles $hello as
    [ "Hallo Welt!\n", "Hello World!\n", "Bonjour le monde!\n", "Ola mundo!\n",
      "~ nuqneH { ~ 'u' ~ nuqneH disp disp } name\n",
      "++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++\n..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.\n" ]
Configuration object as hash:
    {
        contribution_quantile = 0.99;
        default_only_mode = Y;
        number_of_runs = 10000;
        number_of_threads = 10;
        # etc.
    }
Altogether:
    Metaphysic-terms =
    {
        Numbers =
        {
            3.141592653589793 = "The ratio of a circle's circumference to its diameter.";
            2.718281828459045 = <<___;
The mathematical constant "e" is the unique real number such that the value of
the derivative (slope of the tangent line) of f(x) = e^x at the point x = 0 is
exactly 1.
___
            42 = "The Answer to Life, the Universe, and Everything.";
        };
        Words =
        {
            ACME = <<Value;
A fancy-free Company [that] Makes Everything: Wile E. Coyote's supplier of equipment and gadgets.
Value
            <<Key = <<Value;
foo bar foobar
Key
[JARGON] A widely used meta-syntactic variable; see foo for etymology.  Probably
originally propagated through DECsystem manuals [...] in 1960s and early 1970s;
confirmed sightings go back to 1972. [...]
Value
        };
    };
The Random Lists (Rlist) syntax is inspired by NeXTSTEP's Property Lists. But Rlist is simpler, more readable and more portable. The Perl and C++ implementations are fast, stable and free. Markus Felten, with whom I worked a few month in a project at Deutsche Bank, Frankfurt in summer 1998, arrested my attention on Property lists. He had implemented a Perl variant of it (http://search.cpan.org/search).
The term ``Random'' underlines the fact that the language
has four primitive/anonymuous types;
the basic building block is a list, which is combined at random with other lists.
Hence the term Random does not mean aimless or accidental. Random Lists are arbitrary lists.
The main difference between Data::Dumper and Data::Rlist is that scalars will be properly encoded as number or string. Data::Dumper writes numbers always as quoted strings, for example
    $VAR1 = {
                'configuration' => {
                                    'verbose' => 'Y',
                                    'importance_sampling_loss_quantile' => '0.04',
                                    'distribution_loss_unit' => '100',
                                    'default_only' => 'Y',
                                    'num_threads' => '5',
                                            .
                                            .
                                   }
            };
where Data::Rlist writes
    {
        configuration = {
            verbose = Y;
            importance_sampling_loss_quantile = 0.04;
            distribution_loss_unit = 100;
            default_only = Y;
            num_threads = 5;
                .
                .
        };
    }
As one can see Data::Dumper writes the data right in Perl syntax, which means the dumped text can be simply eval'd, and the data can be restored very fast. Rlists are not quite Perl-syntax: a dedicated parser is required. But therefore Rlist text is portable and can be read from other programming languages such as C++.
With $Data::Dumper::Useqq enabled it was observed that Data::Dumper renders output significantly slower than compile. This is actually suprising, since Data::Rlist tests for each scalar whether it is numeric, and truely quotes/escapes strings. Data::Dumper quotes all scalars (including numbers), and it does not escape strings. This may also result in some odd behaviors. For example,
    use Data::Dumper;
    print Dumper "foo\n";
yields
    $VAR1 = 'foo
    ';
while
    use Data::Rlist;
    PrintData "foo\n"
yields
    { "foo\n"; }
Finally, Data::Rlist generates smaller files. With the default $Data::Dumper::Indent of 2 Data::Dumper's output is 4-5 times that of Data::Rlist's. This is because Data::Dumper recklessly uses blanks, instead of horizontal tabulators, which blows up file sizes without measure.
Rlists are not Perl syntax:
    RLIST    PERL
    -----    ----
     5;       { 5 => undef }
     "5";     { "5" => undef }
     5=1;     { 5 => 1 }
     {5=1;}   { 5 => 1 }
     (5)      [ 5 ]
     {}       { }
     ;        { }
     ()       [ ]
To reduce recursive data structures (into true hierachies) set $Data::Rlist::MaxDepth to an integer above 0. It then defines the depth under which compile shall not venture deeper. The compilation of Perl data (into Rlist text) then continues, but on STDERR a message like the following is printed:
    ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)
This message will also be repeated as comment when the compiled Rlist is written to a file. Furthermore $Data::Rlist::Broken is incremented by one. While the compilation continues, effectively any attempt to venture deeper as suggested by $Data::Rlist::MaxDepth will be blocked.
See broken.
Much work has been spent to optimize Data::Rlist for speed. Still it is implemented in pure Perl (no XS). A rough estimation for Perl 5.8 is ``each MB takes one second per GHz''. For example, when the resulting Rlist file has a size of 13 MB, compiling it from a Perl script on a 3-GHz-PC requires about 5-7 seconds. Compiling the same data under Solaris, on a sparcv9 processor operating at 750 MHz, takes about 18-22 seconds.
The process of compiling can be speed up by calling quote7 explicitly on scalars. That is, before calling write or write_string. Big data sets may compile faster when for scalars, that certainly not qualify as symbolic name, quote7 is called in advance:
    use Data::Rlist qw/:strings/;
    $data{quote7($key)} = $value;
        .
        .
    Data::Rlist::write("data.rlist", \%data);
instead of
    $data{$key} = $value;
        .
        .
    Data::Rlist::write("data.rlist", \%data);
It depends on the case whether  the first variant is faster: compile and compile_fast
both have to call  is_random_text on each scalar.  When the scalar  is already quoted, i.e.,
its first character is ", this test ought to run faster.
Internally is_random_text applies the precompiled regex $Data::Rlist::REValue. Note that the expression ($s!~$Data::Rlist::REValue) can be up to 20% faster than the equivalent is_random_text($s).
Normally you don't have to care about strings, since un/quoting happens as required when reading/compiling Rlist or CSV text. A common problem, however, occurs when some string uses the same lexicography than numbers do.
Perl defines the string as the basic building block for all program data, then lets the program decide what strings mean. Analogical, in a printed book the reader has to decipher the glyphs and decide what evidence they hide. Printed text uses well-defined glyphs and typographic conventions, and finally the competence of the reader, to recognize numbers. But computers need to know the exact number type and format. Integer? Float? Hexadecimal? Scientific? Klingon? The Perl Cookbook recommends the use of a regular expression to distinguish number from string scalars (recipe 2.1).
In Rlist,  string scalars  that look  like numbers need  to be  quoted explicitly.   Otherwise, for
example, the  string scalar "-3.14" appears as  -3.14 in the output,  "007324" is compiled
into 7324 etc. Such text is lost and read back  as a number.  Of course, in most cases this is just
what you want. For hash keys, however, it might be a problem.  One solution is to prefix the string
with "_":
    my $s = '-9'; $s = "_$s";
Such strings do not qualify as a number anymore.  In the C++ implementation it will then become
some std::string, not a double.  But the leading "_" has to be removed by the reading
program.  Perhaps a better solution is to explicitly call quote7:
    use Data::Rlist qw/:strings/;
    $k = -9;
    $k = quote7($k);            # returns qq'"-9"'
    $k = 3.14_15_92;
    $k = quote7($k);            # returns qq'"3.141592"'
Again, the need to quote strings that look like numbers is a problem evident only in the Perl implementation of Rlist, since Perl is a language with weak types. With the C++ implementation of Rlist there's no need to quote strings that look like numbers.
See also write, is_number, is_symbol, is_random_text and http://en.wikipedia.org/wiki/American_Standard_Code_for_Information_Interchange.
Installing CPAN packages usually requires administrator privileges. Another way is to copy the Rlist.pm file into a directory of your choice. Instead of use Data::Rlist;, however, you then use the following code. It will find Rlist.pm also in . and ~/bin, and it calls the Exporter explicitly:
    BEGIN {
        $0 =~ /[^\/]+$/;
        push @INC, $`||'.', "$ENV{HOME}/bin";
        require Rlist;
        Data::Rlist->import();
        Data::Rlist->import(qw/:floats :strings/);
    }
    (define-generic-mode 'rlist-generic-mode
       (list "//" ?#)
       nil
       '(;; Punctuators
         ("\\([(){},;?=]\\)" 1 'cperl-array-face)
         ;; Numbers
         ("\\([-+]?[0-9]+\\(\\.[0-9]+\\)?[dDlL]?\\)" 1 'font-lock-constant-face)
         ;; Identifier names
         ("\\([-~A-Za-z_][-~A-Za-z0-9_]+\\)" 1 'font-lock-variable-name-face))
       (list "\\.[rR][lL][iI]?[sS]$")
       ;; Extra functions to setup mode.
       (list 'generic-bracket-support
             '(lambda()
               (require 'cperl-mode)
               ;;(hl-line-mode t)                      ; highlight cursor-line
               (local-set-key [?\t] (lambda()(interactive)(cperl-indent-command)))
               (local-set-key [?\M-q] 'fill-paragraph)
               (set-fill-column 100)))
       "Generic mode for Random Lists (Rlist) files.")
Data::Rlist depends only on few other packages:
    Exporter
    Carp
    strict
    integer
    Sys::Hostname
    Scalar::Util        # deep_compare() only
    Text::Wrap          # unhere() only
    Text::ParseWords    # split_quoted(), parse_quoted() only
Data::Rlist is free of $&, $` or $'. Reason: once Perl sees that you need one of these meta-variables anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program (see also perlre).
This is supplement  information for compile, the function  internally called by write
and   write_string.    We  will   discuss   why   compile,  compile_fast   and
compile_Perl  transliterate  typeglobs  and  typeglob-refs  into "?GLOB?".   This  is  an
attempted explanation.
TYPEGLOBS ARE A PERL IDIOSYNCRACY
Perl uses a symbol table per package to map symbolic names like x to Perl values. Typeglob objects are complete symbol table entries. The symbol table hash (stash) is an ordinary hash, named like the package with two colons appended. The main symbol table's name is %main::, or %::. In the C implementation of the Perl interpreter, the main symbol is simply a global variable, informally called the defstash (default stash). The symbol Data:: in stash %:: addresses the stash of package Data, and the symbol Rlist:: in the stash %Data:: addresses the stash of package Data::Rlist.
Typeglobs are an idiosyncracy of Perl: different types need only one stash entry, so that one symbol can name all types of Perl data (scalars, arrays, hashes) and nondata (functions, formats, I/O handles). The symbol x is mapped to the typeglob *x, and therein coexist $x (the scalar value), @x (the list value), %x (the hash value), &x (the code value) and x (the I/O handle or the format specifier).
Modifying $x in a Perl program won't change %x, because the typeglob *x is interposed between the stash and the program's actual values for $x, @x etc. The sigil * serves as wildcard for the other sigils %, @, $ and &. (Hint: a sigil is a symbol ``created for a specific magical purpose''; the name derives from the latin sigilum = seal.)
Typeglobs cannot be dissolved by compile, because when (e.g.) $x and %x are in use, *x does not return something useful like
    (SCALAR => \$x, HASH => \@x)
Instead it plays the ball back:
    $ perl -e 'print *x'
    *main::x
Typeglobs are also not interpolated in strings:
    $ perl -e 'print "*x is not interpolated"'
    *x is not interpolated
    $ perl -e 'print "although ".*x." could be a string"'
    although *main::x could be a string
Typeglobs (stash entries) are arranged by perl on the fly, even with the use strict pragma in effect:
    $ perl -e 'package nirvana; use strict; print *x;'
    *nirvana::x
Each typeglob is a full path into the perl stashes, down from the defstash:
    $ perl -e 'print "*x is \"*main::x\"" if *x eq "*main::x"'
    *x is "*main::x"
    $ perl -e 'package nirvana; sub f { local *g=shift; print *g."=$g" }; package main; $x=42; nirvana::f(*x)'
    *main::x=42
GLOB-REFS
In the  C implementation of  Perl typeglobs have  the type GV for  ``Glob value''.  Each  GV is
merely a  set of  pointers to sub-objects  for scalars,  arrays, hashes etc.   In Perl  the special
syntax *x{ARRAY}  accesses the  array-sub-object, and is  another way  to say \@x.   But when
applied to  a typeglob as \*foo  it returns a typeglob-ref,  or globref.  So  the Perl backslash
operator \ works like the address-of operator & in C.
    $ perl -e 'print *::'
    *main::main::               # ???
    $ perl -e '$x = 42; print $::{x}'
    *main::x                    # typeglob-value 'x' in the stash
    $ perl -e 'print \*::'
    GLOB(0x10010f08)            # some globref
In Perl4 you had to pass typeglob-refs to pass references to functions (the backslash-operator was not yet ``invented''). Since Perl5 saw the light of day, typeglob-refs can be considered as artefacts. Note, however, that these artefacts are still faster than true references, because true references are themselves stored in a typeglob (as REF type) and so need to be dereferenced. For example, when we assign the scalar reference \$x to *x, the typeglob will digest it as REF attribute:
    $ perl -e '$x = 42; *x = \$x; print $x'
    42
Typeglob-refs in contrast can be used directly by the interpreter (they are raw GV-pointers in the Perl interpreter). For example:
    void f1 { my $bar = shift; ++$$bar }
    void f2 { local *bar = shift; ++$bar }
    f1(\$x);                  # increments $x
    f1(*x);                   # dto., but faster
GLOB-ALIASES
Typeglob-aliases offer another interesting application for typeglobs. For example, *bar=*x aliases the symbol bar in the current stash, so that x and bar point to the same typeglob. This means that when you declare sub x {} after casting the alias, bar is x.
This smells like a free lunch. The penalty, however, is that the bar symbol cannot be easily removed from the stash. One way is to say local *bar, wich temporarily assigns a new typeglob to bar with all pointers zeroized:
    void f { local *bar; ... }
The local-statement will put the bar symbol into the package stash, i.e., the same stash in which f exists.
*foo{THINGS}s
The *x{NAME} expression family is fondly called ``the *foo{THING} syntax'':
    $scalarref = *x{SCALAR};
    $arrayref  = *ARGV{ARRAY};
    $hashref   = *ENV{HASH};
    $coderef   = *handlers{CODE};
    $ioref     = *STDIN{IO};
    $ioref     = *STDIN{FILEHANDLE};    # dto.
    $globref   = *x{GLOB};
    $globref   = \*x;                   # dto.
When Perl THINGs are accessed this way few rules apply. Firstofall, *foo{THING}s are not hashes. The syntax is a stopgap:
    $ perl -e 'print \*x, *x{GLOB}, \*x{GLOB}'
    GLOB(0x100110b8)GLOB(0x100110b8)REF(0x1002e944)
    $ perl -e '$x=1; exists *x{GLOB}'
    exists argument is not a HASH or ARRAY element at -e line 1.
Some *foo{THING} is undef if the requested THING hasn't been used yet. Only *foo{SCALAR} returns an anonymous scalar-reference:
    $ perl -e 'print "nope" unless defined *foo{HASH}'
    nope
    $ perl -e 'print *foo{SCALAR}'
    SCALAR(0x1002e94c)
In Perl5 it is still not possible to get a reference to an I/O-handle (file-, directory- or socket handle) using the backslash operator. When a function requires an I/O-handle you must therefore pass a globref. For new Perl programmers the syntax is obscure, since it is possible to pass an IO::Handle-reference, a typeglob or a typeglob-ref as the filehandle.
    sub logprint($@) {
        my $fh = shift;
        print $fh map { "$_\n" } @_;
    }
    logprint(*STDOUT{IO}, 'fee');   # pass IO-handle
    logprint(*STDOUT    , 'fie');   # dto., pass typeglob
    logprint(\*STDOUT   , 'foo');   # dto., pass typeglob-ref
VIOLATING STASHES
As we saw we can access the Perl guts without using a scalpel. Suprisingly, it is also possible to touch the stashes themselves:
    $ perl -e '$x = 42; *x = $x; print *x'
    *main::42
    $ perl -e '$x = 42; *x = $x; print *42'
    *main::42
By assigning the scalar value $x to *x we have effectively demolished the stash: neither $42 nor $main::42 are accessible. Symbols like 42 are invalid, because 42 is a numeric literal, not a string literal.
    $ perl -e '$x = 42; *x = $x; print $main::42'
Nevertheless it is easy to confuse perl this way:
    $ perl -e 'print *main::42'
    *main::42
    $ perl -e 'print 1*9'
    9
    $ perl -e 'print *9'
    *main::9
    $ perl -e '*x = 42; print $::{42}, *x'
    *main::42*main::42
    $ perl -v
    This is perl, v5.8.8 built for cygwin-thread-multi-64int
    (with 8 registered patches, see perl -V for more detail)
Of course these behaviors are not reliable, and may disappear in future versions of perl. In German you say ``Schmutzeffekt'' (dirt effect) for certain mechanical effects that occur non-intendedly because machines and electrical circuits are not perfect, and so is software. However, ``Schmutzeffekts'' are neither bugs nor features, but phenomenons.
LEXICAL VARIABLES
Lexical variables (my variables) are not stored in stashes, and do not require typeglobs. These variables are stored in a special array, the scratchpad, assigned to each block, subroutine, and thread. These are really private variables, and they cannot be localized. Each lexical variable occupies a slot in the scratchpad; hence is addressed by an integer index, not a symbol. my variables are like auto variables in C. They're also faster than locals, because they can be allocated at compile time, not runtime. Therefore you cannot declare *x lexically:
    $ perl -e 'my(*x);'
    Can't declare ref-to-glob cast in "my" at -e line 1, near ");"
Seel also the Perl man-pages perlguts, perlref, perldsc and perllol.
In C++ we use a flex/bison scanner/parser combination to read Rlist language productions. The C++ parser generates an Abstract Syntax Tree (AST) of double, std::string, std::vector and std::map values. Since each value is put into the AST, as separate object, we use a free store management that allows the allocation of huge amounts of tiny objects.
We also use reference-counted smart-pointers, which allocate themselves on our fast free store. So RAM will not be fragmented, and the allocation of RAM is significantly faster than with the default process heap. Like with Perl, Rlist files can have hundreds of megabytes of data (!), and are processable in constant time, with constant memory requirements. For example, a 300 MB Rlist-file can be read from a C++ process which will not peak over 400-500 MB of process RAM.
There are no known bugs, this package is stable. Deficiencies and TODOs:
The "deparse" functionality for the "code_refs" compile option has not
yet been implemented.
The "threads" compile option has not yet been implemented.
IEEE 754 notations of Infinite and NaN not yet implemented.
compile_Perl is experimental.
Copyright 1998-2008 Andreas Spindler
Maintained at CPAN (http://search.cpan.org/dist/Data-Rlist/) and the author's site (http://www.visualco.de). Please send mail to rlist@visualco.de.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
Contact the author for the C++ library at rlist@visualco.de.
Thank you for your attention.