MongoDB::DataTypes(3) The data types used with MongoDB

VERSION

version v1.2.3

DESCRIPTION

This goes over the types you can save to the database and use for queries in the Perl driver. If you are using another language, please refer to that language's documentation (<http://api.mongodb.org>).

NOTES FOR SQL PROGRAMMERS

You must query for data using the correct type.

For example, it is perfectly valid to have some records where the field ``foo'' is 123 (integer) and other records where ``foo'' is ``123'' (string). Thus, you must query for the correct type. If you save "{"foo" => "123"}", you cannot query for it with "{"foo" => 123}". MongoDB is strict about types.

If the type of a field is ambiguous and important to your application, you should document what you expect the application to send to the database and convert your data to those types before sending. There are some object-document mappers that will enforce certain types for certain fields for you.

You generally shouldn't save numbers as strings, as they will behave like strings (e.g., range queries won't work correctly) and the data will take up more space. If you set ``looks_like_number'' in MongoDB::BSON, the driver will automatically convert everything that looks like a number to a number before sending it to the database.

Numbers are the only exception to the strict typing: all number types stored by MongoDB (32-bit integers, 64-bit integers, 64-bit floating point numbers) will match each other.

TYPES

Numbers

By default, numbers with a decimal point will be saved as doubles (64-bit).

NOTE: On a perl compiled with long-double support, floating point number precision will be lost when sending data to MongoDB.

32-bit Platforms

Numbers without decimal points will be saved as 32-bit integers. To save a number as a 64-bit integer, use bigint (i.e. Math::BigInt):

    use bigint;
    $collection->insert({"user_id" => 28347197234178})

The driver will die if you try to insert a number beyond the signed 64-bit range: -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807.

Numbers that are saved as 64-bit integers will be decoded as Math::BigInt objects.

64-bit Platforms

Numbers without a decimal point will be saved and returned as 32-bit integers if they will fit and 64-bit integers otherwise.

To force 64-bit encoding, use a Math::BigInt object.

64-bit integers in the shell

The Mongo shell has one numeric type: the 8-byte float. This means that it cannot always represent an 8-byte integer exactly. Thus, when you display a 64-bit integer in the shell, it will be wrapped in a subobject that indicates it might be an approximate value. For instance, if we run this Perl on a 64-bit machine:

    $coll->insert({_id => 1});

then look at it in the shell, we see:

    > db.whatever.findOne()
    {
        "_id" :
            {
                "floatApprox" : 1
            }
    }

This doesn't mean that we saved a float, it just means that the float value of a 64-bit integer may not be exact.

Dealing with numbers and strings in Perl

Perl is very flexible about whether something is number or a string; it generally infers the type from context. Unfortunately, the driver doesn't have any context when it has to choose how to serialize a variable. Therefore, the default behavior is to introspect the internal state of the variable. Any variable that has ever been used in a string context (e.g. printed, compared with 'eq', matched with a regular expression, etc.) will be serialized as a string.

    my $var = "4";
    # stored as the string "4"
    $collection->insert({myVar => $var});
    $var = int($var) if (int($var) eq $var);
    # stored as the int 4
    $collection->insert({myVar => $var});

Because of this, users often end up with more strings than they wanted in their databases.

One technique for eliminating the string representation and store a numeric interpretation is to add 0 to the variable:

    $collection->insert({myVar => 0 + $var});

If you would like to have everything that looks like a number saved as a number without the "0+" technique, use a MongoDB::BSON codec that has the prefer_numeric option set.

    $coll2 = $collection->with_codec( prefer_numeric => 1 );
    $coll2->insert( {myVar => "1.23"} ); # stored as double 1.23

On the other hand, some data looks like a number but should be saved as a string. For example, suppose we are storing zip codes. To ensure a zip code is saved as a string, bless the string as a "MongoDB::BSON::String" type:

    my $z = "04101";
    my $zip = bless(\$z, "MongoDB::BSON::String");
    # zip is stored as "04101"
    $collection->insert({city => "Portland",
        zip => bless(\$zip, "MongoDB::BSON::String")});

Additionally, there are two utility functions, "force_int" and "force_double", that explicitly set Perl's internal type flags to Integer ("IV") and Double ("NV") respectively. These flags trigger MongoDB's recognition of the values as Int32/Int64 (depending on the size of the number) or Double:

    my $x = 1.0;
    MongoDB::force_int($x);
    $coll->insert({x => $x}); # Inserts an integer
    MongoDB::force_double($x);
    $coll->insert({x => $x}); # Inserts a double

Strings

All strings must be valid UTF-8 to be sent to the database. If a string is not valid, it will not be saved. If you need to save a non-UTF-8 string, you can save it as a binary blob (see the Binary Data section below).

All strings returned from the database have the UTF-8 flag set.

Unfortunately, due to Perl weirdness, UTF-8 is not very pretty. For example, suppose we have a UTF-8 string:

    my $str = 'Ă…land Islands';

Now, let's print it:

    print "$str\n";

You can see in the output:

    "\x{c5}land Islands"

Lovely, isn't it? This is how Perl prints UTF-8. To make it ``pretty,'' there are a couple options:

    my $pretty_str = utf8::encode($str);

This, unintuitively, clears the UTF-8 flag.

You can also just run

    binmode STDOUT, ':utf8';

and then the string (and all future UTF-8 strings) will print ``correctly.''

Arrays

Arrays must be saved as array references ("\@foo", not @foo).

Embedded Documents

Embedded documents take the same form as top-level documents: either hash references or Tie::IxHashes.

Dates

The DateTime, Time::Moment or DateTime::Tiny package can be used to insert and query for dates. Dates stored in the database will be returned as instances of one of these classes, depending on the "dt_type" setting of the MongoDB::BSON codec object:

    $codec = MongoDB::BSON->new( dt_type => 'Time::Moment' );
    $client = MongoDB::MongoClient->new( bson_codec => $codec );

An example of storing and retrieving a date:

    use DateTime;
    my $now = DateTime->now;
    $collection->insert({'ts' => $now});
    my $obj = $collection->find_one;
    print "Today is ".$obj->{'ts'}->ymd."\n";

An example of querying for a range of dates:

    my $start = DateTime->from_epoch( epoch => 100000 );
    my $end = DateTime->from_epoch( epoch => 500000 );
    my $cursor = $collection->query({event => {'$gt' => $start, '$lt' => $end}});

Warning: creating Perl DateTime objects is extremely slow. Consider saving dates as epoch seconds and converting the numbers to objects only when needed. A single DateTime field can make deserialization up to 10 times slower.

For example, you could use the time function to store seconds since the epoch:

    $collection->update($criteria, {'$set' => {"last modified" => time()}})

This will be MUCH faster to deserialize. Or, for more precision, consider using the ``time'' in Time::HiRes function to get epoch seconds as a floating-point value.

The Time::Moment module is substantially faster than DateTime and might be a convenient alternative to using integers or floating point numbers and manually inflating to an object before use. Consider comparing benchmarks using "dt_type" set to Time::Moment against those using numbers and inflating on demand.

Note that (at least, as of "DateTime::Tiny" version 1.04) there is no time-zone attribute for "DateTime::Tiny" objects. We therefore consider all such times to be in the "UTC" time zone. Likewise, "DateTime::Tiny" has no notion of milliseconds (yet?), so the milliseconds portion of the datetime will be set to zero.

Regular Expressions

Use "qr/.../" to use a regular expression in a query, but be sure to limit your regular expression to syntax and features supported by PCRE, which are not actually fully compatible with Perl <https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions#Differences_from_Perl>.

    my $cursor = $collection->query({"name" => qr/[Jj]oh?n/});

Regular expressions will match strings saved in the database.

NOTE: only the following flags are supported: ``imxs''.

You can also save and retrieve regular expressions themselves, but regular expressions will be retrieved as MongoDB::BSON::Regexp objects for safety (these will round-trip correctly).

From that object, you can attempt to compile a reference to a "qr{}" using the "try_compile" method. However, due to PCRE differences, this could fail to compile or could have different match behavior than intended.

    $collection->insert({"regex" => qr/foo/i});
    $obj = $collection->find_one;
    if ("FOO" =~ $obj->{regex}->try_compile) { # matches
        print "hooray\n";
    }

SECURITY NOTE: A regular expression can evaluate arbitrary code. You are strongly advised never to use untrusted input as a regular expression.

Booleans

Boolean values are emulated using the boolean package via the "boolean::true" and "boolean::false" functions. Using boolean objects in documents will ensure the documents have the BSON boolean type in the database. Likewise, BSON boolean types in the database will be returned as boolean objects.

An example of inserting boolean values:

    use boolean;
    $collection->insert({"okay" => true, "name" => "fred"});

An example of using boolean values for query operators (only returns documents where the name field exists):

    my $cursor = $collection->query({"name" => {'$exists' => boolean::true}});

Most of the time, you can just use 1 or 0 in query operations instead of "true" and "false", such as for specifying fields to return, but some commands require boolean objects and the database will return an error if integers 1 or 0 are used.

Boolean objects from the following JSON libraries will also be encoded correctly in the database:

  • JSON::XS
  • JSON::PP
  • Cpanel::JSON::XS
  • Mojo::JSON
  • JSON::Tiny

MongoDB::OID

``OID'' stands for ``Object ID'', and is a unique id for identifying documents. OIDs are 12 bytes, which are guaranteed to be unique. Their string form is a 24-character string of hexadecimal digits.

To create a unique id:

    my $oid = MongoDB::OID->new;

To create a MongoDB::OID from an existing 24-character hexadecimal string:

    my $oid = MongoDB::OID->new("value" => "123456789012345678901234");

Binary Data

By default, all database strings are UTF8. You need to store images, binaries, and other non-UTF8 data as binary data. There are two ways to do this.

String Refs

In general, you can pass the string as a reference. For example:

    # non-utf8 string
    my $string = "\xFF\xFE\xFF";
    $collection->insert({"photo" => \$string});

This will save the variable as binary data, bypassing the UTF8 check.

Binary data can be matched exactly by the database, so this query will match the object we inserted above:

    $collection->find({"photo" => \$string});

MongoDB::BSON::Binary type

You can also use the MongoDB::BSON::Binary class. This allows you to preserve the subtype of your data. Binary data in MongoDB stores a ``type'' field, which can be any integer between 0 and 255. Identical data will only match if the subtype is the same.

Perl uses the default subtype "SUBTYPE_GENERIC".

The driver returns binary fields as instances of MongoDB::BSON::Binary to ensure that binary data can successfullly roundtrip. MongoDB::BSON::Binary objects stringify to the underlying data to make it easier to work with.

MongoDB::Code

MongoDB::Code is used to represent JavaScript code and, optionally, scope. To create one:

    use MongoDB::Code;
    my $code = MongoDB::Code->new("code" => "function() { return 'hello, world'; }");

Or, with a scope:

    my $code = MongoDB::Code->new("code" => "function() { return 'hello, '+name; }",
        "scope" => {"name" => "Fred"});

Which would then return ``hello, Fred'' when run.

MongoDB::MinKey

"MongoDB::MinKey" is ``less than'' any other value of any type. This can be useful for always returning certain documents first (or last).

"MongoDB::MinKey" has no methods, fields, or string form. To create one, it is sufficient to say:

    bless $minKey, "MongoDB::MinKey";

MongoDB::MaxKey

"MongoDB::MaxKey" is ``greater than'' any other value of any type. This can be useful for always returning certain documents last (or first).

"MongoDB::MaxKey" has no methods, fields, or string form. To create one, it is sufficient to say:

    bless $minKey, "MongoDB::MaxKey";

MongoDB::Timestamp

    my $ts = MongoDB::Timestamp->new({sec => $seconds, inc => $increment});

Timestamps are used internally by MongoDB's replication. You can see them in their natural habitat by querying "local.main.$oplog". Each entry looks something like:

    { "ts" : { "t" : 1278872990000, "i" : 1 }, "op" : "n", "ns" : "", "o" : { } }

In the shell, timestamps are shown in milliseconds, although they are stored as seconds. So, to represent this document in Perl, we would do:

    my $oplog = {
        "ts" => MongoDB::Timestamp->new("sec" => 1278872990, "inc" => 1),
        "op" => "n",
        "ns" => "",
        "o" => {}
    }

Timestamps are not dates. You should not use them unless you are doing something low-level with replication. To save dates or times, use a number, DateTime object, or DateTime::Tiny object.

# vim: set ts=4 sts=4 sw=4 et tw=75:

AUTHORS

COPYRIGHT AND LICENSE

This software is Copyright (c) 2016 by MongoDB, Inc..

This is free software, licensed under:

  The Apache License, Version 2.0, January 2004