GUIDs/UUIDs (Globally/Universal Unique Identifiers) are very random. Therefore, INSERTing into an index means jumping around a lot. Once the index is too big to be cached, most INSERTs involve a disk hit. Even on a beefy system, this limits you to a few hundred INSERTs per second.
This blog is mostly eliminated in MySQL 8.0 with the advent of the following function: UUID_TO_BIN(str, swap_flag).
A 'standard' GUID/UUID is composed of the time, machine identification and some other stuff. The combination should be unique, even without coordination between different computers that could be generating UUIDs simultaneously.
The top part of the GUID/UUID is the bottom part of the current time. The top part is the primary part of what would be used for placing the value in an ordered list (INDEX). This cycles in about 7.16 minutes.
Some math... If the index is small enough to be cached in RAM, each insert into the index is CPU only, with the writes being delayed and batched. If the index is 20 times as big as can be cached, then 19 out of 20 inserts will be a cache miss. (This math applies to any "random" index.)
36 characters is bulky. If you are using that as a PRIMARY KEY in InnoDB and you have secondary keys, remember that each secondary key has an implicit copy of the PK, thereby making it bulky.
It is tempting to declare the UUID VARCHAR(36). And, since you probably are thinking globally, so you have CHARACTER SET utf8 (or utf8mb4). For utf8:
To compress
But first, a caveat. This solution only works for "Time based" / "Version 1" UUIDs They are recognizable by the "1" at the beginning of the third clump.
The manual's sample: 6ccd780c-baba-1026-9564-0040f4311e29 . A more current value (after a few years): 49ea2de3-17a2-11e2-8346-001eecac3efa . Notice how the 3rd part has slowly changed over time? Let's data is rearranged, thus:
1026-baba-6ccd780c-9564-0040f4311e29 11e2-17a2-49ea2de3-8346-001eecac3efa 11e2-17ac-106762a5-8346-001eecac3efa -- after a few more minutes
Now we have a number that increases nicely over time. Multiple sources won't be quite in time order, but they will be close. The "hot" spot for inserting into an INDEX(uuid) will be rather narrow, thereby making it quite cacheable and efficient.
If your SELECTs tend to be for "recent" uuids, then they, too, will be easily cached. If, on the other hand, your SELECTs often reach for old uuids, they will be random and not well cached. Still, improving the INSERTs will help the system overall.
Let's make Stored Functions to do the messy work of the two actions:
DELIMITER // CREATE FUNCTION UuidToBin(_uuid BINARY(36)) RETURNS BINARY(16) LANGUAGE SQL DETERMINISTIC CONTAINS SQL SQL SECURITY INVOKER RETURN UNHEX(CONCAT( SUBSTR(_uuid, 15, 4), SUBSTR(_uuid, 10, 4), SUBSTR(_uuid, 1, 8), SUBSTR(_uuid, 20, 4), SUBSTR(_uuid, 25) )); // CREATE FUNCTION UuidFromBin(_bin BINARY(16)) RETURNS BINARY(36) LANGUAGE SQL DETERMINISTIC CONTAINS SQL SQL SECURITY INVOKER RETURN LCASE(CONCAT_WS('-', HEX(SUBSTR(_bin, 5, 4)), HEX(SUBSTR(_bin, 3, 2)), HEX(SUBSTR(_bin, 1, 2)), HEX(SUBSTR(_bin, 9, 2)), HEX(SUBSTR(_bin, 11)) )); // DELIMITER ;
Then you would do things like
-- Letting MySQL create the UUID: INSERT INTO t (uuid, ...) VALUES (UuidToBin(UUID()), ...); -- Creating the UUID elsewhere: INSERT INTO t (uuid, ...) VALUES (UuidToBin(?), ...); -- Retrieving (point query using uuid): SELECT ... FROM t WHERE uuid = UuidToBin(?); -- Retrieving (other): SELECT UuidFromBin(uuid), ... FROM t ...;
Do not flip the WHERE; this will be inefficent because it won't use INDEX(uuid):
WHERE UuidFromBin(uuid) = '1026-baba-6ccd780c-9564-0040f4311e29' -- NO
TokuDB is a viable engine if you must have UUIDs (even non-type-1) in a huge table. TokuDB is available in MariaDB as a 'standard' engine, making the barrier to entry very low. There are a small number of differences between InnoDB and TokuDB; I will not go into them here.
Tokudb, with its “fractal” indexing strategy builds the indexes in stages. In contrast, InnoDB inserts index entries “immediately” — actually that indexing is buffered by most of the size of the buffer_pool. To elaborate…
When adding a record to an InnoDB table, here are (roughly) the steps performed to write the data (and PK) and secondary indexes to disk. (I leave out logging, provision for rollback, etc.) First the PRIMARY KEY and data:
Tokudb, on the other hand, does something like
If you are familiar with how sort-merge works, consider the parallels to Tokudb. Each "sort" does some work of ordering things; each "merge" is quite efficient.
To summarize:
This shows three thing for speeding up usage of GUIDs/UUIDs:
Note that the benefit of the "hot spot" is only partial:
Thanks to Trey for some of the ideas here.
The tips in this document apply to MySQL, MariaDB, and Percona.
Written Oct, 2012. Added TokuDB, Jan, 2015.
Rick James graciously allowed us to use this article in the Knowledge Base.
Rick James' site has other useful tips, how-tos, optimizations, and debugging tips.
Original source: http://mysql.rjweb.org/doc.php/uuid
© 2019 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/guiduuid-performance/