Exploring PostgreSQL 18's new UUIDv7 support

158 points by s4i 2 days ago

> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time... Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.

So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.

In other words, you can only use UUIDv7 for rows that never need to be looked up by any data coming from the user. And maybe that exists sometimes for certain data in JOINs... but it seems like it might be more the exception than the rule, and you never know when an internal ID might need to become an external one in the future.

tracker1 3 hours ago

This is only really true if leaking the creation time of the record is itself a security concern.
- nitwit005 22 minutes ago
  
  It was a concern in the past, as people used password creation tools that were deterministic based on the current time.
  There was previously an article linked here about recovering access to some bitcoin by feeding all possible timestamps in a date range to the password creation tool they used, and trying all of those passwords.
- kvirani 3 hours ago
  
  Which I have to assume is rare, right?
  - wongarsu 3 hours ago
    
    We used to leak approximate creation time all the time back when everyone used sequential keys. If anything sequential keys are far worse: they leak the approximate number of records, make it easy to observe the rate at which new keys are created, and once you know that you can deduce the approximate creation date of any key.
    UUIDv4 removes all three of those vectors. UUIDv7 still removes two of three. It doesn't leak record count or the rate at which you create them, only creation time. And you still can't guess adjacent keys. It's a pretty narrow information leakage for something you routinely reveal on purpose.
    
    blackenedgem 3 hours ago
    
    UUIDv7s are much worse for creation time though imo. For sequential IDs an attacker needs to be have a lot of data to narrow the creation time. That raises the barrier of entry considerably to the point that only a committed attacker could infer the time.
    With UUIDv7 the creation time is always leaked without any sampling. A casual attacker could quite easily lookup the time and become motivated in probing and linking the account further
  - wredcoll 3 hours ago
    
    It seems wildly paranoid, even for securitt researchers.
    
    ibejoeb 3 hours ago
    
    There are some practical applications that are not necessarily related to security. If you are storing something like a medical record, you don't want use it as a public ID for a patient visit, because the date is subject to HIPAA.
    
    mulmen 26 minutes ago
    
    But they would have to relate that ID to patient data like their identity right? The date alone cannot be a HIPAA issue. That means every date is a HIPAA violation because people go to the doctor every day.
    
    replygirl 3 hours ago
    
    it's not about the individual record, it's about correlating records. if you can sequence everything in time it gets a lot easier to deanonymize data
    
    Macha 3 hours ago
    
    However, if your API has a (very common) createdAt field on these objects, the ability to get the creation time from the identifier is rather academic.
    
    tracker1 3 hours ago
    
    Can you provide an example of where you would legitimately have the ID for a medical record interaction, but not a date/time associated?
    
    tyre 2 hours ago
    
    Email is not secure but sending an email with a link to "Information about your appointment" is fine. If that link goes to `/appointments/sjdhfaskfhjaksdjf`, there is no leaked data. If it goes to `/appointments/20251017lkafjdslfjalsdkjfa`, then the link itself contains PHI.
    Whether creation date is PHI…I could see the argument being yes, since it correlates to medical information (when someone sought treatment, which could be when symptoms present.)
    
    oulipo2 2 hours ago
    
    I remember in the cracking days, where we were trying to crack ElGamal encryption or other, we noticed when some code had been written in eg Delphi (which used a weak RNG based on datetime), then when you tried to guess when the code was compiled and the key were generated, you could get a rough timerange, and if you bruteforced through that timerange as a seed to the RNG, and tried to generate the random ElGamal key from that, you would widely reduce the range of possibilities (eg bruteforce 10M ints, instead of billions or more)
    
    noir_lord an hour ago
    
    An online casino got hit a similar way a long time ago, iirc someone realised the seed for a known prng was the system clock, so you could brute force every shuffle either side of the approx time stamp and compare the results to some known cards (I.e. the ones you’d been dealt) once you had a match you knew what everyone else had.
    Always thought that was elegant (the attach not using the time as the seed).
- dethos 3 hours ago
  
  Exactly
oconnore 3 hours ago

If this is a concern, pass your UUIDv7 ID through an ECB block cipher with a 0 IV. 128 bit UUID, 128 bit AES block. Easy, near zero overhead way to scramble and unscramble IDs as they go in/out of your application.
There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly
- 10000truths an hour ago
  
  This is, strictly speaking, an improvement, but not by much. You can't change the cipher key because your downstream users are already relying on the old-key-scrambled IDs, and you lose all the benefits of scrambling as soon as the key is leaked. You could tag your IDs with a "key version" to change the key for newly generated IDs, but then that "key version" itself constitutes an information leak of sorts.
  - DSingularity an hour ago
    
    Why do you need forward secrecy?
    
    10000truths an hour ago
    
    I edited that out of my post, as I'm not sure it's the correct term to use, but the problem remains. If the key leaks, then all IDs scrambled with that key can be de-scrambled, and you're back to square one.
- blackenedgem 3 hours ago
  
  Then that's just worse and more complicated than storing a 64 bit bigint + 128 UUIDv4. Your salt (AES block) is larger than a bigint. Unless you're talking about a fixed value for the AES (is that a thing) but then that's peppering which is security through obfuscation.
  - cyberax 3 hours ago
    
    Uhh... What? You just use AES with a fixed key and IV in block mode.
    You put in 128 bits, you get out 128 bits. The encryption is strong, so the clients won't be able to infer anything from it, and your backend can still get all the advantages of sequential IDs.
    You also can future-proof yourself by reserving a few bits from the UUID for the version number (using cycle-walking).
matthew16550 3 hours ago

Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places [1].
A UUIDv7 primary key seems to reduce / eliminate those problems.
If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.
[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...
- crazygringo 2 hours ago
  
  > I suspect it would not be used as often as the primary key index
  That doesn't matter because it's the creation of the index entry that matters, not how often it's used for lookup. The lookup cost is the same anyways.
  - matthew16550 an hour ago
    
    The page I linked shows uses after creation where the cost can be different.
macote 3 hours ago

You don't need to add a UUIDv4 column, you could just encrypt your UUIDv7 with format-preserving encryption (FPE).
- whattheheckheck 3 hours ago
  
  What's the computational complexity of doing that conversion vs the lookup table of uuidv4 for each uuidv7?
  - benjiro an hour ago
    
    DB lookups + extra index are way more expensive then hardware assisted decoding.
    If your UUIDv4 is cached, your still suffering from extra storage and index. Not a issue on a million row system but imagine a billion, 10 billion.
    And what if its not cached. Great, now your hitting the disk.
    Computers do not suffering from lacking CPU performance, especially when you can deploy CPU instruction sets. Hell, you do not even need encryption. How about making a simple bit shift where you include a simple lookup identifier. Black box sure, and not great if leaked but you have other things to worry about if your actual shift pattern is leaked. Use extra byte or two for iding the pattern.
    Obfuscating your IDs is easy. No need for full encryption.
gigatexal 3 hours ago

In a well normalized setup idk maybe not. Uuidv4 for your external ids and then have a mapping table to correspond that to something you’d use internally. Then you can torch an exposed uuid update the mapping table and generate a new one and none of your pointers and foreign keys need to change internally.
- crazygringo 2 hours ago
  
  The point is, that mapping table incurs the same indexing cost that was trying to be eliminated in the first place. Normalization is irrelevant.
Illniyar 2 hours ago

If leaking creation time is a concern, can we not just fake the timestamp? We can do so in a way that most performance benefits remain - so like starting with a base time of 1970 and then adding base time to it intermittently, having random months and days to new records (or maybe based on the user's id - so the user's record are temporally consistent but they aren't with other user records).
I'm sure there might be a middle ground where most of the performance gains remain but the deanonymizing risk is greatly reduced.
Edit: encrypting the value in transit seems a simpler solution really
- hu3 2 hours ago
  
  In that case, auto increments can also be bumped from time to time. And start from a billion.
  They're more performant than uuidv7. Why would I still use UIID? Perhaps I would still want uuids because they can be generated in client and because they make incorrect JOINs return no rows.
lukebechtel 3 hours ago

how risky is exposing creation time really though? I feel like for most applications this is uncritical
- Biganon 2 hours ago
  
  I wouldn't say necessarily "risky", it's more that it forces your hand when you wouldn't want to reveal an entity's creation time. Say you use these IDs for users of your site, and they're used in API queries / URLs etc., then it's trivial to know when a user created their account. Sure, many sites already expose this information, but not all of them do; what if you don't want it exposed? What if you consider that a user's seniority is nobody's business, that it could bias the behavior of other users towards them, etc.?
- morshu9001 2 hours ago
  
  It takes consideration. There are plenty of systems like Facebook and Twitter that use IDs somewhat exposing time, but the things they're IDing already have public creation timestamps.
jongjong 25 minutes ago

Great point. Also, having to support multiple IDs is a maintenance headache.
IMO, a major problem solved by UUIDs is the ability to create IDs on the client-side, hence, they are inherently user-facing. A major reason why this is an important use case for UUIDs is because it allows clients to avoid accidental duplication of records when an insertion fails due to network issues. It provides insertion idempotence.
For example, when the user clicks on a button on a form to insert a record into a database, the client can generate the UUID on the client-side, then attach it to a JSON object, then send the object to the server for insertion; in the meantime, if there is a network issue and it's unclear whether or not the record was inserted, the code can automatically retry (or user can manually retry) and there is no risk of duplication of data if you use the same UUID.
This is impossible to do with auto-incrementing IDs because those are generated by the database in a centralized way so the user cannot know the ID head of time and thus, if there is a network failure while submitting a form, the client cannot automatically know whether or not the record was successfully inserted; if they retry, they may create a duplicate record in the database. There is no way to make the operation idempotent without relying on some kind of fixed ID which has a uniqueness constraint on the database side.

pqdbr 4 hours ago

Great article, specially for this part:

> What can go wrong with using UUIDv7 Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.

> This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization. While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach. Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.

SahAssar an hour ago

> Experts recommend
What experts? For what scenarios specifically? When do they consider time-of-creation to be sensitive?
hn_throwaway_99 4 hours ago

> Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
So then what's the point? How I always did things in the past was use an auto increment big int as the internal primary key, and then use a separate random UUID for the external facing key. I think this recommendation from "experts" is pretty dumb because you get very little benefit using UUIDV7 (beyond some portability improvements) if you're still using a separate internal key.
While I wouldn't use UUIDV7 as a secure token like I would UUIDV4, I don't see anything wrong with using UUIDV7 as externally exposed object keys - you're still going to need permissions checks anyway.
- morshu9001 2 hours ago
  
  I asked a similar question, and yeah it seems like this is entirely for distributed systems, even then only some of them. Your basic single DB Postgres should just have a serial PK.
- crazygringo 3 hours ago
  
  For distributed databases where you can't use autoincrement.
  Or where, for some reason, the ID needs to be created before being inserted into the database. Like you're inserting into multiple services at once.
  - sgarland 2 hours ago
    
    Many distributed databases have mechanisms to use an auto-increment, actually - often, generating large chunks at a time to hand out.
andy_ppp 4 hours ago

I wish Postgres would just allow you look up records by the random component of the field, what are the chances of collisions with 80 bits of randomness? My guess is it’s still enough.
- jagged-chisel 4 hours ago
  
  You can certainly create that index.
  - andy_ppp 4 hours ago
    
    Yes, just obviously if it’s automated and part of Postgres people will use it without having to think too much and it removes one of the objections to what I think for most large systems is a sensible way to go rather than controversial because security.
- mamcx 3 hours ago
  
  What could be better is to allow to create a type with custom display, in/out and internally set the native type IN SQL (this require to do it in c)
dgb23 2 hours ago

Or just generate them in bulk and take them from a list?

gopalv 4 hours ago

UUIDv7 is only bad for range partitioning and privacy concerns.

The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.

I want to see something like HBase or S3 paths when UUIDv7 gets used.

vlovich123 4 hours ago

> UUIDv7 is only bad for range partitioning and privacy concerns.
It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.
As for range partitioning, you can of course choose to partition on the hash of the UUIDv7 at the cost of giving up cheaper rights / faster indices. On the other hand, that of course gives up locality which is a common challenge of partitioning schemes. It depends on the end-to-end design of the system but I wouldn't say that UUIDv7 is inherently good or bad or better/worse than other UUID schemes.
- saghm 4 hours ago
  
  Isn't it at least a bit worse than v4, which has no timestamp at all? There might be concerns around non-secure randomness being used to generate the bits, but I don't feel like it's accurate to claim that's indistinguishable from a literal timestamp.
- ibejoeb 4 hours ago
  
  UUIDv4 doesn't leak creation time.

stickfigure 3 hours ago

It never occurred to me that Postgres is more efficient when inserting monotonic values. It's the nature of B+ trees so it makes sense. But in the world of distributed databases, monotonic inserts create hot partitions and scalability problems, so evenly-distributed ids are preferred.

In other words, "don't try this with CRDB".

chuckadams 25 minutes ago

It's the nature of B+ trees, multiplied by the nature of clustered indexes: if you use a UUIDv4 as a primary key, your entire row gets moved to random locations, which really sucks when you normally retrieve them sequentially. With a non-clustered index (say, your UUIDv4 id you use for public APIs when you don't want to leak the v7 info) then you'll still get more fragmentation with the random data, but it's something autovacuum can usually keep up with. But it's more work it has to do on top of everything else it does.
baq an hour ago

Leaky abstractions in databases are one of the reasons every developer should read the table of contents of the hot databases used by the things he’s working on. IME almost no one does that.

mfrye0 an hour ago

I can confirm on the performance benefits. I wanted to start with uuidv7 for a new DB earlier this year, so I put together a function to use in the meantime. Once the function is available natively, we'll just migrate to use it instead.

For anyone interested:

CREATE FUNCTION uuidv7() RETURNS uuid AS $$ -- Get base random UUID and overlay timestamp select encode( set_bit( set_bit( overlay(uuid_send(gen_random_uuid()) placing substring(int8send((extract(epoch from clock_timestamp())*1000)::bigint) from 3) from 1 for 6), 52, 1), -- Set version bits to 0111 53, 1), 'hex')::uuid; $$ LANGUAGE sql volatile;

pmontra 3 hours ago

My customers return created_at attributes in all their API calls so UUIDv7 won't harm them at all. They also use sequential ids. Only one of them ever used UUIDv4 as primary key. We didn't have any performance problem but the whole production system was run by one psql insurance and one Elixir application server. Probably almost any architectural choice is good at that scale.

caymanjim 2 hours ago

Tangential, but I'm grateful to this article for teaching me that Postgres has "table foo" as shorthand for "select * from foo". I won't use that in code, but I'll happily use it for interactive queries.

morshu9001 4 hours ago

The article compares UUIDv7 vs v4, but doesn't say why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?

molf 3 hours ago

Good question. There's a few reasons to pick UUID over serial keys:
- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.
- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.
- Because UUIDs can be generated anywhere, sharding is easier.
The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.
I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.
edoceo 4 hours ago

So the client side can create the ID before insert - that's the case that (mostly) drives it for me. The other is where you have distributed systems and then later want to merge the data and not have any ID conflicts.
- saagarjha 4 hours ago
  
  Allowing the client to generate IDs for you seems like a bad idea?
  - morshu9001 4 hours ago
    
    Client = backend here, right? So you could make a bunch of rows that relate to each other then insert, without having to ping the DB each time to assign a serial ID. Normally the latter is what I do, but I can imagine a scenario where it'd be slow.
    
    wongarsu 3 hours ago
    
    The usual flow would be INSERT ... RETURNING id, which gives you the db-generated id for the record you just inserted with no performance penalty. That doesn't work for circular dependencies and it limits the amount of batching you can do. But typically those are smaller penalties than the penalty from having a 128 bit primary key vs a 64 bit key
    
    morshu9001 3 hours ago
    
    Yeah, that's what I do
  - coolspot 4 hours ago
    
    “client” here may refer to a backend app server. So you can have 10-100s of backend servers inserting into a same table without having a single authority coordinating IDs.
    
    morshu9001 4 hours ago
    
    That table is still a single authority, isn't it? But I guess fewer steps is still faster.
    
    tracker1 3 hours ago
    
    Except if you're using a sharding or clustering database system, where the record itself may be stored to separate servers as well as the key generation itself.
    
    morshu9001 2 hours ago
    
    In those cases yes. There's still a case for sequential there depending on the use pattern, but write-heavy benefits from not waiting on one server for IDs.
  - bramhaag 3 hours ago
    
    It can be quite elegant. You can avoid the whole temporary or external ID mess when the client generates the ID, this is particularly useful for offline-first clients.
    Of course you need to be sure the server will accept the ID, but that is practically guaranteed by the uniqueness property of UUIDs.
  - markstos 4 hours ago
    
    Why?
- jrochkind1 4 hours ago
  
  yup, I'd say those are the two biggies.
Deadron 4 hours ago

For when you inevitably need to expose the ids to the public the uuids prevent a number of attacks that sequential numbers are vulnerable to. In theory they can also be faster/convenient in a certain view as you can generate a UUID without needing something like a central index to coordinate how they are created. They can also be treated as globally unique which can be useful in certain contexts. I don't think anyone would argue that their performance overall is better than serial/bigserial though as they take up more space in indexes.
- morshu9001 4 hours ago
  
  But these are internal IDs only, and public ones should be a separate col. Being able to generate uuid7 without a central index is useful in distributed systems, but this is a Postgres DB already.
  Now, the index on the public IDs would be faster with a uuid7 than a uuid4, but you have a similar info leak risk that the article mentions.
  - rcfox 4 hours ago
    
    "Distributed systems" doesn't have to mean some fancy, purpose-built thing. Just correlating between two Postgres databases might be a thing you need to do. Or a database and a flat text file.
    
    morshu9001 3 hours ago
    
    I usually just have a uuid4 secondary for those correlations, with a serial primary. I've done straight uuid4 PK before, things got slow on not very large data because it affected every single join.
- xienze 3 hours ago
  
  People really overthink this. You can safely expose internal IDs by doing a symmetric cipher, like a Feistel cipher. Even sequential IDs will appear random.
ibejoeb 4 hours ago

If you need an opaque ID like a uuid because, for example, you need the capability to generate non-colliding IDs generated by disparate systems, the best way I've found is to separate these two concerns. Use a UUIDv4 for public purposes and a bigint internally. You don't need to worry about exposing creation time, and you can still manage your data in the home system with all the properties that a total ordering affords.
- tracker1 3 hours ago
  
  Now coordinate those sequential ids on a sharded or otherwise clustered database system.
  - ibejoeb 3 hours ago
    
    That's the point. Those are only system-unique, not universally. It's a lower-level attribute that is an implementation detail, like for referential integrity in an rdbms. At that point, if you need it, you have atomic increment.
simongr3dal 4 hours ago

I believe the concern is if your primary key in the database is a serial number it might be exposed to users unless you do extra work to hide that ID from any external APIs and if there are any flaws in your authorization checks it can allow enumeration attacks exposing private or semi-private info. With UUIDs being virtually unguessable that makes it less of a concern.
- morshu9001 4 hours ago
  
  uuid7 is still guessable though, as the article says. The assumption is that these are internal only PKs.
  - molf 3 hours ago
    
    There is a big difference though. Serial keys allow attackers to guess the rate at which data is being added.
    UUID7 allows anyone to know the time of creation, but not how many records have been created (approximately) in a particular time frame. It leaks data about the record itself, but not about other records.
  - tracker1 3 hours ago
    
    Far, far less than sequential Ids, and the random part is some pretty big values numerically... I mean there's billions of possible values for every MS on the generating server... you aren't going to practically "guess" at them.
  - e12e 3 hours ago
    
    Guessable with 80 bits of entropy?
nextaccountic 4 hours ago

uuids can be generated by multiple services across your stack
bigserial must by generated by the db
- coolspot 4 hours ago
  
  But what if we just use milliseconds as our bigserial? And maybe add some hw-random number at the end to avoid conflicts? Wait
  - crazygringo an hour ago
    
    Oh yeah, it would be an identifier but it would be unique. Across the universe of all devices, effectively. Should come up with a name for that
  - tracker1 3 hours ago
    
    Somehow +1 on this comment just doesn't feel like enough.
mhuffman 4 hours ago

>why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?
So the common response is sequential ID crawling by bad actors. UUIDs are generally un-guessable and you can throw them into slop DBs like Mongo or storage like S3 as primary identifiers without worrying about permissions or having a clever interested party pwn your whole database. A common case of security through obscurity.
martinky24 4 hours ago

You don’t scale horizontally, do you?
- rcfox 4 hours ago
  
  Do most people? Not everyone is Google.
  - martinky24 4 hours ago
    
    Many people have more than 1 server that need to generate coherent identifiers amongst one another. That's not a "Google scale" thing.
    
    rcfox 3 hours ago
    
    Your comment heavily implied (to me) scaling databases horizontally. Yes, it's not necessarily "Google scale" either, but it's a ton of extra complexity that I'm happy to avoid. But a Google employee is probably going to approach every public-facing project with the assumption of scaling everything horizontally.
    With multiple servers talking to a single database, I'd still prefer to let the database handle generating IDs.
    
    morshu9001 3 hours ago
    
    Yeah, there's too much advice jumping straight to uuid4 or 7 PKs for no particular reason. If you're doing a sharded DB, maybe, and even then it depends.
    Speaking of Google, Spanner recommends uuid4, and specifically not any uuid that includes a timestamp at the start like uuid7.
- morshu9001 4 hours ago
  
  This is Postgres. There is Citus, but that still supports (maybe recommends?) serial PKs.

lucasyvas 2 hours ago

These are all non-issues - don’t allow an end user to determine a serial primary key as always.

And the amount of information it leaks is negligible - they might know the oldest and the newest and there’s an infinite gulf in between.

It’s better and more practical than SERIAL or BIGSERIAL in every way - if you need a random/external ID, add a second column. Done.

morshu9001 2 hours ago

Why not serial PK with uuid4 secondary? Every join uses your PK and will be faster.
Biganon 2 hours ago

> if you need a random/external ID, add a second column. Done.
As others have stated, it completely defeats the performance purpose, if you need to lookup using another ID.

rvitorper 2 hours ago

Does anyone have performance issues with uuidv4? I worked with a db with 10s of billions of rows, no issues whatsoever. Would love to hear the mileage of fellow engineers

cipehr an hour ago

What database were you using? For example with SQL server, by default it clusters data on disk by primary key. Random (non-sequential) PKs like uuidv4 require random cluster shuffling to insert a row “in the middle” of a cluster, increasing io load and causing performance issues.
Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.
- rvitorper an hour ago
  
  Postgres. It was also a single instance, which made it significantly easier. But nice to know that this is an issue on SQL Server
crazygringo an hour ago

Honestly not really. Yes random keys make inserts slower. But if inserts are only 1% of your database load, then yeah it's basically no issues whatsoever.
On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.
- rvitorper an hour ago
  
  Makes sense. Thanks for the comment

gnatolf 3 hours ago

For me, the shear length of uuids is annoying in payloads of tokens etc. I wish there was a common way to abbreviate those, similar to the git way.

pmontra 2 hours ago

It's a 128 bit number. If you express that number in base 62 (26 upper case letters + 26 downcase letters + 10 digits) you need only a bit more than 20 characters. You can compress it further by increasing the base and using other 8 bit ASCII characters.

qntmfred 3 hours ago

any thoughts on uuidv7 vs ulid, nanoid, etc for url-safe encodings?

nikisweeting an hour ago

ULID is the best balance imo, it's more compact, can be double clicked to select, and case-insensitive so it can be saved on macOS filesystems without conflicts.
Now someone should make a UUIDv7 -> ULID adapter lib that 1:1 translates UUIDv7 <-> ULID preserving all the timestamp resolution and randomness bits so we can use the db-level UUIDv7 support to store ULIDs.
thewisenerd 2 hours ago

i guess that depends on what you mean by url-safe
uuidv7 (-) and nanoid (_-) have special characters which urlencode to themselves.
none are small enough that you want someone reading them over the phone; but from a character legibility, ulid makes more sense.

burnt-resistor an hour ago

Sequential primary keys are pretty important for scalable, stable sorting by record creation time using the primary keys' index similar to serial (int) but avoids the guessing vulnerability. For this use-case, an UUID "v9"-like approach can be a better option: https://uuidv9.jhunt.dev

6r17 4 hours ago

Great read - short, effective ; I know what I learned. Very good job