Cosmologically Unique IDs

251 points - today at 6:37 PM

Comments

lisper today at 7:33 PM

This analysis is not quite fair. It takes into account locality (i.e. the speed of light) when designing UUID schemes but not when computing the odds of a collision. Collisions only matter if the colliding UUIDs actually come into causal contact with each other after being generated. So just as you have to take locality into account when designing UUID trees, you also have to take it into account when computing the odds of an actual local collision. So a naive application of the birthday paradox is not applicable because that ignores locality. So an actual fair calculation of the required size of a random UUID is going to be a lot smaller than the ~800 bits the article comes up with. I haven't done the math, but I'd be surprised if the actual answer is more than 256 bits.

(Gotta say here that I love HN. It's one of the very few places where a comment that geeky and pedantic can nonetheless be on point. :-)

stonegray today at 11:26 PM

Specifying a CSPRNG as an entropy source to avoid collision is incorrect.

CSPRNGs make prediction of the next number difficult (cracking-AES difficulty) but do not add entropy and must be seeded uniquely otherwise they will output the same numbers. Unless the author is proposing having the same machine generate a single universe-scale list in one run.

Also “banning” ids that are all 1s or 0s is silly; they are just as valid and unique as any other number if you’re generating them properly. Although I might suggest purchasing a lottery ticket if you get an UUID with all settable bits as 1.

m4nu3l today at 9:03 PM

A more realistic estimate of the total number of addressable things should take into account that for anything to be addressable, its address should be stored somewhere at least once.

If it takes at least Npb particles to store one bit of information, then the number of addressable things would decrease with the number of bits of the address.

So let's call Nthg the number of addressable things, and assume the average number of bits per address grows with Nb = f(Ntng).

Then the maximum number of addressable things is the number that satisfies Nthg = Np/(Npb*f(Ntng)), where Np is the total number of particles.

vessenes today at 11:16 PM

Chiming in from the decentralized world - there’s an adversarial / cooperative dynamic in the assignment of these IDs - and the selection of parents, not discussed in the original. I think you could possibly get to sub linear by allowing a small number of cooperative nodes to assign new IDs.

On the contrary, having the right to assign IDs is powerful; on balance, to my mind the right thing to do is some sort of a ZK verifiable random function, e.g. sunspot-based transformations combined with some proof of ‘fair’ random choice. In that case, I think the 800 bit number seems like plenty. You could also do some sort of epoch-based variable length, where for the next billion years or so, we use 1/256 of the ID space, (forced first bit to 0), and so on.

j-pb today at 7:16 PM

Great insights and visualisations!

I build a whole database around the idea of using the smallest plausible random identifiers, because that seems to be the only "golden disk" we have for universal communication, except for maybe some convergence property of latent spaces with large enough embodied foundation models.

It's weird that they are really under appreciated in the scientific data management and library science community, and many issues that require large organisations at the moment could just have been better identifiers.

To me the ship of Theseus question is about extrinsic (random / named) identifiers vs. intrinsic (hash / embedding) identifiers.

https://triblespace.github.io/triblespace-rs/deep-dive/ident...

https://triblespace.github.io/triblespace-rs/deep-dive/tribl...

adityaathalye today at 7:13 PM

Just past page 281 of Becky Chambers's delightful "the galaxy, and the ground within".

  Received Message
  Encryption: 0
  From: GC Transit Authority --- Gora System (path: 487-45411-479-4)
  To: Ooli Oht Ouloo (path: 5787-598-66)
  Subject: URGENT UPDATE

Man I love the series.

Looks like this multispecies universe has centrally-agreed-upon path addressing system.

ekipan today at 8:15 PM

I forget the context but the other day I also learned about Snowflake IDs [1] that are apparently used by Twitter, Discord, Instagram, and Mastodon.

Timestamp + random seems like it could be a good tradeoff to reduce the ID sizes and still get reasonable characteristics, I'm surprised the article didn't explore there (but then again "timestamps" are a lot more nebulous at universal scale I suppose). Just spitballing here but I wonder if it would be worthwhile to reclaim ten bits of the Snowflake timestamp and use the low 32 bits for a random number. Four billion IDs for each second.

There's a Tom Scott video [2] that describes Youtube video IDs as 11-digit base-64 random numbers, but I don't see any official documentation about that. At the end he says how many IDs are available but I don't think he considers collisions via the birthday paradox.

[1]: https://en.wikipedia.org/wiki/Snowflake_ID

[2]: https://youtu.be/gocwRvLhDf8

rini17 today at 8:21 PM

From real life we know that people prefer to have multiple anonymous IDs, or self-selected handles, either makes fully deterministic generation schemes moot.

Also, network routing requires objects that have multiple addresses.

Physics side of whole thing is funny too, afaik quantum particles require fungibility, i.e. by doxxing atoms you unavoidably change the behavior of the system.

bluecoconut today at 7:29 PM

Fun read.

One upside of the deterministic schemes is they include provenance/lineage. Can literally "trace up" the path the history back to the original ID giver.

Kinda has me curious about how much information is required to represent any arbitrary provenance tree/graph on a network of N-nodes/objects (entirely via the self-described ID)?

(thinking in the comment: I guess if worst case linear chain, and you assume that the information of the full provenance should be accessible by the id, that scales as O(N x id_size), so its quite bad. But, assuming "best case" (that any node is expected to be log(N) steps from root, depth of log(N)) feels like global_id_size = log(N) x local_id_size is roughly the optimal limit? so effectively the size of the global_id grows as log(N)^2? Would that mean: from the 399 bit number, with lineage, would be a lower limit for a global_id_size be like (400 bit)^2 ~= 20 kB (because of carrying the ordered-local-id provenance information, and not relative to local shared knowledge)

ktpsns today at 7:27 PM

Quite offtopic, but: I found UUIDs being overused in many cases. People then abused them to store data, making them effectively "speaking IDs" or "multi column indices".

manofmanysmiles today at 7:56 PM

I'd propose using our current view of physical reality to own a subset of the UIID + version field if new physics is discovered.

10-20 bits: version/epoch

10-20 bits: cosmic region

40 bits: galaxy ID

40 bits: stellar/planetary address

64 bits: local timestamp

This avoids the potentially pathological long chain of provenance, and also encodes coordinates into it.

Every billion years or so it probably makes sense to re-partion.

QuiCasseRien today at 9:54 PM

I really love everything related to Cosmology but I always struggle with two contrary concepts that lead to paradox (for me) :

- Infinity : from school, we learn our universe is infinite.

- We often do calculation with upper limit like this one : 10^240. This is a big number butttttt it's not infinite you know. 10^240+1, 10^240+2...

So :

1. if it's infinite, why doing upper limit calculation ?

2. if it's limited, what is there outside that limit ?

Extremly paradoxal

small_model today at 8:43 PM

We will probably end up with something like each planet has its own local addressing, and the big router in the sky does NAT, each solar system has a router and so on.

alex_tech92 today at 7:58 PM

It is interesting how much of our infrastructure relies on the assumption that 'close enough' is actually 'good enough' for uniqueness. When we move from UUIDs to things like ULIDs or Snowflake IDs, we are really just trading off coordination cost for a slightly higher collision risk that we will likely never hit in several lifetimes. Thinking about it on a 'cosmological' scale makes you realize how much of a luxury local generation is without needing a central authority. It is that tiny bit of entropy that keeps the whole distributed system from grinding to a halt.

factotvm today at 8:00 PM

> In order to fix this, we might start sending out satellites in every direction

Minor correction: Satellites don't go in every direction; they orbit. Probes or spaceships are more appropriate terms.

philipwhiuk today at 9:30 PM

Note that they almost immediately contract from 'the universe' to 'the visible universe', which isn't the same thing at all.

deleted today at 7:54 PM

eudamoniac today at 11:02 PM

I was going to read this, but it starts with an AI slop header image for no purpose, so I intuited that the article was similarly ill constructed.

dvh today at 9:49 PM

Another blow to the "all electrons are the same electron" theory. Why have only 1 electron with so many possible ids /s

frikit today at 8:47 PM

The best way to solve this is not to, and just giving up on the idea of identification.

If you have an infinite multiverse of infinite universes, and perhaps layers on that, with different physics, etc., you can’t have identity outside of all existence.

In Judaism, one/the name of God is translated as “I am”. I believe this is because God’s existence is all, transcending whatever concepts you have of existence or of IDs. That ID is the only ID.

So, the cosmic solution to IDs is the name of God.