- cross-posted to:
- technology@lemmy.ml
- cross-posted to:
- technology@lemmy.ml
Operation archive the archive?
The internet archive probably contains sooo much data its probably 4 petabytes and thats hard to store
It’s “only” a couple hundred hard drives.
145+ Petabytes for a single copy of the archive and they currently have two copies of everything for a total of around 290 Petaybtes.
The largest hard drive I’m aware of is 32TB so you’d “only” need over 9,000 (lol) of the largest drives ever made. I can’t even tell you what that would cost since Seagate doesn’t have a publicly available price for the damn things!
And it had to be replicated, so 3 copies somewhere (granted proper backups are compressed).
Let’s say they have a proper backup compressed at (a random) 60%. That one backup is 87 petabytes. With daily incrementals, so another what, 14 PB to get through 2 weeks of incrementals? Something in the range of 600 PB total with replicas?
(I’m completely pulling numbers out of my ass, I’m not familiar with how such large datasets are managed from a DR perspective).
Too bad there are no obscenely rich techbros around for whom this would be nothing.
That’s chump yacht money.
You’d need more than 9,000 of the largest hard drives made (32TB) to store the nearly 300 Petabytes of data they have. Still within the reach of an obscenely rich tech bro but not exactly cheap.
even then you’d still need networking, caching, the rest of the servers, and someone to deploy all of this
And, you’d want/need redundancy. One on-site back up for quick restoration and one off-site for surviving physical disaster. So, you’d need at least 3 times that. In HDD prices, that is roughly 2.5 million per set-up, or 7.5 million total for all three. And in SSD prices, well it’s about 3x that. 7.5 million per set up and 22.5million for all three.
An alternate option is a distributed back-up. They could have people volunteer to store and host like 10 gigs each, and just hand out each 10 gig chunk to 10 different people. That would take alot of work to set up, but it would be alot safer. And there are already programs/systems like that to model after. 10 gigs is just an example, might be more successful or even more possible in chunks of 1-2 terabytes. Basically one full hard drive per volunteer.
Lol, had to add that after doing the math for 10 gigs to ten people and realising that was 1000 people per terabyte, so would take 150 million volunteers. Even at 2 petabytes each, assuming we still wanted 10x redundancy in that model, it would be like 750 thousand volunteers or something like that. Maybe there is no sustainable volunteer driven model, lol.