The Internet Archive’s last-ditch effort to save itself

@DirkMcCallahan@lemmy.world · 7 months ago

The Internet Archive’s last-ditch effort to save itself

HexesofVexes · 7 months ago

Operation archive the archive?

@DestroyMegacorps@lemmy.ml · 7 months ago

The internet archive probably contains sooo much data its probably 4 petabytes and thats hard to store

@grue@lemmy.world · 7 months ago

It’s “only” a couple hundred hard drives.

Buelldozer · 7 months ago

145+ Petabytes for a single copy of the archive and they currently have two copies of everything for a total of around 290 Petaybtes.

The largest hard drive I’m aware of is 32TB so you’d “only” need over 9,000 (lol) of the largest drives ever made. I can’t even tell you what that would cost since Seagate doesn’t have a publicly available price for the damn things!

@BearOfaTime@lemm.ee · 7 months ago

And it had to be replicated, so 3 copies somewhere (granted proper backups are compressed).

Let’s say they have a proper backup compressed at (a random) 60%. That one backup is 87 petabytes. With daily incrementals, so another what, 14 PB to get through 2 weeks of incrementals? Something in the range of 600 PB total with replicas?

(I’m completely pulling numbers out of my ass, I’m not familiar with how such large datasets are managed from a DR perspective).

Optional · 7 months ago

Too bad there are no obscenely rich techbros around for whom this would be nothing.

That’s chump yacht money.

Buelldozer · 7 months ago

You’d need more than 9,000 of the largest hard drives made (32TB) to store the nearly 300 Petabytes of data they have. Still within the reach of an obscenely rich tech bro but not exactly cheap.

@twei@discuss.tchncs.de · 7 months ago

even then you’d still need networking, caching, the rest of the servers, and someone to deploy all of this

Buelldozer · 7 months ago

It was 30 Petabytes back in 2016.

Today a single copy of the Internet Archive is over 145 Petabytes!

Tarquinn2049 · edit-2 7 months ago

And, you’d want/need redundancy. One on-site back up for quick restoration and one off-site for surviving physical disaster. So, you’d need at least 3 times that. In HDD prices, that is roughly 2.5 million per set-up, or 7.5 million total for all three. And in SSD prices, well it’s about 3x that. 7.5 million per set up and 22.5million for all three.

An alternate option is a distributed back-up. They could have people volunteer to store and host like 10 gigs each, and just hand out each 10 gig chunk to 10 different people. That would take alot of work to set up, but it would be alot safer. And there are already programs/systems like that to model after. 10 gigs is just an example, might be more successful or even more possible in chunks of 1-2 terabytes. Basically one full hard drive per volunteer.

Lol, had to add that after doing the math for 10 gigs to ten people and realising that was 1000 people per terabyte, so would take 150 million volunteers. Even at 2 petabytes each, assuming we still wanted 10x redundancy in that model, it would be like 750 thousand volunteers or something like that. Maybe there is no sustainable volunteer driven model, lol.