GV is Preserving the Web in Amber and Why That's Great News

TL;DR: Amber will make sure that when we link to something on GV, it will be available for readers in the future via. Archive.org, even if the original URL stops working!

Hi all, I am Jer, the Technical Lead for Global Voices. Today I want to tell you about our recent implementation of Amber across all GV sites, and why I think it's such an important step towards accomplishing our mission of creating a record of citizen media from all around the world.

The web is always falling apart, but GV tries to hold it together

This shouldn't come as a surprise to anyone, but with the long view GV has, it becomes painfully obvious how fragile the web is. When we started in 2005 blogging was only a few years old, so every single blog we linked to was “new” by modern standards. Even a “longstanding” platform like Blogger was just a baby compared to how old Twitter currently is, and Twitter, of course, didn't even exist yet.

Since 2005 so many sites have gone offline, it's absurd to try and list them, but back then, it felt like services like Technorati, Digg (in it's original form) and Google Reader, which were “competition” for Global Voices, would be around forever, today all three are offline.

At the time Technorati was a search engine for blogs, trying to aggregate and categorize all the blogs on the web, and it was pretty useful. People would ask me “why Global Voices when we have Technorati already?” My answer then was the same as my answer now:

GV has a mission to translate, report on and preserve the web that isn't driven by profit, nor is it driven by expensive technology that will break down. GV's mission is driven by real human effort to not only find and catalogue the blogs, websites and social media profiles that make up the web, but to pull out the important quotes, contextualize them and create a record of what is most important on the global web.

This makes the GV archives a treasure trove of internet history, and one we have worked hard to preserve over the years. Even as blogs, services and aggregators drop like flies, abandoning their archives and all the history and context that was invested in them, GV has kept it's full archive of over 100,000 original posts online, tagged and searchable (with no small thanks to WordPress, the free, open source platform that was still a baby in 2005 but turned out to be a great long-term investment).

That GV has such wide, deep and grassroots international coverage only makes our preservation efforts more important, because in so many cases, we are the only organization who reported on our subjects. We found the unique voices that were only being heard by a tiny local audience and amplified them for the world.

The only problem is that no matter how well we quote the important parts of the the blogs and sites we link to, it will only ever scratch the surface of the unique perspectives and history those sites hold. In far too many cases (thousands, tens of thousands, hundreds of thousands) the sites we linked to over the years have already gone offline, leaving our links to them as tragic dead ends. Readers visiting our posts from 2005 are likely to find 404 error after 404 error when they try to dig deeper.

The phenomenon of broken links on old posts is called Link Rot, and avoiding this fate is exactly what Amber is all about, and the reason we've invested in getting it set up across GV sites!

Trapping the web in Amber makes it available for the future

As you can see from the video embedded above, Amber tackles the problem of link rot head on, by ensuring that every time we link to a URL, we also get a copy of it's current state saved for the future. Nothing can make sure that sites run by other people stay online, but with Amber, we can have confidence that at least the pages we link to will still be available.

Amber and Global Voices are family ❤

Amber was built by a team from The Berkman Klein Center for Internet & Society at Harvard University. For those who don't know, Global Voices also traces its origins to the Berkman Klein Center, where our founders Ethan and Rebecca were fellows during GV's inception.

This shouldn't surprise anyone, since the missions of GV and Amber are so closely aligned. Both aim to preserve the web as a historical record with freedom and openness as guiding principles.

We didn't have time to implement Amber during it's initial development, but we've been working to contribute our code improvements back using the the Amber GitHub project, and hopefully our contributions can be helpful to other large sites that install the plugin.

How does Amber work?

Amber offers a few options for how to save the backups of sites you link to, but for GV's purposes, the ones that save the website directly to your server aren't practical. We have so many links (440k URLs just on the English GV site) that this would quadruple our storage needs and, of course, our hosting bill.

Instead, we are relying on the Archive.org “WayBackMachine” integration in Amber, which sends each URL to Archive.org when we publish the post. This lets the WayBackMachine do what it does best: fetch, store and catalogue that URL for the rest of eternity.

Later, Amber re-scans all those URLs, asking Archive.org to check again, and note if the link no longer works. Once a link is known to be broken, visitors who click it are sent to the Archive.org backup, rather than the original (404) URL.

Wait, what is the Archive.org “WayBackMachine”?

Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more. […] Today we have 20+ years of web history accessible through the Wayback Machine
About IA

The Internet Archive is a magic tool and one of the few free, beautiful internet things that has lasted as long as GV. Anyone can upload content to Archive.org as long as it has a free license (like Creative Commons) and have it hosted for free forever.

At Global Voices, we love the Internet Archive for so many reasons. It helps us investigate our own past, seeing what GV looked like long ago. It helps us with our journalism, unearthing sources that would be lost otherwise. We even use it to host our podcast audio files, because it's free and works great!

Loving something doesn't mean you necessarily trust it, but in this case I am happy to rely on Archive.org for our Amber implementation. For collaboration to function, we need to be able to rely on our allies, and Archive.org is the kind of ally Global Voices cherishes.

How will Amber look on GV?

Amber popup linking to the Archive.org snapshot of a dead link

Most of the time, Amber won't look like anything at all, since the plugin will handle all the URL snapshotting and evaluation in the background. The one time we'll notice Amber is when a link that was previously working goes down. At that point, if you visit the old article and click the link, you'll see the Amber box, informing you that the URL is probably not working and offering you the choice of visiting the original URL, or visiting the backup on Archive.org.

 

 

Global Voices and Amber are working together to preserve the history of the diverse, international web

It was a big project, but I am thrilled to have Amber working on GV, and hope it will help our readers today and in the future to find the nuanced, local perspectives that light up our homepage each day, even if the original sources go offline.

Oh yeah, one more thing, if you have your own blog, why not install the Amber plugin yourself? We can all do our part to preserve internet history, whether it's the enormity of GV, or your own little corner of the web ❤

2 comments

Join the conversation

Authors, please log in »

Guidelines

  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.