Friday, April 16, 2010

The biggest waste of space in mankind's history?

I have a twitter account.

I even tried sending out some regular tweets to see what would happen.

Then I ran out of material and enthusiasm.

Perhaps some folks feel the need to tweet (or twitter or whatever it's called) but I don't.

Similarly, I don't feel inclined to hang on every word that some celebrity might utter via their Twitter account -- life's far too short for that and there are plenty of far more interesting things going on in the world to waste my time with such fluff.

However, it has to be admitted that lots of folks live for (and on) Twitter.

To them, the cyber-SMS system is the be-all and end-all of their days.

Some addicts have hundreds or even thousands of people on their "follow" lists and/or spend inordinate amounts of time documenting their every move by way of firing off short and often cryptic messages.

Clearly, I may be in the minority with my opinions regarding Twitter and all those twit(terer)s who use it.

One organisation that thinks I'm completely wrong is the US Library of Congress.

In their wisdom (or should that be insanity), the USLoC plans to archive every pubic tweet ever made and make that archive available online for all to access.

Given that there are around 55 million tweets published every day, this is surely a gargantuan task, especially considering that there is an intention to also archive all the tweets that have occured since 2006. That represents billions of tweets -- most of which will be the most mindless dross ever to pass down cyberspace's superhighway.

One can't help but wonder if this isn't a project somewhat akin to collecting, collating, indexing and displaying all the sheets of toilet paper flushed since 1923. Certainly, from my perspective, the content involved will be just as riveting and worthy.

Okay, so I exaggerate a little...

As a profile of public opinion and a social commentary, this massive archive of Twitter posts may have some value to future historians -- but they'll still be faced with the issue of sorting the gold from the mud in order to get any reasonable data.

When I look at how seemingly underfunded such movements as The Internet Archive are, I have to wonder if the USLoC has gotten their priorities right.

The signal-to-noise ratio appears (at least to me) to be much better when analyzing web-content than Twitter postings and if public money was going to be spent creating such an extensive historical database of public opinion, I'd rather it sourced that data from anywhere other than Twitter.

As an interesting aside, the amount of storage required to archive all those tweets is surprisingly small.

If we assume 55 million tweets per day and take an average length to be (say) 100 bytes, that's just 5.5GB. Multiply that by 365 days and the annual data storage required for this archive (excluding indexes) is just 2 terabytes. A couple of external USB drives would do the job nicely!

Of course there is the overhead of indexing all these small messages but that could be offset to some extent by compressing the text involved prior to storage.

Maybe this isn't the biggest waste of space in the history of mankind at all -- but it's still a waste, if you ask me.

No comments:

Post a Comment