Friday, April 30, 2010

How much memory is enough?

We tend to think of the internet as the single most comprehensive information resource on the face of the planet, and that's probably true.

However, advances in memory technologies such as high-density Flash RAM and ever larger hard drives are making it easier to carry more of that information around with us, even if we don't have internet connectivity.

I find it stunning that, since I became involved in computers back in the 1970s, the price of solid-state memory has fallen from around two cents per byte to an incredibly low one half a millionth of a cent per byte today (for USB Flash drives).

Even more astonishing, is the massive reduction in the physical size of solid-state memory.

My first hand-built computer had just a few meagre KBytes of RAM and this required a large circuit board with eight large 18-pin chips for each of those 1024 bytes. Today we have gigabytes of memory squeezed into an area little more than the size of your thumbnail.

And things just keep getting better.

Researchers are now touting another potential breakthrough in memory density which, they claim, could allow over a billion pages of information to be stored in a single square-inch of area.

The key to this technology is a technique for creating very highly packed arrays of magnetic nanodots.

Imagine being able to have the entire contents of the all the world's libraries in a device small enough to slip comfortably inside your pocket. Indeed, a snapshot of the entire internet could be condensed into an e-Book reader so that it was always available to you, no matter whether you had connectivity or not.

The mind boggles at the potential for applications for this high density of data storage.

Of course the technology is far from ready for production at this stage, several obstacles standing in the way -- not the least of which is coming up with a way to address these nanodots for the purpose of reading and writing.

The potential for such a device goes far beyond just storing data however. If the nanodots could be configure so as to be altered by the pattern of light falling on them then new "gigapixel" cameras could be created that provided really practical optical zoom capabilities.

The only problem we may face in the future is coming up with enough data to fill the ever-larger and ever-cheaper memory we find ourselves in possession of.

Unless you want a copy of every YouTube video ever posted, you may find that you reach your fill long before your portable electronic device runs out of memory. And then there's the problem of finding the time to actually assimilate that information.

Perhaps, at some time in the not too distant future, we really will have more computer-based memory than we'll ever need in our lifetimes.

Won't that be nice?

Friday, April 23, 2010

Big numbers are now much smaller

When I was a child back in the 1960s, one hundred was a big number.

One thousand was a huge number.

And a million -- well that was an altogether staggering amount.

Although the imperial system of weights and measures ruled back then and we didn't have fancy metric stuff like metres, grams or joules -- we still used the metric multipliers such as kilo
and mega. However, that's usually as far as it went.

Our radios tuned into frequencies that were measured in kilocycles (remember, this was before we switched to Hertz) and even the new fangled television only used megacycle frequencies.

When the humble microcomputer first arrived back in the late 1970s its memory was measured in bytes, hundreds of bytes or (if you were really rich), kilobytes.

As you can see 10^6 was just about the maximum multiplier used in those days for almost anything and I recall being the only person in my class at school who knew what a google was.

These days, everyone has heard of Google (although I bet lots still don't know what it means in its mathematical context) and sitting beside me on the desk is a drive with one terabyte of capacity.

In fact, the prefixes "giga" and "tera" have replaced "kilo" when talking about memory and disk storage, and now we also refer to processor speeds in gigahertz rather than the megahertz of those old 8-bit CPUs.

The earliest serial data links provided data transfer at rates that were measured in mere bits per second, now we're talking megabits, hundreds of megabits and even gigabits for today's state of the art wireless, DSL and fibre-based links.

This huge change in multipliers has happened in the relatively short span of just 40-50 years, so where will we be in another half-century?

Well if we extrapolate based on this history, we'd better get used to dealing with prefixes such as peta, exa, zetta and yotta.

No doubt many folks are thinking "what on earth could you possibly do with a zettabyte of memory, that's 10^21 bytes!

Well we live in an age when even those who build the computers and write the software have consistently underestimated the rate of growth to come. Whether it is true or just folk-law, both IBM Chairman Thomas J Watson and Microsoft Chairman Bill Gates have both been rumored (or misquoted) as grossly underestimating both the number of computers that would be needed (Watson) and the amount of memory computer-users would find to be adequate (Gates).

These days, when NASA's Solar Dynamics Observatory is streaming 1.5TB worth of data back to earth each day, it's only a matter of time before we start talking petabytes instead of terabytes.

And, for the record, the human brain is purported to have just 125MB of user-defined memory. That's a very small number (these days).

Friday, April 16, 2010

The biggest waste of space in mankind's history?

I have a twitter account.

I even tried sending out some regular tweets to see what would happen.

Then I ran out of material and enthusiasm.

Perhaps some folks feel the need to tweet (or twitter or whatever it's called) but I don't.

Similarly, I don't feel inclined to hang on every word that some celebrity might utter via their Twitter account -- life's far too short for that and there are plenty of far more interesting things going on in the world to waste my time with such fluff.

However, it has to be admitted that lots of folks live for (and on) Twitter.

To them, the cyber-SMS system is the be-all and end-all of their days.

Some addicts have hundreds or even thousands of people on their "follow" lists and/or spend inordinate amounts of time documenting their every move by way of firing off short and often cryptic messages.

Clearly, I may be in the minority with my opinions regarding Twitter and all those twit(terer)s who use it.

One organisation that thinks I'm completely wrong is the US Library of Congress.

In their wisdom (or should that be insanity), the USLoC plans to archive every pubic tweet ever made and make that archive available online for all to access.

Given that there are around 55 million tweets published every day, this is surely a gargantuan task, especially considering that there is an intention to also archive all the tweets that have occured since 2006. That represents billions of tweets -- most of which will be the most mindless dross ever to pass down cyberspace's superhighway.

One can't help but wonder if this isn't a project somewhat akin to collecting, collating, indexing and displaying all the sheets of toilet paper flushed since 1923. Certainly, from my perspective, the content involved will be just as riveting and worthy.

Okay, so I exaggerate a little...

As a profile of public opinion and a social commentary, this massive archive of Twitter posts may have some value to future historians -- but they'll still be faced with the issue of sorting the gold from the mud in order to get any reasonable data.

When I look at how seemingly underfunded such movements as The Internet Archive are, I have to wonder if the USLoC has gotten their priorities right.

The signal-to-noise ratio appears (at least to me) to be much better when analyzing web-content than Twitter postings and if public money was going to be spent creating such an extensive historical database of public opinion, I'd rather it sourced that data from anywhere other than Twitter.

As an interesting aside, the amount of storage required to archive all those tweets is surprisingly small.

If we assume 55 million tweets per day and take an average length to be (say) 100 bytes, that's just 5.5GB. Multiply that by 365 days and the annual data storage required for this archive (excluding indexes) is just 2 terabytes. A couple of external USB drives would do the job nicely!

Of course there is the overhead of indexing all these small messages but that could be offset to some extent by compressing the text involved prior to storage.

Maybe this isn't the biggest waste of space in the history of mankind at all -- but it's still a waste, if you ask me.

Friday, April 9, 2010

New TLDs a license to print money?

For a long, long time, internet users wishing to have their own domain name were limited to just a handful of generic top level domains (gTLDs) such as .com, .org, .net and various country codes such as .nz, .au, etc.

In 2000 and again in 2003/4, this limited number of gTLDs was extended by the inclusion of new ones including .biz, .info, .aero and others.

There have also been other specialist gTLDs mooted, such .xxx for "adult" websites but not much has actually come of that.

However, next year the floodgates will be opened for yet another round of new gTLD applications, with anyone being able to register their own gTLD if they wish to, and more importantly, if they have enough money.

How much money?

Well for a start, there is a non-refundable "evaluation fee" of US$185,000 for each application -- hardly pocket change and something that really limits these new gTLDs to big corporations or companies who are confident they can leverage this new cyberproperty for profit.

But the costs don't stop there. Even if/once your new gTLD is approved (not a guaranteed outcome of the "evaluation process"), you'll be paying an additional US$25,000 per annum to keep it alive.

ICANN (The Internet Corporation for Assigned Names and Numbers) claim that these fees are not set at a level designed to produce a profit (since the organisation is a "not for profit" one) but merely designed to cover the actual costs.

All I can say to that is that these guys must have some pretty fancy office furniture, medical and retirement schemes!

On the other hand, perhaps it's a good thing that the price of a new gTLD is set so high, since it does keep out the riff-raff and stops the DNS being polluted with a tsunami of vanity domains.

More information on the process can be found at the ICANN website.

Despite ICANNs assurances that this isn't simply a money-grab, I find it difficult to believe that claim.

Any sizable company wishing to protect its brand in cyberspace will soon have little option but to register it as a gTLD to avoid someone else doing the same and potentially diluting their trademark. Even those who choose not to register may end up paying fees to challenge attempts by others to hijack that branding by way of registering the same or similar gTLD.

It's almost certain that this cyber-landgrab will buy many a new BMW and Porsche for members of the legal fraternity but will it really do anything to improve the internet?

Do we really care whether a fast-food giant has its website at www.mcdonalds.com or www.mcdonalds?

What this does is create yet more potential for bunfights over who gets the rights to things such as the .mcdonald domain?

Is it this McDonald or this one? Clearly the former would not want to be mistaken for the latter.

If I were to try and register my own surname (simpson) as a gTLD, do you think that Fox Studios would allow me to do so or would they almost certainly object on the basis that (despite having legitimate claim to such a domain), my registration of that name would constitute a breach of their trademark?

When genuine trademark clashes occur both sides can end up wasting huge amounts of money dealing with an issue that simply shouldn't exist -- were it not for the decision to sell even more gTLDs.

New gTLDs not a license to print money?

I'm not so sure.