Archives For Organization

Yesterday I downloaded Wikipedia – All of 2,120,684 English articles of it are now on my hard drive.
A few stats for the curious (for the English version only):

  • Date of last edit: April 2007 (A September backup is still in progress)
  • Compressed size: 7.19 GB
  • Number of files compressed: 9
  • Number of files decompressed: millions
  • Decompressed size: ~100 GB
  • Time to download: 8 hours
  • Time to decompress: 10 hours

Why on earth would you do this?

  • The more versions of Wikipedia exist in the world, the less the chance it will get lost if something really depressing happens
  • Having a local version means you can look things up offline
  • Local versions can be used in corporate firewalls to give users access without granting full access to the web
  • Local files can be parced by scripts to create reports and study the structure, trends, and patterns
  • umm..its cool…do I really need a reason to want all the world’s collective knowledge on my laptop

Most people interested in this won’t have any problem downloading the compressed files. But few people have 100 GB free on their laptop. If you decompress the files to an external usb hard drive, it can take 30 hours per file (multiply times 9 files). Still, that is not too bad. You can buy 2.5″ laptop drives now that have 300gb and 3.5″ drives that are 1000 GB. The only problem is that decompressing the time it takes (10+ hours!)

Wikipedia is growing…fast.
Last year (Dec 2006) total compressed size of the English version was version only 5.8 GB. By April 2007 (just 4 months later) it was 7.2 GB (this is the version I downloaded). I am waiting to see what the September version will be. How fast is Wikipedia growing? It is hard to say. The last stats I found are for July 2006 but it was definitely on an exponential curve.


http://download.wikimedia.org/

http://static.wikipedia.org/downloads/December_2006/en/