The Question:
How big would a database be (as in, how much hard drive space would it take up) if it were to store any and all combinations of passwords that included uppercase, and lowercase letters, and numbers, up to 8 characters long. This does not include special characters. I’m working strictly off the ascii table. Specifically Decimal values 48-57 inclusive, 65-90 inclusive, and 97-122 inclusive.
Ok. Now a database table for those who don’t know is made up strictly like a secure excel spread sheet. There’s a login that authorises you to edit data, there are rows, columns, and cells. The columns have unique names, (as in descriptive and not “column A, column B, column C”). There’s something called a primary key, which keeps all rows within the table unique, and then there’s the actual information that you wish to store. Now for this specific table (which is actually referred to as a “password dictionary” for hackers) we will be requiring three columns: the primary key column, called “uid” for lack of a better name, the column which a password hash will be stored within (called “hash”), and the column where the original password is kept (called “pass”).
The deal with a password hash is for security purposes. You cannot reverse engineer a hash, or if you do, it will take a lot of guess work and a crap load of time. It takes in binary shifts, truncations, and all sorts of crap. After those truncations however, no values can be recovered. This is why you request a new password, instead of having admins of whatever site you are visiting giving you the old one back. It’s because they literally cannot recover your password. The most common hash algorithm in use today is called “md5″, and this is what we’ll be working with as a demonstration today. This will not include the up-and-coming hash algorithm called “sha1″, but including that will basically double the amount of space that we need. Now a few details about the md5 hash algorithm is that it increases the size of the hash to specifically 32 characters in length. Every single value that you use as input into that hashing algorithm will have it’s own UNIQUE hash that will come out of it, and those hashes will always be 32 characters long, even if you put a single 0, or the phrase “Bobsbitchtits” in there. (Whoever gets that reference gets +5 internets). This is NOT a random process (hence why we’ve started to see a few reverse-engineering capabilities emerge within the past few years). If the process was random, we wouldn’t be using it to securely log into various websites, as it’d never work. Most sites (if not all who use proper login systems) take this hash and store it in a database with no further additions to it, but that’s another beef of mine, and another story in and of itself as to why they should do stuff to it after it’s been hashed.
Now onto creating that table. Since most sites use a MySQL back end to store data, this is what we will be working with. The types that we will use will be dictated by what will go in there.
First off that primary key will need to go from 1 to 36^8 (that is, 36*36*36*36*36*36*36*36, or 2,821,109,907,456). That is correct, there is literally 2.8 TRILLION possible combinations here. Now, luckily for us MySQL has a numerical type that can count that high. It’s called BIGINT, and using signed values (as in, they can go positive or negative, rather than unsigned which is positive only) the max value is 9,223,372,036,854,775,807 (or 9.2 quintillion). Now all considered, the largest value that that column will hold will be 2.8 trillion (see number above). One cell holding that kind of information will only cause 9 bytes of space usage. The size of this cell will not change even if the value is 1, or 2.8 trillion It’s how the engine stores integers, details that I won’t be going into. Not to bad right? But considering that we’re doing almost 3 trillion of those, that would leave us with 25,389,989,167,104 bytes (or 25.3 terabytes) of space being used in that column alone.
I’m not done yet either. The md5 hash, which is what the next column will contain is 32 characters long. No matter the input. Now, if you look to that ascii table, you’ll see that the highest-valued alpha-numeric character is the lowercase “z”. One cell containing 32 “z”s, will take up 36 bytes. (Of note, even if you use the lowest-valued character, the size doesn’t start to shrink until you start putting less than 32 characters in that cell). Multiply that by about 3 trillion and you get 101,559,956,668,416 bytes (101.5 terabytes) of space required for THAT column alone. So far we have accumulated 126.8 terabytes of space required, and we still have one column to go.
The last column will contain the actual value creating that hash. That would mean that a column that will hold a word of length 8, but allow for numeric and alpha characters, will be required. The largest value that this will use in one cell is 20 bytes in size (also of very interesting note, if you just put a single “1″ in there it will still be counted as 20 bytes. It’s how the engine works. I can explain it to you if you want, but I’m not gonna do that here) This will require 56,422,198,149,120 extra bytes (56.4 terabytes) of space, totalling up to just about 183.2 terabytes of space.
To put this into perspective, there are ~6.7 billion people on this earth as of 2008. If each person was worth one byte on a hard drive, we’d have a 6.8 GB hard drive. Now there are 1024 gigabytes on a 1 terabyte hard drive. Which means that to get 1 terabyte, each person on this earth would be required to be worth 149.253731 bytes. To hold all this information, we would require each person on this lovely earth to be worth 27,343.2835821 bytes (27.3 kilobytes). This is nominal, and this kinda space is probably already harbored by the US government. It would not surprise me (considering that Bungie, for their Halo 3 saved games server, has at least 47 terabytes of space).
However, my college recently started requiring 15 characters minimum within their passwords. This causes 36^15 different possible combinations, or 221,073,919,720,733,357,899,776 (221 sextillion) different possible combinations. I’m not gonna go through the math here, but that would require 14e24 (14 septillion or 14,000,000,000,000,000,000,000,000) bytes (14 yottabytes) of storage. To get that much space each person on this earth of ours right now will require 2 petabytes (2,000 Terabytes) of space to their name. And all of that would be used to store this information, not a drop of it for their personal need. Within about 500 years however, I’m positive that we will have this kind of space readily available.
So I hope, beyond hope that within 500 years, we will be taking security much more seriously than we are on a general level today. I don’t think I’ll be disappointed.
