On Salting

If you’ve ever had to listen to a security briefing for a website and its login system, you’ll know that one of the most important things you can do besides hashing your password (the correct way), is to salt it. The fun thing about salts is that they don’t have to be cryptographically secure in origin to provide an extra layer of security (as something is better than nothing). But what are the differences that come up when we attempt to take that step to make them secure, and what steps are appropriate to take?

The Avalanche

One thing all secure hashes have is something called ‘The Avalanche Effect’. Where if you change just one character in your password, the resulting hash will look completely different. For example:

1
2
3
4
5
6
7
$password = 'abcde';
echo sha1($password);
//Presents: 03de6c570bfe24bfc328ccd7ca46b76eadaf4334
 
$password = 'abcdf';
echo sha1($password);
//Presents: 9693da0e085af20ef1f982b017fc6ec2419848e5

The two look nothing alike to the human eye, but here’s the problem: Even with the strength attributed to sha1, it’s still really easy for a computer to tell that the two hashes come from a similar source. This is where salting comes in. A salt is simply a known value that is stuck (somewhere) onto a private value to ensure that the resulting hash is even more different than before. This comes into play especially when users have the same password. Due to that fact, we also like to keep salts unique between users so that in case they do have the same password, a hacker who has all the hashes and salts won’t be able to tell. So take, for instance this:

1
2
3
4
5
6
7
$password = 'aabcde';
echo sha1($password);
//Presents: 404940891010bbba961496918826d91fc2e2f5ac
 
$password = 'babcdf';
echo sha1($password);
//Presents: 09515cccb9e59871f1ea1a2a34920367a23b4a72

Again, to the human eye they look completely different, but the biggest prize from doing such a thing is that now the hacker has to spend even longer with his rainbow table to break the hash. So the idea is, how do we make him spend as much time with it as possible?

Tales from the Crypt

PHP has a wonderful function called “crypt”. If your php install is set up correctly, you’ll be able to use a very popular algorithm called ‘Blowfish’ through the use of this function. Blowfish is a preferred hashing algorithm currently due to the fact that it is slow, and has a strong algorithm behind it, amongst other things. This is a key ingredient when hashing passwords: The slower the algorithm, generally, the stronger. (Please note that something that takes 10 seconds to turn “abc” into “abd” or “efg” into “efh”, etc, etc…. While slow, is still a weak algorithm).

The curiosity about Blowfish is that it’s salt is limited to 22 characters. So how do we use this to get something better? Warning: There is maths ahead.

Math Ahoy!

When we calculate the potential strength of a resulting hash, we start by looking at all the possible outputs it can have from its input. For instance: sha1 provides a result that’s 40 characters long, each character possible being 0-9 or a-f (in total, 16 possible values per character space), so therefore we calculate 16**40 = ~1.5e48 potential combinations. That’s a 15 followed by 47 zeroes.  That’s no small number to laugh at. But since we’re only limited to 22 characters in legnth, we have 16**22 = ~3.0e26. Significantly smaller, but no less diminuitive.

So how do we make the sha1 return better? Well one idea is to perform what’s called a base64 encoding. Long story short, it reduces the 40 character output of sha1 to the length of 28, but with 66 possible different character values instead of just 16. This gives us 66**28 = ~8.9e50. WOW! That’s a larger potential combination result than the original sha1 output! Small fact: Blowfish doesn’t enoy two of the output characters from base64 encoding (the ‘+’ and ‘=’ signs), so that reduces us to 64**28 = ~3.7e50; still larger than the original output which then leaves us with 64**22 (due to the salt length limitation) = ~5.4e39. Still lerger than just cutting the sah1 output to 22 characters by a fair set of magnitudes (13, if my algebra isn’t failing me).

So What’s Next?

The biggest problem I have sitting here is that the security of the sha1 algorithm has been drawn into question, thus taking away our eventual goal of ~5.4e39, as the potential for the same outputs coming from different inputs, effectively reduces this number. So much has this been called into question, it brought about the sha2 family of algorithms: SHA256, SHA386, and SHA512. So, to make it even more secure, we can use one of them, to ensure that the 22 characters we gain are more unique and we’re closer to achieving that penultimate ~5.4e39 combinations. There are other algorithms we can use to get good outputs to convert as well, such as the whirlpool algorithm (my favourite) and others, but for the contents of this talk, they’re effectively equivalent.

The math and process is the same as above, but now we’re using a bigger data pool (a larger output), which ensures a smaller occurance of collisions within the first 22 characters than just using base64 encoded sha1. The only other step up we can take is randomly generating those 22 characters ourselves. Under normal circumstances, I would say ‘no’, but since even sending ‘mt_rand()’ through SHA512 has been deemed ineffective, we are foced to look to other sources. Thankfull, php has a way for us.

Tales from the… Mcrypt?

PHP has a function that on the latest stable builds of php for any OS works. It’s called ‘mcrpyt_create_iv’. An IV is called an initialization vector and is used for a lot of secure two-way algorithms. We can use that as the source of our salt. By saying that we want a 16 bit long IV (which translates to 24 characters base64 encoded), we get a knowingly secure source for our salt, ensuring the uniqueness that we’re striving for!

“But wait!” you say, “Why can’t we just throw THAT value into SHA512 and then encode the result?” There’s nothing stopping you from doing that, outside of a lack of need. One thing that a lot of people get confused when we’re talking about a ‘secure salt’, is that the salt itself only has to be unique (or at least, ‘unique enough’ according to a computer). Otherwise who knows that salt doesn’t really matter as long as the correct algorithm is used to hash your passwords. For instance, the Blowfish algorithm actually returns the salt given to it as part of the resulting hah for easy verification for logging in. You store the entire thing in the database. It’s the public half of the hash, so the security of the actual password lies on the password itself before it goes through the hash, and the hashing algorithm itself, right where it should be.

The Code!

And the moment you’ve all been waiting for. It’s a surprisingly small amount of code this time around, but it is extremely vital to keeping your users’ passwords safe. So until next time, enjoy!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function salt(){
	$replace = array(
		"+" => ".", 
		"=" => "/"
	);
	$salt = mcrypt_create_iv(16, MCRYPT_DEV_URANDOM); //Get 16 bits of truly random data
	$salt = base64_encode($salt); //Base64 encode it
	$salt = substr($salt, 0, 22); //Take it down to 22 characters
	$salt = strtr($salt, $replace); //Replace any naughty characters with safe ones
	return $salt;
}
 
function hash($password){
    return crypt($password, "$2y$12$".salt()); //Hash everything
}

ADFGVX Redux

So, a few months ago (ok, probably about a year ago), I learned of this interesting programming language called D. It’s designed by a guy who specializes in compilers, and so his goal when he created D was to have a clean and easy to use syntax that is also very easy to compile, keep secure, and test.

I started out with something simple like The Euler Project, to learn the basics. That didn’t really keep my interest very long, and my lack of knowledge on how to appropriately program those algorithms irritated me rather than my lack of knowledge of the language. So I turned to an old, old idea that I had: The Cryptology Collection. A collection of classic ciphers and codes, plus the more modern ones (broken or otherwise).

Eventually I’d like to be able to apply this project to another project having to do with Cryptanalysis, and go from ancient to modern techniques with it, but first the (more or less,) basics. Naturally, the first one I started with last time was the ADFGVX Cipher, and as such that will be the one I use this time, but only in the programming language of D, and a bit more care taken to make the code efficient (at some expense to debug and readability, which will be improved later.)

You can follow my development on this project here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
module classic.ciphers.ADFGVX;
 
import std.string;
 
class ADFGVX {
    private char[][] _polybius;
    private char[] _letters = ['A','D','F','G','V','X'];
 
    public this( ) {
        //TODO: Randomize the creation of this alphabet.
        string alphabet = "MBJYA,Z(?PS.)NX; DURITW:!HGLFECVO'KQ";
 
        _polybius = new char[][6];
        foreach( i, ref e; _polybius ) {
            e = new char[6];
        }
        for( int row = 0, count = 0; row < 6; row++ ) {
            for( int column = 0; column < 6; column++ ) {
                _polybius[row][column] = alphabet[count];
                count++;
            }
        }
    }
 
    public string encode( string plain, string key ) {
        char[] coordinates;
        char[][char] map;
        string cipher;
 
        //Get the coordinates of each letter in the message.
        foreach( char c ; plain.toUpper() ) {
            for( int y = 0; y < 6; y++ ) {
                for( int x = 0; x < 6; x++ ) {
                    if( _polybius[y][x] == c ) {
                        coordinates ~= _letters[x];
                        coordinates ~= _letters[y];
                    }
                }
            }
        }
 
        //Map these coordinates under the key letters, going across.
        foreach( int i, char c; coordinates ) {
            map[key[i%key.length]] ~= c;
        }
 
        //Transpose the collection of letters under each key letter into a single
        // line, after sorting the resulting map into alphabetical according to
        // the key characters.
        foreach( char c; map.keys.sort ) {
            cipher ~= map[c] ~ " ";
        }
 
        //Return the cipher text, stripping whitespace at each end of the message.
        return cipher.strip();
    }
 
    public string decode( string cipher, string key ) {
        string plain;
        char[][char] map;
        string[] plainMap;
        char[] coordinates;
 
        //Get a "plain" mapping of the message, by splitting the cipher into an array.
        plainMap = cipher.split(" ");
 
        //Initialize the map that we'll be using to translate back into the plaintext.
        foreach( char c; key ) {
            map[c] = [];
        }
 
        //De-sort the collection into the actual map we'll be reconstructing the
        // coordinate string from.
        foreach( int i, string word; plainMap ) {
            foreach( char letter; word ) {
                map[map.keys.sort[i]] ~= letter;
            }
        }
        //Go through the map, and actual reconstruct the coordinate list.
        for( int i = 0; i < cipher.length; i++ ) {
            if( map[key[i % map.length]].length > 0 ) {
                coordinates ~= map[key[i % map.length]][0];
                map[key[i % map.length]] = map[key[i % map.length]][1 .. $];
            }
        }
        //Go through the coordinate list two at a time, grabbing the letters
        // at the X and Y coordinates.
        for( int i = 0; i < coordinates.length; i += 2 ) {
            int x, y;
            char cx = coordinates[i];
            char cy = coordinates[i+i];
            for( int k = 0; k < 6; k++ ) {
                if( cx == _letters[k] ) {
                    x = k;
                }
                if( cy == _letters[k] ) {
                    y = k;
                }
            }
            plain ~= _polybius[y][x];
        }
 
        return plain;
    }
 
    //Test that a known plain text translates into a known cipher text.
    unittest {
        ADFGVX cipher = new ADFGVX( );
        string text = "HELLO, WORLD.";
        string key = "fubar";
        string test = cipher.encode( text, key );
        writeln( test );
        assert( test == "VFFDF XVVXX DVXGGD GXVGX VGAFV" );
    }
 
    //Test that a known cipher text translates into a known plain text.
    unittest {
        ADFGVX cipher = new ADFGVX( );
        string text = "VFFDF XVVXX DVXGGD GXVGX VGAFV";
        string key = "fubar";
        string test = cipher.decode( text, key );
 
        assert( test == "HELLO, WORLD." );
    }
}

ShiftedBits CMS is now IceCreamCMS

Hello world that doesn’t read this blog! I felt like being incredibly self-centered and pretend that for the moment, things I say and do about this certain project of mine are relatively important and impact more people than I know.

Now with that out of the way, let’s move on to WHY the name change has happened.

I was almost done with ShiftedBits. I had used it in production for a clients’ website, and all that really needed to happen after all that was a few more ideas put into practice and BAM! Version 1.0 release. Then something interesting happened: I grew up, and actually started to do some research. This research led me to realize that the set up of ShiftedBits was all wrong! Nothing was inherently testable, there were no mock-able objects, I couldn’t test the integration for crap. Not to mention User Security (which was one of the BIG topics I took on with SBCMS) was not really implemented correctly. It had its positives but it also had some major drawbacks in the design.

So I set about thinking. You know, as you do…

I was watching Harry Potter p.6 on the TV one morning (I was waiting for a job to start and had been out of school for a while, so this wasn’t necessarily uncommon), and eating a bowl of ice cream for breakfast. And it hit me. The changes I had to make were so drastic, that it was effectively going to be a new platform all together. The new name had to be something random. I looked at a box on the coffee table. At the rug underneath both, the couch I was on and then straight at my bowl of ice cream, and it hit me like the sweet sweet taste of Vanilla Bean (which is what I happened to be eating at that time as well.)

I knew what to call the project, and I automatically had a guiding principle based off of that name. It’s ice cream. Ice cream is sweet. Ice cream is good. Ice cream is for Real People. Now, that “Real People” shindig really got me. One of the major ideas of SBCMS was to be so simple, that a non-software developer could understand it. It fit so well to say that Real People were any one who WEREN’T software developers involved in the actual project (effectively, everyone not me…) and making it simple and easy for them to perform multiple actions, such as install the system, upgrade, install modules (and to their liking, write their own with little to no overhead.) was the best way to go about designing this over-glorified personal project.

So what has happened in the mean time? Well, in short I discovered Test Driven Development, Ant, Maven, Continuous Integration, SOLID, PCI-DSS, and OWASP among others. Got a job too, and moved, but that’s besides the point.

So things are moving on. The biggest actual paradigm shift in the code is that the CMS is now gonna follow more Java-like principles in design. Maybe one day I’ll finally figure out how to properly use namespaces in PHP, and then they’ll act like packages… or something.

But yeah! On the up and up, development is slow at the moment, as I’m still working on getting my “servers” built (yayVM’s), and the rest of the system actually designed. Hopefully one day soon I’ll have the actual process flow charted for you guys well enough for me to post it, as well as a UML diagram of the CMS as a whole.

Until next time!

Why HTML Placeholders Don’t Replace HTML Labels

For the background as to why I’m writing what you’re about to read, please read this blog post. For those who don’t want to read it, the short version is that the current implementations input text placeholders, per the HTML5 spec, isn’t as good as a (slightly) different implementation via Apple’s iOS. The post has some code that implements them the iOS way instead of the normal way. The upcoming rant is not about this new implementation. It’s about the comments to this idea, and the general feeling and understanding (or lackthereof) of placeholders in the first place.

The Short Rebuttal

When a web developer/designer foregos the use of labels in support of using placeholders instead, they’re being dumb. And wrong. And a bunch of other things I won’t list, but do follow the same general pattern.

Some History

The placeholder attribute was “created” many many moons (ok, so a few years) ago by people who were concerned about how a user interface looks without a lot of space available. The general idea behind it was to remove the need for labels around an input box to allow for a more compact and minimalistic experience. The tech behind this idea said, effectively: Put the label inside the text element, instead of around it. Then when a user clicks on the input box, the text clears so that it’s ready for user input. That way users still know what’s being asked of them, and are then allowed to provide input when ready. The great irony to this is that the best market that would benefit from this kind of technology (the mobile market), doesn’t implement it natively.

After a few years with implementing this tech in HTML4/XHTML1 and Javascript (and CSS for those who wanted special stylings for this new fandangled placeholder thingymabob), the W3C HTML5 committee added the placeholder attribute to the HTML5 specifications. This would soon be adopted by all major browsers, and backwards compatibility support for older browsers was soon built into most major Javascript libraries (such as jQuery, as a plugin).

The Problem

As I read through the comments on that blog post, I was struck by a seeming dogma-like attitude that placeholders are replacements for input labels. Whether that’s actually there or not is up for debate, but we’ll move on to my actual rant in the mean time.

I do not, and will not claim to be a master in knowledge of the HTML5 spec, nor in UI Design (as that’s more a taste thing that varies human by human), but I will claim to know how to freaking read. The HTML5 spec clearly states that placeholder attributes are not meant to replace input labels. Now, this would seem to be counter-intuitive to the originally designed purpose of placeholders to save screen space. So what happened?

It seems to me that a lot of designers see the HTML5 spec like Christians see the Bible. There are parts of it they like, parts of it they don’t, and will choose to ignore the parts they don’t (or at least acknowledge them being there, and just not practice the parts they disagree with). That’s where problem number 1 arises.

The HTML5 spec is not a ‘bible’

It’s a specification. In any real-world work scenario, if you don’t follow specifications to the ‘T’, as it were, then you would get ready to eat a lot of shit from your co-workers and management because that means the customer didn’t get the functionality or look that they wanted. It’s not your job to tell the customers they’re wrong, it’s your job to make them figure out that they’re wrong on their own terms (this is usually done by charging a lot more for people asking you to do stupid things such as use iFrames instead of Ajax or somesuch. It’s not that you won’t implement iFrames, it’s just that the maintenance for them is so high that you have to make it worth your while).

In the case of web development (and in some aspects, web design as well), the customer is giving you two specs, one explicit, one implicit. The explicit one are the design specs that they give you. They want a logo this color, they want headers in this font, they want a background with this design, etc. The implicit one is the HTML5 specification. The customer may not realize it, but as a web developer/designer it is your duty to implement their spec following as closely as possible to the HTML5 specification. The benefit of that is that (in general practice now, thankfully), properly implemented HTML is rendered nearly the same cross browser. This isn’t to say there aren’t tweaks to be made per browser, but by following the HTML5 spec you’re doing yourself (or whomever is implementing your design) a big favour in keeping the implementation as standard as possible, and therefore allowing for less problems in cross-browser compatibility.

This is where problem number 2 arises:

You’re forgetting something

The fanciest design and the coolest graphics won’t save you from dumb (rednecks), slow (elderly or mentally-disabled), or handicapped (blind, and other disabled) people from visiting your site. Forgetting to have labels for your input tags is nothing more than plain negligence. I will quote a comment (#5, to be specific) from the above-linked blog to prove my point:

I have actually found myself tabbing to another input & back to see the placeholder again, so I also like this implementation.

This also implies something else: That people don’t read what they see on a web page, they scan, and in doing so miss some details. This is a perfect case for why we should not forego the use of label tags around input (which it seems a lot of people do now), but instead use them as they were originally meant to be used: to tell people what to put into a specific input field. That way screen-readers and other slow (and optionally, fast) people don’t get confused about what’s being required.

Remember, the best thing to aim for is the simplest thing you can get. By this I don’t mean simplest to implement, as you can refactor code in any number of ways to keep it simple and easy, but to keep the user’s job simple and clear. I’ll emphasize this with a story.

I was contracted to build a website at the turn of the year. It was a private website only to be used by a small handful of people so it didn’t need to be flashy or anything like that, just able to get the job done. I will admit that when I first built the account creation layout for the website, I used placeholders instead of labels. I found out that this was the wrong way to go about that, when someone used a security key that was given to them for unique identification in the system as their username. Needless to say, that wasn’t exactly how it was supposed to be used. I was astounded at this fact. I thought to myself: “This can’t be possible! It said ‘Username goes here’ very… clear…ly… … … Crap.”

I visited the page, and thought to myself: as someone who doesn’t know what to expect from this site, what does this page tell me? I realized, it didn’t tell me very much. It gave me a few text boxes, whose text would disappear when I clicked on them, and in general it was fairly hard to figure out what was going where and why.

Then I put in labels for the form fields.

The difference was amazing. I showed the old page to a friend who had never seen the site before (they knew I had been working on it, but that was about it), and I asked them to tell me what they saw when the page first loaded: it wasn’t much. They said “well the password goes there, and something called the ‘security-key’ goes there,” and then came the dreaded question, “but what goes in that first one again?”

“I’ll show you,” I responded.

I sent them a link to the fixed web page (you can consider this hallway A/B testing if you want to put technical terms to it), and said “What about now?” They said the difference was simple and clear. It was easy to tell what went were, and if they clicked on an input then it was significantly harder to forget what was supposed to go in there as the text explaining this never went away.

Conclusion

I will forgive the blog writer for his code which doesn’t include labels, as while writing code for blogs you do cut corners because it’s more used to show an idea instead of being proper in every aspect. Also, his demo of the implementation uses labels so it’s ok. He also mentions in comment #16 (in response to #15) that the use of labels and placeholders are not mutually exclusive.

But for everyone else who thinks that using placeholders instead of labels is a good thing: Shame on you. You should know better. We’re not out here to make the web pretty, we’re here to make the web better. Now lets get to it, shall we?

User Handling, part 1

Intro

At the end of my last post, we had just finished authenticating that a user registering for your site was in fact, a human. Now we’re gonna go a step further and discuss registering them, and creating a simple login system that uses form-based authentication, or more specifically, a form-based authentication system that uses html and http.

Registering

Setup

Registering a user for your site means that you gather various pieces of information about your user, and store it in your system, for later access. For the most simplest of registrations, you’ll need at least a username (handle), and a password. Optional information that you could want would be anything from an email address, to a location in the world (city, state, country), to a specification of gender and ethnicity. It’s up to you and how you build your community, and what’s important to keep track of. So based on that, you’ll have a form based off of the one below:

<form method="post" action="register.php">
	<input type="text" name="username" />
	<input type="password" name="password />
	<!-- Any various more information you want here -->
	<input type="submit" value="Register!" />
</form>

You’ll notice that we’re sending things server side, using the “POST” method defined in the http. This is generally the proper way as defined in the protocol, as POSTing something is for saving data on the server, while GETing something is for retrieving data. This is something we’re going to have to take note of, while writing our registration script.

Saving

In PHP, if you run var_dump on $_POST after a form submission, you’ll see that it’s an associative array, that looks something like this:

array(2) { ["username"]=> string(5) "fubar" ["password"]=> string(6) "abc123" }

So to retrieve those values, we simply do this:

< ?php
$username = $_POST['username'];
$password = $_POST['password'];
?>

From there, we create a new user in the database, by inserting all the data about them that we’ve gathered.

< ?php
mysql_connect( "host", "username", "password" );
mysql_select_db( "database" );
mysql_query( "INSERT INTO `users` VALUES( NULL, '$username', '$password' )" );
?>

Side Note: The “NULL” in that insert statement is for the user_id in the database, i.e. a unique identifier for each user. In MySQL that’d be set up as an auto-increment primary key.

Logging In

Setup

After the registration form, logging a user in is quite simple. It takes a form, like the one above,

<form method="post" action="login.php">
	<input type="text" name="username" />
	<input type="password" name="password" />
	<input type="submit" value="Login" />
</form>

Contrary to the registration form, however, we only need the username and the password, not everything else, in order to log in.

Logging in

The act of logging in is simple. You get the password from the form, and compare it to the one in the database, that is linked to the username.

< ?php
 
$username = $_POST['username'];
$password = $_POST['password'];
 
mysql_connect( "host", "user", "password" );
mysql_select_db( "database" );
$query = mysql_query( "SELECT * FROM `users` WHERE `username` = '$username'" );
$row = mysql_fetch_assoc( $query );
if( $row['password'] == $password )
	setcookie( "<sitename>_login", $username );
?>

Staying in

Cookies

This is one of two ways, and is the easier and cleaner way to take care of this. When a user logs in, you set a cookie on his browser, preferably something with a unique name. In my example, I use “_login”, so if you’re sitename was “example.com”, you could name your cookie “example_login”. It’s up to you. In my ShiftedBits framework, I have the cookies set to be prepended with “sb_” just to keep them unique.

This works in the same way a login works, only without the influence of a user. You get the cookie value, check to see if it’s correct, (in this case, if the username value given exists in the database), and then base your user’s experience off of that.

Sessions

The is the second way, and a teensy bit harder to deal with, especially if you don’t have files to handle boiler-plate stuff like this for you. The way to initialize a session is by calling session_start at the beginning of EVERY SCRIPT that displays different output depending on who’s viewing it.

< ?php
session_start();
?>

That function call HAS to come before any output at all ever. Period. Then, instead of setting a cookie, you set a $_SESSION variable.

< ?php
$_SESSION['username'] = $username;
?>

Then, for every link that the user clicks through, to keep the session alive you need to add what’s called the session id to the end of it. What happens is that when session_start is called, it checks the query string of the page for “PHPSESSID”. If it exists, it uses that ID to continue the session. So, to make sure that the session id is stapled onto each link we output, we add the SID constant to the end of each url:

<a href="<?php echo "random_file.php?" . SID ?>">Link</a>

Getting Out

Cookies

Most cookies last for as long as the browser is open, and sometimes that can last a while. Others, depending on the parameters given to setcookie can last longer than the browser session; days, months even. Sometimes a year but you don’t see that often. So, to counteract this, you set the cookie’s time limit to sometime in the past.

< ?php
setcookie( "name", "", time() - 3600 );
?>

Setting it to an empty string is just a precaution we can take as well to make sure that it doesn’t exist.

Sessions

Sessions are a little bit trickier to handle as there’s TWO things that need to be reset, yet when I say “a little bit”, I seriously mean “a little bit”.

< ?php
session_start();
$_SESSION = array( );
session_destroy();
?>

First off, we have to start up the session otherwise there’d be nothing to destroy. We then set the super-global $_SESSION to an empty array, effectively getting rid of whatever value we had in there. Last but not least, we call session_destroy to kill the session for us.

Summary

This post has gone over three different things, all relatively simple. User registration, user login, and keeping the user logged in. They shouldn’t be that hard of a set of concepts to grasp, as you have to do this for nearly every website you visit (facebook, myspace, etc). The whole point of this post was to set down a summary of what needs to happen to allow users to use your site at a different level than lurkers. Again, if you your site is to just showcase things that you’ve made (like an image gallery), then there’s no need to have this.

Also of note, is that the methods described here are far and beyond not-secure. If you’re site is popular enough, and user security has to be on the call list, then wait for my next post. Part 2 will combine everything we learned here with various secure ways to authenticate a user that browsers use when given a page that requires a username/password to view.

Regulating Real Registrations

It is not all that uncommon for various people to get users registering for whichever website they run. Sometimes it’s one every couple of days or weeks, or sometimes multiple registrations per minute. The one thing the entire spectrum has to deal with though, is making sure the user is a person, someone who types in the name, password, and extra details. Granted if the “hacker” is rich enough, they can get other real people to log in and start spamming links or try to be a scam artist, asking for someone’s password and all that other good stuff. Here are a few unobtrusive ways to help prevent these things from happening.

(Note: Any Javascript code, is on purpose using the jQuery Javascript Library)

Making Sure They’re Real

The most common and popular way to check for this, is to use what’s called a CAPTCHA. That, however can be bypassed. There are a lot of new versions of it coming out that try and fool bots using new and interesting transformation techniques. The most popular so far is called ReCaptcha. This is fair well and good, but I feel there are better ways to prove humanity, things that I’ve seen on other sites, some only in demonstration.

Pop Quiz

The pop-quiz method (as I call it) is a simple little test asking the human a basic math question. Such as “What is 4 + 3?” The numbers of course are randomized and javascript is used to check if the answer is correct.

num1 = Math.floor((Math.random() * 10) + 1);
num2 = Math.floor((Math.random() * 10) + 1);
sum = num1 + num2;
 
$("#s_num1").text( num1 );
$("#s_num2").text( num2 );
 
$("#f_register").submit(function(){
	if( $("#i_sum").val() != sum ){
		alert( "You're not human!" );
		return false;
	}
});

It is assumed in the above piece of code, that the answer the user puts in, is contained in an input box with the id of “i_sum”. The two values to be added are in the span tags with id’s “s_num1″, and “s_num2″ respectively. It checks to see if the value in i_sum is the same as the calculated sum, and if so, it lets the form submit, otherwise it pops up an alert error (or does whatever to notify the user that they got the answer wrong and are presumed to be in-human) and returns false. The “return false;” statement is crucial, as that’s what prevents the form from actually submitting.

Focal Blur

The focal blur (again, my own naming) scheme detects whether or not a user clicked on, or used any or all the elements within the form. Meaning, as long as the user causes the focus event to fire, then they are actively clicking on an element, and then making it blur as they leave it.

$("#f_register").submit(function(){
	if( $("#h_focusTest").val() != "true" && $("#h_blurTest").val() != "true" ){
		alert( "You're not human!");
		return false;
	}
});
 
$("#i_username").focus(function(){
	$("#h_focusTest").val( "true" );
});
$("#i_username").blur(function(){
	$("#h_blurTest").val( "true" );
});

This check banks on the fact that some (if not most bots that I know of) only analyze the html of the page, fill out all the form elements (without triggering either focus or blur event) and then submit it. Some bots, however, are smarter than that. These bots analyze the html, but send a POST request to the server directly with the information. Lucky for us, we can also double check this server side.

< ?php
if( $_POST['h_focusTest'] != "true" && $_POST['h_blurTest'] != "true" )
	//Say screw it, and die mercilessly.
	die( "You faker..." );
?>

Beat the Clock

Well, not really. This method is, as the name may allude to, a timing verification, but instead of beating a clock, you have to lose to it. See, it takes time to type in all that information (some reg forms are worse than others), so if the bot has the ability to actually trigger the focus and blur events, then it’s not gonna take long to fill out the form; at least, not as long as a normal human would. So, say if given a regular user reg form: username, password, and email, a normal human would take maybe 4 to 6 seconds depending on how good they are with the keyboard to get through it all and hit the enter key. If you time it, and the time for the first focus event to the submit action being fired is less then that, then you have one of two things on your hands: A user with mad keyboard skills, or a bot.

function micro() {
	d = new Date();
	return	Date.UTC(
				d.getFullYear(), d.getMonth(), d.getDate(), d.getHours(), 
				d.getMinutes(), d.getSeconds(), d.getMilliseconds()
			);
}
//========
startTime = null;
$("#f_register").submit(function(){
	finishTime = micro();
	total = finishTime - startTime;
	alert( total );
	if( total < 2500 ){
		alert( "Time: You're not human!" );
		return false;
	}
	$("#h_time").val( total );
});
 
$("#i_username").focus(function(){
	if( startTime == null )
		startTime = micro();
});

The function micro() returns the javascript near equivalent to php’s microtime( true ); function call. So, the result is in microseconds (meaning, 1 second == 1000, 2 seconds == 2000, etc.) I have my limit set to 2500 (or 2.5 seconds) because for my tests I averaged about 3.5 or 4 seconds, However I understand that for a username, password, and email with no math question, especially with small values in each field doesn’t take that long to fill out, especially if you’re good at typing. 2000 is also a good limit to have, especially if you don’t have the math question to prove humanity.

Limitations

As I’ve mentioned in previous blog posts, everything is hackable. And this also applies to the three methods above, through the use of one simple tool: Another human. Some people get paid to register and log into sites, and then spam the community there, or do other work that is detrimental to the community (e.g. Steal someone’s account on gaia online and take all of their items, etc.) However, if you require the human touch, then you’re important, so give yourself a pat on the back for being important and then ban the user.

In summary there are ways outside of a CAPTCHA to validate a user. So don’t depend on just that one thing to do the work for you. Concerning the onfocus javascript event, there’s many, many, many more ways you can use this to prove that a user is human (forcing them to click on a colored square for instance) so use your imagination and let it go wild. However, as also another downfall with CAPTCHA’s, sometimes they’re unusable by the user because they can’t read the warped message. With that in mind, make sure that your users (who you should expect to be VERY dumb indeed) can solve it easily, or better yet don’t have to do any extra work at all, with the verification being behind the scenes (as with the FocalBlur and Beat the Clock types).

Remember, your website is owned by your users. They are the ones that use it. Make sure they can do so, and at the same time make sure they enjoy doing so, above all else.

Star Wars offshoot as a programming concept?

A couple days ago I was working on some homework for one of my programming classes. We’re learning to program Android OS phones using Java and this particular exercise had us learning how to interact with a SQLite database. Looking through the tutorial of this there was mentioning of “giving basic CRUD functionality” to the system through this Database Adapter that had been written for us.

When I first read that line I thought, “CRUD? What the heck is that?”
Well, dear friends, the answer is that CRUD is a check list of features a database must implement to be fully considered to be a database software, or any other form of “persistent data storage“. It stands for “Create, Read, Update, Delete”, and it stands as a list of abilities that a database requires to be fully functional.

Now if you look down the list of alternate abbreviations for crud, if you’re a star wars geek, you’ll see a VADE(R) listing. This, in and of itself, stands for “View, Add, Delete, Edit, (Restore)”. This particular flavour, with the addition of the Restore on the end, is something used for transactional databases, or databases that keep a log of actions for a specified amount of time, or a specific command is given, and allows you to undo any changes you make before making them permanent on the database.

My theory is, what if we didn’t apply this to just databases and persistent storage. What if we used this as a programming paradigm to interact with data in the first place?

If you really break it down, almost everything on the web has to deal with persistent storage (through databases or flat files or whatever) in one way or another. Most ideal target? Service sites like youtube, or photobucket, or even deviantArt. Or heck, ANY forum that you’ve been on in the past 14 years, including email and google searches. All of these different technologies deal with persistent storage in one way or another.

Now, the likely hood of this already being in place, whether the designer knew about it or not is a completely different story. The point I’m trying to make is that when it comes to using data (viewing, manipulating, etc) there’s a certain way that we should deal with it, and for me personally CRUD is just too broad and too overbearing.

I prefer VADERPS. Why? Because it keeps the separate functions, well, separate, and small. And I like small, because small is fast, small is easy, and small is awesome. Though, what does VADERPS stand for? Easy. I’m sure you’ve noticed by now that it’s based off of VADER, so that part is easy. View, Add, Delete, Edit, Restore. The P stands for Purge, and is based off of Delete in that Delete deals with one, while Purge is optimized for multiple deletes at a time. That S is also more or less based off of View. The S stands for Search and like Purge instead of dealing with one, it deals with many. In all, you have View, Add, Delete, Edit, Restore, Purge, and Search.

That specific separation between View and Search, and thusly Delete and Purge, is a distinction I feel like I have to make. As View deals with just one object, Search deals with many, and CRUD combined the two into “Read” and I’m sure a bunch of others in that list do the same thing, or word the acronym in such a way as to support one or the other such as ABCD and it’s Browse idea. The closest one on that list that comes close to my idea is VEDAS (view, edit, delete, add, search) but the restore functionality (and mass deletion) is a nice trick too, you know, though the distinction between mass delete and single delete has a different reason behind it.

I chose to separate out Delete and Purge because if you think about it, if you do a mass deletion (such as say all of March of 2007 on your blog or whatever) then having a function that deletes all of those one at a time is a bad idea. To put this into perspective it’s like looping through the lines of a file on linux manually to find the right line you’re looking for, when you could have easily done it using a grep command. Grep loops through the file for you, and does it at a much more efficient rate than any form of looping and file handling you could have come up with did. In this case, looping through the file on your own is like using delete one at a time per blog id that you want to delete, whereas using grep is like the purge, where it does it for you at a much faster rate because it takes out the middle man. In straight up SQL it’s the difference between calling this a bunch of times:

DELETE FROM TABLE WHERE id=x;

and doing this:

DELETE FROM TABLE WHERE id IN (x, y, z);

The overhead of having PHP loop through making that first call for you for x, y, and z is a HECK of a lot when compared to just having MySQL do it for you efficiently.

I bring this up for a reason. I’ve been through many a site that doesn’t allow a user to edit comments that they’ve already made; or I’ve been through some form of code on a website that doesn’t allow a programmer to easily say “This is the summary, here’s the detail.” And on no site what so ever have I ever seen functionality of “Ok, we’ll take this out, but won’t make it permanent for the next 12 or 24 hours just in case it was a bad idea and a case of the ID10T virus”, when just keeping the data in a temporary state of “garbage collection” status would have been the simplest way to fix the problem*.

So here’s what I propose, in quick format:
VADERPS:
View – One item at a time, usually linked to by a Search, contains all details necessary about that item.
Add – Name says it all. Adds an item to the system, depending on the item type and data required.
Delete – Deletes one thing, and one thing only. Optimized for deleting said one thing.
Edit – Edits an item. It’s only logical to have to edit one thing at a time and so no “Multi-edit” functionality needed… Yet.
Restore – Means keeping an item in a state of purgatory, just in case the user hit the wrong button.
Search – Optimized for viewing multiple items at once, contains a summary of details about all items.

In all, the fact that there’s a clearer distinction between different /types/ of data sets (individual vs. multiple) will make things simpler in the long run design-wise and faster optimization-wise as keeping things simple and stupid is the way to go.

*Note: I actually talked to the CEO of a company who had to revoke the admin priviledges of a client he had using his program, because even after /three/ “Are you sure you want to delete this?” messages, still deleted things off of the system that really shouldn’t have been deleted in the first place. In this case it’s better to have things be transactional, and keep them in a temporary purgatory status for 24/48/72 hours or whatever so that they appear deleted but are easy to recover just in case.

The ADFGVX Cipher

The first in a series of many blog posts I hope to be making as I go on an adventure of learning Cryptography through Programming. Today’s entry is about the ADFGVX cipher, used in World War 1, and first used by me because it was the first on the list.

History

The ADFGVX cipher was invented and used by the Germans during World War 1. It is considered a Fractionating Transposition Cipher, and is in fact based off an earlier cipher, invented by Col. Fritz Nebel in 1918 called ADFGX. The V was added to allow for the use of the full alphabet, and the numbers 0-9, instead of combining the letters i and j. The letters A, D, F, G, V, and X were chosen because of how dissimilar they sounded using Morse code, to make it easier to distinguish the various parts of the message.

The cipher was broken on April 5th, 1918, using complex cryptanalysis algorithms by a French Army Lieutenant, Georges Painvin. This is generally attributed to the French Army’s stop to the German’s Spring Offensive of 1918, as the code was broken a few weeks after wards and thus told the French about the Germans’ plans to attack Ludendorff, however looking at the dates, by April 5th, the attack had already petered out and so this claim is generally regarded as false.

Today the ADFGVX cipher is considered technologically and cryptographically insecure, however it’s security can be increased (as with almost any other cipher) by taking the output and sending it through multiple other ciphers as desired. It still will not make it unbreakable, but it will take longer to break, which may be key enough to get a distinct advantage over whoever is trying to break your code.

How It Works

A predetermined, random, alphabet of letters is given (referred to as a Mixed Alphabet, in the case of this program, it is generally:

"MBJYA,Z(?PS.)NX; DURITW:!HGLFECVO'KQ"

as the random shuffle used by the computer is seeded the same way each time the program runs. This alphabet is then transposed into a Polybius Square, using the letters A, D, F, G, V, X as coordinates for the square.


  A D F G V X
A M B J Y A ,
D Z ( ? P S .
F ) N X ; D
G U R I T W :
V ! H G L F E
X C V O ' K Q

We then acquire what is called a plaintext (the message to be enciphered), and a transposition key (a collection of non-repeating letters to help obfuscate the message).

With the plaintext, we go through each letter, and find it’s respective coordinates on the polybius square. So, if the plaintext were:

HELLO, WORLD.

Going through it would generate the string:

DVXVGVGVFXXAVFVGFXDGGVXFXD

Then, we take that string, and then start listing it under the key, going across, before going down.

The key in this case is: “FUBAR”

F U B A R
D V X V G
V G V F X
X A V F V
G F X D G
G V X F X
D

We then take the key word, and organize the letters in alphabetic order, keeping the related values under it’s correlated letter:


A B F R U
V X D G V
F V V X G
F V X V A
D X G G F
F X G X V
    D

Then we take the resulting string order, and translate it into a ciphertext, separating each bit with a space, that looks like this:

VFFDF XVVXX DVXGGD GXVGX VGAFV

And of course, to decode it you reverse the process.

The Code

If you’re still here, this is probably why you’re here in the first place. I wrote the program in C++ using Microsoft’s Visual Studio (Professional Edition). You can find the zipped version of the solution here if you wish to look at it. However if you don’t wanna go through all that trouble, here’s the encode and decode functions for your viewing pleasure.

Encode Function:

/**
 * Encodes a plaintext message using the given key, and polybius square
 */
void	ADFGVX::encode( )
{
	//Local Variables to help encoding.
	string	coordinates = "";
 
	//Create polybius square coordinates for each letter in the phrase.
	for( unsigned int i = 0; i < plaintext.length(); i++ )
	{
		char c = plaintext.at( i );
		for( int y = 0; y < 6; y++ )
		{
			for( int x = 0; x < 6; x++ )
			{
				if( polybius.at(y).at(x) == c )
				{
					coordinates.append( 1, squaremap.at( x ) );
					coordinates.append( 1, squaremap.at( y ) );
				}
			}
		}
	}
 
	//Create Sorted key character, and index pairs.
	//Maps are auto sorted, so we need the index as the second half of the
	//pair to make a successful mapping.
	map< char, int > m;
	for( unsigned int i = 0; i < key.length(); i++ )
	{
		pair<char, int> p( key.at( i ), i );
		m.insert( p );
	}
 
	//Preparing the table to be filled with "coordinates".
	vector< string > v;
	for( unsigned int i = 0; i < key.length(); i++ )
		v.push_back( "" );
 
	//Fill it with coordinates
	for( unsigned int i = 0; i < coordinates.length(); i++ )
		v.at( i % key.length() ).append( 1, coordinates.at( i ) );
 
	//Generate The Cipher Text
	ciphertext = ""; //Clear it out first, in case there's one already there.
	map< char, int >::iterator k; //Iterator through the key map.
 
	//Loop through the map, using the vector to get the proper values to append
	//to the cipher text.
	for( k = m.begin(); k != m.end(); k++ )
	{
		ciphertext.append( v.at( k->second ) );
		ciphertext.append( " " );
	}
}

Decode Function:

/**
 * Decodes a cipher text using the given key, and polybius square.
 */
void	ADFGVX::decode( )
{
	//Temporary variables to help with the decoding.
	vector< string > v, v1;
	string	s;
 
	//Separate each piece into bits in the vector to un-map.
	for( unsigned int i = 0; i < ciphertext.length(); i++ )
	{
		char c = ciphertext.at( i );
		//If we hit a space, we're at the end of a piece, so push it onto the vector.
		if( c == ' ' )
		{
			v.push_back( s );
			s = "";
		}
		else
		{
			s.append( 1, ciphertext.at( i ) );
		}
	}
	//Since the end isn't blocked by a space, we need to push it
	//as it wasn't detected.
	v.push_back( s );
 
	//Re-map the key to get the coordinates in their proper places.
	map< char, int > m;
	for( unsigned int i = 0; i < key.length(); i++ )
	{
		pair<char, int> p( key.at( i ), i );
		m.insert( p );
	}
 
	//Go through the key to get the correct mappings.
	map< char, int >::iterator k;
	for( unsigned int i = 0; i < key.length(); i++ )
	{
		//k is an iterator, and we can't get it's relative position while 
		//looping through, so we have j to keep track of that for us.
		int j = 0; 
		for( k = m.begin(); k != m.end(); k++ )
		{
			if( k->first == key.at( i ) )
				v1.push_back( v.at( j ) );
			j++;
		}
	}
 
	//Peel off the letters to recreate the non-mapped version
	//of the cipher text.
	s = "";
	for( unsigned int i = 0; i < ciphertext.length(); i++ )
	{
		int location = i % v.size();
		string at = v1.at( location );
 
		//This is to make sure we're not dealing with any strings that are
		//already translated and are shorter than other strings being
		//translated.
		if( at.length() > 0 )
		{
			char c = at.at( 0 );
			s.append( 1, c );
			string n_at = at.substr( 1 ); //Chop off the first character
			v1.at( location ) = n_at; //And reset the string to the new value.
		}
	}
 
	//Decode the resulting text from the polybius square.
	plaintext = "";
	for( unsigned int i = 0; i < s.length(); i += 2 )
	{
		int x, y;
		char cx = s.at( i );
		char cy = s.at( i + 1 );
 
		//We need the numerical positions of the coordinate letters.
		for( int k = 0; k < 6; k++ )
			if( cx == squaremap.at( k ) )
				x = k;
		for( int k = 0; k < 6; k++ )
			if( cy == squaremap.at( k ) )
				y = k;
 
		plaintext.append( 1, polybius.at(y).at(x) );
	}
}

ore-may ig-pay atenizing-lay

I got bored. So sue me. Redone in C++ (correctly this time, and probably more efficient than this or this.)

#include <iostream>
#include <string>
#include <vector>
#include <cctype>
 
using namespace std;
 
bool	isPunctuation( char c );
string	latinize( string s );
bool	isVowel( char c );
 
int main( )
{
	string input, output, temp;
	vector<string> words;
	vector<char> punctuation;
	cout< < "Enter a phrase you would like to latenize" << endl
		<< "-> ";
	getline( cin, input );
 
	for( int i = 0; i < input.length(); i++ )
	{
		char c = tolower( input.at( i ) );
		if( isPunctuation( c ) )
		{
			words.push_back( temp );
			punctuation.push_back( c );
			temp.clear();
		}
		else
			temp.append( 1, c );
	}
 
	if( !temp.empty() )
	{
		words.push_back( temp );
		punctuation.push_back( ' ' );
	}
 
	for( int i = 0; i < words.size(); i++ )
	{
		string l = latinize( words.at( i ) );
		output.append( l );
		output.append( 1, punctuation.at( i ) );
	}
 
	cout<< output << endl;
 
	return 0;
}
 
bool	isPunctuation( char c )
{
	char punctuation[33] = {
		',', '.', '/', '<', '>', '?', ';', '\'', ':', '"', '[', ']', '\\', 
		'{', '}', '|', '`', '-', '=', '~', '!', '@', '#', '$', '%', '^', 
		'&', '*', '(', ')', '_', '+', ' '
	};
 
	for( int i = 0; i < 33; i++ )
		if( c == punctuation[i] )
			return true;
 
	return false;
}
 
string latinize( string s )
{
	string	postfix, temp;
	for( int i = 0; i < s.length(); i++ )
	{
		char c = s.at( i );
		if( isVowel( c ) )
		{
			postfix.append( "ay" );
			temp = s.substr( i );
			temp.append( "-" );
			temp.append( postfix );
			return temp;
		}
		else
			postfix.append( 1, c );
	}
}
 
bool	isVowel( char c )
{
	char	vowels[5] = { 'a', 'e', 'i', 'o', 'u' };
	for( int i = 0; i < 5; i++ )
		if( c == vowels[i] )
			return true;
	return false;
}

© 2013 - Sitemap - Privacy Policy