How do you secure email?

Wait, didn’t we already cover encrypted email in an earlier post? We did. But encrypted email is not secure email. Encryption solves but part of the secure email conundrum.

An Identity Crisis

Toby, Toby, a Cairn Terrier of Distinction, dutifully encrypts all the email he sends his pal Violet. Violet is thrilled to receive email that she believes isn’t junk, but her joy is short lived. Toby keeps wanting her assistance in moving $1,000,000 through various Caribbean bank accounts! Or insists that she claim the $5,000,000 that a newly discovered uncle, Indian Buffet, has left her. What is going on?

There is mischief afoot. For Attila the Hound has also learnt how to encrypt email and is masquerading as Toby. Attila pulls Violet’s public key (it is public!) from a directory, and then sends her (encrypted) email that looks like this:

From: <TOBY@caninegenius.woof>
To: <violet@mathiscool.xyz>

Subject: Funtestic Finencial Oppurchoonity!

Dear Mr. Violet,
Sir, I beseechfully writes to tell you...

[No, you can’t do this by using Outlook. I think. But you can with Notepad and a few of lines of code].

SMTP Servers (tuned and debugged for nearly 30 years) use the venerable SMTP protocol to reliably push billions of email messages around the planet 24 hours a day. SMTP is simple, insanely successful, eminently spoof-able and rather insecure. SMTP has no concept of verifiable sender identity. Attila the Hound can send Violet email (encrypted or not) claiming to be Iron Man and she has no way of knowing him from Robert Downey Jr. Encryption keeps your email from prying eyes, but it can’t save you from actors.

What are Violet and Toby to do? Attila is putting a strain on their friendship. Toby & Violet face an identity crisis.

Something you have, Something you know

Toby asks Grandma Asha for help. He regularly applies her practical everyday wisdom to difficult engineering problems.

In my experience, says Grandma, you prove or assert your identity using:

  • Something you have: Your driver’s license, your passport [token]
  • Something you know: Your social security number, your mother’s maiden name [secret]
  • Your signature.

And the lights blaze in Toby’s ingenious head.

The Digital Signature

To prove and assert his identity, Toby uses a blend of Grandma’s suggestions.

First, he creates a (public, private) key pair (see previous post for an overview of key pairs and asymmetric encryption). The private key is a secret that only Toby knows (and has). If he can prove to Violet that he knows this secret, he can prove to her that he is Toby!

Toby demonstrates his knowledge of his private key by using it to encrypt data both he and Violet have access to – the email he is about to send her. If Violet successfully decrypts the email using Toby’s public key, then Violet knows that Toby must have encrypted the email. This is because the only data Violet can decrypt using Toby’s public key – is data encrypted using Toby’s private key! Asymmetric encryption is genius.

But wait, isn’t asymmetric encryption slow? Not a problem, growls the canine cryptographer.

  • Toby creates a cryptographic hash or digital fingerprint of his email.
  • He encrypts the hash with his private key. This will prove to Violet that the email is really from him. Encrypting the practically unique digital fingerprint of the email is as good as encrypting the email itself.
  • He attaches the encrypted hash to the email.
  • He names his creation a digital signature.
  • Toby has signed the email with his private key.

Violet verifies Toby’s identity by verifying his digital signature:

  • Violet creates a cryptographic hash of the email she receives.
  • She decrypts Toby’s digital signature using his public key. This gives her Toby’s version of the hash.
  • She compares her version of the hash with Toby’s
  • If the two match, she can confidently state that:
    • Toby sent her the email
    • Nobody tampered with or altered the email after Toby signed it. If they had, her version of the cryptographic hash  – the digital fingerprint – would be different from what came in the digital signature.

But where does Violet get Toby’s public key? Violet could look it up in a directory, but does not have to. The performance conscious Toby saves her the extra round trip by sending his public key along with the email itself. Public keys are designed for broad dissemination, so this is safe.

How do you send secure email?

To send secure email, you:

  • Sign it with your private key [so the recipient knows you sent it, and nobody else tampered with it]
  • THEN encrypt it with the recipient’s public key [so nobody but the recipient can read it].

And you are done, right? Wrong.

Spoofing Public Keys

For the cunning Attila can also generate his own (public, private) key pair. He uses this pair to continue pretending that he is Toby:

  • Like before, Attila creates an email that claims to be from Toby.
  • He signs the email with his (Attila’s) private key
  • Then he attaches his (Attila’s) public key to the email

Violet receives Attila’s email and runs through her validation procedure. As Attila expected, everything checks out. The digital signature matches! Violet accepts Attila’s email as what it claims to be – an email from her pal Toby .

Then, Attila’s mentor, Prof. Moriarty, joins the fun. Moriarty figures out that he can intercept Toby’s emails to Violet, but is frustrated because they are encrypted. So, the wily Professor hacks into the public directory that hosts Violet’s public key, and replaces Violet’s public key with his own. Toby is none the wiser as he downloads what he believes to be Violet’s public key. He encrypts email he is sure is for Violet’s eyes only, but will in reality be read by Prof. Moriarty.

Prof. Moriarty reads Toby’s insightful commentary on support vector machines with great interest. Then he re-encrypts the email using Violet’s public key (which he has kept), and forwards it to Violet.

And so we arrive at our next conundrum:

  • How does Violet know that the public key she used to verify Toby’s digital signature on his email– is really Toby’s?
  • How does Toby know that the public key he used to encrypt his email to Violet – is really Violet’s?

Anybody can generate a public, private key pair. Directories can be hacked and spoofed.

In this cruel, untrusting world, who attests that a proffered public key is the genuine public half of a subject’s (public,private key) pair? Who do you trust? How do you trust?

Unfortunately, tonight’s episode must end on that cliffhanging note. Tune in next time for the exciting tale of two X509 Certificates.

To be continued….

What is a cryptographic hash?

The other day, I was in a meeting where somebody said, “….and then you take a SHA-256 hash of the document, which is unique…”.

Not quite. It would be more accurate to say practically unique.

Cryptographic hash functions are astounding. They take arbitrary binary data: document, image, movie, message, bytes.. and crunch over every bit to produce short, fixed length summary called a hash value or digest. E.g. The SHA1 hash function creates a 160 bit digest out of any source input, no matter what its size or content. Cryptographic digests have some very important properties.

Say you create a cryptographic digest of a document. You will find it practically impossible to:

  • Find or create a second document (or any other data) that will produce the same digest
  • Change or tamper the source document – even a single bit – without also altering the digest .
  • Reverse engineer the document from the digest – i.e. by hashing randomly generated documents until you find one that has a matching digest.

These properties make the digest unique for all practical purposes. You can take any binary data and derive a big number that represents that data and that data alone. The digest serves as a digital fingerprint for the data. This property makes cryptographic digests the basis of Digital Signatures and their close cousins, HMACs (we’ll cover both in upcoming posts).

But the fact remains: the cryptographic hash is not actually unique. Where there is a hash value, there will always be a collision: two or more arbitrary pieces of data that reduce down to the same digest.

Collisions

Why are collisions inevitable? Most of you know how a hash table works. If you don’t, then consider the following : Say you were given 5 balls and asked to place them in 3 buckets. It doesn’t take an engineer to realize that you must put at least 2 balls in buckets that already contain at least 1 other ball. Collisions!

The famous SHA-1 function, the one time champion of cryptographic hashing, produces 160 bit hash values. 160 bits represents is 2^160 (2 raised to 160) possible unique values. 2^160 is a very big number: approx. 1.46 x 10^48 – i.e. 48 zeros. By comparison, the Earth has an estimated 1.33 x 10^50 atoms.

The number of possible inputs (balls) to the hash function is infinite. The number of possible hash values (buckets) is fixed. Collisions! Multiple balls will land in the same bucket. Eventually. But it may take a long while because there are so many buckets!

In fact, the laws of probability tell us that you have a 50-50 chance of getting a collision if you have as few as 2^80 inputs. Which is a smaller but still scarily big number. Why? For the same reason that 23 random people have a 50-50 chance of sharing the same birthday (but not birth year!).

Finding Collisions

So how do you go find collisions and why? The why is obvious: imagine if you found somebody who had the same fingerprint as you – unlikely though it may be. If you were of the miscreant persuasion, you might take advantage of this knowledge. The same holds for digital fingerprints. If you could tamper with or create digital data in such a way that its digital fingerprint matched (collided) with that of the “real data”, you could cause some mischief. Since the digital fingerprint of the bad data matched the one people expected from good data, they would have little reason to be suspicious.

The simplest way to find collisions– brute force –  is also the hardest – primarily because of how long it takes. You could brute force collision detection by calculating the hashes of bazillion (all) inputs using gazillions of computers and then watching a lot of TV as you… wait……for ever. Or you could alter various bits of the original input, and try computing hashes and see if any of them stick. You could do cryptanalysis – which these days is a highly sophisticated version of how the British famously hacked the German Enigma machine. There are other techniques and they all are more involved than this short paragraph may let on (you think?), but you get the general idea.

To find collisions using the hottest cryptographic hash function in town (the SHA2 family) is (currently) practically impossible. In cryptography, practically impossible means computationally infeasible. Which is a fancy way of saying that even if you used all the current computers, algorithms and known mathematics, it would take you so long to solve the problem that it wouldn’t matter any more. You could use all of Azure and Amazon EC2 to crunch your algorithms, but you would die before you succeeded, as would all of humanity and possibly the Earth too. Of course, a brainy breakthrough that exploited of a fundamental flaw in the hash function, or a quantum computing revolution might give you a fighting chance, but until then..

You could also invent some new smarty pants Math that lets you find a collision in feasible time. Cryptographers live in abject terror of computational feasibility – the Freddie Kruger of their dreams.

Avoiding Collisions

Cryptographic hash functions are painstakingly designed to reduce the probability of collisions. If you peek at the code for a hash function, you will find it replete with bit operations like xors, bitwise and/ors, shifts and rotations. They operate on each bit, shoving and pushing and twisting the data with seemingly arbitrary, but carefully chosen and massively tested steps. A software blender using every bit in the original data to make a digital smoothie, with each smoothie having a taste of its own that incorporates the flavor of everything that went in. With values distributed values more or less randomly (evenly) across all buckets. Small changes in the original triggering an avalanche of changes in the computed digest.

The mathematics behind why or how any of this works is way over my balding head. The mathematics are actually so subtle and clever that they may include hidden flaws – either mistakes or deliberate weaknesses that a clever chap may exploit at a later time. This is why cryptographic hash functions are few and far between. Rock stars that hold sway for a while even as they are taken apart by brainiacs. Until one of them discovers a weakness. And so went MD4 and MD5, SHA0. And not so long ago, SHA1 also met its fate, even though it was compromised only in theory. Cryptographers are a paranoid bunch, which is just fine with me!