How do you secure email?

Wait, didn’t we already cover encrypted email in an earlier post? We did. But encrypted email is not secure email. Encryption solves but part of the secure email conundrum.

An Identity Crisis

Toby, Toby, a Cairn Terrier of Distinction, dutifully encrypts all the email he sends his pal Violet. Violet is thrilled to receive email that she believes isn’t junk, but her joy is short lived. Toby keeps wanting her assistance in moving $1,000,000 through various Caribbean bank accounts! Or insists that she claim the $5,000,000 that a newly discovered uncle, Indian Buffet, has left her. What is going on?

There is mischief afoot. For Attila the Hound has also learnt how to encrypt email and is masquerading as Toby. Attila pulls Violet’s public key (it is public!) from a directory, and then sends her (encrypted) email that looks like this:

From: <TOBY@caninegenius.woof>
To: <violet@mathiscool.xyz>

Subject: Funtestic Finencial Oppurchoonity!

Dear Mr. Violet,
Sir, I beseechfully writes to tell you...

[No, you can’t do this by using Outlook. I think. But you can with Notepad and a few of lines of code].

SMTP Servers (tuned and debugged for nearly 30 years) use the venerable SMTP protocol to reliably push billions of email messages around the planet 24 hours a day. SMTP is simple, insanely successful, eminently spoof-able and rather insecure. SMTP has no concept of verifiable sender identity. Attila the Hound can send Violet email (encrypted or not) claiming to be Iron Man and she has no way of knowing him from Robert Downey Jr. Encryption keeps your email from prying eyes, but it can’t save you from actors.

What are Violet and Toby to do? Attila is putting a strain on their friendship. Toby & Violet face an identity crisis.

Something you have, Something you know

Toby asks Grandma Asha for help. He regularly applies her practical everyday wisdom to difficult engineering problems.

In my experience, says Grandma, you prove or assert your identity using:

  • Something you have: Your driver’s license, your passport [token]
  • Something you know: Your social security number, your mother’s maiden name [secret]
  • Your signature.

And the lights blaze in Toby’s ingenious head.

The Digital Signature

To prove and assert his identity, Toby uses a blend of Grandma’s suggestions.

First, he creates a (public, private) key pair (see previous post for an overview of key pairs and asymmetric encryption). The private key is a secret that only Toby knows (and has). If he can prove to Violet that he knows this secret, he can prove to her that he is Toby!

Toby demonstrates his knowledge of his private key by using it to encrypt data both he and Violet have access to – the email he is about to send her. If Violet successfully decrypts the email using Toby’s public key, then Violet knows that Toby must have encrypted the email. This is because the only data Violet can decrypt using Toby’s public key – is data encrypted using Toby’s private key! Asymmetric encryption is genius.

But wait, isn’t asymmetric encryption slow? Not a problem, growls the canine cryptographer.

  • Toby creates a cryptographic hash or digital fingerprint of his email.
  • He encrypts the hash with his private key. This will prove to Violet that the email is really from him. Encrypting the practically unique digital fingerprint of the email is as good as encrypting the email itself.
  • He attaches the encrypted hash to the email.
  • He names his creation a digital signature.
  • Toby has signed the email with his private key.

Violet verifies Toby’s identity by verifying his digital signature:

  • Violet creates a cryptographic hash of the email she receives.
  • She decrypts Toby’s digital signature using his public key. This gives her Toby’s version of the hash.
  • She compares her version of the hash with Toby’s
  • If the two match, she can confidently state that:
    • Toby sent her the email
    • Nobody tampered with or altered the email after Toby signed it. If they had, her version of the cryptographic hash  – the digital fingerprint – would be different from what came in the digital signature.

But where does Violet get Toby’s public key? Violet could look it up in a directory, but does not have to. The performance conscious Toby saves her the extra round trip by sending his public key along with the email itself. Public keys are designed for broad dissemination, so this is safe.

How do you send secure email?

To send secure email, you:

  • Sign it with your private key [so the recipient knows you sent it, and nobody else tampered with it]
  • THEN encrypt it with the recipient’s public key [so nobody but the recipient can read it].

And you are done, right? Wrong.

Spoofing Public Keys

For the cunning Attila can also generate his own (public, private) key pair. He uses this pair to continue pretending that he is Toby:

  • Like before, Attila creates an email that claims to be from Toby.
  • He signs the email with his (Attila’s) private key
  • Then he attaches his (Attila’s) public key to the email

Violet receives Attila’s email and runs through her validation procedure. As Attila expected, everything checks out. The digital signature matches! Violet accepts Attila’s email as what it claims to be – an email from her pal Toby .

Then, Attila’s mentor, Prof. Moriarty, joins the fun. Moriarty figures out that he can intercept Toby’s emails to Violet, but is frustrated because they are encrypted. So, the wily Professor hacks into the public directory that hosts Violet’s public key, and replaces Violet’s public key with his own. Toby is none the wiser as he downloads what he believes to be Violet’s public key. He encrypts email he is sure is for Violet’s eyes only, but will in reality be read by Prof. Moriarty.

Prof. Moriarty reads Toby’s insightful commentary on support vector machines with great interest. Then he re-encrypts the email using Violet’s public key (which he has kept), and forwards it to Violet.

And so we arrive at our next conundrum:

  • How does Violet know that the public key she used to verify Toby’s digital signature on his email– is really Toby’s?
  • How does Toby know that the public key he used to encrypt his email to Violet – is really Violet’s?

Anybody can generate a public, private key pair. Directories can be hacked and spoofed.

In this cruel, untrusting world, who attests that a proffered public key is the genuine public half of a subject’s (public,private key) pair? Who do you trust? How do you trust?

Unfortunately, tonight’s episode must end on that cliffhanging note. Tune in next time for the exciting tale of two X509 Certificates.

To be continued….

What is a cryptographic hash?

The other day, I was in a meeting where somebody said, “….and then you take a SHA-256 hash of the document, which is unique…”.

Not quite. It would be more accurate to say practically unique.

Cryptographic hash functions are astounding. They take arbitrary binary data: document, image, movie, message, bytes.. and crunch over every bit to produce short, fixed length summary called a hash value or digest. E.g. The SHA1 hash function creates a 160 bit digest out of any source input, no matter what its size or content. Cryptographic digests have some very important properties.

Say you create a cryptographic digest of a document. You will find it practically impossible to:

  • Find or create a second document (or any other data) that will produce the same digest
  • Change or tamper the source document – even a single bit – without also altering the digest .
  • Reverse engineer the document from the digest – i.e. by hashing randomly generated documents until you find one that has a matching digest.

These properties make the digest unique for all practical purposes. You can take any binary data and derive a big number that represents that data and that data alone. The digest serves as a digital fingerprint for the data. This property makes cryptographic digests the basis of Digital Signatures and their close cousins, HMACs (we’ll cover both in upcoming posts).

But the fact remains: the cryptographic hash is not actually unique. Where there is a hash value, there will always be a collision: two or more arbitrary pieces of data that reduce down to the same digest.

Collisions

Why are collisions inevitable? Most of you know how a hash table works. If you don’t, then consider the following : Say you were given 5 balls and asked to place them in 3 buckets. It doesn’t take an engineer to realize that you must put at least 2 balls in buckets that already contain at least 1 other ball. Collisions!

The famous SHA-1 function, the one time champion of cryptographic hashing, produces 160 bit hash values. 160 bits represents is 2^160 (2 raised to 160) possible unique values. 2^160 is a very big number: approx. 1.46 x 10^48 – i.e. 48 zeros. By comparison, the Earth has an estimated 1.33 x 10^50 atoms.

The number of possible inputs (balls) to the hash function is infinite. The number of possible hash values (buckets) is fixed. Collisions! Multiple balls will land in the same bucket. Eventually. But it may take a long while because there are so many buckets!

In fact, the laws of probability tell us that you have a 50-50 chance of getting a collision if you have as few as 2^80 inputs. Which is a smaller but still scarily big number. Why? For the same reason that 23 random people have a 50-50 chance of sharing the same birthday (but not birth year!).

Finding Collisions

So how do you go find collisions and why? The why is obvious: imagine if you found somebody who had the same fingerprint as you – unlikely though it may be. If you were of the miscreant persuasion, you might take advantage of this knowledge. The same holds for digital fingerprints. If you could tamper with or create digital data in such a way that its digital fingerprint matched (collided) with that of the “real data”, you could cause some mischief. Since the digital fingerprint of the bad data matched the one people expected from good data, they would have little reason to be suspicious.

The simplest way to find collisions– brute force –  is also the hardest – primarily because of how long it takes. You could brute force collision detection by calculating the hashes of bazillion (all) inputs using gazillions of computers and then watching a lot of TV as you… wait……for ever. Or you could alter various bits of the original input, and try computing hashes and see if any of them stick. You could do cryptanalysis – which these days is a highly sophisticated version of how the British famously hacked the German Enigma machine. There are other techniques and they all are more involved than this short paragraph may let on (you think?), but you get the general idea.

To find collisions using the hottest cryptographic hash function in town (the SHA2 family) is (currently) practically impossible. In cryptography, practically impossible means computationally infeasible. Which is a fancy way of saying that even if you used all the current computers, algorithms and known mathematics, it would take you so long to solve the problem that it wouldn’t matter any more. You could use all of Azure and Amazon EC2 to crunch your algorithms, but you would die before you succeeded, as would all of humanity and possibly the Earth too. Of course, a brainy breakthrough that exploited of a fundamental flaw in the hash function, or a quantum computing revolution might give you a fighting chance, but until then..

You could also invent some new smarty pants Math that lets you find a collision in feasible time. Cryptographers live in abject terror of computational feasibility – the Freddie Kruger of their dreams.

Avoiding Collisions

Cryptographic hash functions are painstakingly designed to reduce the probability of collisions. If you peek at the code for a hash function, you will find it replete with bit operations like xors, bitwise and/ors, shifts and rotations. They operate on each bit, shoving and pushing and twisting the data with seemingly arbitrary, but carefully chosen and massively tested steps. A software blender using every bit in the original data to make a digital smoothie, with each smoothie having a taste of its own that incorporates the flavor of everything that went in. With values distributed values more or less randomly (evenly) across all buckets. Small changes in the original triggering an avalanche of changes in the computed digest.

The mathematics behind why or how any of this works is way over my balding head. The mathematics are actually so subtle and clever that they may include hidden flaws – either mistakes or deliberate weaknesses that a clever chap may exploit at a later time. This is why cryptographic hash functions are few and far between. Rock stars that hold sway for a while even as they are taken apart by brainiacs. Until one of them discovers a weakness. And so went MD4 and MD5, SHA0. And not so long ago, SHA1 also met its fate, even though it was compromised only in theory. Cryptographers are a paranoid bunch, which is just fine with me!

How encrypted email works

I’ve been working on the Direct Project for the past year or more. The Direct Project is a federally sponsored initiative that uses secure email as the foundation for the ubiquitous nationwide exchange of health information.

To secure an email, you have to, among other things, encrypt the message content. It is no surprise that many newcomers to Direct want to know how encrypted email works. Others, who are comfortable with classic message security, notice that unlike point to point messaging (one sender, one receiver), email is inherently multicast (one sender, many receivers). They ask: how do you encrypt email sent to multiple recipients?

In this inaugural posting for my new blog, I will try to answer both questions in plain English.

Encryption Basics

First, a quick refresher on encryption concepts:

  1. Key: An array of carefully generated bits, used to encrypt and decrypt data.
  2. Encryption: You use a key (secret) and a precise series of complicated steps (encryption algorithm or cipher) to mangle (encrypt) data into undecipherable gibberish.
  3. Decryption: You use a key (secret – hopefully the right one) and a precise series of complicated steps (decryption algorithm or cipher) to un-mangle (decrypt) gibberish back into your original data. If you use the wrong key, or the wrong algorithm, you turn the source gibberish into more gibberish.
  4. Symmetric Encryption: You use the same key to both encrypt and decrypt the data. Both the sender and the receiver have a copy of the same keya shared secret. To share the secret, the sender and receiver must exchange their shared key securely – without an attacker getting a peek. If an attacker can somehow (silently) intercept an inadequately protected secret as it moves from sender to receiver (steaming open the envelope, so to speak), the attacker can also decrypt your encrypted data.
  5. Asymmetric Encryption: You use one key (public) to encrypt the data and an associated but different key (private) to decrypt the data. Data encrypted with your public key can only be decrypted with your associated private key. You boldly give the public part of your key pair to anybody you want to receive encrypted data from. You keep your private key secret and and use it to decrypt data that people send you. Unlike symmetric encryption, there is no shared secret to exchange. You can distribute your public key to the entire world without fear. Data encrypted with your public key is truly for your eyes only – because only you can decrypt it with the secret private key that only you have.The reverse is also true. Data encrypted with your private key can only be decrypted using your public key. This property has important implications for digital signatures (more in future posts).

Symmetric and Asymmetric encryption work differently, – they use different types of keys and different encryption/decryption algorithms.

Symmetric encryption is fast. Asymmetric encryption is slow.

How does email encryption work?

Violet wants people to encrypt the email they send her. To help them do this, Violet creates a (public, private) key pair. She wraps up her public key in a secure package called an X509 Digital Certificate (more on this in future posts) and gives the certificate containing the public key to those she is corresponding with. To make it easy for others to find her public key, she even publishes her certificate in a public directory.

Violet’s good friend Toby Toby decides to send her some encrypted email.

All Toby has to do is use Violet’s public key to encrypt the message, right? Wrong.

To use Violet’s public key to encrypt his email, Toby must use asymmetric encryption. Which, unfortunately, is slow. Toby cannot practically encrypt the content of his email using Violet’s asymmetric public key – it takes too much work!

To encrypt his email content, Toby needs a faster option – symmetric encryption. Toby generates a new symmetric encryption key and uses this key to efficiently encrypt the content of his email.

But how does Violet decrypt Toby’s email? To decrypt, Violet needs a copy of the symmetric encryption key, which she doesn’t have because Toby generated it on the fly and hasn’t given it to her yet! How does Toby securely send Violet a copy of his encryption key?

Toby cleverly solves the problem by attaching the encryption key to the email itself. The message brings its own key with it.

But isn’t that crazy? Anybody can now grab the key and decrypt the email, right? Wrong.

The clever Toby encrypts the symmetric encryption key before attaching it to the email. He does this using Violet’s public key, which he had obtained earlier. And even though this requires slow asymmetric encryption, the performance conscious Toby doesn’t mind because the encryption key is relatively small – usually only 256 bits long at most.

Toby sends his email to Violet. Naturally, Toby does not encrypt the addressing information on the message – the To & From – which have to travel in the clear, just like the addressing information on the envelope of a sealed snail-mail letter. Email servers use the addressing information to transport the email to its destination.

When Violet receives the email, she decrypts the attached encryption key using her private key. She then uses the encryption key to decrypt the email content and receives Toby’s friendly missive.

How do you encrypt email sent to multiple recipients?

Toby wants to send an email message to both Violet and Margaret. How does he encrypt this message?

Should Toby repeat the encryption process twice? Encrypt the email once for Violet and again for Margaret? And what happens if Toby also puts Gitanjali on the To line? Does Toby have to encrypt the message three times? And send out 3 different copies of the same message? Isn’t that getting really inefficient?

Toby has a much better idea. Just like before, he encrypts the email exactly once, using a symmetric encryption key. Then he attaches multiple copies of the same encryption key to the message – one for each recipient and encrypted with that recipient’s public key. Toby encrypts one copy of the encryption key with Violet’s public key. He encrypts a second copy with Margaret’s public key and third with Gitanjali’s. Then he attaches the 3 copies to the message.

When Margaret receives the email, she locates the copy of the encryption key that was intended for her. She decrypts the encryption key, then uses it to decrypt Toby’s note. Violet and Gitanjali do the same.

You can use the same technique to encrypt email sent to as many recipients as you like. Every new recipient merely means the small overhead of an additional attached copy of the encryption key.

S/MIME

You should now have a high level notion of how email encryption works. Those of you who are interested in the gory details should deep dive into S/MIMEthe defacto standard for securing email. Please do peruse the S/MIME and Direct Transport specs for a bit by bit commentary.

It takes more than encryption to secure email. See my follow up posts to learn how:

Source Code

The open source Direct Project Reference implementation contains a full S/MIME and secure email implementation. To learn how to encrypt and sign email and email content in C#, check out the SMIME source code.