Stream ciphers¶
Description¶
A stream cipher is a symmetric-key encryption algorithm that encrypts a stream of bits. Ideally, that stream could be as long as we’d like; real-world stream ciphers have limits, but they are normally sufficiently large that they don’t pose a practical problem.
A naive attempt with block ciphers¶
Let’s try to build a stream cipher using the tools we already have. Since we already have block ciphers, we could simply divide an incoming stream into different blocks, and encrypt each block:
This scheme is called ECB mode (Electronic Code Book Mode), and it is one of the many ways that block ciphers can be used to construct stream ciphers. Unfortunately, while being very common in home-grown cryptosystems, it poses very serious security flaws. For example, in ECB mode, identical input blocks will always map to identical output blocks:
At first, this might not seem like a particularly serious problem. Assuming the block cipher is secure, it doesn’t look like an attacker would be able to decrypt anything. By dividing the ciphertext stream up into blocks, an attacker would only be able to see that a ciphertext block, and therefore a plaintext block, was repeated.
We’ll now illustrate the many flaws of ECB mode with two attacks. First, we’ll exploit the fact that repeating plaintext blocks result in repeating ciphertext blocks, by visually inspecting an encrypted image. Then, we’ll demonstrate that attackers can often decrypt messages encrypted in ECB mode by communicating with the person performing the encryption.
Visual inspection of an encrypted stream¶
To demonstrate that this is, in fact, a serious problem, we’ll use a simulated block cipher of various block sizes and apply it to an image 1. We’ll then visually inspect the different outputs.
- 1
This particular demonstration only works on uncompressed bitmaps. For other media, the effect isn’t significantly less damning: it’s just less visual.
Because identical blocks of pixels in the plaintext will map to identical blocks of pixels in the ciphertext, the global structure of the image is largely preserved.
As you can see, the situation appears to get slightly better with larger block sizes, but the fundamental problem still remains: the macrostructure of the image remains visible in all but the most extreme block sizes. Furthermore, all but the smallest of these block sizes are unrealistically large. For an uncompressed bitmap with three color channels of 8 bit depth, each pixel takes 24 bits to store. Since the block size of AES is only 128 bits, that would equate to or just over 5 pixels per block. That’s significantly fewer pixels per block than the larger block sizes in the example. But AES is the workhorse of modern block ciphers—it can’t be at fault, certainly not because of an insufficient block size.
When we look at a picture of what would happen with an idealized encryption scheme, we notice that it looks like random noise. Keep in mind that “looking like random noise” doesn’t mean something is properly encrypted: it just means that we can’t inspect it using methods this trivial.
Encryption oracle attack¶
In the previous section, we’ve focused on how an attacker can inspect a ciphertext encrypted using ECB mode. That’s a passive, ciphertext-only attack. It’s passive because the attacker doesn’t really interfere in any communication; they’re simply examining a ciphertext. In this section, we’ll study a different, active attack, where the attacker actively communicates with their target. We’ll see how the active attack can enable an attacker to decrypt ciphertexts encrypted using ECB mode.
To do this, we’ll introduce a new concept called an oracle. Formally defined oracles are used in the study of computer science, but for our purposes it’s sufficient to just say that an oracle is something that will compute some particular function for you.
In our case, the oracle will perform a specific encryption for the attacker, which is why it’s called an encryption oracle. Given some data chosen by the attacker, the oracle will encrypt that data, followed by a secret suffix , in ECB mode. Or, in symbols:
The secret suffix is specific to this system. The attacker’s goal is to decrypt it. We’ll see that being able to encrypt other messages surprisingly allows the attacker to decrypt the suffix. This oracle might seem artificial, but is quite common in practice. A simple example would be a cookie encrypted with ECB, where the prefix is a name or an e-mail address field, controlled by the attacker.
You can see why the concept of an oracle is important here: the attacker would not be able to compute themselves, since they do not have access to the encryption key or the secret suffix . The goal of the oracle is for those values to remain secret, but we’ll see how an attacker will be able to recover the secret suffix (but not the key ) anyway. The attacker does this by inspecting the ciphertext for many carefully chosen values of the attacker-chosen prefix .
Assuming that an attacker would have access to such an oracle might seem like a very artificial scenario. It turns out that in practice, a lot of software can be tricked into behaving like one. Even if an attacker can’t control the real software as precisely as they can query an oracle, the attacker generally isn’t thwarted. Time is on their side: they only have to convince the software to give the answer they want once. Systems where part of the message is secret and part of the message can be influenced by the attacker are actually very common, and, unfortunately, so is ECB mode.
Decrypting a block using the oracle¶
The attacker starts by sending in a plaintext that’s just one byte shorter than the block size. That means the block that’s being encrypted will consist of those bytes, plus the first byte of , which we’ll call . The attacker remembers the encrypted block. They don’t know the value of yet, but now they do know the value of the first encrypted block: . In the illustration, this is block :
Then, the attacker tries a full-size block, trying all possible values for the final byte. Eventually, they’ll find the value of ; they know the guess is correct because the resulting ciphertext block will match the ciphertext block they remembered earlier.
The attacker can repeat this for the penultimate byte. They submit a plaintext that’s two bytes shorter than the block size. The oracle will encrypt a first block consisting of that followed by the first two bytes of the secret suffix, . The attacker remembers that block.
Since the attacker already knows , they try followed by all possible values of . Eventually they’ll guess correctly, which, again, they’ll know because the ciphertext blocks match:
The attacker can then rinse and repeat, eventually decrypting an entire block. This allows them to brute-force a block in attempts, where is the number of possible values for each byte (so, for 8-bit bytes, that’s ) and is the block size. This is much better than a regular brute-force attack, where an attacker has to try all of the possible blocks, which would be:
For a typical block size of 16 bytes (or 128 bits), brute forcing would mean trying combinations. That’s a huge, 39-digit number. It’s so large that trying all of those combinations is considered impossible. An ECB encryption oracle allows an attacker to do it in at most tries, a far more manageable number.
Conclusion¶
In the real world, block ciphers are used in systems that encrypt large amounts of data all the time. We’ve seen that when using ECB mode, an attacker can both analyze ciphertexts to recognize repeating patterns, and even decrypt messages when given access to an encryption oracle.
Even when we use idealized block ciphers with unrealistic properties, such as block sizes of more than a thousand bits, an attacker ends up being able to decrypt the ciphertexts. Real world block ciphers only have more limitations than our idealized examples, such as much smaller block sizes.
We aren’t even taking into account any potential weaknesses in the block cipher. It’s not AES (or our test block ciphers) that cause this problem, it’s our ECB construction. Clearly, we need something better.
Block cipher modes of operation¶
One of the more common ways of producing a stream cipher is to use a block cipher in a particular configuration. The compound system behaves like a stream cipher. These configurations are commonly called mode of operations. They aren’t specific to a particular block cipher.
ECB mode, which we’ve just seen, is the simplest such mode of operation.
The letters ECB
stand for electronic code book 2. For reasons
we’ve already gone into, ECB mode is very ineffective. Fortunately,
there are plenty of other choices.
- 2
Traditionally, modes of operation seem to be referred to by a three-letter acronym.
CBC mode¶
CBC mode, which stands for cipher block chaining, is a very common mode of operation where plaintext blocks are XORed with the previous ciphertext block before being encrypted by the block cipher.
Of course, this leaves us with a problem for the first plaintext block: there is no previous ciphertext block to XOR it with. Instead, we pick an IV: a random number that takes the place of the “first” ciphertext in this construction. initialization vectors also appear in many other algorithms. An initialization vector should be unpredictable; ideally, they will be cryptographically random. They do not have to be secret: IVs are typically just added to ciphertext messages in plaintext. It may sound contradictory that something has to be unpredictable, but doesn’t have to be secret; it’s important to remember that an attacker must not be able to predict ahead of time what a given IV will be. We will illustrate this later with an attack on predictable CBC IVs.
The following diagram demonstrates encryption in CBC mode:
Decryption is the inverse construction, with block ciphers in decryption mode instead of encryption mode:
While CBC mode itself is not inherently insecure (unlike ECB mode), its particular use in TLS 1.0 was. This eventually led to the BEAST attack, which we’ll cover in more detail in the section on SSL/TLS. The short version is that instead of using unpredictable initialization vectors, for example by choosing random IVs, the standard used the previous ciphertext block as the IV for the next message. Unfortunately, it turns out that attackers figured out how to exploit that property.
Attacks on CBC mode with predictable IVs¶
Suppose there’s a database that stores secret user information, like medical, payroll or even criminal records. In order to protect that information, the server that handles it encrypts it using a strong block cipher in CBC mode with a fixed key. For now, we’ll assume that that server is secure, and there’s no way to get it to leak the key.
Mallory gets a hold of all of the rows in the database. Perhaps she did it through a SQL injection attack, or maybe with a little social engineering. 3 Everything is supposed to remain secure: Mallory only has the ciphertexts, but she doesn’t have the secret key.
- 3
Social engineering means tricking people into things they shouldn’t be doing, like giving out secret keys, or performing certain operations. It’s usually the most effective way to break otherwise secure cryptosystems.
Mallory wants to figure out what Alice’s record says. For simplicity’s sake, let’s say there’s only one ciphertext block. That means Alice’s ciphertext consists of an IV and one ciphertext block.
Mallory can still try to use the application as a normal user, meaning that the application will encrypt some data of Mallory’s choosing and write it to the database. Suppose that through a bug in the server, Mallory can predict the IV that will be used for her ciphertext. Perhaps the server always uses the same IV for the same person, or always uses an all-zero IV, or…
Mallory can construct her plaintext using Alice’s IV (which Mallory can see) and her own predicted IV . She makes a guess as to what Alice’s data could be. She asks the server to encrypt:
The server dutifully encrypts that message using the predicted IV . It computes:
That ciphertext, CM, is exactly the ciphertext block Alice would have had if her plaintext block was G. So, depending on what the data is, Mallory has figured out if Alice has a criminal record or not, or perhaps some kind of embarrassing disease, or some other issue that Alice really expected the server to keep secret.
Lessons learned: don’t let IVs be predictable. Also, don’t roll your own cryptosystems. In a secure system, Alice and Mallory’s records probably wouldn’t be encrypted using the same key.
Attacks on CBC mode with the key as the IV¶
Many CBC systems set the key as the initialization vector. This seems like a good idea: you always need a shared secret key already anyway. It yields a nice performance benefit, because the sender and the receiver don’t have to communicate the IV explicitly, they already know the key (and therefore the IV) ahead of time. Plus, the key is definitely unpredictable because it’s secret: if it were predictable, the attacker could just predict the key directly and already have won. Conveniently, many block ciphers have block sizes that are the same length or less than the key size, so the key is big enough.
This setup is completely insecure. If Alice sends a message to Bob, Mallory, an active adversary who can intercept and modify the message, can perform a chosen ciphertext attack to recover the key.
Alice turns her plaintext message into three blocks and encrypts it in CBC mode with the secret key and also uses as the IV. She gets a three block ciphertext , which she sends to Bob.
Before the message reaches Bob, Mallory intercepts it. She modifies the message to be , where is a block filled with null bytes (value zero).
Bob decrypts , and gets the three plaintext blocks :
is some random block. Its value doesn’t matter.
Under the chosen-ciphertext attack assumption, Mallory recovers that decryption. She is only interested in the first block () and the third block (). By XORing those two together, she finds . But, the IV is the key, so Mallory successfully recovered the key by modifying a single message.
Lesson learned: don’t use the key as an IV. Part of the fallacy in the introduction is that it assumed secret data could be used for the IV, because it only had to be unpredictable. That’s not true: “secret” is just a different requirement from “not secret”, not necessarily a stronger one. It is not generally okay to use secret information where it isn’t required, precisely because if it’s not supposed to be secret, the algorithm may very well treat it as non-secret, as is the case here. There are plenty of systems where it is okay to use a secret where it isn’t required. In some cases you might even get a stronger system as a result, but the point is that it is not generally true, and depends on what you’re doing.
CBC bit flipping attacks¶
An interesting attack on CBC mode is called a bit flipping attack. Using a CBC bit flipping attack, attackers can modify ciphertexts encrypted in CBC mode so that it will have a predictable effect on the plaintext.
This may seem like a very strange definition of “attack” at first. The attacker will not even attempt to decrypt any messages, but they will just be flipping some bits in a plaintext. We will demonstrate that the attacker can turn the ability to flip some bits in the plaintext into the ability to have the plaintext say whatever they want it to say, and, of course, that can lead to very serious problems in real systems.
Suppose we have a CBC encrypted ciphertext. This could be, for example, a cookie. We take a particular ciphertext block, and we flip some bits in it. What happens to the plaintext?
When we “flip some bits”, we do that by XORing with a sequence of bits, which we’ll call . If the corresponding bit in is 1, the bit will be flipped; otherwise, the bit will remain the same.
When we try to decrypt the ciphertext block with the flipped bits, we will get indecipherable 4 nonsense. Remember how CBC decryption works: the output of the block cipher is XORed with the previous ciphertext block to produce the plaintext block. Now that the input ciphertext block has been modified, the output of the block cipher will be some random unrelated block, and, statistically speaking, nonsense. After being XORed with that previous ciphertext block, it will still be nonsense. As a result, the produced plaintext block is still just nonsense. In the illustration, this unintelligible plaintext block is .
- 4
Excuse the pun.
However, in the block after that, the bits we flipped in the ciphertext will be flipped in the plaintext as well! This is because, in CBC decryption, ciphertext blocks are decrypted by the block cipher, and the result is XORed with the previous ciphertext block. But since we modified the previous ciphertext block by XORing it with , the plaintext block will also be XORed with . As a result, the attacker completely controls that plaintext block , since they can just flip the bits that aren’t the value they want them to be.
TODO: add previous illustration, but mark the path X takes to influence P prime {i + 1} in red or something
This may not sound like a huge deal at first. If you don’t know the plaintext bytes of that next block, you have no idea which bits to flip in order to get the plaintext you want.
To illustrate how attackers can turn this into a practical attack, let’s consider a website using cookies. When you register, your chosen user name is put into a cookie. The website encrypts the cookie and sends it to your browser. The next time your browser visits the website, it will provide the encrypted cookie; the website decrypts it and knows who you are.
An attacker can often control at least part of the plaintext being
encrypted. In this example, the user name is part of the plaintext of
the cookie. Of course, the website just lets you provide whatever value
for the user name you want at registration, so the attacker can just add
a very long string of Z
bytes to their user name. The server will
happily encrypt such a cookie, giving the attacker an encrypted
ciphertext that matches a plaintext with many such Z
bytes in them.
The plaintext getting modified will then probably be part of that
sequence of Z
bytes.
An attacker may have some target bytes that they’d like to see in the
decrypted plaintext, for example, ;admin=1;
. In order to figure out
which bytes they should flip (so, the value of in the
illustration), they just XOR the filler bytes (~ZZZ~…) with that target.
Because two XOR operations with the same value cancel each other out,
the two filler values (~ZZZ~…) will cancel out, and the attacker can
expect to see ;admin=1;
pop up in the next plaintext block:
This attack is another demonstration of an important cryptographic principle: encryption is not authentication! It’s virtually never sufficient to simply encrypt a message. It may prevent an attacker from reading it, but that’s often not even necessary for the attacker to be able to modify it to say whatever they want it to. This particular problem would be solved by also securely authenticating the message. We’ll see how you can do that later in the book; for now, just remember that we’re going to need authentication in order to produce secure cryptosystems.
Padding¶
So far, we’ve conveniently assumed that all messages just happened to fit exactly in our system of block ciphers, be it CBC or ECB. That means that all messages happen to be a multiple of the block size, which, in a typical block cipher such as AES, is 16 bytes. Of course, real messages can be of arbitrary length. We need some scheme to make them fit. That process is called padding.
Padding with zeroes (or some other pad byte)¶
One way to pad would be to simply append a particular byte value until the plaintext is of the appropriate length. To undo the padding, you just remove those bytes. This scheme has an obvious flaw: you can’t send messages that end in that particular byte value, or you will be unable to distinguish between padding and the actual message.
PKCS#5/PKCS#7 padding¶
A better, and much more popular scheme, is PKCS#5/PKCS#7 padding.
PKCS#5, PKCS#7 and later CMS padding are all more or less the same
idea 5. Take the number of bytes you have to pad, and pad them with
that many times the byte with that value. For example, if the block size
is 8 bytes, and the last block has the three bytes 12 34 45
, the
block becomes 12 34 45 05 05 05 05 05
after padding.
- 5
Technically, PKCS#5 padding is only defined for 8 byte block sizes, but the idea clearly generalizes easily, and it’s also the most commonly used term.
If the plaintext happened to be exactly a multiple of the block size, an entire block of padding is used. Otherwise, the recipient would look at the last byte of the plaintext, treat it as a padding length, and almost certainly conclude the message was improperly padded.
This scheme is described in [Hou].
CBC padding attacks¶
We can refine CBC bit flipping attacks to trick a recipient into decrypting arbitrary messages!
As we’ve just discussed, CBC mode requires padding the message to a multiple of the block size. If the padding is incorrect, the recipient typically rejects the message, saying that the padding was invalid. We can use that tiny bit of information about the padding of the plaintext to iteratively decrypt the entire message.
The attacker will do this, one ciphertext block at a time, by trying to get an entire plaintext block worth of valid padding. We’ll see that this tells them the decryption of their target ciphertext block, under the block cipher. We’ll also see that you can do this efficiently and iteratively, just from that little leak of information about the padding being valid or not.
It may be helpful to keep in mind that a CBC padding attack does not actually attack the padding for a given message; instead the attacker will be constructing paddings to decrypt a message.
To mount this attack, an attacker only needs two things:
A target ciphertext to decrypt
A padding oracle: a function that takes ciphertexts and tells the attacker if the padding was correct
As with the ECB encryption oracle, the availability of a padding oracle may sound like a very unrealistic assumption. The massive impact of this attack proves otherwise. For a long time, most systems did not even attempt to hide if the padding was valid or not. This attack remained dangerous for a long time after it was originally discovered, because it turns out that in many systems it is extremely difficult to actually hide if padding is valid or not. We will go into this problem in more detail both in this chapter and in later chapters.
In this chapter, we’ll assume that PKCS#5/PKCS#7 padding is being used, since that’s the most popular option. The attack is general enough to work on other kinds of padding, with minor modifications.
Decrypting the first byte¶
The attacker fills a block with arbitrary bytes . They also pick a target block from the ciphertext that they’d like to decrypt. The attacker asks the padding oracle if the plaintext of has valid padding. Statistically speaking, such a random plaintext probably won’t have valid padding: the odds are in the half-a-percent ballpark. If by pure chance the message happens to already have valid padding, the attacker can simply skip the next step.
Next, the attacker tries to modify the message so that it does have
valid padding. They can do that by indirectly modifying the last byte of
the plaintext: eventually that byte will be 01
, which is always
valid padding. In order to modify the last byte of a plaintext block,
the attacker modifies the last byte of the previous ciphertext block.
This works exactly like it did with CBC bit flipping attacks. That
previous ciphertext block is the block , so the byte being
modified is the last byte of , .
The attacker tries all possible values for that last byte. There are several ways of doing that: modular addition, XORing it with all values up to 256, or even picking randomly; the only thing that matters is that the attacker tries all of them. Eventually, the padding oracle will report that for some ciphertext block , the decrypted plaintext of has valid padding.
Discovering the padding length¶
The oracle has just told the attacker that for our chosen value of , the plaintext of has valid padding. Since we’re working with PKCS#5 padding, that means that the plaintext block ends in one of the following byte sequences:
01
02 02
03 03 03
…
The first option (01
) is much more likely than the others, since it
only requires one byte to have a particular value. The attacker is
modifying that byte to take every possible value, so it is quite
likely that they happened to stumble upon 01
. All of the other valid
padding options not only require that byte to have some particular
value, but also one or more other bytes. For an attacker to be
guaranteed a message with a valid 01
padding, they just have to try
every possible byte. For an attacker to end up with a message with a
valid 02 02
padding, they have to try every possible byte and
happen to have picked a combination of and that
causes the plaintext to have a 02
in that second-to-last position.
(To rephrase: the second-to-last byte of the decryption of the
ciphertext block, XORed with the second-to-last byte of , is
02
.)
In order to successfully decrypt the message, we still need to figure out which one of those options is the actual value of the padding. To do that, we try to discover the length of the padding by modifying bytes starting at the left-hand side of until the padding becomes invalid again. As with everything else in this attack, we modify those bytes in by modifying the equivalent bytes in our chosen block . As soon as padding breaks, you know that the last byte you modified was part of the valid padding, which tells you how many padding bytes there are. Since we’re using PKCS#5 padding, that also tells you what their value is.
Let’s illustrate this with an example. Suppose we’ve successfully found
some block so that the plaintext of has valid
padding. Let’s say that padding is 03 03 03
. Normally, the attacker
wouldn’t know this; the point of this procedure is to discover what that
padding is. Suppose the block size is 8 bytes. So, we (but not the
attacker) know that is currently:
In that equation, are some bytes of the plaintext. Their actual value doesn’t matter: the only thing that matters is that they’re not part of the padding. When we modify the first byte of , we’ll cause a change in the first byte of , so that becomes some other byte :
As you can see, this doesn’t affect the validity of the padding. It also
does not affect , , or .
However, when we continue modifying subsequent bytes, we will eventually
hit a byte that is part of the padding. For example, let’s say we turn
that first 03
into 02
by modifying . now
looks like this:
Since 02 03 03
isn’t valid PKCS#5 padding, the server will reject
the message. At that point, we know that once we modify six bytes, the
padding breaks. That means the sixth byte is the first byte of the
padding. Since the block is 8 bytes long, we know that the padding
consists of the sixth, seventh and eighth bytes. So, the padding is
three bytes long, and, in PKCS#5, equal to 03 03 03
.
A clever attacker who’s trying to minimize the number of oracle queries
can leverage the fact that longer valid padding becomes progressively
more rare. They can do this by starting from the penultimate byte
instead of the beginning of the block. The advantage to this method is
that short paddings (which are more common) are detected more quickly.
For example, if the padding is 0x01
and an attacker starts modifying
the penultimate byte, they only need one query to learn what the padding
was. If the penultimate byte is changed to any other value and the
padding is still valid, the padding must be 0x01
. If the padding is
not valid, the padding must be at least 0x02 0x02
. So, they go back
to the original block and start modifying the third byte from the back.
If that passes, the padding was indeed 0x02 0x02
, otherwise the
padding must be at least 0x03 0x03 0x03
. The process repeats until
they’ve found the correct length. This is a little trickier to
implement; you can’t just keep modifying the same block (if it’s
mutable), and you’re waiting for the oracle to fail instead of pass,
which can be confusing. But other than being faster at the cost of being
slightly more complex, this technique is equivalent to the one described
above.
For the next section, we’ll assume that it was just 01
, since that
is the most common case. The attack doesn’t really change depending on
the length of the padding. If you guess more bytes of padding correctly,
that just means that there are fewer remaining bytes you will have to
guess manually. (This will become clear once you understand the rest of
the attack.)
Decrypting one byte¶
At this point, the attacker has already successfully decrypted the last
byte of the target block of ciphertext! Actually, we’ve decrypted as
many bytes as we have valid padding; we’re just assuming the worst case
scenario where there is only a single byte. How? The attacker knows that
the last byte of the decrypted ciphertext block (we’ll call
that byte ), XORed with the iteratively found value
, is 01
:
By moving the XOR operation to the other side, the attacker gets:
The attacker has now tricked the receiver into revealing the value of the last byte of the block cipher decryption of .
Decrypting subsequent bytes¶
Next, the attacker tricks the receiver into decrypting the next byte.
Remember the previous equation, where we reasoned that the last byte of
the plaintext was 01
:
Now, we’d like to get that byte to say 02
, to produce an almost
valid padding: the last byte would be correct for a 2-byte PKCS#5
padding (02 02
), but that second-to-last byte probably isn’t 02
yet. To do that, we XOR with 01
to cancel the 01
that’s already
there (since two XORs with the same value cancel each other out), and
then we XOR with 02
to get 02
:
So, to produce a value of 02
in the final position of the decrypted
plaintext, the attacker replaces with:
This accomplishes the goal of almost valid padding. Then, they try all
possible values for the second-to-last byte (index ).
Eventually, one of them will cause the message to have valid padding.
Since we modified the random block so that the final byte of the
plaintext will be 02
, the only byte in the second-to-last position
that can cause valid padding is 02
as well. Using the same math as
above, the attacker has recovered the second-to-last byte.
Then, it’s just rinse and repeat. The last two bytes are modified to
create an almost-valid padding of 03 03
, then the third byte from
the right is modified until the padding is valid, and so on. Repeating
this for all the bytes in the block means the attacker can decrypt the
entire block; repeating it for different blocks means the attacker can
read the entire message.
This attack has proven to be very subtle and hard to fix. First of all, messages should be authenticated, as well as encrypted. That would cause modified messages to be rejected. However, many systems decrypt (and remove padding) before authenticating the message; so the information about the padding being valid or not has already leaked. We will discuss secure ways of authenticating messages later in the book.
You might consider just getting rid of the “invalid padding” message; declaring the message invalid without specifying why it was invalid. That turns out to only be a partial solution for systems that decrypt before authenticating. Those systems would typically reject messages with an invalid padding slightly faster than messages with a valid padding. After all, they didn’t have to do the authentication step: if the padding is invalid, the message can’t possibly be valid. An attack that leaks secret information through timing differences is called a timing attack, which is a special case of a side-channel attack: attacks on the practical implementation of a cryptosystem rather than its “perfect” abstract representation. We will talk about these kinds of attacks more later in the book.
That discrepancy was commonly exploited as well. By measuring how long it takes the recipient to reject the message, the attacker can tell if the recipient performed the authentication step. That tells them if the padding was correct or not, providing the padding oracle to complete the attack.
The principal lesson learned here is, again, not to design your own cryptosystems. The main way to avoid this particular problem is by performing constant time authentication, and authenticating the ciphertext before decrypting it. We will talk more about this in a later chapter on message authentication.
Native stream ciphers¶
In addition to block ciphers being used in a particular mode of operation, there are also “native” stream ciphers algorithms that are designed from the ground up to be a stream cipher.
The most common type of stream cipher is called a synchronous stream cipher. These algorithms produce a long stream of pseudorandom bits from a secret symmetric key. This stream, called the keystream, is then XORed with the plaintext to produce the ciphertext. Decryption is the identical operation as encryption, just repeated: the keystream is produced from the key, and is XORed with the ciphertext to produce the plaintext.
You can see how this construction looks quite similar to a one-time pad, except that the truly random one-time pad has been replaced by a pseudorandom stream cipher.
There are also asynchronous or self-synchronizing stream ciphers, where the previously produced ciphertext bits are used to produce the current keystream bit. This has the interesting consequence that a receiver can eventually recover if some ciphertext bits are dropped. This is generally not considered to be a desirable property anymore in modern cryptosystems, which instead prefer to send complete, authenticated messages. As a result, these stream ciphers are very rare, and we don’t talk about them explicitly in this book. Whenever someone says “stream cipher”, it’s safe to assume they mean the synchronous kind.
Historically, native stream ciphers have had their issues. NESSIE, an international competition for new cryptographic primitives, for example, did not result in any new stream ciphers, because all of the participants were broken before the competition ended. RC4, one of the most popular native stream ciphers, has had serious known issues for years. By comparison, some of the constructions using block ciphers seem bulletproof.
Fortunately, more recently, several new cipher algorithms provide new hope that we can get practical, secure and performant stream ciphers.
RC4¶
By far the most common native stream cipher in common use on desktop and mobile devices is RC4.
RC4 is sometimes also called ARCFOUR or ARC4, which stands for alleged RC4. While its source code has been leaked and its implementation is now well-known, RSA Security (the company that authored RC4 and still holds the RC4 trademark) has never acknowledged that it is the real algorithm.
It quickly became popular because it’s very simple and very fast. It’s not just extremely simple to implement, it’s also extremely simple to apply. Being a synchronous stream cipher, there’s little that can go wrong; with a block cipher, you’d have to worry about things like modes of operation and padding. Clocking in at around 13.9 cycles per byte, it’s comparable to AES-128 in CTR (12.6 cycles per byte) or CBC (16.0 cycles per byte) modes. AES came out a few years after RC4; when RC4 was designed, the state of the art was 3DES, which was excruciatingly slow by comparison (134.5 cycles per byte in CTR mode). [Dai]
An in-depth look at RC4¶
This is an optional, in-depth section. It almost certainly won’t help you write better software, so feel free to skip it. It is only here to satisfy your inner geek’s curiosity.
On the other hand, RC4 is incredibly simple, and it may be worth skimming this section.
RC4 is, unfortunately, quite broken. To better understand just how broken, we’ll take a look at how RC4 works. The description requires understanding modular addition; if you aren’t familiar with it, you may want to review the appendix on modular addition.
Everything in RC4 revolves around a state array and two indexes into that array. The array consists of 256 bytes forming a permutation: that is, all possible index values occur exactly once as a value in the array. That means it maps every possible byte value to every possible byte value: usually different, but sometimes the same one. We know that it’s a permutation because starts as one, and all operations that modify always swap values, which obviously keeps it a permutation.
RC4 consists of two major components that work on two indexes and the state array :
The key scheduling algorithm, which produces an initial state array for a given key.
The pseudorandom generator, which produces the actual keystream bytes from the state array which was produced by the key scheduling algorithm. The pseudorandom generator itself modifies the state array as it produces keystream bytes.
The key scheduling algorithm¶
The key scheduling algorithm starts with the identity permutation. That means that each byte is mapped to itself.
Then, the key is mixed into the state. This is done by letting index iterate over every element of the state. The index is found by adding the current value of (starting at 0) with the next byte of the key, and the current state element:
Once has been found, and are swapped:
This process is repeated for all the elements of . If you run out of key bytes, you just wrap around on the key. This explains why RC4 accepts keys from anywhere between 1 and 256 bytes long. Usually, 128 bit (16 byte) keys are used, which means that each byte in the key is used 16 times.
Or, in Python:
from itertools import cycle
def key_schedule(key):
s = range(256)
key_bytes = cycle(ord(x) for x in key)
j = 0
for i in range(256):
j = (j + s[i] + next(key_bytes)) % 256
s[i], s[j] = s[j], s[i]
return s
The pseudorandom generator¶
The pseudorandom generator is responsible for producing pseudorandom bytes from the state . These bytes form the keystream, and are XORed with the plaintext to produce the ciphertext. For each index , it computes ( starts at 0). Then, and are swapped:
To produce the output byte, and are added together. Their sum is used as an index into ; the value at is the keystream byte :
We can express this in Python:
def pseudorandom_generator(s):
j = 0
for i in cycle(range(256)):
j = (j + s[i]) % 256
s[i], s[j] = s[j], s[i]
k = (s[i] + s[j]) % 256
yield s[k]
Attacks¶
This is an optional, in-depth section. It almost certainly won’t help you write better software, so feel free to skip it. It is only here to satisfy your inner geek’s curiosity.
The section on the attacks on RC4 is a good deal more complicated than RC4 itself, so you may want to skip this even if you’ve read this far.
There are many attacks on RC4-using cryptosystems where RC4 isn’t really the issue, but are caused by things like key reuse or failing to authenticate the message. We won’t discuss these in this section. Right now, we’re only talking about issues specific to the RC4 algorithm itself.
Intuitively, we can understand how an ideal stream cipher would produce a stream of random bits. After all, if that’s what it did, we’d end up in a situation quite similar to that of a one-time pad.
The stream cipher is ideal if the best way we have to attack it is to try all of the keys, a process called brute-forcing the key. If there’s an easier way, such as through a bias in the output bytes, that’s a flaw of the stream cipher.
Throughout the history of RC4, people have found many such biases. In the mid-nineties, Andrew Roos noticed two such flaws:
The first three bytes of the key are correlated with the first byte of the keystream.
The first few bytes of the state are related to the key with a simple (linear) relation.
For an ideal stream cipher, the first byte of the keystream should tell me nothing about the key. In RC4, it gives me some information about the first three bytes of the key. The latter seems less serious: after all, the attacker isn’t supposed to know the state of the cipher.
As always, attacks never get worse. They only get better.
Adi Shamir and Itsik Mantin showed that the second byte produced by the cipher is twice as likely to be zero as it should be. Other researchers showed similar biases in the first few bytes of the keystream. This sparked further research by Mantin, Shamir and Fluhrer, showing large biases in the first bytes of the keystream. [FMS01] They also showed that knowing even small parts of the key would allow attackers to make strong predictions about the state and outputs of the cipher. Unlike RC4, most modern stream ciphers provide a way to combine a long-term key with a nonce (a number used once), to produce multiple different keystreams from the same long-term key. RC4, by itself, doesn’t do that. The most common approach was also the simplest: concatenate 6 the long-term key with the nonce : , taking advantage of RC4’s flexible key length requirements. In this context, concatenation means the bits of are appended to the bits of . This scheme meant attackers could recover parts of the combined key, eventually allowing them to slowly recover the long-term key from a large amount of messages (around to , or tens of millions of messages).
- 6
Here we use as the operator for concatenation. Other common symbols for concatenation include (for some programming languages, such as Python) and ⋅ (for formal languages).
WEP, a standard for protecting wireless networks that was popular at the time, was heavily affected by this attack, because it used this simplistic nonce combination scheme. A scheme where the long-term key and the nonce had been securely combined (for example using a key derivation function or a cryptographic hash function) wouldn’t have had this weakness. Many other standards including TLS were therefore not affected.
Again, attacks only get better. Andreas Klein showed more extensive correlation between the key and the keystream. [Kle08] Instead of tens of millions of messages with the Fluhrer, Mantin, Shamir attacks, attackers now only needed several tens of thousands of messages to make the attack practical. This was applied against WEP with great effect.
In 2013, a team of researchers at Royal Holloway in London produced a combination of two independent practical attacks [ABP+]. These attacks proved to be very damning for RC4: while RC4’s weaknesses had been known for a long time, they finally drove the point home for everyone that it really shouldn’t be used anymore.
The first attack is based on single-byte biases in the first 256 bytes of the keystream. By performing statistical analysis on the keystreams produced by a large number of keys, they were able to analyze the already well-known biases in the early keystream bytes of RC4 in much greater detail.
TODO: illustrate: http://www.isg.rhul.ac.uk/tls/RC4_keystream_dist_2_45.txt
The second attack is based on double byte biases anywhere in the keystream. It turns out that adjacent bytes of the keystream have an exploitable relation, whereas in an ideal stream cipher you would expect them to be completely independent.
Byte pair |
Byte position (mod 256) |
Probability |
---|---|---|
This table may seem a bit daunting at first. The probability expression in the rightmost column may look a bit complex, but there’s a reason it’s expressed that way. Suppose that RC4 was a good stream cipher, and all values occurred with equal probability. Then you’d expect the probability for any given byte value to be since there are different byte values. If RC4 was a good stream cipher, two adjacent bytes would each have probability , so any given pair of two bytes would have probability . However, RC4 isn’t an ideal stream cipher, so these properties aren’t true. By writing the probability in the form, it’s easier to see how much RC4 deviates from what you’d expect from an ideal stream cipher.
So, let’s try to read the first line of the table. It says that when the first byte of any 256-byte chunk from the cipher is , then the byte following it is slightly more likely ( times as likely, to be exact) to be 0 than for it to be any other number. We can also see that when one of the keystream bytes is , you can make many predictions about the next byte, depending on where it occurs in the keystream. It’s more likely to be , or the position in the keystream plus one or two.
TODO: demonstrate attack success
Again, attacks only get better. These attacks have primarily focused on the cipher itself, and haven’t been fully optimized for practical attacks on, say, web services. The attacks can be greatly improved with some extra information about the plaintext you’re attempting to recover. For example, HTTP cookies are often base-64 or hex encoded.
There’s no way around it: we need to stop using RC4. Fortunately, we’ve also developed many secure alternatives. The continuing advances in cryptanalysis of RC4 helped contribute to a sense of urgency regarding the improvement of commonly available cryptographic primitives. Throughout 2013 in particular, this led to large improvements in, for example, browser cryptography (we will discuss browser cryptography, notably SSL/TLS, in a later chapter).
Salsa20¶
Salsa20 is a newer stream cipher designed by Dan Bernstein. Bernstein is well-known for writing a lot of open source (public domain) software, most of which is either directly security related or built with information security very much in mind.
There are two minor variants of Salsa20, called Salsa20/12 and Salsa20/8, which are simply the same algorithm except with 12 and 8 rounds 7 respectively, down from the original 20. ChaCha is another, orthogonal tweak of the Salsa20 cipher, which tries to increase the amount of diffusion per round while maintaining or improving performance. ChaCha doesn’t have a “20” after it; specific algorithms do have a number after them (ChaCha8, ChaCha12, ChaCha20), which refers to the number of rounds.
- 7
Rounds are repetitions of an internal function. Typically a number of rounds are required to make an algorithm work effectively; attacks often start on reduced-round versions of an algorithm.
Salsa20 and ChaCha are among the state of the art of modern stream ciphers. There are currently no publicly known attacks against Salsa20, ChaCha, nor against any of their recommended reduced-round variants, that break their practical security.
Both cipher families are also pretty fast. For long streams, Salsa20 takes about 4 cycles per byte for the full-round version, about 3 cycles per byte for the 12-round version and about 2 cycles per byte for the 8-round version, on modern Intel processors [Ber] and modern AMD processors [Dai]. ChaCha is (on most platforms) slightly faster still. To put that into comparison, that’s more than three times faster than RC4 8, approximately three times faster than AES-CTR with a 128 bit key at 12.6 cycles per byte, and roughly in the ballpark of AES GCM mode 9 with specialized hardware instructions.
- 8
The quoted benchmarks don’t mention RC4 but MARC4, which stands for “modified alleged RC4”. The RC4 section explains why it’s “alleged”, and “modified” means it throws away the first 256 bytes because of a weakness in RC4.
- 9
GCM mode is an authenticated encryption mode, which we will see in more detail in a later chapter.
Salsa20 has two particularly interesting properties. Firstly, it is possible to “jump” to a particular point in the keystream without computing all previous bits. This can be useful, for example, if a large file is encrypted, and you’d like to be able to do random reads in the middle of the file. While many encryption schemes require the entire file to be decrypted, with Salsa20, you can just select the portion you need. Another construction that has this property is a mode of operation called CTR mode, which we’ll talk about later.
This ability to “jump” also means that blocks from Salsa20 can be computed independently of one another, allowing for encryption or decryption to work in parallel, which can increase performance on multi-core CPUs.
Secondly, it is resistant to many side-channel attacks. This is done by ensuring that no key material is ever used to choose between different code paths in the cipher, and that every round is made up of a fixed-number of constant-time operations. The result is that every block is produced with exactly the same number of operations, regardless of what the key is.
Both stream ciphers are based on an ARX design. One benefit of ARX ciphers is that they are intrinsically constant time. There are no secret memory access patterns that might leak information, as with AES. These ciphers also perform well on modern CPU architectures without needing cipher-specific optimizations. They take advantage of generic vector instructions, where the CPU performs related operations on multiple pieces of data in a single instruction. As a result, ChaCha20 performance is competitive with AES on modern Intel CPUs, even though the latter has specialized hardware.
Here is an example ARX operation:
To find the new value of , first we perform a modular addition () of and , then we XOR () the result with x and finally we rotate left () by bits. This is the core round primitive of Salsa20.
Native stream ciphers versus modes of operation¶
Some texts only consider native stream ciphers to be stream ciphers. This book emphasizes what the functionality of the algorithm is. Since both block ciphers in a mode of operation and a native stream cipher take a secret key and can be used to encrypt a stream, and the two can usually replace each other in a cryptosystem, we just call both of them stream ciphers and are done with it.
We will further emphasize the tight link between the two with CTR mode, a mode of operation which produces a synchronous stream cipher. While there are also modes of operation (like OFB and CFB) that can produce self-synchronizing stream ciphers, these are far less common, and not discussed here.
CTR mode¶
CTR mode, short for counter mode, is a mode of operation that works by concatenating a nonce with a counter. The counter is incremented with each block, and padded with zeroes so that the whole is as long as the block size. The resulting concatenated string is run through a block cipher. The outputs of the block cipher are then used as the keystream.
This illustration shows a single input block , consisting of nonce , current counter value and padding, being encrypted by the block cipher using key to produce keystream block , which is then XORed with the plaintext block to produce ciphertext block .
Obviously, to decrypt, you do the exact same thing again, since XORing a bit with the same value twice always produces the original bit: . As a consequence, CTR encryption and decryption is the same thing: in both cases you produce the keystream, and you XOR either the plaintext or the ciphertext with it in order to get the other one.
For CTR mode to be secure, it is critical that nonces aren’t reused. If they are, the entire keystream will be repeated, allowing an attacker to mount multi-time pad attacks.
This is different from an initialization vector such as the one used by CBC. An IV has to be unpredictable. An attacker being able to predict a CTR nonce doesn’t really matter: without the secret key, they have no idea what the output of the block cipher (the sequence in the keystream) would be.
Like Salsa20, CTR mode has the interesting property that you can jump to any point in the keystream easily: just increment the counter to that point. The Salsa20 paragraph on this topic explains why that might be useful.
Another interesting property is that since any keystream block can be computed completely separately from any other keystream block, both encryption and decryption are very easy to compute in parallel.
Stream cipher bit flipping attacks¶
Synchronous stream ciphers, such as native stream ciphers or a block cipher in CTR mode, are also vulnerable to a bit flipping attack. It’s similar to CBC bit flipping attacks in the sense that an attacker flips several bits in the ciphertext, and that causes some bits to be flipped in the plaintext.
This attack is actually much simpler to perform on stream ciphers than it is on CBC mode. First of all, a flipped bit in the ciphertext results in the same bit being flipped in the plaintext, not the corresponding bit in the following block. Additionally, it only affects that bit; in CBC bit flipping attacks, the plaintext of the modified block is scrambled. Finally, since the attacker is modifying a sequence of bytes and not a sequence of blocks, the attacks are not limited by the specific block size. In CBC bit flipping attacks, for example, an attacker can adjust a single block, but can’t adjust the adjacent block.
TODO illustrate
This is yet another example of why authentication has to go hand in hand with encryption. If the message is properly authenticated, the recipient can simply reject the modified messages, and the attack is foiled.
Authenticating modes of operation¶
There are other modes of operation that provide authentication as well as encryption at the same time. Since we haven’t discussed authentication at all yet, we’ll handle these later.
Remaining problems¶
We now have tools that will encrypt large streams of data using a small key. However, we haven’t actually discussed how we’re going to agree on that key. As noted in a previous chapter, to communicate between people, we need key exchanges. The number of key exchanges grows about as fast as the number of people squared. While the key to be exchanged is a lot smaller now than it was with one-time pads, the fundamental problem of the impossibly large number of key exchanges hasn’t been solved yet. We will tackle that problem in the next section, where we’ll look at key exchange protocols: protocols that allow us to agree on a secret key over an insecure medium.
Additionally, we’ve seen that encryption isn’t enough to provide security: without authentication, it’s easy for attackers to modify the message, and in many flawed systems even decrypt messages. In a future chapter, we’ll discuss how to authenticate messages, to prevent attackers from modifying them.