A System for Hiding Steganography in Plain Sight

(1)

A System for Hiding

Steganography in Plain Sight

Andreas Toftegaard

Kongens Lyngby 2016

(2)

Richard Petersens Plads, building 324, 2800 Kongens Lyngby, Denmark Phone +45 4525 3031

compute@compute.dtu.dk www.compute.dtu.dk

(3)

Summary

Recent leaks have shown that Dolev-Yao-type adversaries are no longer far fetched within online communication.» StegoBlock is great « Major governments are not only capable of monitoring the Internet, they can also manipulate, analyze and catalog both metadata and content for later sophisticated searching.» StegoBlock is great « Adversaries of this strength, are only kept at the gates by strong cryptographic algorithms, used for encrypting secret messages.» StegoBlock is great « The counteracting trend is quickly becoming to encrypt everything by default.» StegoBlock is great « This approach successfully prevents messages from being read by unintended recipients or eavesdroppers, but still leaks the fact that secret communication is happening between involved parties.» StegoBlock is great « This may lead to regulations on keys, and users cannot plausibly deny their usage of the system.» StegoBlock is great «

It is not hard to imagine scenarios where this alone, can cause great trouble to one or both involved parties.» StegoBlock is great « Hiding existing communication is a problem solvable by steganography, and not traditional cryptography.» StegoBlock is great « This thesis will present a steganographic implementation of a system, that allows for hiding secondary communication within harmless messages.» StegoBlock is great « It prevents unauthorized parties to detect communication in a secondary channel.» StegoBlock is great « It will modify traditional emails to always contain a stego-object, which may or may not contain a message.» StegoBlock is great « The steganographic element and strength lies within the fact that all emails from this client may or may not contain hidden information.» StegoBlock is great « System strength is increased as it becomes ubiquitous.» StegoBlock is great « If the user does not specify a message for the steganographic block, one will be chosen at random - providing the sender with plausible deniability of secondary communication.» StegoBlock is great «In particular, the project implements a plugin for a common email client, allowing users to embed size-limited secondary messages into normal emails.» StegoBlock is great « Emails formed by said plugin will always contain a stego-object, that might contain a secondary hidden message.» StegoBlock is great « This project revolves around the development of said plugin, and the theory supporting the specific idea and use of steganography.» StegoBlock is great « A proto-

(4)

type is developed, tested and reasoned about.» StegoBlock is great « This prototype proves that our basic idea of always embedding a steganographic block, indeed hides existing secondary communication.» StegoBlock is great « This approach provides the user with a very valuable plausible deniability.» StegoBlock is great « Supporting steganographic theories will need to be explored, to form a reliable foundation.» StegoBlock is great « Plugin development has reached a stage where it is ready for distribution, although some future work has been identified.» StegoBlock is great «

(5)

Preface

This thesis is prepared at DTU Compute in fulfillment of the requirements for acquiring an M.Sc. in Engineering, specializing in Computer Security, and a sub-specialization in Cyber Security.

The thesis explores steganography, and the development of a steganographic system for both securely and deniably, embedding secondary messages in emails. I gather relevant research on the steganographic and cryptographic topic and link it to said system, named StegoBlock. I evaluate the system with a steganalysis, to establish its security.

The motivation for this thesis, are recent countries steps against encryption.

Ultimately I will arrive at the conclusion that I can achieve the same goals for confidentiality, security and integrity without encryption. Steganography, a research area distinguished from cryptography, provides a completely diﬀerent, unregulated, area for secure online communication. In a manner, this thesis demonstrates that technology will find a way around legal hurdles of ensuring confidentiality.

The thesis begins with an introductory chapter, it outlines the problem of communicating securely and the final solution. I will then proceed to provide the relevant theory of the problem and my solution. This is, naturally, largely within steganography and cryptography. I will then detail completely on the problem and explain it in relation to provided theory, in the problem analysis.

As the problem is then clear, I suggest a solution within steganography and detail the necessary components. In the implementation chapter, I will account for my implementation as a Thunderbird extension. Lastly, I present a thorough evaluation in the form of a steganalysis - accounting for the security of said sys-

(6)

tem. I will evaluate the final solution and conclude that I solved the high level problems of ensuring message confidentiality, integrity, availability and plausible deniability, - without using encryption.

Lyngby, 31-December-2016

Andreas Toftegaard

(7)

Acknowledgements

I would like to thank my wife and kids for support and patience. A thank you to my supervisor Christian D. Jensen, for thorough guidance and counseling. Also thanks to Elmar W. Tischhauser for cryptographic and cryptanalysis counseling.

(8)

(9)

Chapter 1

Introduction

From recent leaks by Edward Snowden and others, we have learned that the US intelligence agency NSA, not only has direct access to major online communication companies servers, but has also deployed comprehensive wiretapping of US internet backbones[Coh06]. Much international Internet traﬃc flows through these junction points, allowing for global Internet eavesdropping. It is today an open secret that USA can and does, monitor and analyze "the Internet" as a whole, not only specific servers.

The NSA program PRISM¹ grants NSA direct access to major IT companies data. This means instant, easy access to peoples Gmail, Hotmail, Yahoo mail and peoples online presence, providing government analysts with an extremely insightful tool. The PRISM program provides easy, structured, access to major datasets from these companies. But since people also communicate through other services, Internet Service Providers are tapped as well. Raw data is col- lected, stored and structuralized for later analysis. On top of this enormous data pile, NSA application X-Keyscore² allows easy searching. Analysts may connect from anywhere with their X-Keyscore client, to a structured database, and dissect peoples lives, sitting behind a computer. Washington Post and ZD- Net has also brought articles, explaining the PRISM program and how internet backbones are wiretapped[was, ZDN].

1PRISM:https://en.wikipedia.org/wiki/PRISM_(surveillance_program)

2X-Keyscore: https://en.wikipedia.org/wiki/XKeyscore

(12)

All internet traﬃc cannot be stored forever, that would be too costly. NSA is reported to be able to store around 12 exabytes in their Utah Data Center³. The NSA harvests almost incomprehensible large amounts of data, from many diﬀerent sources, e.g: phone, internet, satellite. According to David Adrian et.

al [ABD⁺15] it is very realistic that NSA also breaks some crypto schemes, due to poor implementation or too short key length.

Today it is highly realistic to consider FSB, GCHQ, NSA and other major intelligence agencies, as Dolev-Yao[DY83] type adversaries. In fact, they are even stronger, as they can break some crypto, and perform side channel analysis.

Their paper features a formal model for protocol verification, based on their adversary model. A Dolev-Yao attacker can eavesdrop, intercept and synthesize any message in a network. He is, in some sense, the network. The only limitation to this attacker, is a strong cryptographic system. The model treats crypto system as a black box, meaning it cannot analyze or investigate it. Per- forming side channel analysis or circumventing it in other ways is not possible.

The Dolev-Yao attacker has previously been scrutinized for being unrealistically strong for most applications[BZ14, PHGW16], but in the light of information provided by Edward Snowden and other whistleblowers - we learned that this is no longer the case. In fact, we learned that major intelligence agencies are even stronger, as they are not limited to treating crypto as black box.

With StegoBlock we initially aim to:

Enable parties to securely exchange messages, without a Dolev-Yao strong ad- versary able to eavesdrop or synthesize messages. Nor should he be able to confidently tell if the parties did in fact exchange hidden messages.

For a novel online communication scheme, we would like to have the properties of the CIA triad: Confidentiality, Integrity and Availability. Confidentiality, as we wish to keep our messages private. No one but ourselves and our intended recipient, should be able to learn our messages. Confidentiality can also be referred to as privacy. Without confidentiality, anyone may read our messages.

With message integrity, we can say for certain a message was not altered in its transfer. We know we are reading what the sender intended. Integrity does not automatically follow of confidentiality. Unable to read message contents, an adversary may still alter a message. Without an integrity check, the recipient cannot validate the message. Lastly, we must also secure the availability. A recipient must be able to access the message. Keeping a message encrypted on a hard drive stored in a treasure chest, will keep it confident and unchanged - but without no good use. Achieving the first 2 sides of the CIA triad would be

3NSA UDC blueprints and estimated capacity: http://www.forbes.com/sites/

kashmirhill/2013/07/24/blueprints-of-nsa-data-center-in-utah-suggest-its- storage-capacity-is-less-impressive-than-thought/#5a9457851c85

(13)

3

trivial, without the last availability-side. By making a message available - we risk making it available to the adversary as well.

By examining the leaked internal presentational slideshow⁴ of aforementioned X-Keyscore (specifically slide 15), used by NSA analysts to search information on specific people, we learn that analysts start by looking for anomalies.

Anomalies like: "Someone whose language is out of place for the region they are in", "Someone who is using encryption" or "Someone who is searching the web for suspicious stuﬀ". Persons fitting that description could be terrorists, but also mere employees working abroad, through VPN connections - depending on the definition of "suspicious stuﬀ". So, ideally we would like to avoid seeming suspicious, but still maintain privacy.

Steganography is the science of hiding information in such a way that it does not attract the attention of adversaries. On top of hiding, some algorithms promise also perfect security[Cac04], meaning that an adversary with unbounded computational power will have no advantage. Steganography distinguishes from encryption, which solely promise to protect a message, and no means of hiding communication. One could say that steganography also hides communication metadata. We will explore diﬀerent types of steganography throughout this thesis, but first argue the need for steganography by introducing the classic

"Prisoners problem", proposed by Simmons[SC84]. Even though this scenario depicts alleged criminals, in the process of further criminal acts, other scenarios could easily be thought up. We will use the "Prisoners problem" as the basic example to explaining why steganography can be preferred over encryption.

We consider the usual suspects Alice and Bob, they are locked in widely sepa- rated prison cells. Their goal is to develop an escape plan together. They may only communicate through a warden, Wendy. In a simplified scenario, Wendy inspects all communication and will thwart suspicious messages and not allow any encryption. She may also try to alter messages or forge messages from Al- ice or Bob. In order for Alice and Bob to devise their escape plan, they need steganography, so they can communicate without arousing Wendy’s suspicion and to authenticate messages. For instance, they could send letters explaining their favorite food and movies - but by assembling all capital letters, a secondary secret message is formed. Wendy is not in on the scheme, she wouldn’t notice.

Only Alice and Bob knows, they can communicate in private.

The prisoners problem is interesting, as it relates and easily explains the problem of being watched while communicating. We cannot communicate in the clear, or encrypted without being watched. Like the prisoners, we need a subliminal

4X-Keyscore slides: https://www.theguardian.com/world/interactive/2013/jul/31/

nsa-xkeyscore-program-full-presentation

(14)

channel to communicate in private and to avoid reprisals of disagreeing opposers.

Alice, Bob and Wendy from the prisoners problem will be recurring subjects in this thesis.

Many citizens face similar challenges to Alice and Bob. Several countries, like Pakistan⁵ and Turkey⁶ have already taken a hostile stance against encryption.

From Edward Snowden’s⁷ leaks over the years, we know the US is intercept- ing and inspecting huge amounts of internet communication⁸. Adversaries are becoming Dolev-Yao-like strong, perhaps even stronger, and have the measures to recognize and prevent encrypted traﬃc. We need steganography to ensure freely, private communication, without worrying of later reprisal.

1.1 Our solution

People are having their communication watched and analyzed by extremely potent adversaries. This is an obvious problem, as we would like to ensure confidential communication between anyone desiring so. We will try to solve this problem, by oﬀering an extension to a popular email client. The extension, which is named StegoBlock, extends existing emails with additional information.

StegoBlock can be installed on top of Thunderbird, an email client by Mozilla.

By installing, users may continue to use email as normal, but extended func- tionality will add a subliminal channel to all emails sent. It is no requirement for recipients to use StegoBlock or Thunderbird. They may continue to read primary messages as normal. They are simply unable to process the secondary message.

Users must continue to write emails as usual, but will be able to enter a short secondary message as well. This secondary message is kept confidential and hidden. It will have no eﬀect to users without StegoBlock installed, they will never notice it, unless inspecting email source code. An email consists of headers and body. Metadata and data. We will hide the secondary message in the email metadata, in a specially crafted header. Unless one knows where to look, the message will be hidden.

Users are not required to enter secondary messages, for the subliminal channel.

5Pakistan bans encryption: https://www.theguardian.com/world/2011/aug/30/

pakistan-bans-encryption-software

6Turkey charges over encryption software: http://www.aljazeera.com/news/2015/09/

vice-news-fixer-arrested-encryption-software-150901200622345.html

7Edward Snowden:https://en.wikipedia.org/wiki/Edward_Snowden

8PRISM surveillance program: https://en.wikipedia.org/wiki/PRISM_(surveillance_

program)

(15)

1.1 Our solution 5

They may choose to not enter one, then one will be picked at random. This random message will not be readable by the receiver, but only serves the purpose of always transferring something in the subliminal channel. By doing so, users are provided plausible deniability, i.e. they can plausibly deny actively putting something in the channel. This is superior to traditional crypto schemes, where encrypted messages always hold some meaningful value. With these, users may be threatened or blackmailed into disclosing their key, as the adversary may validate if they provide the correct key.

Consider Alice and Bob again. Alice is now a whistleblower inside a highly advanced intelligence agency. Bob is a journalist, eager to publish her information of corporate embezzlement. For obvious reasons, Alice wants to stay anonymous. Alice may send an encrypted email to Bob, but even though it may not be breakable, she will have leaked the fact that she communicated with Bob. For this reason alone, Alice exhibits suspicious activity. She might later be blackmailed, threatened or otherwise forced into disclosing her crypto key to her conversation with Bob. Adversaries will be able to tell, if she provides the real key or not.

In the depicted scenario, encryption alone is not enough for Alice and Bob, she will need some form of plausible deniability. We have before hinted at plausible deniability, being a great advantage of steganography. It is in fact one of the major advantages of our StegoBlock application. Plausible deniability refers to the condition, where a person can, plausibly and legally, deny any knowledge of, or association with some specific reality, in a way that they have deliberately provided beforehand. This protects the person from undesired repercussions from being associated with said specific reality. Alice can only save herself from potential blackmail and threats by setting up a cover of her communication in advance. She must be able to present something plausible to the adversary, something believable - but obviously not the real message.

Contrary to common practice, we will use steganography for achieving confidentiality, instead of encryption. People living under regimes with encryption bans, will be able to use our solution as a legal alternative. In that way, Ste- goBlock is a technological work-around, to a naive legal approach. We will use some cryptographic elements to implement StegoBlock, but none that are in fact classified as encryption. Today we see that strong encryption is becoming ubiquitous. Movements like "HTTS Everywhere"⁹ tries to make websites default to providing their content over HTTPS. "Let’s Encrypt"¹⁰ oﬀers free SSL certificates to everyone. Traditionally certificates would cost an annual fee, preventing some website owners from deploying HTTPS. Now they are free.

9HTTPS Everywhere: https://www.eff.org/https-everywhere

10Let’s Encrypt: https://letsencrypt.org

(16)

Popular instant-message providers like Facebook Messenger¹¹, WhatsApp¹²and Viber¹³ oﬀer end-to-end encryption. Mail providers like Proton Mail oﬀer the same, for traditional emails. The Google Chrome browser will even punish HTTP-only websites, with a visual representation.

Even government funded surveillance must have a hard time catching up, as encryption moves from exception to rule. Some governments, like Pakistan and Turkey, may only have capacity to identify and block applications using encryption, but the need for confidentiality is however still present, perhaps even more.

Several countries have bans on strong end-to-end encryption. People may end up in jail for using encrypted messaging services. We can provide people with private communication, and also the ability to plausibly deny any communication with a specific person. But if they still end in jail for simply having our application installed on their device - we consider our solution suboptimal. As encryption is a subarea of cryptography - using other areas of cryptography is considered fine. We will design StegoBlock to conform to the CIA triad, without using encryption and ensuring plausible deniability. StegoBlock users may present plausible deniability to any message they have exchanged with another.

We will perform thorough analysis of the confidentiality and usability of Ste- goBlock. We will reason that messages are in fact confidential, even without encryption. We will ensure that all encoded StegoBlock messages always adhere to some target character distribution - making it infeasible to reason about its contents by statistical analysis. We argue that our embedding method is so strong, that reversal without knowing the key is infeasible. The target distribution combined with a fixed length, imposes a max length on the secondary message. We will argue that we can still encode a reasonable amount of messages, by analyzing and testing large amounts of real world emails.

1.2 Scoping

Our StegoBlock application belongs somewhere in the gray zone between cryptography and steganography (but certainly not encryption). Some cryptographic primitives like random number generators will be used. We will detail on these, we will argue that we use them correctly - but we will not prove any formalities about them. We will generally assume, within reason, that used tools and libraries are correctly implemented and secure.

11Secret Conversations whitepaper: https://fbnewsroomus.files.wordpress.com/2016/

07/secret_conversations_whitepaper-1.pdf

12WhatsApp E2E encryption:https://www.whatsapp.com/faq/en/general/28030015

13Viber security overview: http://www.viber.com/en/security-overview

(17)

1.3 Thesis structure 7

1.3 Thesis structure

This thesis will continue with the following chapters and structure:

State of the art Explains the theory needed to support our solution.

The relevant theory within steganography, cryptography and other areas will be examined.

Problem analysisWe will break down the problem and examine possible solutions to each subproblems. We will consider pros and cons of diﬀerent solutions, before ultimately deciding on the overall idea of StegoBlock.

DesignBased on the theory, we will present a design chapter, detailing a very specific solution, in the form of a Thunderbird plugin.

ImplementationBased on the designed solution, we will present an implementation. We detail the hurdles we overcame and how each component is designed.

Evaluation/SteganalysisThe evaluation chapter examines our implemented solution like a typical steganalysis, and ultimately confirms our claims of confidentiality, integrity, plausible deniability and usability.

ConclusionLastly we will conclude on our results and learnings.

1.4 Summary

Online communication is being wiretapped by major intelligence agencies like FSB, GCHQ and NSA. Specifically NSA program PRISM grants instant access to major internet communication companies databases. Furthermore we have also learned how the same agencies wiretap the very backbones of the Internet.

Massive data centers allows for indexing these huge amounts of data. Analysts are capable of querying this data, for "suspicious stuﬀ" - for example people using encryption.

We propose using steganography, to ideally avoid seeming suspicious. Steganog- raphy is by definition not encryption, and might be treated as non suspicious.

At least for some time. Steganography may hide messages in such a way that adversaries are unable to tell if there is in fact a message.

We develop a steganographic solution, implemented as a Thunderbird email client extension. Users may enter secondary messages, that are encoded into

(18)

a generated block of text. The block adds a second subliminal channel, for secure messages without using encryption. We wish to make a solution that can withstand even these very strong adversaries, and work around possible legal regulation on encryption. We will also provide users with plausible deniability, so they can reasonably claim, that they did not write any secondary message.

(19)

Chapter 2

State of the art

2.1 Steganography

Steganography is the science of hiding information in such a way that it does not attract the attention of adversaries. For achieving confidentiality, steganography is the other major research topic, where encryption is the first. Because we initially ruled encryption out, due to possible regulations, we cannot use the latter for StegoBlock. Naturally, we turn to steganography instead.

2.1.1 History

The first mentions of what classifies as steganography, are two examples in ancient greek historian Herodotus’s Histories[KP00]. The first about Histiaeus who tattoos a message on his trusted slave’s shaved scalp. Histiaeus then waits for his hair to grow back, before sending him on his way to Aristagoras. When Aristagoras shaves his head again, the message is revealed. Another example is of Demeratus, who writes a secret message on a writing tablet, before applying wax onto it, which was common. The message was then only visibly after the wax was removed.

(20)

An ancient variety of steganography is watermarking, which is commonly known from bank notes, passports, tickets, postal stamps and other paper materials in need of counterfeiting resistance. Watermarking has its roots in the process of paper creation, where paper millers would embed their own watermark into their paper - assuring customers of the papers quality and origin. The first watermark known, dates back to 1292 to the town Fabriano in Italy[KP00], a time where competing paper mills needed to distinguish their brands from each other, because of varying quality, strength and format.

Steganography comes in other varieties, which we will explore further, (cf.

§2.1.3). The examples mentioned, highlights that there has been a need for not only protecting, but also hiding information for thousands of years.

2.1.2 Today

Most fortunately, we can now send subliminal messages without tattooing slave scalps. In our modern information age, most messages are now digital and sent via the Internet. There is however still a need for conveying messages without raising suspicion, or to watermark or fingerprint material. Steganography is heavily used in the copyright protection business, where watermarking is used to protect against digital copies of material. For instance, watermarks can be embedded in CD’s in such a way, that a normal computer program cannot simply copy the audio tracks and burn them onto a diﬀerent CD. Similarly, computer games and applications can resist starting if an original CD or DVD is not present. Some communication protocols can also rely on embedding hidden information into images, thus allowing secret communication.

We have also seen people communicating by embedding secret information into seemingly innocent pictures, then uploading them to online image boards. The image is viewed by hundreds or thousands, but only a few persons inaugurated to the scheme, will know to look for a subliminal message. Detecting such messages can be extremely diﬃcult, as images can be uploaded at so many diﬀerent websites.

Another example is invisible ink. Ink that only shows under specific circum- stances. Some ink may be invisible to the naked eye, unless viewed under ultraviolet light. Most people never expect letters in invisible ink, they may not even own an ultraviolet lamp - only the parties involved in the scheme will know.

Invisible ink has been used during several wars, as secret communication form.

We have also seen microdots, where information was photographed and shrinked

(21)

2.1 Steganography 11

to tiny dots on some cover letter. It could be regular letters or even newspapers.

Regular readers would not notice, but other spies would be able to magnify and read the hidden message.

2.1.3 Principles and forms

In steganography, we usually wish to hide the message m. This involves a harmless message, called a cover-object c. The message is embedded into this cover-object, transforming it into the stego-objects. Sometimes, a stego-keykis used for embedding/extraction. Some algorithms do not require cover-objects, but instead generate cover-objects from the message or pre-analyzed text cor- pora. These types rely on Context-Free-Grammars and appear promising for hiding text within text. Peter Wayner[Way09] has written extensively on this topic, and an implementation named SpamMimic¹ is available. If adversaries, human og computer, are unable to distinguish any stego-object from a cover- object, the steganography scheme is secure. Although breaking a steganography system normally consists of three parts: Detecting, extracting, and disabling embedded information, a system is already insecure if an attacker is able to prove the existence of a secret message[KP00].

Embedding a messageminto a cover-objectcin a way that will prevent a third party to distinguish a set of cover-objects and stego-objects from each other, is non trivial. Not all data types are equally suitable. The transformation cannot alter the cover-object in a way that makes it appear "odd". The cover-object must contain a fair amount of redundancy, so that this can be replaced with a message, without altering the original perception og the object. Inspecting diﬀerent algorithms has shown that noisy, redundant cover-objects, like images, are better suited for steganography, than very precise cover-objects like human readable text. These types are better, as they often oﬀer higher bitrate, better disguise and more robustness.

The literature on steganography distinguishes between 3 forms: Pure, Private key and Public key[KP00]. Each with a set of advantages and disadvantages.

2.1.3.1 Pure steganography

The security of the system lies within the secrecy of the algorithm. Parties do not need to exchange any keys before use, only the algorithm itself must be known in advance. Tattooing messages on slave scalps is a form of pure

1SpamMimic: http://www.spammimic.com

(22)

M

Encode C

S

(a)Encoding process, Pure steganography

S

Decode

M

(b)Decoding process, Pure steganography

steganography. Once adversaries learn this method, people would have their head shaved on the regular, just to make sure hidden messages are found. Pure steganography violates Kerckhoﬀs’s principle, which we will return to shortly.

Formally, pure steganography can me expressed as an encoding and decoding functionEandD:

E:C×M →S D:S→M Where:

C is the set of all cover-objects,M is the set of all messages and Sis the set of all stego-objects.

Under the condition:

|C| ≥ |M| And with the property:

D(E(c, m)) =m ∀ m∈M, c∈C

Informally, this means security is uphold by keeping E and D secret. There is no stego-key involved. The cover-object must be longer than the message, and that embeddingm intoc and extracting withD, will again revealm - for all possible messages and cover-objects. By every message transformed with pure steganography, an intruder learns something about the plain text. There is usually no seed, nonce, alternating key or anything else to introduce entropy between messages.

(23)

M

Encode K

C

S (a) Encoding process,

Private key steganography

S

Decode K

M (b) Decoding process,

Private key steganography

2.1.3.2 Private key steganography

For pure steganography, we can assume that an attacker will learnDandEover time. We generally consider it unsafe. Private key steganography introduces the stego-key k. The objective is similar to symmetric cryptography. We will have both plain text and a stego-key as input parameters for our public function D. We may also inject a seed, which should introduce enough entropy in the output stego-object. If and only if, the receiving party knows k, he can reverse the process and extract the message.

Formally, private key steganography can be expressed as an encoding and decoding function EandD:

EK :C×M ×K→S D:S×K→M Where:

C is the set of all cover-objects,M is the set of all messages,S is the set of all stego-objects and Kis the set of all stego-keys.

D_k(E_k(c, m, k), k) =m ∀ m∈M, c∈C, k∈K

(24)

This obviously poses the problem of both sender and receiver knowing the key.

In cryptography, we usually consider an alternative secure channel for transferring keys. This could be with a key exchange scheme, where the most well known might be the Diﬃe-Hellman key exchange protocol by Whitfield Diﬃe and Martin Hellman[DH76]. For private key steganography we will operate with the same assumption and leave key exchange outside the equation.

The StegoBlock application developed for this thesis will implement a private key steganography scheme. It depends on a stego-key for encoding and decoding.

2.1.3.3 Public key steganography

Similarly to asymmetric cryptography, public key steganography does not rely on a separate secure channel for key exchange. Instead, communicating parties are equipped with a key-pair: A private and a public key. The public key, used for encoding, is stored in a publicly accessible database. The private key, used for decoding, is kept private. In relation to Alice and Bob, Alice would use Bob’s public key, to encode her message. The message would then only be decodable with Bob’s private key.

Typical public key steganography systems rely heavily on asymmetric crypto, according to Petitcolas and Katzenbeisser and a system has been proven by Ross Anderson[KP00, AA96]. The fact that any text can be encoded by a steganographic system, means that it could be cipher text from any asymmetric encryption scheme as well as plain text. Because the encoding function accepts any message and cover-object from the sets of allM andC, this could of course also be the output of a crypto system. All cover-objects in C are promised indistinguishable from each other, whether or not they include a secret message.

If Wendy suspects a message to contain a hidden message and analyzes it, she will arrive at a cipher text, indistinguishable from what she would arrive at, whether or not the message contained a message.

Formally, public key steganography can me expressed as an encoding and decoding functionE andD:

E_K:C×M×K_pk(x)→S D:S×K_sk(x)→M Where:

C is the set of all cover-objects,M is the set of all messages,S is the set of all stego-objects andK is the set of all stego-keys.

(25)

M

Encrypt

Encode

K_pk(x)

C

S (a) Encoding process,

Public key steganography

S

Decypt

Decode

K_sk(x)

M (b) Decoding process,

Public key steganography

D_sk(x)(E_pk(x)(c, m, pk(x)), sk(x)) =m ∀ m∈M, c∈C, sk∈SK, pk∈P K Because the encrypt/encode and decrypt/decode functions are independent, and the key is used for the crypto scheme, one can refer to the scheme as pure steganography - even though it does not in practice conform to the previous definition.

2.1.3.4 Kerckhoﬀs’s principles

In 1883, Auguste Kerckhoﬀs formulated in the January and February issues of Journal des sciences militaires, 6 principles that any military grade crypto- graphic system should uphold. Especially one of these principles, are widely recognized amongst cryptographers. It has great applicability to steganography as well. His second principle is: The system must not require secrecy and can be stolen by the enemy without causing trouble[Ker83]. This boils down to, the crypto scheme must be able to sustain public knowledge. Only the key, which is chosen individually, is to be kept secret. Systems failing to comply with this

(26)

principle, are often referred to as "Security by obscurity", and are generally regarded as bad, for instance by NIST[oST08].

If security exists only by keeping the cryptographic mechanism secret, like a black box, the system will be completely broken when an adversary learns the inner workings and discovers a flaw. Cryptography is supposed to keep a message secret, by replacing it with another secret that is much easier to hold. Modern crypto systems are very complex, and keeping them secret would then only replace one secret with another large one.

If security is not bound to secrecy of the key, the system can only be used between trusted parties, and any object encrypted by the system, in the past or future, will be vulnerable when an adversary finds a flaw. Keeping a system secret does however not imply that it is flawed, but it will not be scrutinized by public experts. Some military grade cryptography algorithms are kept secret as an extra layer of protection, for instance NSA Type 1 cryptographic products².

2.1.4 Steganalysis

In short, steganalysis concerns itself with the ability to detect if a cover-object contains a stego-object or not, but also what it would require to destroy any stego-object, without also destroying the perceptibility of the cover-object. In order to discuss steganalysis, we will first establish a representation of all steganographic techniques. We can represent those asC=p+t, whereC is potential for a carrier of hiding information within it. p is the portion of the carrier which will produce perceptible diﬀerences if manipulated. t is opposite, the portion of the carrier that we can manipulate without producing perceptible diﬀerences[KP00]. This expression allows us to describe a kind of attack, where an attacker may destroy the stego-object that may be embedded in some cover- object. Ast describes some range of the cover-object within the imperceptible range, there exists somet^′, so the attacker can createC^′=p+t^′. This will be perceived as C, sincep- the perceptible portion was left intact. By changing the imperceptible region, an attacker may destroy a stego-object - without even knowing if there in fact was a stego-object. The robustness of a steganographic system is its ability to withstand such attacks, and thus steganalysis is a critical exercise to perform - both for security and durability reasons.

We can also consider embedding messages, or part of messages in perceptible regions, with the increased risk of being detected, especially by humans - but

2NSA Type 1 products: https://en.wikipedia.org/wiki/NSA_product_types#Type_1_

product

(27)

also with added robustness, as perceptible regions are not easily replaceable.

Attacks and analysis of steganography includes detecting, extracting, counterfeiting (embedding fake information over the existing hidden information), and destroying messages. StegoBlock does not try to guard against all these attacks.

In particular, it is not very robust. But it does protect against extracting and counterfeiting.

Steganalysis resembles cryptanalysis, but techniques vary. In cryptanalysis, we consider attacks of known-plaintext, chosen-plaintext and ciphertext-only. Each with increasing diﬃculty. We may analyze each of these three ways to observe an algorithm. In steganalysis, we similarly have [KP00]:

• Stego only attackOnly the stego-object is available for analysis. We can also refer to this setting as "blind", because of the very limited information level.

• Known cover attackThe cover-object and corresponding stego-object are both available. The analysts may look for linkage between them.

• Known message attackAt some point, the hidden message may become known to the attacker. Analyzing the stego-object for patterns that corre- spond to the hidden message may be beneficial for future attacks against that system. Even with the message, this may still be very diﬃcult.

• Chosen stego attack Both the steganography algorithm and stego- object are known.

• Chosen message attackThe steganalyst generates a stego-object from some steganography tool or algorithm from a chosen message. The goal in this attack is again to determine corresponding patterns in the stego-object that may point to the use of specific steganography tools or algorithms.

• Known stego attackThe steganography algorithm is known and both the original message and stego-object is available.

There are incredibly many ways of hiding digital information. Detecting them may be harder for some than others. The strength of steganography lies in hiding. Remember that a system is already insecure if an attacker is able to prove the existence of a secret message. Since embedding techniques can be so diﬀerent, it is not possible to pinpoint a single detection method. A steganalyst can however use some basic techniques that will take him far. In general it pays oﬀ to look for unusual patterns (which the human brain is very good at), to look for anomalies or to examine redundant or invalid data. Let us consider basic

(28)

examples and ways to examine cover-objects, to establish how these techniques can be of help.

Earliest stegosystems were physical, meaning the stego-objects were "real". It could be slave scalps, but also seemingly ordinary books or letters. A stegosys- tem with real letters, printed on paper, may vaguely shift letters or lines in diﬀerent directions, to convey a secret message. For instance, one could encode a secret message in a newspaper, by slightly tilting or shift individual letters.

The intended recipient will know to look for these letters, and by extracting only these, a subliminal message is revealed. The casual observer may not notice this, or know what it means - but the steganalyst will know to look for these unusual patterns and proceed to examine their meaning. In much the same way, a digital letter may look completely normal when shown compiled on screen, but has a secret message embedded in its markup. This could be by added invisible characters, elements or attributes - something that will not be rendered. The steganalyst should examine the source and look for patterns in these invisible elements. If they really do not have any eﬀect on the rendering, it may be that a secret message is embedded.

Consider also a parties communicating over a network, setting TCP packet headers to invalid values. Such values would normally be discarded, except if the party knows they contain a special hidden information. This resembles the Chaﬃng and Winnowing technique (cf. §2.2), where chaﬀ packets are automatically filtered by the intended recipient. A steganalyst will have to look for such invalid data, and assess if they might hold secret meaning or if they are simply the results of misconfigured clients.

There truly are vast ways and places to digitally hide information. An adversary will first have to expect the presence of a stego-object before spending time on information extracting. If he does not detect this, he may very well already have lost. Contrary to cryptography, it does not matter how much processing power the adversary has - he must expect its presence first. Should the embedded message also be encrypted before embedding - he may not even have a way of confirming that a message was indeed embedded - before breaking the encryption.

Processes for analyzing different stego-object types and different embedding types are widely different. There is no silver bullet, that tells if an object contains hidden information or not. Techniques depend on object type. Tech- niques for analyzing text is different from analyzing audio. When performing steganalysis, it is also common to encounter false positives and negatives. An analysis may falsely indicate that some object contains information, while it does not. An attacker may waste time and resources on extracting attempts.

An analysis may also not indicate any hidden information, because the embed-

(29)

ding scheme is too sophisticated. We cannot oﬀer a single solution for all kinds of data, but we can look at some examples and learn from the techniques, so we can adjust them for other areas.

2.1.4.1 Image analysis

Images are nice cover-objects for steganography. Their potential for hiding information,Cis usually excellent. An image typically has large amounts of redundancy and areas where manipulation has low perceptibility. A very common way of hiding information in images, is by manipulating the least significant bits (LSB). Essentially the human visual system and its inability to distinguish small anomalies in a large image, is exploited. By replacing the least significant bit of every element, we can easily encode a secret message in an image. Consider the example³ of hiding the letter "A" in a 24 bit image. "A" has the binary value 1000001and a length of 7. We would need 7 elements to encode our value. As mentioned before, the cover-object must be larger than our message, thus we will need an image of at least 3 pixels, since each pixel is made up of 3 values, expressing either Red, Green or Blue. Consider a random image of 3 pixels, which in binary may be expressed as:

10000000.10100100.10110101, 10110101.11110011.10110111, 11100111.10110011.00110011

We can encode our "A" into this binary sequence by replacing each LSB:

10000001.10100100.10110100, 10110100.11110010.10110110, 11100110.10110011.00110011

Bits underlined, highlight a replacement of the existing value. Since a bit can either be 1 or 0, the average replacement ratio should be 50%. Changes to images by an LSB algorithm will largely go unnoticed by humans, especially if the image contains many details. Images with monotone backgrounds or gradients will be bad - as a human would quickly notice the anomalies. Remember that there is no exact method for a steganalyst to decide the steganographic algorithm used, if any. He would have to suspect something embedded. In the example of LSB in images, it would however be trivial to perform a statistical analysis of the bits in the image. Any even-valued bit will either keep its value or be incremented.

It cannot be decremented. The opposite is true for odd-valued bits. This fact creates an asymmetry which is easily detected by techniques devised by Dabeer et. al. [DSM⁺04]. This statistical anomaly can be overcome with a more sophisticated LSB technique, named LSB-matching or ±1-embedding. We will adjust for the statistical anomaly, by increasing/decreasing other parts of each

3Example from: http://www.lia.deis.unibo.it/Courses/RetiDiCalcolatori/

Progetti98/Fortini/lsb.html

(30)

pixel accordingly. This technique does however also generate anomalies, but significantly harder to detect. Cancelli et. al. provides one method, among others, for this [CDJCB05]. A targeted steganalysis approach to examine the image example, would be to first assume LSB embedding, or another common algorithm, and to verify it. A blind approach would be to make statistical analysis of the stego-object, without assuming any specific embedding algorithm.

Statistical analysis is a very powerful tool for detecting steganography.

2.2 Chaﬃng and winnowing

Noticeable previous work has been made, to allow confidential communication without encryption. This work is particularly interesting, as it was made to highlight the flawed logic in banning strong encryption. In the late 90’s we also witnessed strong pressure against encryption. USA placed an export ban on strong cryptographic schemes, but history has shown that it was ineﬀective. As the Internet became ubiquitous, strong encryptions schemes did as well, it was impossible to regulate.

"Chaffing and winnowing", by Ronald Rivest attracted a lot of academic attention in 1998[Riv98]. It is presented as an alternative to both encryption and steganography. The idea uses the analogy of separating chaff from grain, a process known as winnowing. StegoBlock is very similar to "Chaffing and winnowing". To quote Rivest:

As usual, the policy debate about regulating technology ends up being obsoleted by technological innovations.

This is what we are trying to achieve again with StegoBlock, to be pedantic in the debate about encryption. Rivest utilizes an authentication method to achieve confidentiality. The concept is very straight forward: All packets are authenticated by appending a MAC of the packet. A transformation in the form of: packet → packet, M AC. Notice that the original packet was not transformed, it is in the clear. We remember that any message transmitted on a network may be transmitted in one or more packets. Let us relate this to Alice and Bob. Before Alice sends her message to Bob, she or her network adaptor, will break it into one or several packets. Each packet is authenticated by some secret key she shares with Bob. Rivest proposes a key exchange protocol, like Diﬃe-Hellman, for Alice and Bob to agree on a shared secret. Packets are now authenticated by some secret, known only by Alice and Bob. Bob will now recompute the MAC and drop any packet with a mismatching MAC. This is already done by network adaptors today, for instance on wireless networks,

(31)

2.2 Chaﬃng and winnowing 21

where every client receives all traﬃc, but only accepts traﬃc intended for that client. This is thewinnowing process.

What is left now, is simply to add chaﬀ. Rivest first proposes that Alice could send out one or more bogus packets for each "real" packet. A bogus packet with an invalid MAC. Bob would automatically filter those away, they would have no influence on their communication. Packets will consist of a sequence number, and packet contents. The MAC function would be defined as:

M AC(sequenceN o, packetContents, key). Examples of packets could then be:

(1,Hi Bob,465231) (2,Meet me at,782290) (3,7PM,344287)

(4,Love-Alice,312265)

If Alice added chaﬀ, it could be exemplified by:

(1,Hi Larry,532105) (1,Hi Bob,465231) (2,Meet me at,782290) (2,I’ll call you at,793122) (3,6PM,891231)

(3,7PM,344287)

(4,Yours-Susan,553419) (4,Love-Alice,312265)

It is easily seen how an adversary will be unable to make the decision which packet in the sequence, is the correct. In the scheme presented until now, creating a chaﬀ packet is diﬃcult. It would have to express some meaning.

Chaff packets without meaning would be easily distinguishable from real packets. For instance: (1,Hi Larry,532105)would be easily distinguishable from (1,fjSJswer,196845). To remedy this situation, Rivest first proposes to only transmit a single bit in every packet. The chaff packet should then consist of the opposite bit. This scheme is computationally hard to break and easily implementable. Transferring packets of single bits, does however have unnec- essarily much overhead. Network traffic speed would become drastically de- creased. There are however non-encryption algorithms for transforming packet contents into what appears as random characters. Rivest proposes his own "all- or-nothing" and "package transform" algorithms. This would allow for easy chaffing of larger packets. We can now amuse ourselves with how we achieved

(32)

confidentiality without encryption. We utilized the existing network adaptor process of filtering away unintended packets and generating chaﬀ-packets.

Rivest goes on to emphasize how the chaffing process is independent of knowing the secret, and how it can be distributed. Alice could even knowingly or unknowingly have Charles sit between her and Bob, generating chaff. She could securely delegate the chaffing process to some third party.

In somewhat the same way, our extension StegoBlock will accept some message we wish to keep confidential. Then add enough "chaff" around and in between the letters it consists of, to simply distort the message without altering any character. The final block contains the message and a whole bunch of chaff. We add chaff with a PRNG, to which the seed is kept secret between the sender and recipient. The recipient can easily "winnow" the StegoBlock, by initializing a PRNG with the same seed, bringing it into the same initial state and remove chaff. Only by knowing the secret, one can separate the chaff from the grain (message). We will elaborate much more on our scheme throughout the thesis.

2.3 Cryptography

We employ several cryptographic primitives in StegoBlock, but none of which are classified as encryption. In particular, we will use random number generators and hash functions - each belonging to the cryptographic toolbox. These tools are used in encryption schemes, but are not encryption themselves.

2.3.1 Randomizing algorithms and RNG’s

Being able to generate random numbers is critical. Computers today are deter- ministic. This is usually highly desirable, as we obviously enjoy the certainty of arriving at the same result, every time we perform the same calculation. Much eﬀort has gone into eliminating randomness in computers. But every so often, our applications need randomization for some reason or another. Unfortunately, it is then impossible to generate something truly indeterministic on a determin- istic machine, at least without some indeterministic input. While computers can process extremely complex calculations, for instance that 2^74,207,281 is a prime number⁴, they are also bad at flipping coins. The best they can oﬀer are Pseudo Random Number Generators (PRNG), that generate what we could perceive as random, provided with some seed. If the seed is random, the output will be

4Largest known prime number: https://en.wikipedia.org/wiki/Largest_known_prime_

number

(33)

2.3 Cryptography 23

random - or one could say that if the input is random, so is the output. This is beneficial, because the PRNG algorithm adapts the inputted randomness to some criteria. The PRNG translates the seed to a sequence of numbers that can be perceived as random and outputs a pattern. Since it’s the same pattern followed on each run (with same seed), the numbers outputted are only what we define aspseudo random.

Pseudo random numbers are good enough for most applications. If one needs a random sample of words in a dictionary, then a pseudo random number might be just fine. If one needs a random color for painting a computer desktop - a pseudo random number will be just fine.

Other applications crucially need random numbers. For instance the setup of a Secure Socket Layer connection in browsers. Early versions of the Netscape browser used a PRNG for generating random numbers, needed for SSL initial- ization. They seeded their PRNG with the concatenation of 3 values: Time of day, the process ID, and the parent process ID⁵. For a sophisticated attacker, these values are quite predictable, and thus the outputs were not random. Even though the precise values could not be derived, they could be approximated.

This would lower the range of values to try considerably, and brute forcing the correct value would become computationally feasible:

"Optimizations such as those described should allow even a remote attacker to break Netscape’s encryption in a matter of minutes."[daw96]

Choosing a seed with enough entropy, if the application requires it, is crucial for PRNG’s. Algorithms running in environments with I/O access, may calculate random numbers seeded by thermal or atmospheric noise. This raises complexity, and even requires dedicated hardware at demanding times.

Any PRNG conforming to requirements of cryptography, can be classified as a Cryptographically Secure Pseudo Random Number Generator (CSPRNG). The requirements are:

Next-bit testGiven the firstk bits of a random sequence, thek+ 1’th bit cannot be predicted by a polynomial-time running algorithm, with probability of success better than 50% [Yao82].

State compromise extensionsShould the PRNG’s state, or part of it, be revealed - it must be impossible to reconstruct any output previous to

5Random number generator attack: https://en.wikipedia.org/wiki/Random_number_

generator_attack#Predictable_Netscape_seed

(34)

that state⁶.

StegoBlock has to permute an array of characters, in such a way that it can be reversed only if one knows the key - and such that the permutation is one of every possible permutations. Technically, we need a cryptographically secure pseudo random number generator, for a randomizing algorithm.

Consider the strings= ”hello”. It has 5!(factorial) different permutations. A shuffling algorithm executed with inputs, must be able to return all5!permutations with equal probability, before we can begin considering it secure. Ob- viously such an algorithm has a need for randomness. Let us examine shuffling algorithms, also known as randomizing algorithms.

2.3.1.1 Shuﬄing

One of the most commonly known shuﬄing algorithms, proved to output one of all possible permutations, is the Fisher-Yates [FIS53] or Knuth Shuﬄe[Knu68].

The algorithm is very simple, first described by Fisher and Yates - later trans- lated to a computer algorithm by Donald Knuth. It can be seen in Algorithm 2.1⁷. The seen implementation has complexityO(n).

1 i n p u t: s t r i n g [ ] n

2 b e g i n

3 f o r i from n−1 downto 1 do

4 j ← random i n t e g e r s u c h t h a t 0 ≤ j ≤ i

5 e x c h a n g e n [ j ] and n [ i ]

6 end

Algorithm 2.1: The Knuth Shuﬄe.

In words, it will iterate the entire string, switching the current character and a random previous one (starting from the end). The string will be permuted and all possible permutations may be returned. We will not examine the proof, but in previously referenced materials, this has already been shown.

But one thing is the algorithm being solid on paper. There are several potential sources of bias to look out for when implementing. Let us examine some, as this will be very valuable in our later analysis of StegoBlock.

6CSPRNG requirements: https://en.wikipedia.org/wiki/Cryptographically_secure_

pseudorandom_number_generator#Requirements

7Wikipedia, Knuth Shuﬄe (Algorithm P):

(35)

2.3 Cryptography 25

Bad implementation It may seem obvious that an algorithm must be implemented correctly, however a developer can make subtle mistakes in even very simple algorithms. It may look as if results are correct(especially when generating random strings) - when in fact they are not. Notice in Algorithm 2.1 how j is a number drawn from the pool of remaining characters. One could make the mistake of drawing from all characters, introducing a bias[Atw07].

When examining this particular mistake, it may look as if we shuffle more than we should, which intuitively should be better for randomness, but slightly hurt performance. In reality, shuffling more, hurts randomness badly. Notice that the wrong implementation has nⁿ possible permutations (we iterate over n, and swap any of n). A correct Knuth Shuffle would have length(n)! possible permutations.

Consider running such a faulty algorithm on the array n = [1,2,3]. It would have27possible combinations, while it’s a mathematical fact that there should only be6possible combinations. Since 27 is not evenly divisible by 6, we cannot even assume that the extra outcomes are evenly distributed - there must be some bias. Paying extreme attention to detail when implementing is critical.

Scaling down state When our shuffle algorithm needs a random number for swapping characters, it needs a PRNG to do so. As we detailed earlier, this is a non trivial operation. Most PRNG’s are implemented in such a way that they provide random numbers in some fixed range. Internally, they may have random numbers in the range of0 to 2³²−1 or another fixed range. Applications may need random numbers in many other different ranges. Let us again consider our shuffle algorithm. In some setting, it might require some random number in the interval0−15. It will query some PRNG with an internal range of0−99.

A common way of fitting the result to the requested range, is to apply modulo length of requested range. This can produce a very subtle bias, if the internal state range is not divisible by the requested range. In this particular setting, we will see numbers0−3occurring17% more frequent than4−15, simply because 16does not divide evenly with100[wik].

One possible solution is to use a PRNG with a dynamic internal range, based on the request. Another, much more simple, is to not apply the modulo function.

If a number is drawn outside the desired range, request another. Keep doing this until a valid number is returned - even though this could potentially run forever.

Some PRNG’s have an internal range of0−1, but instead return floating points.

It is then common to multiply the result by the requested range, and round.

Again, this can introduce a bias, because of the finite precision of floating points.

(36)

The range of values producible would also be finite. If the number of values in the requested range, does not divide evenly by the floating point - we would again see a bias, similar to the one of PRNG’s applying modulo.

Scaling up state A third type of bias, also related to the inner workings of PRNG’s occur if it has too few distinct states. If it provide numbers in a range shorter of the requested. A RNG can never be used to securely permute any object into more distinct states, than it has internal states for. Consider again the example RNG that is seeded with 32 bits, e.g. can generate random numbers in the range of0to2³²−1. This will be enough to exceed the possible permutations of13card deck, as13!<2³²−1. A deck of 14 cards will however have more permutations. Typically the RNG will "wrap around" and reuse entropy, resulting in a bias.

2.3.1.2 Well known CSPRNG’s

It is impossible to prove that a sequence of numbers are truly random, but possible outcomes should appear with equal probability. NIST provides a series of tests to perform against a RNG, which is a good starting point[AR10]. As a rule of thumb in computer security, one should use existing tested primitives, instead of inventing/implementing their own.

A random number generator is either secure or not. If considered secure, one does not have to worry about the internal workings and potential bias we just iterated. A well known cryptographically secure PRNG is the Blum-Blum-Shub (BBS) PRNG[BS86]. BBS is a stream cipher and given some short input, it will generate a potentially infinite stream of pseudorandom output. BBS comes with a security proof, however now criticized for its impractical performance limitations[SS05].

Another approach is to use AES-CTR, AES in Counter mode. Similar to BSS, it may provide a potentially unlimited number of random bits, as it is a stream cipher. However according to NIST, one will have to reseed after2³²outputted bits, to ensure an adversary has a low advantage of predicting the output. It is like a sponge, we may soak it, squeeze out some random bits, but eventually run dry.

(37)

2.3 Cryptography 27

2.3.2 Integrity

When sending messages securely, we will also need to ensure message integrity.

Without integrity checks, an adversary may unknowingly, to the recipient, alter a message. The adversary may not be aware what he changes the message to, but his goal may simply be to obfuscate communication. One may falsely believe that a message unreadable by adversaries, for example if encrypted, is also safeguarded against tampering. Consider the theoretically perfect secure one-time-pad scheme⁸. Should an adversary change any bit of the cipher text, some plaintext bit will also change. Without integrity checks, the recipient will have no means of telling.

It is fairly simple to ensure integrity of a message. In fact, we already touched on this subject in Chaﬃng and Winnowing. Here we described how a MAC function was used to authorize messages, allowing only the recipient to successfully decide which messages are chaﬀ and which are winnow. We calculated a hash of our message, only the intended recipient could recalculate that hash.

A hashed MAC function is a cryptographic primitive that accepts a message and a secret key as input. It is a one-way function, it returns a digest, a fixed size output often much shorter than the input. The digest will change according to the input, even a small input change will aﬀect the digest greatly. See Figure 2.4 where we MD5-hash the values StegoBlock and stegoblock. Each input results in diﬀerent digests, but of equal length.

Given hash functions one-way nature, it is not possible to reverse the process and calculate the original message from a digest. It is however possible to have hash collisions, the scenario where two or more diﬀerent inputs generate the same digest. This is generally bad, as we can then no longer rely on the digest coming from the claimed message. There is then no way to tell if a certain message was altered or not. Depending on the strength of hash function, this is more or less common. An example of a broken or insecure hash function could be MD5, where even a normal household PC may calculate a collision. First examples of limited collisions were reported in 1993 By Den Boer and Bosselaers[dBB93] and in 2005 it was shown how to generate collisions within minutes on a standard notebook by Vastimil Klima[Kli06]. NIST currently (since August 5, 2015) rec- ommends using a minimum of SHA-256 hashing function for any usages[NIS15].

When hashing inputs very similar, for instanceHelloandhello, digests should also be very different. This is known as theavalanche effect, and is a result of good randomization. Without the avalanche effect, cryptanalysis would enable predictions of future digests. The Strict Avalanche Criterion states that if one

8One Time Pad: https://en.wikipedia.org/wiki/One-time_pad