Original Message Date: 06 Jun 91  13:48:41
From: Uucp on 1:125/777
To: Tom Jennings on 1:125/111
Subj: Ralph Merkle's paper 
^AINTL 1:125/111 1:125/777
From: toad.com!gnu (John Gilmore)
To:   pozar, tom
Date: Thu, 6 Jun 91 11:51:59 PDT

From postnews Thu Jul 13 03:10:10 1989
Subject: Merkle's "A Software Encryption Function" now published and Ť
available
Newsgroups: sci.crypt

Ralph Merkle called me today to let me know that Xerox was not going to
let him submit his paper on a nice new set of encryption and hash functions
to a journal for publication.  The story is that a division of Xerox
sells a lot of stuff to NSA and they threatened to pull their business
if Xerox publishes it.  There is no law that says NSA can stop Xerox
from publishing it -- it's just a "business decision" on Xerox's part.

Happily, however, I do not sell anything to the NSA.  And I have
a copy of the paper, which was distributed for several months
by Xerox, without any conditions, before NSA even heard of it.
The work was not government-sponsored or classified; there is no
law that lets the government suppress it.

As a courtesy to Xerox Corporation I could avoid publishing this
paper.  However, I prefer to extend the courtesy to the person who did
the work, Ralph Merkle, who would like to see it released and used.  I
do thank Xerox for supporting his excellent work and hope that they
will continue to do so.  Mr. Merkle did not ask or suggest that I
publish the paper, and should bear none of the blame (if any).

I have published and distributed a number of copies of this paper, and
I hereby offer to sell a copy of this paper to anyone who sends me $10
(cash preferred, checks accepted) and a return address.  Send your Ť
requests to:

        Merkle Paper Publishing
        PO Box 170608
        San Francisco, CA, USA  94117-0608

Since the paper is "published and is generally available to the public
through subscriptions which are available without restriction to any
individual who desires to obtain or purchase the published
information", it is exempt from State Department export control under
22 CFR 120.18 and 22 CFR 125.1 (a), and is exportable to all
destintions under Commerce Department General License GTDA under 15 CFR
379.3(a).  It can therefore be sent to foreign as well as US domestic
individuals.

I believe that the availability of fast, secure cryptography to the
worldwide public will enable us to build much more secure computer
systems and networks, increasing individual privacy as well as making
viruses and worms much harder to write.  For example, the Snefru
one-way hash function described in the paper would be a good choice for
validating copies of programs downloaded from BBS systems or the net,
to detect virus contamination.  If UUCP and TCP/IP links could be
encrypted with Mr. Merkle's Khafre or Khufu ciphers, simple monitoring
of phone wires or Ethernets would not yield login passwords, private
mail, and other serious security violations.  The technology exists;
all that stands in our way is a bureacracy that has no *legal* power
to restrict us, if we follow the published rules.

        John Gilmore

From postnews Thu Jul 13 03:31:56 1989
Subject: Online copy of Merkle's "A Software Encryption Function"
Newsgroups: sci.crypt
References: <7981@hoptoad.uucp>

[See the parent article <7981@hoptoad.uucp> for the legalese.  This
netnews article is exportable under 22 CFR 120.18, 22 CFR 125.1 (a),
and 15 CFR 379.3(a).  It can be sent to all foreign as well as US
domestic destinations.                                  -- John]

                             A Software Encryption Function
                                           by
                                    Ralph C. Merkle
                                       Xerox PARC
                                  3333 Coyote Hill Road
                                  Palo Alto, CA 94304


ABSTRACT

Encryption hardware is not available on most computer systems in use
today.  Despite this fact, there is no well accepted encryption
function designed for software implementation -- instead, hardware
designs are emulated in software and the resulting performance loss is
tolerated.  The obvious solution is to design an encryption function
for implementation in software.  Such an encryption function is
presented here -- on a SUN 4/260 it can encrypt at 4 to 8 megabits per
second.  The combination of modern processor speeds and a faster
algorithm make software encryption feasible in applications which
previously would have required hardware.  This will effectively reduce
the cost and increase the availability of cryptographic protection.


Introduction

The computer community has long recognized the need for security and
the essential role that encryption must play.  Widely adopted,
standard encryption functions will make a great contribution to
security in the distributed heavily networked environment which is
already upon us.  IBM recognized the coming need in the 1970's and
proposed the Data Encryption Standard.  or DES [19].  Although
controversy about its key size has persisted, DES has successfully
resisted all public attack and been widely accepted.  After some
debate its use as a U.S. Federal Standard has recently been reaffirmed
until 1992 [14].  However, given the inherent limitations of the 56
bit key size used in DES [16] it seems clear that the standard will at
least have to be revised at some point.  A recent review of DES by the
Office of Technology Assessment [15] quotes Dennis Branstad as saying
the 'useful lifetime' of DES would be until the late 199O's.

Despite the widespread acceptance of DES the most popular software
commercial encryption packages (for, e.g., the IBM PC or the Apple
Macintosh) typically offer both DES encryption and their own
home-grown encryption function.  The reason is simple -- DES is often
50 to 100 times slower than the home-grown alternative.  While some of
this performance differential is due to a sub-optimal DES
implementation or a faster but less secure home-grown encryption
function, it seems that DES is inherently 5 to 10 times slower than an
equivalent, equally secure encryption function designed for software
implementation.  This is not to fault DES -- one of the design
objectives in DES was quite explicitly a fast hardware implementation:
when hardware is available, DES is an excellent choice.  However, a
number of design decisions were made in DES reflecting the hardware
orientation which result in slow software performance -- making the
current extensive use of DES in software both unplanned-for and rather
anomalous.

Having offered a rationale for an encryption function designed for
software implementation, we now turn to the design principles,
followed by the actual design.


Basic Principles

The basic design principles in DES seem sound.  The fact that DES has
not been publicly broken speaks in their favor.  However, upon
examining specific design decisions in DES, we find that several
should be revised -- either in light of the software orientation of
the new encryption function, or because of the decreased cost of
hardware since the early '70's.  Examining the basic design decisions
one by one, and in no particular order, we can decide what reasonably
should be changed.

The selection of a 56 bit key size is too small, a problem which can
be easily remedied.  This subject has already been debated
extensively, and while 56 bits seems to offer just sufficient
protection for many commercial applications, the negligible cost of
increasing the key size virtually dictates that it be done.

The extensive use of permutations is expensive in software, and should
be eliminated -- provided that a satisfactory alternative can be
found.  While permutations are cheap in hardware and provide an
effective way to spread information (also called `diffusion' [21])
they are not the best choice for software.  In the faster
implementations of DES, the permutations are implemented by table
look-ups on several bits at once.  That is, the 48-to-32 bit
permutation P is implemented by looking up several bits at once in a
table -- where each individual table entry is 32 bits wide and the
table has been pre-computed to contain the permutation of the bits
looked up.  Using a table-lookup in a software encryption function
seems a good idea and can effectively provide the desired `diffusion'
-- however there seems no reason to limit such a table to being a
permutation, Having once paid the cost of looking up an entry in a
table, it seems preferable that the entry should contain as much
information as possible rather than being arbitrarily restricted to a
small range of possibilities.

Each individual S-box in DES provides only 64 entries of 4 bits each,
or 32 bytes per S-box.  Memory sizes have greatly increased since the
mid 1970's when DES was designed, and larger S-boxes seem appropriate.
More subtly, DES uses 8 S-boxes and looks up 8 different values in
them simultaneously.  While this is appropriate for hardware (where
the 8 lookups can occur in parallel) it seems an unreasonable
restriction for software.  In software, each table lookup must follow
the preceding lookups anyway -- for that is the nature of sequential
program execution.  It seems more valuable cryptographically to make
each lookup depend upon the preceding lookup, because in software such
dependency is nearly free.  This means that the cascade of
unpredictable changes that are so central to DES-type encryption
functions can achieve greater depth with fewer lookups.  Looked at
another way, DES has a maximum circuit depth of 16 S-boxes, even
though it has a total of 128 S-box lookups.  If those same 128 S-box
operations were done sequentially, with the output of each lookup
operation altering the input to the next lookup, then the maximum
circuit depth would be 128 S-boxes -- eight times as many and almost
certainly providing greater cryptographic strength.  This change would
have very little impact on the running time of a software
implementation on a typical sequential processor.  A larger S-box size
and sequential (rather than parallel) S-box usage should be adopted.

The initial and final permutations in DES are widely viewed as
cryptographically pointless -- or at least not very important.  They
are therefore discarded.

The key schedule in DES has received mild criticism for not being
sufficiently complicated[9].  In practice, all of the faster DES
software implementations pre-compute the key schedule.  This
pre-computation seems a good idea when large volumes of data are being
encrypted -- the pre-computation allows a more leisurely and careful
arrangement of the encryption tables and means the actual encryption
function can more rapidly scramble the data with less effort.  A more
complex key pre-computation therefore seems desirable.

Finally, the design criteria used for the DES S-boxes were kept
secret.  Even though there is no particular reason to believe that
they conceal a trap door, it would seem better if the criteria for
S-box selection were made explicit, and some sort of assurances
provided that the S-boxes were actually chosen randomly in accordance
with the published criteria.  This would both quiet the concerns about
trap doors, and also allow a fuller and more explicit consideration of
the S-box selection criteria.

With this overview of design principles we can now proceed to the design.


Khufu, Khafre and Snefru

There are actually two encryption functions named Khufu and Khafre,
and a one-way hash function named Snefru (all three names were taken
from the Pharoahs of ancient Egypt following a suggestion by Dan
Greene.  To quote the Encyclopedia Britannica, "The ideal pyramid was
eventually built by Snefru's successor, Khufu, and the first --- the
Great Pyramid at Giza --- was the finest and must successful."  Khafre
was Khufu's son).

The basic hardware model around which they are optimized is a 32-bit
register oriented microprocessor.  The basic operations are 32-bit
load, store, shift, rotate, `xor' and `and'.

The two encryption functions are optimized for somewhat different
tasks, but use very similar design principles.  Khufu is designed for
fast bulk encryption of large amounts of data.  To achieve the fastest
possible speed, the tables used in encryption are pre-computed.  This
pre-computation is moderately expensive, and makes Khufu unsuited for
the encryption of small amounts of data.  The other encryption
function -- Khafre -- does not require any pre-computation.  This
means Khafre can efficiently encrypt small amounts of data.  On the
other hand, Khafre is somewhat slower than Khufu for the encryption of
large volumes of data because it takes more time to encrypt each
block.

The one-way hash function -- Snefru -- is designed to rapidly reduce
large blocks of data to a small residue (perhaps 128 or 256 bits).
Snefru requires no pre-computation and therefore can be efficiently
applied to small arguments.  Snefru will be used exclusively for
authentication purposes and does not provide secrecy.

We first discuss the design of Khufu.


Khufu

Khufu is a block cipher operating on 64-bit blocks.  Although
increasing block size was a very tempting design alternative, the
64-bit block size of DES has not been greatly criticized.  More
important, many systems built around DES assume that the block size is
64 bits.  The pain of using a different encryption function is better
minimized if the new encryption function can be easily 'plugged in' in
place of the old -- which can be done if the block size is the same
and the key size is larger.  The new encryption function essentially
looks exactly like the old encryption function -- but with some new
keys added to the key space.  Increasing the block size might have
forced changes in more than just a single subroutine -- it might (for
example) have forced changes in data formats in a communications
systems.

Khufu, like DES, is a multi-round encryption function in which two
32-bit halves (called L and R for Left and Right) are used alternately
in the computations.  Each half is used as input to a function F,
whose output is XORed with the other half -- the two halves are
exchanged and the computation repeated until the result appears to be
random (no statistically detectable patterns).  Khufu uses a different
F-function than DES -- and uses different F-functions during the
course of encryption.  One round of DES uses an F-function defined by
8 table lookups and associated permutations.  By contrast, one round
of Khufu uses a single table lookup in a larger S-box.  In addition,
in the first step of encryption (prior to the main loop) the plaintext
is XORed with 64 bits of key material, and again in the final step of
encryption (following the main loop) the 64-bit block is XORed with
another 64 bits of key material to produce the ciphertext.

In somewhat more detail, the 64-bit plaintext is first divided into
two 32-bit pieces designated L and R.  L and R are then XORed with two
32-bit pieces of auxiliary key material.  Then the main loop is
started, in which the bottom byte from L is used as the input to a
256-entry S-box.  Each S-box entry is 32-bits wide, The selected
32-bit entry is XORed with R.  L is then rotated to bring a new byte
into position, after which L and R are swapped.  The S-box itself is
changed to a new S-box after every 8 rounds (which we shall sometimes
call an `octet').  This means that the number of S-boxes required
depends on the number of rounds of encryption being used.  Finally,
after the main loop has been completed, we again XOR L and R with two
new 32-bit auxiliary key values to produce the ciphertext.

For efficiency reasons, we restrict the number of rounds to be a
multiple of 8, i.e., an integral number of octets.  If the main
encryption loop is always executed a multiple of 8 times, then it can
be unrolled 8 times -- which is substantially more efficient than the
definitionally correct but inefficient versions given in this paper.
For this reason, the variable `enough' given below must be an exact
multiple of 8.  Various integer calculations will not work correctly
for values of 'enough' that are not multiples of 8.  Encryption of a
single 64-bit plaintext by Khufu can be viewed algorithmically as
follows:

   L, R: int32;
   enough: integer; -- the security parameter, default of 16 seems Ť
appropriate.
              -- values of 8, 16, 24, 32, 40, 48, 56, and 64 are possible.
   SBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32; -- key Ť
material
   AuxiliaryKeys: ARRAY[1 .. 4] OF int32; -- additional key material
   rotateSchedule: ARRAY [1 .. 8] := [16,16,8,8,16,16,24,24];
   octet: integer; -- really (round + 7)/8
             -- it keeps track of which 8-round 'octet' we are currently in

   L := L XOR AuxiliaryKeys[l];
   R := R XOR AuxiliaryKeys[2];
   octet := 1;
   FOR round := 1 to enough DO -- Note that 'enough' must be a multiple of Ť
8 
      Begin
         R := R XOR SBoxes[octet] [L AND #FF];
         L := RotateRight[L, rotateSchedule[(round-1) mod 8 + 1] ];
         SWAP[L,R];
         if (round mod 8 = 0) then octet := octet + 1;
      End;

   L := L XOR AuxiliaryKeys[3];
   R := R XOR AuxiliaryKeys[4];

Notationally, it will be convenient to index the different variables
at different rounds.  This indexing is explicitly given by re-writing
the above algorithm and replacing L and R with arrays.  In addition,
we add the array 'i' to denote the indices used to index into the
S-box.

   L, R: ARRAY [0 .. enough] OF int32;
   enough: integer; -- the security parameter, default of 16 seems Ť
appropriate.
              -- values of 8, 16, 24, 32, 40, 48, 56, and 64 are possible.
   i: ARRAY[0 .. enough] OF int8; -- 8-bit bytes
   SBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32; -- key Ť
material
   AuxiliaryKeys: ARRAY[1 .. 4] OF int32: -- additional key material
   rotateSchedule: ARRAY [1 .. 8] := [16,16,8,8,16,16,24,24];
   octet: integer; -- really (round + 7)/8
              -- it keeps track of which 8-round 'octet' we are currently Ť
in

   L[0] := L[-1] XOR AuxiliaryKeys[l];
   R[0] := R[-1] XOR AuxiliaryKeys[2];

   octet := 1;

   FOR round := 1 to enough DO -- Note that `enough' must be a multiple of Ť
8
      Begin
         i[round] := L[round-1] AND #FF
         L[round] := R[round-1] XOR SBoxes[octet] [ i[round] ];
         R[round] := RotateRight[ L[round-1], rotateSchedule[(round-1) mod Ť
8 + 1] ];
         if (round mod 8 = 0) then octet := octet+ 1;
      End;

   L[enough + 1] := L[enough] XOR AuxiliaryKeys[3];
   R[enough + 1] := R[enough] XOR AuxiliaryKeys[4];

The plaintext is (by definition) [L[-1], R[-1]], while the ciphertext
is [L[enough + 1], R[enough + 1]].  By definition, round 1 computes
L[1] and R[1] from L[0] and R[0], using index 1 -- or i[1].
Similarly, round n computes L[n] and R[n] from L[n-1] and R[n-1] using
i[n].  We shall sometimes say that 'round' 0 computes L[0] and R[0]
from L[-1] and R[-1], and that 'round' enough + 1 computes L[enough + 1]
and R[enough + 1] from L[enough] and R[enough].

The primary purpose of the rotation schedule is to bring new bytes
into position so that all 8 bytes of input are used in the first 8
rounds (or first octet).  This means that a change in any single input
bit is guaranteed to force the use of a different S-box entry within 8
rounds, and so initiate the cascade of unpredictable changes needed to
hide the input.  A secondary purpose of the rotation schedule is to
maximize the number of rotates by 16 because they tend to be faster on
many microprocessors.  For example, the 68000 has a SWAP instruction
which is equivalent to rotating a 32-bit register by 16 bits.  Also,
rotation by 16 tends to be very fast on processors with 16 bit
registers -- simply by altering one`s viewpoint about which register
contains the lower 16 bits and which register contains the upper 16
bits it is possible to perform this operation with no instructions at
all.  The final purpose of the rotation schedule was to restore the
data to its original rotational position after each octet of 8 rounds.
Thus, the sum of the rotations is equal to 0 module 32.

A different S-box is used after each octet of encryption.  This has
two beneficial effects: first, it means that the same S-box entry will
never be used twice with the same rotational alignment.  That is, if a
single S-box were used for all octets, then it might be that i[1] (the
index used to select an S-box entry on the first round) and i[9] might
be the same -- and therefore the same S-box entry would be used in
rounds 1 and 9.  These identical S-box entries would cancel each other
out because a value XORed with itself produces 0.  (If i[1] = i[9],
then SBox[i[1]] XOR ...stuff... XOR SBox[i[9]] would equal
...stuff...)  Both i[1] and i[9] would have had no effect on the
encryption process.  This would weaken the encryption function.  If,
however, the S-box is changed after every octet then even if i[1] =
i[9], cancellation is very unlikely to occur (because SBoxes[1][i[1]]
is almost certainly different from SBoxes[2][i[9]], even though i[1] =
i[9]).  A second beneficial effect is to insure that the encryption
process is entirely different during the second octet than in the
first octet.  If the same S-box were used, then the second octet would
compute the same function as the first octet -- which can be a serious
weakness.

The parameter 'enough' is used because encryption must continue for
enough rounds to obscure and conceal the data.  The exact number of
rounds that is sufficient will no doubt be a matter of considerable
debate -- it is left as a parameter precisely so that those who wish
greater security can use more rounds, while those who are satisfied
with fewer rounds can encrypt and decrypt data more rapidly.  It seems
very unlikely that fewer than 8 rounds (one octet) will ever be used,
nor more than 64 rounds (8 octets).  The author expects that almost
all applications will use 16, 24, or 32 rounds.  Values of 'enough'
that are not multiples of 8 are banned.


Pre-Computing the S-Boxes

While 128 bits of key material is used at the start and finish of the
encryption process (e.g., 64 bits at the start and 64 bits at the
finish from the 128-bit array 'AuxiliaryKeys'), most of the key
material is mixed in implicitly during the encryption process by
selection of entries from the S- boxes.  All the S-boxes (along with
the 128 bits of auxiliary key material) are pre-computed from a
(presumably short) user supplied key.  The S-boxes ARE most of the
key.  This raises the question of how the S-boxes are computed and
what properties they have.  While the specific method of computing the
S-boxes is complex, the essential idea is simple: generate the S-boxes
in a pseudo-random fashion from a user supplied key so that they
satisfy one property: all four of the one-byte columns in each S-box
must be permutations.  Intuitively, we require that selection of a
different S-box entry change all four bytes produced by the S-box.
More formally, (where `#' means 'not equal to' and SBoxes[o][i][k]
refers to the kth byte in the ith 32-bit entry of the SBox used during
octet 'o'): for all o, i, j, k; i # j implies SBoxes[o][i][k] #
SBoxes[o][j][k].

We can divide the pre-computation of a pseudo-random S-box satisfying
the desired properties into two parts: first, we generate a stream of
good pseudo-random bytes; second, we use the stream of pseudo-random
bytes to generate four pseudo-random permutations that map 8 bits to 8
bits.  These four pseudo-random permutations ARE the generated S-box.
We repeat this process and compute additional S-boxes until we have
enough for the number of rounds of encryption that we anticipate.

We could generate a stream of pseudo-random bytes using an encryption
function -- but we have no S-box to use in such an encryption
function!  To circumvent this circularity problem, we can assume the
existence of a single 'initial' S-box.  Although we must get this
initial S-box from somewhere, for the moment we assume it exists and
satisfies the properties described earlier.  We will discuss where it
comes from later.

We (rather arbitrarily) adopt a 64-byte 'state' value for our
pseudo-random byte-stream generator.  That is, the user-provided key
is used to initialize a 64-byte block (which effectively limits the
key size to 512 bits -- this does not seem to be a significant limit).
This 64-byte block is then encrypted using Khufu (using the standard
S-box for all octets, and setting the auxiliary keys to 0) in cipher
block chaining mode.  (Although use of a single S-box for all rounds
will result in occasional cancellations as described earlier, this is
acceptable for this particular application.)  This provides 64
pseudo-random bytes.  When these 64 bytes have been used, the 64-byte
block is again encrypted, providing an additional 64 pseudo-random
bytes.  This process is repeated as long as more pseudo-random bytes
are needed.

Now that we have a stream of pseudo-random bytes, we must convert them
into the needed permutations.  We adopt the algorithm given in Knuth
Vol II.  In this algorithm, we start with some pre-existing (not
necessarily random) permutation.  For our purposes, we can start with
the initial S-box.  We then interchange each element in the initial
permutation with some other randomly chosen element, thus producing a
random permutation.  In a pseudo programming language we have:

FOR octet := 1 to enough/8 DO
   SBox:= initialSBox;
   FOR column := 0 to 3 DO
      BEGIN
          FOR i := 0 to 255 DO
             BEGIN
                randomRow := RandomInRange[i,255]; -- returns a random Ť
number
                                             -- between i and 255, Ť
inclusive
                SwapBytes [SBox[i,column], SBox[randomRow,column]];
             END;
      END;
   SBoxes[octet] := SBox;
END;

The routine 'RandomInRange' uses the stream of random bytes to
actually generate a number in the requested range.


Khafre

The design of Khafre is similar to the design of Khufu except that
Khafre does not pre-compute its S-box.  Instead, Khafre uses a set of
standard S-boxes (discussed in the next section -- note that the
standard S-boxes are different from the one initial S-box).  The use
of standard S-boxes means that Khafre can quickly encrypt a single
64-bit block without the lengthy pre-computation used in Khufu;
however it also means that some new mechanism of mixing in key
material must be adopted because the standard S-boxes can not serve as
the key.  The mechanism of key-mixing is simple -- key material is
XORed with the 64-bit data block before the first round and thereafter
following every 8 rounds.  A consequence of this method is that the
key must be a multiple of 64 bits -- it is expected that 64-bit and
128-bit key sizes will typically be used in commercial applications.
Arbitrarily large key sizes can be used, though this will slow down
encryption.

We can summarize Khafre as follows:

   L, R: int32;
   standardSBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32;
   key: ARRAY [0 .. enoughKey-1] OF ARRAY [0 .. 1] of int32:
   keyIndex: [0 .. enoughKey-1];
   rotateSchedule: ARRAY [1 .. 8] := [16,16,8,8,16,16,24,24];

   L := L XOR key[0][0];
   R := R XOR key[0][1];
   keyIndex := 1 MOD enoughKey;
   round := 1;
   octet := 1;
   WHILE (round <= enough) DO
       Begin
          L := L XOR standardSBoxes[octet] [R AND #FF];
          R := RotateRight[R, rotateSchedule[round mod 8 + 1] ];
          SWAP[L,R];
          IF round MOD 8 = 0 THEN
              BEGIN
                L := L XOR RotateRight[ key[keyIndex][0], octet];
                R := R XOR RotateRight[ key[keyIndex][1], octet];
                keyIndex := keyIndex + 1;
                IF keyIndex = enoughKey THEN keyIndex := 0;
                octet := octet + 1;
              END;
          round := round + 1;
       End;
       IF keyIndex # 0 THEN ERROR;

We again require that the number of rounds be a multiple of 8 for
efficiency reasons.  In addition, we will require that the key size
evenly divide the number of octets + 1, That is, we require that
enoughKey MOD (enough/8 + 1) = 0. (This requirement is shown in the
code by the final test and error condition).  This condition is
imposed to insure that decryption can be done correctly.  If we did
not impose this condition, then we would have to compute the correct
value of 'keyIndex' to use when we started to decrypt.  For example,
if we used a 128-bit key for 32 rounds to encrypt a 64-bit plaintext,
then the final entry used in the key array would be key[1].  If,
however, we had encrypted for 24 rounds the final entry would have
been key[0].  Computing the correct value of keyIndex with which to
start the decryption process would require additional computational
steps -- it would require a MOD operation which is computationally
expensive.  To avoid this, we simply impose the condition that we can
always decrypt by starting from the last entry in the key array
(keyIndex = enoughKey-1).  This effectively imposes the constraint
that (enough/8 + 1)/enoughKey is exactly 0.

Khafre will probably require more rounds than Khufu to achieve a
similar level of security because it uses a fixed S-box.  In addition,
each Khafre round is somewhat more complex than each Khufu round.  As
a consequence of these two factors, Khafre will take longer than Khufu
to encrypt each 64-bit block.  In compensation for this slower
encryption speed, Khafre does not require pre-computation of the S-box
and so will encrypt small amounts of data more quickly than Khufu.


Snefru

Snefru is a one-way hash function.  One-way hash functions are
fundamentally intended for authentication, not secrecy, and so their
design is somewhat different than that of conventional encryption
functions.  However, many of the same basic principles are applicable.
Snefru in particular uses many of the design principles used in Khufu
and Khafre.

One way hash functions (also known as MDC's (Manipulation Detection
Codes)) have been generally known for some time.  The first definition
was apparently given by Merkle [1,2] who also gave a method of
constructing one-way hash functions from random block ciphers.  A more
recent overview was given by Jueneman, Matyas, and Meyer[11] and
Jueneman[4].

The major use of one-way hash functions is for authentication.  If a
value y can be authenticated, then we can authenticate x by computing:

   y = F(x)

No other input x' can be found (although they probably exist) which
will generate y.  A 128 bit y can authenticate an arbitrarily large x.
This property is crucial for the convenient authentication of large
amounts of information.

Broadly speaking, there are two definitions for one-way hash
functions.  The first definition is for a `weak' one-way hash
function.  A weak one-way hash function is a function F such that:

   1.) F can be applied to any argument of any size.  For notational
convenience, F applied to more than one argument is equivalent to F
applied to the bit-wise concatenation of all its arguments.

   2.) F produces a fixed size output. (The output might be 64 bits).

   3.) Given F and x, it is easy to compute F(x).

   4.) Given F and a 'suitably chosen' (e.g., random) x, it is
computationally infeasible to find a x' # x such that F(x) = F(x').

The phrase 'computationally infeasible' used above can be replaced
with any one of several more precise definitions -- each definition
will in turn result in a somewhat different definition of a one way
hash function.  Snefru is intended to be a 'random' one-way hash
function -- e.g., for all practical purposes it can be modelled by a
very large table of random numbers, or by a `demon' who selects a
random number whenever we wish to compute some output value for
Snefru.  This is discussed further in [18].

In a weak one way hash function there is no guarantee that it's
computationally infeasible to find pairs of inputs that produce the
same output.  That is, it might be that F(z) = F(z') for some inputs z
and z' that someone in fact found.  However, if x # z and x # z', then
this doesn't matter.  Clearly, we must prevent someone from
deliberately picking x so that it is equal to a value of z or z' which
they know has this undesirable property.  What's more, it must be
difficult to generate too many z-z' pairs (and it clearly must be
impossible to generate z-z' pairs in which we can effectively control
the value of z or z') or even a randomly chosen x would not be safe.
Thus, choosing x in a random fashion should make it unlikely that
anyone can find an x' such that F(x)=F(x').  It is possible, however,
to choose x in a non-random fashion and still be safe, if we assume
that F is random (a reasonable assumption for Snefru), then any method
of choosing x that does not depend on F is effectively random with
respect to F and therefore suitable.  This observation is sometimes
important in practice, for it allows simpler (therefore faster and
cheaper) methods of selecting x.

Weak one-way hash functions also suffer from the problem that repeated
use weakens them.  That is, if a weak one-way hash function is used
to hash 1,000 different random values for x (which might represent
1,000 different messages signed by a 1,000 different people) then the
chances of finding an x' such that one of the thousand values of x
satisfies F(x) = F(x') might be 1,000 times higher.  As more people
sign more messages with the same weak one-way hash function, the
overall security of the system is degraded.  This undesirable state of
affairs can be dealt with by parameterization, which is the approach
taken with Snefru.  We simply define a family of one-way hash
functions with the property that each member Fi of the family is
different from all other members of the family, and so any information
about how to break Fi will provide no help in breaking Fj for i # j.
If the system is designed so that every use of a weak one-way hash
function is parameterized by a different parameter, then overall
system security can be kept high.

The alternative definition is of a `strong' one-way hash function.  A
strong one-way hash function is a function F such that:

    1.) F can be applied to any argument of any size.  For notational
convenience, F applied to more than one argument is equivalent to F
applied to the bit-wise concatenation of all its arguments.

    2.) F produces a fixed size output.  (The output might be 128 bits).

    3.) Given F and x, it is easy to compute F(x).

    4.) Given F, it is computationally infeasible to find any pair x,
x' such that x # x' and F(x) = F(x').

Strong one-way hash functions are easier to use in systems design than
weak one-way hash functions because there are no pre-conditions on the
selection of x, and they provide the full claimed level of security
even when used repeatedly (or misused either because of error or
sub-optimal system design).  Many authors recommend the exclusive use
of strong one-way hash functions[4,11] because of the practical
difficulties of insuring that the pre-conditions on x imposed by a
weak one-way hash function are met, and the increased problems in
insuring that different parameters are selected for each use.  In
practice, the constraints imposed by use of a weak one- way hash
function imply that x cannot be chosen by an agent who has a motive to
break the system.  If A signs a message which B provides, then A will
have to 'randomize' the message by some means before signing it.  If A
fails to randomize the message, then B can select a 'rigged' message,
and later surprise A by exhibiting a novel message and showing that A
signed it.

Weak one-way hash functions can be quite valuable in some system
designs, although they must be used with caution.  Parameterized weak
one-way hash functions in particular appear to be under-rated.

Snefru can be used as either a weak or a strong one-way hash function,
with or without parameterization depending on the desires of the
system designer.

A one-way hash function F accepts an arbitrarily large input --
however, it is much easier to specify a function that accepts a fixed
size input.  We will, therefore, follow a two-step procedure to define
F.  First, we will define a fixed size function F0 which has the same
properties as F but which accepts a fixed size input, and then we will
use F0 to construct F.  By definition, F0 has properties 2, 3, and 4
listed above for F; but replaces property 1 (which says that F can
have an unlimited input size) with the simpler property that F0 can
accept only a fixed size input.

A fixed-size strong one-way hash function F0 has the following properties:

   1.) F0 can be applied to a fixed size input argument (the input
might be 256 bits).  The size of the input must be larger than the
size of the output.  For notational convenience, F0 applied to more
than one argument is equivalent to F0 applied to the bit-wise
concatenation of all its arguments.

   2.) F0 produces a fixed size output. (The output might be 128 bits).

   3.) Given F0 and x, it is easy to compute F0(x).

   4.) Given F0, it is computationally infeasible to find any pair x,
x' such that x # x' and F0(x) = F0(x').

   It is often convenient if the following property is also satisfied:

   5.) Given F0(x), and a randomly chosen x, it is computationally
infeasible to determine x.

If we view x as an array, then we can define F(x) in terms of F0 in the Ť
following fashion:

FUNCTION F(x[1 .. n])
BEGIN
   result := 0;
   FOR i := 1 to n DO
      result := F0(result,x[i]);
   END DO;
   RETURN(result);
END;

(Note that x can be padded with zero's at the end to insure that x[n]
is of the correct size).

We have added property 5 to our fixed-size strong one-way hash
function because it will be convenient and useful to use F0 as a
normal one-way function -- that is, the input is chosen randomly and
is kept secret, while the output is made public.  When used in this
way, it is not necessary that we reduce the size of the input.  For
this reason, we will only require that F0 have this property, and will
not attempt to show that F also has this property.

We can show that F must satisfy properties 1 through 4 if F0 satisfies
properties 2 through 4, and if F0 accepts only a fixed size input.
From the program defining F it is obvious that it will accept an input
of arbitrary size, so property 1 is satisfied.  It is also obvious
that F will produce a fixed size output -- which is the size of
'result' -- so property 2 is satisfied.  Property 3 follows because
computation of F(x) requires time linear in the size of x -- (which we
actually take as the definition of 'easy') -- and because computation
of F0 is 'easy'.  The fact that property 4 holds can be shown
inductively as follows:

Clearly, if n = 1, then property 4 holds for it holds for F0.  Assume,
then, that the property holds for n-1, and we wish to prove it for n.
We know that:

       result := F0( F(x[1 .. n-1]), x[n])

From the fact that property 4 holds for F0 it follows that neither
F(x[1 .. n-1]) nor x[n] could have been changed -- if they had been,
we would have two inputs to F0 that produced the same output.  But if
F(x[1 .. n-1]) has not been changed, then x[1 .. n-1] has not been
changed, by the induction hypothesis.  Q.E.D.

The preceding paragraphs mean simply that in order to design a strong
one-way hash function we need to design a fixed-size strong one-way
hash function F0, from which we can then build F.  In what follows, we
shall define F0 and present intuitive arguments that it is difficult
to break.

Traditionally, one-way hash functions have been designed by treating the Ť
input, x, as the key and
then encrypting a fixed size block.  We will pursue a different approach.  Ť
In this approach, we will
treat the input, x, as the input to a block cipher which uses a fixed key. Ť
 We will then encrypt x
and exclusive or the output of the encryption function with x.  That is,

      F0(x) is defined as: E(0, x) XOR x

      where E(key, plaintext) is a 'good' conventional block cipher.

We then retain as many bits of the output as we desire.

This general approach has been used before[12,23] but must still be
justified.  We prefer this approach to the more traditional approach
(in which x is treated as the key to an encryption function) because
it is faster.  In the traditional approach, it is necessary to mix the
key with a fixed plaintext block during the encryption process.  This
mixing process requires additional steps.  In particular, the key must
be repeatedly re-loaded from memory to be re-mixed with the block
being encrypted. (By way of example, consider that the 56 bit key used
in DES is actually expanded internally into a 768-bit key, so that
each bit of key material can be mixed into the block being encrypted
several times).  On the other hand, if we treat x as the input to a
block cipher then we can use a fixed key, need do no key mixing, and
can still provide excellent security.  To show that security is
preserved using this method, we will appeal to the intuition that a
'good' encryption function appears to be random.  That is, any change
to the input will produce an unpredictable and apparently random
change in the output.  E(0, x) is totally unrelated to E(0, x XOR 1)
-- changing a single bit produced a 'random' change.  We presume that
there is no effective method of analyzing E and that therefore it must
be viewed as a 'black box' -- it's possible to encrypt or decrypt, but
it's not possible to make any prediction about the value that will be
produced (unless we've already encrypted or decrypted that particular
value).

If E is random in the sense discussed above, then E(0, x) is random --
even if x is highly structured.  Therefore E(0, x) XOR x is random and
cannot be predicted from a knowledge of x.  To determine an x' such
that F0(x) = F0(x'), x' must (by definition) satisfy the equation:

      E(0, x) XOR x = y = E(0, x') XOR x'

However, if we assume that the only operations we can perform for the
cryptanalysis are encryption and decryption of a block, i.e., the
computation of E(0,w) or D(0,w) (where D stands for Decryption) for
any value of w that we choose, then our chances of actually finding x'
are little better than chance.  We can select some w by any method we
desire, and then compute E(0,w) -- but this will produce a nearly
random value which, when XORed with w, will have a very small
probability of being equal to y.  If we operate in the reverse fashion
and compute D(0,w) XOR w it too, for the same reason, will only equal
y by chance.

The critical question that we cannot answer with absolute reliability
is whether F0 is in fact 'random' in the sense needed for the
foregoing proof.  This question (at present) can only be answered
empirically by means of a certificational attack -- we have been
unable to break this system, and so we hope that it cannot be broken.

We will in fact propose 3 fixed-size one-way hash functions: HASH128,
HASH256, and HASH512 which hash down 128, 256, or 512 bits,
respectively.  We will then define the final hash function, HASH, in
terms of these four basic functions.  The primary purpose of providing
4 one-way hash functions is efficiency -- if we wish to hash only 128
bits it is significantly more efficient to use a function which
accepts a 128 bit input than to use a function which accepts a 512 bit
input and zero-pad the 128-bit value out to 512 bits.

In addition, HASH, HASH128 and HASH256 accept a 64-bit parameter,
while HASH512 accepts a 96-bit parameter.  The reader can ignore these
parameters without loss of continuity, the general ideas presented
below do not depend on them.  It is quite possible (and simpler) to
design a system that does not use these parameters -- however, we
might have to double the amount of authentication information that we
store.  That is, the output would have to be around 128 bits to
provide good security[4].  Whether or not this is a significant
problem will depend on the specific system being designed and the
particular way the hash functions are being used.

In essence the additional parameters can be used to make exhaustive
search attacks harder.  The basic idea will be familiar to those who
have thought about the problems of applying a one-way function to
passwords -- if the same one-way function is used for all users, then
the fact that two users used the same password is readily apparent if
you examine the password file that contains the encrypted result for
each user.  Furthermore, by applying the one-way function to all the
words in a dictionary, it is possible to recover all passwords that
are English words.  However, if each one-way function used to encrypt
each user's password is slightly different from all the other one-way
functions, then any would-be system cracker must encrypt each word in
the dictionary not with a globally known and fixed one-way function,
but with each individual user's one-way function before recovering all
passwords that are English words.

In a similar way, if HASH is used throughout a system, then we can
reasonably expect that many arguments x0, x1, x2,....  will have been
hashed into many results y0, y1, y2,....  Now, if an interloper wishes
to attack the system he need only find a false xbad that maps onto
some legitimate yi: that is, HASH(xi) = yi = HASH(xbad).  This clearly
is a problem because it might let the interloper 'authenticate' the
false xbad when in fact it is not authentic.  The more y's there are,
the greater the probability that a randomly chosen x will map onto one
of them.  Thus, the more HASH is used, the more popular it becomes,
the easier it will be to find an xbad for some legitimate xi such that
HASH(xi) = yi = HASH(xbad).  If, however, each use of HASH is
parameterized with a different value, then the interloper must attack
each use of HASH as though it were an entirely different hash function
-- as indeed it will be if the hash function is correctly designed.
More precisely, if the interloper finds an xbad such that HASH(p,xi) =
yi = HASH(p',xbad) this will do him absolutely no good as long as p
and p' are not equal.

This parameter must be passed from HASH to the basic functions
HASH128, and HASH256 which are used to define it.  HASH512 must not
only receive the parameter passed from HASH, but must also receive an
additional parameter because HASH512 is used repeatedly when hashing
down a large argument.  That is, hashing a one-gigabyte file will need
some 16,000,000 usages of HASH512 -- if two of these uses produced the
same result, then we could simply delete the part of the file between
them and produce a shortened file that would still produce the same
result.  To prevent this, all 16,000,000 uses of HASH512 invoked by
HASH to hash down the one-gigabyte file must accept an additional
parameter to insure that each use is different.

We start with the design of HASH512 -- the largest fixed-size one-way
hash function and therefore the most efficient for hashing large
blocks of data.

Conceptually, the hash functions all return 128 bits.  If fewer bits
are needed (e.g., adequate security is provided by 128 bits), then
trailing bits are discarded.  In practice, it will often be more
efficient to return only as many bits as are required.

HASH512 accepts an input, x, and two parameters which are named p64 (a
64-bit parameter) and p32 (a 32-bit parameter).  Because HASH512
accepts p64 and p32 it is necessary to add these arguments to the
conventional block cipher E which is used to define HASH512. (Note
that we assume E returns a full 512-bit value -- conceptually we
discard any unneeded bits after E512 has been computed).  We can now
provide a definition of HASH512 in terms of E512 as follows:

      HASH512(p64, p32, x) = leading 128 bits of ( E512(p64, p32, 0, x) Ť
XOR x)

If we now specify E512(p64, p32, 0, x) -- a conventional block cipher
that encrypts a 512-bit block with parameters 'p64' and 'p32' and
which uses a fixed key -- our task is complete.

In essence, we extend the design of Khufu by assuming a block size of
512 bits (16 32-bit words).  This block size was selected as a
compromise between two factors.  We can more efficiently hash more
data if the block size is large.  On the other hand, if the block size
is too large it won't fit into the registers on the computer
implementing the hash function, so parts of the block will have to be
repeatedly loaded and stored.  Most modern RISC chips have many
registers (more than 16 32-bit registers), so on most of these chips
the entire 512-bit block being encrypted can be kept in registers.
There will then be no need to load or store parts of the block during
computation of the hash function.

We modify the basic encryption loop for Khufu as follows: instead of
two halves, L and R, we have 16 'halves' in an array.  We designate
this array as 'Block[0 .. 15]'.  Because we have also added a 64-bit
parameter to this fixed-size one-way hash function we need to mix in
this parameter as well.  The algorithm for doing this is given below:

Function E512(p64, p32, 0, x)
    p64: INT64;
    p32: 1NT32;
    x: INT512; -- a 512 bit 'integer'.
BEGIN
blockSize = 512; -- the block size in bits
blockSizeInBytes = blockSize/8; -- the block size in 8-bit bytes, here Ť
just 64 bytes
blockSizeInWords = blockSize/32 -- the blocksize in 32-bit words, here Ť
just 16 words
p64upper, p64lower: int32; -- the 64-bit parameter, which is represented Ť
as two 32-bit halves
tempBlock, Block: array [0 .. blockSizeInWords-1] of int32;
StandardSBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32; -- Ť
Fixed for all time
rotateSchedule: ARRAY [1 .. 4] := [16,8,16,24];
index: integer;
byteInWord: integer;
sBoxEntry: int32;

p64upper := Upper32BitsOf(p64);
p64lower := Lower32BitsOf(p64);
Block := x; -- note that x must be 512 bits or smaller.  The
            -- trailing bits in Block are zero-filled

FOR index + 1 to enough/blockSizeInBytes - 1 DO
    -- Mix in the parameters that parameterizes this instance
    Block[0] + Block[0] XOR p64lower;
    Block[1] + Block[1] XOR p64upper;
    Block[2] + Block[2] XOR p32;

    FOR byteInWord := 1 to 4 DO -- for each of the four columns
       FOR i := 0 to blockSizeInWords-1 DO
          next := (i+1) MOD blockSizeInWords;
          last := (i-1) MOD blockSizeInWords:

          -- Pick sboxes in sequence of 1,1,2,2,1,1,2,2,1,1,2,2,...
          -- ...1,1,2,2,3,3,4,4,3,3,4,4,...  etc.  Note that
          -- using the S-boxes in this sequence prevents self cancellation
          -- if the same entry is used twice.
          SBoxEntry := standardSBoxes[ 2*index-1 + (i MOD 2)] Ť
[Block[i].bottomByte];
          Block[next] + Block[next] XOR SBoxEntry;
          Block[last] + Block[last] XOR SBoxEntry;
       ENDLOOP;

       -- rotate all the 32-bit words in the block at once by the correct Ť
amount
       FOR i: INTEGER IN [0..  wordCount) DO
          Block[i] + RotateRight[Block[i], rotateSchedule[byteInWord]];
       ENDLOOP;

    ENDLOOP; -- end of byteInWord going from 1 to 4

ENDLOOP; -- end of index going from 1 to enough/blockSizeInBytes-1

  -- flip the Block.  The first word and the last word are interchanged, Ť
etc.
tempBlock := Block;
For i := 0 to blockSizeInWords-1 DO
BEGIN
    Block[i] := tempBlock[blockSizeInWords-i-1];
END;

RETURN(Block);
End;

For efficiency reasons, it is expected that in an actual
implementation the inner loops would be unrolled blocksize or
4*blocksize times.  This would mean that all shifts would be by fixed
amounts (if unrolled 4*blocksize times) and that no array indexing,
divisions or mod computations would actually be performed in the
unrolled loop because the values would be known at compile time.  The
array 'Block' would be kept in 16 registers, and reference to
individual array elements (because the array indices would be known at
compile time) would be to specific registers.

HASH512 can be used to efficiently hash large amounts of data.  If
only small amounts of data need be hashed, then the related functions
HASH256 and HASH128 should be used.  They are precisely the same as
HASH512 except that 'blockSize' is changed from 512 bits to 256 or 128
bits as appropriate, and p32 (the 32-bit parameter) is always 0.

Finally, we define HASH(p, x) in terms of the fixed size hash
functions.  To insure that HASH will always be computed efficiently,
we first determine the size of the argument, x.  If the argument is
small, we use the appropriate fixed-size hash function for speed of
computation.  If the argument is large, we use HASH512 repeatedly to
reduce its size.

We now define HASH as follows:

Function HASH(p64,x): INT128;
x: ARRAY[0 .. n-1] OF INT512; -- note that this declaration actually Ť
defines n
p64: INT64;
p32: INT32; -- a 32-bit integer variable that counts how many times Ť
HASH512 is used
hashArray: ARRAY[0 .. 3] OF INT128; -- an array of 128-bit 'integers'
hashLoc: integer; -- index into hashArray
BEGIN
   p32 := 0;
   -- SizeOf returns the size of its argument in bits
   -- Yes, it is somewhat inconsistent to view 'x' as an array of 512-bit Ť
elements and
   -- also allow it to have a length of less than 512 bits if there is Ť
only one element
   -- in the array - but this is a very high-level programming language Ť
that understands
   -- such oddities.
   IF SizeOf(x) <= 128 THEN RETURN(HASH128(p64, x)) ELSE
   IF SizeOf(x) <= 256 THEN RETURN(HASH256(p64, x)) ELSE
   IF SizeOf(x) <= 512 THEN RETURN(HASH512(p64, p32, x)) ELSE
   -- actually have to reduce a large amount of data
       BEGIN
         p32 := 0; -- count of number of calls to HASH512 starts at 0
         hashLoc := 0; -- start using hashArray[0]
         hashArray := 0; -- zero out all entries in this array
         FOR i := 0 to n-1 DO
            BEGIN
               hashArray[hashLoc] := HASH512(p64,p32,x[i]);
               p32 := p32 + 1;
               hashLoc := hashLoc + 1;
               IF hashLoc >= 4 THEN
                   BEGIN
                      hashArray[0] := HASH512(p64,p32,hashArray);
                      p32 := p32+1;
                      hashLoc := 1;
                   END;
            END;
       END;
   RETURN(HASH512(hashArray,p64,p32));
END;

HASH is more complex than the function F which we discussed earlier
for two reasons.  First, the precise pattern used to hash down a large
x is different; and second it uses a more efficient mechanism for
small arguments.  The fundamental concepts, however, are identical.


Making the Initial and Standard S-Boxes

We need an initial S-box to generate a pseudo-random stream of bytes.
We also need a set of standard S-boxes to use in Khafre and Snefru
during the encryption process.  In all three applications, we need
assurances about how the S-boxes were generated.  This was a major
question in DES -- whether any structure (intentional or accidental)
might be present in the S-boxes that would weaken the encryption
function.  Because the method of selecting the DES S-boxes was kept
secret, the published articles on the structure of DES are necessarily
incomplete.  Published discussions of the structure in the DES S-boxes
makes it clear that very strong selection criteria were used, and much
of the structure actually found can reasonably be attributed to design
principles intended to strengthen DES.  The purpose behind some of the
structure detected is unclear; though it does not appear to weaken DES
it would be useful to know if the structure serves some purpose or
whether it occurred as an unintended consequence of the particular
method chosen to actually generate the S-boxes.

To avoid these questions, the standard S-boxes will be generated from
the initial S-box according to the standard (and public) algorithm for
generating a set of S-boxes from a key.  The key selected for the
standard S-boxes will be the null (all 0) key.  In this way, not only
the standard S- boxes but also the algorithm for generating them are
made public and can be examined to determine if there are any
weaknesses.

The initial S-box must be generated from some stream of random
numbers.  In order to insure that the initial S-box does not have
hidden or secret structure, we adopt the following rules:

   1.) The program that generates the initial S-box from a stream of
random numbers will be public.

   2.) The stream of random numbers used as input to the program
should be above reproach -- it should be selected in such a fashion
that it could not reasonably have been tampered with in a fashion that
might allow insertion of a trap-door or other weakness.

The first criteria is met rather simply by publishing the algorithm
used to generate the standard S-box (publication is planned in the
near future).  The second criteria is met by using the random numbers
published in 1955 by the RAND corporation in 'A Million Random Digits
with 100,000 Normal Deviates' (available on magnetic tape for a
nominal fee).  Given this approach, insertion of a trap-door would
require (1) that the publicly known programs that generate the initial
and standard S-boxes insert the trap door under the very nose of a
watching world or that (2) the trap-door have been planned and
inserted into the random numbers generated by RAND in 1955, over 30
years prior to the design of Khufu (at a time when Khufu's chief
designer found successfully negotiating a flight of stairs an
absorbing challenge).


Methods of Cryptanalysis

Questions about the security of a new cryptographic algorithm are
inevitable.  Often, these questions are of the form 'Have you
considered the following attack...' It is therefore useful to describe
the attacks that were considered during the design process.  This
serves two purposes.  First, it reassures those who find their attack
has already been considered (and presumably found non-threatening).
Second, it tells those who are considering a new attack that the
matter might not be settled and is worth pursuing further.  A second
question typically asked is 'How many rounds are enough?'  This will
vary with three factors: the value of the data being encrypted, the
encryption speed (delay) that is acceptable, and the estimated cost of
cryptanalysis.  This last cost is inferred by considering how many
rounds are sufficient to thwart various certificational attacks.

Attacks can be broadly divided into a number of categories -- starting
with chosen plaintext, known plaintext and ciphertext only.  We shall
primarily consider attacks of the chosen plaintext variety -- a system
secure against chosen plaintext attacks is presumably also secure
against the two weaker attacks.  Some consideration will be given to
known plaintext and ciphertext only attacks.  Protection against
casual browsers is valuable and can be provided more cheaply (i.e.,
with fewer rounds in the encryption process and hence less delay).  An
attack by a casual browser is better modeled by a ciphertext only
attack.  At the same time, the cryptographic resources the casual
browser is likely to bring to bear are markedly inferior.  Finally,
the cost of encryption (in user inconvenience or delay) might be
significant and the value of the data might not justify much
inconvenience -- the choice might be between rapid encryption that
offers protection against casual attack and no encryption at all.

Without further ado, and in no particular order, we discuss the major
attacks considered during the design phase.

We first consider attacks against Khufu with a reduced number of
rounds.  We shall here consider attacks against an 8 round Khufu and
will start with a chosen plaintext attack.  We assume that the
objective is to determine the entries in the S-box and the values of
the auxiliary keys.  While it might theoretically be possible to take
advantage of the fact that the S-box was generated in a pseudo-random
fashion from a smaller key (effectively limited to 64 bytes) this has
so far not proven to be the case.  The pseudo-random method of
generating the S-box from the key is sufficiently complex that the
64-byte to 1024-byte expansion involved in this process appears quite
strong.  This is probably due to the relaxed computational time
requirements on the pre-computation, i.e., the pre-computation is
probably over-kill, but in most applications an additional fixed delay
of some tens of milliseconds probably won't be noticed, so it wasn't
further optimized.

An 8 round encryption can be readily broken under a chosen plaintext Ť
attack by noting that each
round of the encryption process is affected by only a single byte of the Ť
initial plaintext.
Therefore, given 8 rounds and 8 bytes of plaintext, some byte of plaintext Ť
is used last; e.g., in the
  8th round.  By encrypting two plaintext blocks that differ only in
this last byte, we obtain two ciphertext blocks in which the
encryption process differs only in the 8th round, and therefore in
which the two left halves have the same difference as two S-box
entries.  That is, if the two ciphertext left halves are designated
L[8] and L'[8] and if the indices of the S-box entries used in the 8th
rounds are designated i[8] and i'[8], then L[8] XOR L'[8] equals
SBox[i[8]] XOR SBox[i'[8]].  L[8] and L'[8] are known, as are i[8] and
i'[8], so this provides an equation about two S-box entries.  After we
recover roughly 256 equations we will almost be able to solve for the
256 entries in the S-box.  At this point, the recovery of the S-box
will not quite be complete -- we can arbitrarily set the value of a
single S-box entry and determine values for the rest of the entries
that will satisfy the equations we have.  Further equations will not
help us, for if we have one solution to the equations, we can generate
another solution by complementing the ith bit in every proposed S-box
entry.  The new set of values will also satisfy the equations, for
every equation XOR's two S-box entries, and hence complementing the
ith bit in both entries will leave the XOR of the two bits unchanged.
We need another method for resolving this last ambiguity.  This is
conceptually easy (in the worst case, we could simply examine all
2**32 possibilities) but an efficient algorithm is difficult to
explain in a short space -- we therefore leave this as an exercise for
the reader.  Once the S-box entries are known, it is also relatively
simple to determine the auxiliary keys.

If we consider a known plaintext attack against an 8 round encryption,
we find the problem is more difficult.  Certainly, we could request a
large number of plaintext-ciphertext pairs and hope that at least some
of the pairs differed only in the final few bytes (e.g., the bytes
that are used only on the 7th and 8th rounds of encryption) but this
would require many millions of such pairs.  This, of course, presumes
that the plaintext is selected randomly -- which implies that cipher
block chaining (or some other pseudo-randomization method) is used to
insure that patterns in the plaintext are eliminated prior to
encryption.  Direct encryption (without some 'whitening' or pre-
randomization) of sufficient text would probably result in 8-byte
blocks that differed only in a single byte -- which might allow use of
the method described above.

The following paragraph condenses an entire Ph.D. thesis and some
speculative ideas.  It can be skipped without loss of continuity.

A statistical attack of the 'hill-climbing' variety [22] might prove
successful.  In such attacks the discrete points in the key-space are
embedded in a continuous space.  Once this is done, it is possible to
search the space using continuous optimization methods that are
difficult to apply to a discrete space.  A version of such an attack
that would be relatively easy to implement on commercially available
computers might view the encryption process as acting on bytes.  In
the cryptanalysis, each byte (whether it be a byte of plaintext, of
ciphertext, a byte in the S-box, or a byte in some intermediate
computation) would be represented by a vector with 256 entries, each
entry being a real number in the range from 0.0 to 1.0 inclusive.
Intuitively, these real numbers represent the probability that the
byte takes on the corresponding value, That is, if the 23rd entry in
such a vector were .3, then there would be a 30% chance that the byte
associated with that vector was 23.  (The sum of the entries for a
given vector should be 1.0, i.e., the probability that the byte has
some value between 0 and 255 inclusive should be 1).  Because the
S-box is initially completely unknown, we would associate each byte in
the S-box with a vector all of whose entries were equal to 1/256.  By
representing each actual byte in the computation with a vector of real
numbers, we have effectively mapped the discrete cryptographic
encryption process onto a continuous process.  Given a
plaintext-ciphertext pair, we can map the discrete encryption process
into a continuous analog computation.  We can 'encrypt' the continuous
plaintext with the continuous 'S-box' and produce a continuous
'ciphertext'.  The continuous plaintext, because it is known, would be
represented by 8 vectors each of which had a single 1 entry in the
correct place, with all other entries being 0.  Because the initial
continuous approximation to the S-box does not correspond to any
discrete S-box, the result of the encryption would be continuous
'ciphertext' which would consist of 8 vectors with some (initially
flat) distribution showing the probable values for the bytes in the
actual ciphertext.  Because the computed probability distribution for
the ciphertext will differ from the actual ciphertext (which is known)
it is possible to generate an error signal.  If we compute many
continuous 'ciphertexts' from many plaintexts and compare all the
continuous 'ciphertexts' with the actually known ciphertext values,
then we can generate an aggregate error signal that reflects all the
available information.  By changing the probability distributions
associated with the bytes in the S-box (e.g., by moving to adjacent
points in the continuous key space) it is possible to change this
error signal.  Minimizing the error signal as a function of the S-box
values is an optimization problem, and there are many algorithms in
the literature which can be applied.  The simplest approach would be
to find the direction in the continuous key space which best minimizes
the error, and then travel some distance in that direction.  This is
simply the method of steepest descent.  In general, the solution of
non-linear optimization problems is difficult, and whether or not such
an approach will work with an 8 round Khufu is an interesting and as
yet unanswered question.  Because such an attack must depend on
statistical 'roughness' in the encryption process, and because (as
discussed later) 8 rounds is significantly 'rough', such an attack
might work.

Finally, a ciphertext only attack against an 8-round Khufu results in
a very difficult problem.  A modification of the attack sketched above
might be mounted in which the ciphertext is 'decrypted' and the
resulting 'plaintext' is then viewed statistically to measure how
'close' it is to being reasonable.  For example, we could score the
'decryption' by multiplying the probability vector for each byte of
'plaintext' times the actual probability distribution (assuming that
some reasonable approximation to this distribution is known).
Fundamentally, hill climbing and other statistical attacks must rely
on statistically detectable differences between various alternatives.
If the statistics are flat, then such techniques will fail.  An
important question with Khufu is the number of rounds required to
achieve a statistically flat distribution.  Preliminary results
indicate that 16 rounds produces flat statistics.  The use of
auxiliary keys were largely adopted for three reasons: first, it
seemed intuitively reasonable that randomization of the input by
XORing an unknown quantity would assist in the encryption process.
Second, four additional register-to-register XOR operations are cheap
to implement.  Finally, the auxiliary keys foil a specific chosen
ciphertext attack.  This attack depends on the observation that,
although the S-box has 256 entries, the encryption process does not
use all entries for each plaintext-ciphertext pair.  Even worse,
although a typical 8-round encryption will use 8 different S-box
entries it doesn't have to: some entries could be repeated.  In the
worst case, a single entry would be repeated 8 times -- which would
effectively mean that only 32 bits of key material was used.  If the
auxiliary keys were not present then we could simply guess at the
value of one of the S-box entries, and then confirm our guess if we
could find a plaintext-ciphertext pair that used only that entry for
all 8 rounds.  Because each of the 8 rounds uses a single byte from
the plaintext, we could actually construct the plaintext needed to
confirm our guess (if the auxiliary keys were not present).  For
example, if we guess that the 0th S-box entry has some specific value,
then we would select the first byte of our specially-built plaintext
(by 'first byte' we mean i[1], or that byte of plaintext used as an
index into the S-box in the first round) to be 0.  Then, knowing what
happens in the first round, we can select the second byte of plaintext
so that the 0th entry is again selected on the second round -- which
would tell us what happens in the third round.  By repeating this
process for 8 rounds, we can construct a plaintext which, when
enciphered, will tell us whether or not the 0th S-box entry does or
does not have a specific value.  After trying 2**32 values we will
surely find the correct one.  If we then repeat this whole process for
the 1st entry, and then the 2nd entry, etc, we could determine the
values of all the entries in the S-box.

The auxiliary keys prevent this attack because they effectively inject
64 bits of key material into the encryption process prior to selecting
S-box entries.  Thus, correctly guessing a 32-bit S-box entry is
insufficient because we would also have to guess the 64-bit value
XORed with the plaintext prior to encryption.  If we guessed a single
such bit incorrectly, then the incorrectly guessed bit would (within
the first 8 rounds) cause selection of an uncontrolled S-box entry
which would then initiate the uncontrolled avalanche of changes that
we rely upon to provide cryptographic strength.

Although this attack is actually rather inefficient compared with our
first chosen ciphertext attack, it does point out that there is no
guarantee that multiple different S-box entries have actually been
used during encryption.  Instead, we must assure ourselves that the
risk of this occurring is sufficiently low by explicitly computing the
probability of its occurrence.

Another attack in this general class is the cancellation attack.  In
this attack, we alter the first byte of the 8 bytes in the plaintext,
and then attempt to cancel the effects of this alteration by altering
the other 32-bit half in a compensating fashion.  That is, by altering
the first byte of plaintext used in the first round, we cause a change
in the second round that we can understand.  Because we can also
change the other 32-bit half, this understandable change in the second
round can be cancelled.  (Notice that the auxiliary keys have very
little impact on this attack).  Now, if the first byte were 3, and we
changed it to a 5, then this would produce a predictable change in the
value XORed with the other 32-bit half, R, in the first round.  This
first round is computed as:

   i[1] := L[0] AND #FF:
   L[1] := R[0] XOR SBox[ i[1] ];

 For the first plaintext we encrypted, this would become:

   L[1] := R[0] XOR SBox[3];

while for the second plaintext encrypted, this would become:

   L'[1] := R'[0] XOR SBox[5];

Therefore, if we select R'[0] = R[0] XOR SBox[3] XOR SBox[5], then the
second equation becomes:

   L'[1] := R[0] XOR SBox[3] XOR SBox[5] XOR SBox[5]
   or
   L'[1] := R[0] XOR SBox[3]

But this means that L'[1] = R[0] XOR SBox[3] = L[1]

In other words, L[1] and L'[1] are identical -- by knowing SBox[3] XOR
SBox[5] we were able to cancel out the change that should have taken
place in L[1].  This, of course, means that the avalanche of changes
upon which encryption so critically depends has been thwarted at the
very start.  Notice that after the first round of encryption, the two
blocks differ only in the first byte -- that is, the byte used in the
first round.  After 8 rounds of encryption, the resulting ciphertexts
will also differ in only this one byte.

In practice, this attack seems to require that you first guess the
correct value of SBox[i] XOR SBox[j] for two different values of i and
j.  This is a 32-bit value, and so on average it seems necessary to
try 2**32 different values before encountering the correct one.  After
8 rounds of encryption, however, the fact that we have determined the
correct 32-bit 'cancellation value' will be obvious because the final 64
bits of ciphertext generated by the two different plaintexts will
differ in only a single byte.

It might not at first be obvious, but we can in fact modify this
attack so that only 2 * 2**16 plaintext- ciphertext pairs are required
in order to find the correct cancellation value.  Although as described
above, it would seem that we need 2**32 pairs of plaintext-ciphertext
pairs to test each possible 32-bit cancellation value, this is not the
case.  We can generate two lists of plaintext-ciphertext pairs, and
then by selecting one plaintext-ciphertext pair from one list and the
other plaintext- ciphertext pair from the other list, we can generate
2**32 possible combinations of entries from the two lists.  If we
select the plaintexts used to generate the lists carefully, then every
32-bit cancellation value can be represented by one entry from the
first list, and one entry from the second list.

When we consider this attack on a 16 round Khufu it is much weaker.
If we can determine the correct 32-bit cancellation value it will cause
collapse of the encryption process up until the changed byte is again
used.  If the first byte has been changed, then it will again be used
on the 9th round -- this means that in a 16-round Khufu a cancellation
attack will effectively strip off 8 rounds.  The remaining 8 rounds
must then provide sufficient cryptographics strength to resist attack.
Empirical statistical tests indicate that 8 rounds in which changes
take place in the first one or two rounds will result in apparently
random output -- though of course, this result demonstrates only that
the output was random with respect to the specific statistical tests
used, not that all possible statistical tests would reveal no pattern.

An attack proposed by Dan Greene is based on the observation that each
32-bit half is being XORed with values selected (possibly with a
rotation) from the S-box.  Once the key has been chosen this S-box is
fixed -- so at most 256 different values can be used and each value
can be rotated (in the first 8 rounds of Khufu) in only four different
ways.  That is, we are at best applying a fixed and rather limited set
of operations to each half.  If we focus on the right half R, (and if
we neglect the effect of the auxiliary keys) then we find that:

R8 = R0 XOR ROTATE[SBox[i1,0] XOR ROTATE[SBox[i3],16] XOR Ť
ROTATE[SBox[i5],24]
XOR ROTATE[SBox[i7],8]]

R8 designates the right half following 8 rounds of Khufu, i.e., the
right half of the ciphertext.  R0 designates the right half before
encryption begins, i.e., the right half of the plaintext.  Although
the indices used to select the S-box entries are computed during
encryption, we are going to ignore their actual values.  Instead, we
will assume that i1, i3, i5 and i7 are selected randomly.  This should
not weaken the encryption function, so any cryptanalytic success we
have using this assumption indicates weakness in the original system
as well.

If we define

   Y = Y[i1, i3, i5, i7] = ROTATE[SBox[i1,0] XOR ROTATE[SBox[i3],16] XOR
                           ROTATE[SBox[i5],24] XOR ROTATE[SBox[i7],8]]

we can re-express the earlier equation as:

R8 XOR R0 = Y[i1, i3, i5, i7]

The left side of this equation is readily computed from a
plaintext-ciphertext pair, and with enough such pairs we can compute
the probability distribution of (R8 XOR R0).  The right side should
determine the same distribution (if we assume the actual indices are
more or less random -- which should be a good approximation if the
plaintext is random!).  The 4 8-bit indices clearly could generate at
most 2 possible values for Y, but it seems more plausible that some
values for Y will be produced more than once while other values for Y
will not be produced at all.  That is to say, the distribution of Y's
will not be uniform.  If we can compute this distribution from enough
plaintext-ciphertext pairs, and if it is non-uniform, could we then
cryptanalyze an 8 round Khufu?  Statistical evidence gathered on a
16-round Khufu suggests that this attack will fail for 16 rounds, but
its success for 8 rounds is still unclear.  Even given the
distribution of Y's it is not clear (at the time of writing) how to
determine the actual S-box entries.


Summary

An 8-round Khufu can be broken by several attacks, though it is
reasonably resistant t0 ciphertext only attack.  A 16-round Khufu has
so far resisted cryptanalysis.  Preliminary statistical analysis
suggests that a 16-round Khufu produces random output.  We are hopeful
at the current writing that a 16-round Khufu will be useful for
general commercial encryption, and that a 32-round Khufu can be used
for high security applications.  This conclusion is tentative and
requires further investigation.

The analysis of Khafre and Snefru have so far been less detailed -- as
a consequence, we do not yet have recommendations on the number of
rounds that should be used with them.  It seems probable that Khafre
will require more rounds of encryption to provide equivalent security
than Khufu, because the S-boxes used with Khafre are public.  Khufu,
by contrast, generates different S-boxes for each key and keeps the
S-boxes secret -- and so uses more key material per round than Khafre.

Although we believe at the time of writing that the conclusions given
above are sound, any reader seriously considering use of these
encryption functions is advised that (1) waiting for one or two years
following their publication should allow sufficient time for their
examination by the public cryptographic community and (2) current
information about their status should be obtained by contacting the
author.


Acknowledgements

The author would like to particularly thank Dan Greene for his many
comments and the many hours of discussion about cryptography in
general and the various design proposals for Khufu in particular.
Thanks are also due to Luis Rodriguez, who implemented the C version
of Khufu and gathered most of the statistics.  I would also like to
thank Dan Swinehart and the many researchers at PARC who have shown
both interest and support during the many months this effort has
required.


Bibliography

     1.) `Secrecy, Authentication, and Public Key Systems', Stanford
Ph.D. thesis, 1979, by Ralph C. Merkle.

     2.) `A Certified Digital Signature', unpublished paper, 1979.

     3.)  Moti Yung, private communication.

     4.)  `A High Speed Manipulation Detection Code', by Robert R.
Jueneman, Advances in Cryptology - CRYPTO '86, Springer Verlag,
Lecture Notes on Computer Science, Vol. 263, page 327 to 346.

     5.)  `Another Birthday Attack' by Don Coppersmith, Advances in
Cryptology - CRYPTO '85, Springer Verlag, Lecture Notes on Computer
Science, Vol. 218, pages 14 to 17.

     6.)  `A digital signature based on a conventional encryption
function', by Ralph C. Merkle, Advances in Cryptology CRYPTO 87,
Springer Verlag, Lecture Notes on Computer Science, Vol.  293, page
369-378.

     7.)  `Cryptography and Data Security', by Dorothy E. R. Denning,
Addison-Wesley 1982, page 170.

     8.)  `On the security of multiple encryption', by Ralph C.
Merkle, CACM Vol. 24 No. 7, July 1981 pages 465 to 467.

     9.)  `Results of an initial attempt to cryptanalyze the NBS Data
Encryption Standard', by Martin Hellman et. al., Information Systems
lab. report SEL 76-042, Stanford University 1976.

     10.) `Communication Theory of Secrecy Systems', by C. E. Shannon,
Bell Sys. Tech. Jour. 28 (Oct. 1949) 656-715

     11.) `Message Authentication' by R. R. Jueneman, S. M. Matyas, C.
H. Meyer, IEEE Communications Magazine, Vol. 23, No, 9, September 1985
pages 29-40.

     12.) `Generating strong one-way functions with cryptographic
algorithm', by S. M. Matyas, C. H. Meyer, and J. Oseas, IBM Technical
Disclosure Bulletin, Vol. 27, No. 10A, March 1985 pages 5658-5659

     13.) `Analysis of Jueneman's MDC Scheme', by Don Coppersmith,
preliminary version June 9, 1988.  Analysis of the system presented in
[4].

     14.) `The Data Encryption Standard: Past and Future' by M. E.
Smid and D. K. Branstad, Proc. of the IEEE, Vol 76 No. 5 pp 550-559,
May 1988

     15.) `Defending Secrets, Sharing Data; New Locks and Keys for
Electronic Information', U.S. Congress, Office of Technology
Assessment, OTA-CIT-310, U.S. Government Printing Office, October 1987

     16.) `Exhaustive cryptanalysis of the NBS data encryption
standard', by Whitfield Diffie and Martin Hellman, Computer, June
1977, pages 74-78

     17.) `Cryptography: a new dimension in data security', by Carl H.
Meyer and Stephen M.  Matyas, Wiley 1982.

     18.) 'One Way Hash Functions and DES', by Ralph C. Merkle, Xerox Ť
Disclosure Bulletin.

     19.) 'Data Encryption Standard (DES)', National Bureau of
Standards (U.S.), Federal Information Processing Standards Publication
46, National Technical Information Service, Springfield, VA, Apr. 1977

     20.)  ???

     21.) `Cryptography and Computer Privacy', by H. Feistel, Sci.
Amer. Vol. 228, No. 5 pp 15-23, May 1973

     22.) `Maximum Likelihood Estimation Applied to Cryptanalysis', by
Dov Andelman, Stanford Ph.D. Thesis, 1979

     23.)  IBM has recently proposed a specific one-way hash function
which has so far resisted attack.




X-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-X
 Another file downloaded from:                     The NIRVANAnet(tm) Seven

 & the Temple of the Screaming Electron   Taipan Enigma        510/935-5845
 Burn This Flag                           Zardoz               408/363-9766
 realitycheck                             Poindexter Fortran   510/527-1662
 Lies Unlimited                           Mick Freen           801/278-2699
 The New Dork Sublime                     Biffnix              415/864-DORK
 The Shrine                               Rif Raf              206/794-6674
 Planet Mirth                             Simon Jester         510/786-6560

                          "Raw Data for Raw Nerves"
X-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-X