Escolar Documentos
Profissional Documentos
Cultura Documentos
INTRODUCTION TO CRYPTOGRAPHY
From the Dawn of Civilization, to the highly networked societies that we live in today communication has always been an integral part of our existence. What started as simple sign-communication centuries ago has evolved into many forms of communication today the Internet being just one such example. Methods of communication today include Radio communication. Telephonic communication. Network communication. Mobile communication.
All these methods and means of communication have played an important role in our lives, but in the past few years, network communication, especially over the Internet, has emerged as one of the most powerful methods of communication with an overwhelming impact on our lives. Such rapid advances in communications technology have also given rise to security threats to individuals and organizations. In the last few years, various measures and services have been developed to counter these threats. All categories of such measures and services, however, have certain fundamental requirements, which include Confidentiality: which is the process of keeping information private and secret so that only the intended recipient is able to understand the information? For example, if Alice has to send a message to Bob, then Bob only (and no other person except for Bob) should be able to read or understand the message. Authentication: This is the process of providing proof of identity of the sender to the recipient so that the recipient can be assured that the person sending the information is who and what he or she claims to be. For example, when Bob receives a message from Alice, then he should be able to establish the identity of Alice and know that the message was indeed sent by Alice.
Integrity: This is the method to ensure that information is not tampered with during its transit or its storage on the network. Any unauthorized person should not be able to tamper with the information or change the information during transit. For example, when Alice sends a message to Bob, then the contents of the message should not be altered with and should remain the same as what Alice has sent. Non-repudiation: This is the method to ensure that information cannot be disowned. Once the non-repudiation process is in place, the sender cannot deny being the originator of the data. For example, when Alice sends a message to Bob, then she should not be able to deny later that she sent the message.
Cryptography is one of the technological means to provide security to data being transmitted on information and communications systems. Cryptography is especially useful in the cases of financial and personal data, irrespective of the fact that the data is being transmitted over a medium or is stored on a storage device. It provides a powerful means of verifying the authenticity of data and identifying the culprit, if the confidentiality and integrity of the data is violated. Because of the development of electronic commerce, cryptographic techniques are extremely critical to the development and use of defense information systems and communications net. As already discussed, the messages were first encrypted in ancient Egypt as a result of hieroglyphics. The Egyptians encrypted messages by simply replacing the original picture with another picture. This method of encryption was known as substitution cipher. In this method, each letter of the clear text message was replaced by some other letter, which results in an encrypted message or cipher text.
Nishitha College of Engineering and Technology 2
For example, the message WELCOME TO THE WORLD OF CRYPTOGRAPHY can be encrypted by using substitution cipher as XFMDPNF UP UIF XPSME PG DSZQUPHSBQIZ.
In the preceding example, each letter of the plaintext message has been replaced with the next letter in the alphabet. This type of substitution is also known as Caesar cipher. Caesar cipher is an example of shift cipher because it involves shifting each letter of the plaintext message by some number of spaces to obtain the cipher text. For example, if you shift the letters by 5, you get the following combination of plaintext and cipher text letters:
Plaintext
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.
Cipher text F G H I J K L M N O P Q R S T U V W X Y Z A B C D E.
However, simple substitution ciphers are not a very reliable type and can easily be broken down. In such a case, an alternative way is to use multiple alphabets instead of one alphabet. This type of a cipher, which involves multiple cipher alphabets, is known as a polyalphabetic substitution cipher. An example of the polyalphabetic substitution cipher is the Vigenere cipher. With the recent advances in mathematical techniques, there has acceleration in the development of newer methods of encryption. Today, cryptography has emerged so powerful that it is considered rather impossible to break some ciphers. Cryptography has now become an industry standard for providing information security, trust, controlling access to resources, and electronic transactions. Its use is no longer limited to just securing sensitive military information. In fact, cryptography is now recognized as one of the major components of the security policy of an organization. Before moving further with cryptography, let us first look at a few terms that are commonly associated with cryptography: Plaintext: Is the message that has to be transmitted to the recipient. It is also commonly referred to as clear text.
Nishitha College of Engineering and Technology 3
Encryption: Is the process of changing the content of a message in a manner such that it hides the actual message. Cipher text: Is the output that is generated after encrypting the plain text. Decryption: Is the reverse of encryption and is the process of retrieving the original message from its encrypted form. This process converts cipher text to plaintext. Hash algorithm: Is an algorithm that converts text string into a string of fixed length. Key: Is a word, number, or phrase that is used to encrypt the clear text. In computer Based cryptography, any text, key word or phrase is converted to a very large number by applying a hash algorithm on it. The large number, referred to as a key, is then Used for encryption and decryption. Cipher: Is a hash algorithm that translates plaintext into an intermediate form Called Cipher text, in which the original message is in an unreadable form. Cryptanalysis: Is the science of breaking codes and ciphers. Before looking at the Details of various cryptographic techniques let us now look at the steps involved in the conventional encryption model. A sender wants to send a HELLO message to a recipient. The original message, also called plaintext, is converted to random bits known as cipher text by using a key and an algorithm. The algorithm being used can produce a different output each time it is used, based on the value of the key. The cipher text is transmitted over the transmission medium. At the recipient end, the cipher text is converted back to the original text using the same algorithm and key that was used to encrypt the message.
Public key cryptography: This cryptography technique is based on a Combination of two keys secret key and public key. It is also known as Asymmetric encryption. Let us look at each of these methods in detail.
Many secret key algorithms were developed on the basis of the concept of secret key cryptography. The most widely used secret key algorithms include Data Encryption Standard (DES) Triple-DES (3DES) International Data Encryption Algorithm (IDEA) RC4 RC5 CAST-12 RC6 Advanced Encryption Standard (AES).
private for every communicating entity. In public key cryptography, the data that is encrypted with the public key can only be decrypted with the corresponding private key. Conversely, data encrypted with the private key can only be decrypted with the corresponding public key. Due to this asymmetry, public key cryptography is known as asymmetric cryptography. How the public key works does: Lets see how this works out in practice. Consider an example, where Alice wishes to send an encrypted file to Bob. In this situation, Bob would obtain a key pair, retain the private key, and distribute the public key. Alice, therefore, has a copy of Bobs public key. Alice then encrypts the file using Bobs public key and sends the encrypted file to Bob. Since the key pairs are complementary, only Bobs private key can decrypt this file. If someone else intercepts the file, they will be unable to decrypt the file, because only Bobs private key can be used for the decryption. Figure 1.2.1 explains the process of public key cryptography.
This method very clearly indicates that the data you send to a user can only be encrypted by the public key. Similarly, the decryption can be done only by the private key, which is supplied by the recipient of the data. So, there is very little possibility of the data in transit being accessed or tampered by any other person. Need to share a key, as required for symmetric encryption. All communications involve only public keys, and no private key is ever transmitted or shared. The above mechanism also brings out the point that every recipient will have a unique key that he will use to decrypt the data that has been encrypted by its counterpart public key. Diffie and Hellman first discussed the process of asymmetric cryptography.
Nishitha College of Engineering and Technology 7
Many Public key algorithms were developed on the basis of the concept of Public key cryptography. The most widely used Public key algorithms include RSA. ECC.
The combined technique of encryption is used widely. It is basically used for Secure Shell (SSH), which is used to secure communications between a client and the server and PGP (Pretty Good Privacy) for sending messages. Above all, it is the heart of Secure Sockets Layer (SSL), which is used widely by Web browsers and Web servers to maintain a secure communication channel with each other Fig 1.2.2 explains the combined technique of encryption.
CHAPTER 2
SYMMETRIC CRYPTOGRAPHY ALGORITHMS 2.1 Data Encryption Standards (DES)
DES is a block cipher: It encrypts/decrypts data in 64-bit blocks using a 64-bit key (although effective key length is 56-bit). DES is a symmetric algorithm: The same algorithm and key are used for both encryption and decryption. DES is an iterative cipher: the basic building block (a substitution followed by a permutation) called a round is repeated 16 times. For each DES round, a sub-key is derived from the original key called key schedule. Key schedule for encryption and decryption is the same except for the minor difference in the order (reverse) of the sub-keys for decryption.
A basic algorithm for encrypting / decrypting one block of data .Encryption begins with an initial permutation (IP), which scrambles the 64-bit plain-text in a fixed pattern. The result of the initial permutation is sent to two 32-bit registers, called the right half register and left half register. Those registers hold the two halves of the intermediate results through succeeding 16 iterations. The contents of the right half register are permuted (permutation E) and sent to an exclusive-OR unit along with the sub-key for each iteration. Note that some bits are selected twice, allowing the 32-bit register to expand to 48 bits. The 48-bit output of the exclusive-OR block is divided into eight groups (6-bits each) to address eight substitution memories (S-boxes). A permutation P is applied to 32-bit output from S-boxes and then feed into an exclusive-OR block along with the contents of the left half register. The output of this block is written into temporary register, concluding the first iteration.
At the next clock cycle, the contents of the temporary registers are written into the right half register and previous contents of the right half register are written into left half register. This process repeats through 16 iterations. After the 16 iterations, the right half and left half register contents are subjected to a final permutation IP1, which is the inverse of the initial permutation. The output of IP1 is the 64-bit cipher-text.
10
Addition of integers modulo 2 where the 16-bit sub-block is treated as an unsigned integer.
16
Key scheduling: The algorithm uses 52 sub-keys (six for each of eight rounds and four more for the output transformation). First, the 128-bit key is divided into eight 16-bit subkeys. These are the first eight sub-keys for the algorithm (the six for the first round, and the first two for the second round). Then the key is rotated 25 bits to the left and again divided into eight sub-keys. The first four are used in round 2, the last four are used in round 3. The key is rotated another 25 bits to the left for the next eight sub-keys, and so on until the end of the algorithm The decryption scheme is the same as encryptions scheme, except it utilizes a different set of sub keys generated from the key of IDEA where Ki denotes encryption sub keys and Ui denote decryption sub keys, where 1 < i < 52.
11
12
Substitute bytes Transformation: This operation is a non-linear byte substitution. It composes of two sub-transformations; multiplicative inverse and affine
transformation. In most implementations, these two sub-steps are combined into a single table lookup called S-Box. Shift Row Transformation: This step is a simple permutation process, operates on individual rows, i.e. each row of the array is rotated by a certain number of byte positions. Mix columns Transformation: This is a substitution step that makes use of arithmetic over GF (28). Column vector is multiplied (in GF (28)) by a fixed matrix where bytes are treated as a polynomial of degree less than 4. AddRoundKey: Each byte of the array is added (respect to GF (2)) to a byte of the corresponding array of round sub keys. Excluding the first and the last round, the AES with 128 bit round key proceeds for nine iterations. Round keys are generated by a procedure called round key expansion or key scheduling. Those sub-keys are derived from the original key by XOR the two previous columns. For columns that are in multiples of four, the process involves round constants addition, S-Box and shift operations.
2.4 RC6
RC6, like RC5, consists of three components: a key expansion algorithm, an encryption algorithm, and a decryption algorithm. The parameterization is shown in the following specification: RC6-w/r/b, where w is the word size, r is the non-negative number of rounds, and b is the byte size of the encryption key. RC6 makes use of datadependent rotations, similar to DES rounds (Rivest et al., 1998a). RC6 is based on seven primitive operations as shown in Table. Normally, there are only six primitive operations (Rivest et al., 1998a); however, the parallel assignment is primitive and an essential operation to RC6. The addition, subtraction, and multiplication operations use twos complement representations (Rivest, 1997). Integer multiplication is used to increase diffusion per round and increase the speed of the cipher (Rivest et al., 1998a).
13
Operation a+b ab a b
Description Integer addition modulo 2w Integer subtraction modulo 2w Bitwise exclusive-or (XOR) of w-bit words Integer multiplication modulo 2w Rotate the w-bit word a to the left given by the least significant (log2 w) bits of b.
axb a <<< b
a >>> b
Rotate the w-bit word a to the right given by the least significant (log2 w) bits of b Table 1: RC6 operations
Diffusion involves propagating bit changes from one block to other blocks. An avalanche effect is where one small change in the plaintext triggers major changes in the cipher text. To speed up the avalanche of change between rounds, a quadratic equation is introduced (Rivest et al., 1998a). By increasing the rate of diffusion, the rotation amounts spoiling sooner is more likely, due to the changes from simple differentials (Rivest et al., 1998a). To achieve the security goals for transformation, the following quadratic equation is used twice within each round: f(x) = x(2x + 1)(mod 2w). The high-order bits of this equation, which depend on all of the bits of x, are used to determine the rotation amount used (Rivest et al., 1998a). In conjunction with the quadratic equation, the (log2 w) bit shift complicates advanced cryptanalytic attacks (Rivest et al., 1998a). Integer multiplication also contributes by making sure that all of the bits of the rotation amounts are dependent on the bits of another register (Rivest et al., 1998a).
14
2.5 MARS
MARS takes as input four 32-bit plaintext data words A, B, C, D and produces four 32-bit cipher text data words A',B', C', D'. The cipher is word-oriented, in that all the internal operations are performed on 32-bit words. MARS is a type-3 Feistel network, divided into three phases: a 16-round cryptographic core phase wrapped with two layers of 8-round forward and backwards mixing The cryptographic core rounds provide strong resistance to all known crypt analytical attacks, while the mixing rounds provide good avalanche and offer very wide security margins to thwart new (yet unknown) attacks. MARS accepts a variable size user-supplied key ranging from 4 to 14 words (i.e., 128 to 448 bits). MARS uses a key expansion procedure to expand the user-supplied key (consisting of n 32-bit words, where n is any number between 4 and 14) into a key array K[ ] of 40 words for the encryption/decryption operation. The MARS cipher uses a variety of operations to provide a combination of high security, high speed, and implementation flexibility. Specifically, it combines exclusiveor (xor), addition, subtractions, multiplications, and both fixed and data-dependent rotations. MARS also uses a single (S-box) table of 512 32-bit words to providegood resistance against linear and differential attacks, as well as good avalanche of data and key bits. This S-box is also used by the key expansion procedure. Sometimes the S-box is viewed as two tables, each of 256 entries, denoted by S0 and S1. In the design of the Sbox, we generated the entries in a pseudo-random fashion and tested that the resulting S-box has good differential and linear properties.
The operations used in the cipher are applied to 32-bit words, which are viewed as unsigned integers. In this pseudo-code we use the following notations. We number the bits in each word from 0 to 31, where bit 0 is the least significant (or lowest) bit, and bit 31 is the most significant (or highest) bit. We denote by c&d a bitwise exclusive-or of the two words c and d. We denote by c+d addition modulo 2^32, by c-d subtraction modulo 2^32, and by cd multiplication modulo 2^32. Also, c<<<d and c>>>d, denote cyclic rotations of the 32-bit word c by d positions to the left and right, respectively. The
15
decryption operation of MARS is the inverse of the encryption operation and the code for decryption is similar to the code for encryption.
The MARS key expansion procedure expands the user-supplied key ranging from 4 to 14 words into a 40-word key for use in the encryption/decryption operation. The key expansion procedure consists of three steps (Figure 3). The first step is linear expansion which expands the original user-supplied key to forty 32-bit words using a simple linear transformation. The second step is S-box based key stirring which stirs the expanded key using seven rounds of a type-1 Feistel network to destroy linear relations in the key. Then a multiplication key-word modifying step examines the key words which are used in the MARS encryption/decryption operation for multiplication and modifies them if needed. In the pseudo-code cd denotes bitwise-and of the two words c and d.
16
CHAPTER 3
OPERATIONS OF SYMMETRIC KEY ALGORITHM 3.1 Modular Addition Two
The addition of two elements in a finite field is achieved by adding the coefficients for the corresponding powers in the polynomials for the two elements. The addition is performed with the XOR operation (denoted by ) i.e., modulo 2 -so that 1 1 = 0, 1 = 1 , and 0 0 = 0. 0
Alternatively, addition of finite field elements can be described as the modulo 2 addition of corresponding bits in the byte. For two bytes {a7a6a5a4a3a2a1a0} and {b7b6b5b4b3b2b1b0}, the sum is {c7c6c5c4c3c2c1c0}, where each ci = ai bi (i.e., c7 = a7 b7, c6 = a6 b6, ...).
Algorithm: for Modular addition Two Require: Binary Polynomials a(z), b(z) with maximum degree m-1. Ensure: c(z)=a(z) + b(z). 1: for i from 0 to M-1 do 2: C[i] A[i] 3: end for 4: Return(c) B[i].
17
In Prime Field operations modulo means divide it requires more time .so in binary field operation it requires less time with simple addition..
18
(3) Multiplication is achieved in two steps. In the first step, the polynomial product c(x) = a(x) b(x) is algebraically expanded, and like powers are collected
(4)
Where
19
(5)
The result, c(x), does not represent a four-byte word. Therefore, the second step of the multiplication is to reduce c(x) modulo a polynomial of degree 4; the result can be reduced to a polynomial of degree less than 4. For the AES algorithm, this is
4
The modular product of a(x) and b(x), denoted by a(x) b(x), is given by the four-term polynomial d(x), defined as follows:
(6) When a(x) is a fixed polynomial, the operation defined in equation (6) can be written in matrix form as:
(7)
20
Because x +1 is not an irreducible polynomial over GF (2 ), multiplication by a fixed four-term polynomial is not necessarily invertible.
The result x b(x) is obtained by reducing the above result modulo m(x), as defined in equation (4.1). If b7 = 0, the result is already in reduced form. If b7 = 1, the reduction is accomplished by subtracting (i.e., XORing) the polynomial m(x). It follows that multiplication by x (i.e., {00000010} or {02}) can be implemented at the byte level as a left shift and a subsequent conditional bitwise XOR with {1b}. This operation on bytes is denoted by xtime (). Multiplication by higher powers of x can be implemented by repeated application of xtime (). By adding intermediate results, multiplication by any constant can be implemented. For example {57} {13} = {fe} because
There are many ways to implement a finite field multiplier. An originally proposed one in the AES takes the form of Xtime () which is essentially multiplied by x or left-shift with {1B} feedback. That could imply either a bit-serial or a bit-parallel architecture. Rudra proposed the implementation of Rijndael system with composite field arithmetic. We are
Nishitha College of Engineering and Technology 21
considering a fast
needed). Notice of the fix-value multiplications (by {02} or by {03}) leads us to a fixedcoefficient multiplication in GF (2^8) that fulfils our requirements. We are investigating this multiplier... Let Si, c = B(x) be an element to be multiplied. B(x) can also be written in the polynomial form as;
Multiplications used in the Mix Column transformation are {03}.B(x)=(x+1)B(x) and {02}.B(x) = x.B(x). The resulted multiplications are:
(2)
(3)
Implementations of above equations are simple since additions are simply XORs. As an example the circuit to compute xBi is shown in Figure (3.4.1). The implementation of (x + 1) Bi shown in Figure (3.4.2).According to terms given in (2), and an architecture shown in Figure (3.4.2) , the maximum delay time is expected to be that of the a delay unit of a 2-input XOR gate.
22
a(x) = {03}x + {01}x + {01}x + {02} . (3.5.1) As described in Sec. 3.3, this can be written as a matrix multiplication. Let s(x) = a(x) s(x):
(1)
Nishitha College of Engineering and Technology 23
As a result of this multiplication, the four bytes in a column are replaced by the following
(2) By Using fixed coefficient multiplier we can implement the mix columns from equation (1) and (2).we can reduce the multiplication. State column by column matrix as shown in the figure 3.5.1 which explains the mixcolumn Transformations and Architecture is shown in the figure 3.5.2
24
(1)
25
Logical right-shift inserts value 0 bits into the most significant bit, instead of copying the sign bit, it is ideal for unsigned binary numbers, while the arithmetic right-shift is ideal for signed 2'scomplement binary numbers. The Figure (3.8.2) which explains the Left Logical Shift
26
The value that is shifted in on the left during a right-shift is whatever value was shifted out on the right. As shown in the figure 3.9.2.
27
The 80286 MP is an advanced version of the 8086 MP that is designed for multiuser and multi tasking environments. The 80286 addresses 16 MB of physical memory
Nishitha College of Engineering and Technology 28
and 1 GB of virtual memory by using its memory management system. The 80286 contains a memory manager to optimize memory management. The 80286 does not incorporate internal peripherals, instead it contains something called a memory management unit, also called the address unit. The address is 24-bit wide to accommodate 16 MB of Physical memory. In the real mode the 80286 acts or functions like the 8086, but in the protected mode it addresses 16 MB of memory space. The 82284-clock generator provides the clock in the 80286 and the system bus controller provides the system signals.
The 80386 is a full 32-bit version of the earlier 8086 and 80286 16-bit MP and represents a major advancement in architecture. Along with this larger word size are many improvements and additional features. The 80386 features multi tasking, memory management, virtual memory with or without paging, software protection and a large memory system. All the software written for the 8086 is compatible with the 80386. The physical memory is increased from 1 MB in the 8086 and 16 MB in 80286 to 4 GB in 80386. The 80386 can switch from protected to real mode without resetting the MP. The 80486 is a highly integrated device with a powerful memory management unit. A complete numeric coprocessor that is compatible with the 80387.It has a high-speed level 1 cache (8KB). It is similar to the 80386 in its architecture except for a new concept called burst cycle or burst mode in the retrieval of data.
29
any provision for memory. So the chip has to depend on peripherals like memory etc. To facilitate this MP is programmed with a large no of instructions that specialize in data or instruction fetching operations. This is one of the main reasons for the large instruction set. They have complicated memory operations to increase the speed of memory accessing. They have concepts like cache memory, inter leaving and burst mode operation that aim at idealizing the memory interactions. Since they are expected to perform a variety of operations they have a variety instruction length formats. They have a large number of addressing modes often ranging from 5-20 and beyond for higher end processors.
30
All operations done within the registers of the CPU. Fixed-length, an easily decodable instruction format. Single cycle instruction execution. Hardwired rather than micro-programmed control. A relatively large number of registers in the processor unit. Use of overlapped register windows to speed-up procedure call and return. Efficient instruction pipeline. Compiler support for efficient translation of high-level language programs into machine language programs. Its architecture simplifies the instruction set and encourages the optimization of register manipulation. Almost all instructions have simple register addressing. An important aspect of the instruction set is that it is easy to decode. Thus the Opcode and Instruction Register fields can be accessed simultaneously. Due to the simplification of the instructions and their format control logic design is very much simplified. While implementing a digital logic design it is convenient to break up the entire architecture of the complete design into individual modules and test them individually for the functionality. After all the modules are found to work correctly they are put together and checked for there working in totality. We follow this approach here in the implementation of the 32-bit RISC processor.
to the hardwares inherent limitations (speed, response time, power consumption, etc.) which resulted in capped headroom for development. In this processor we are performing various operations of cryptography so we called as cryptography processor.
32
Type1
31 29 28 2524 2019 1615 0
Type2
31 29 28 2524 2019 1615 0
Type3
31
29 28
2524
2019
1615
33
34
Instruction Register
Program Counter
Multiplexer (2:1) Multiplexer A (16:1) Memory Multiplexer D (16:1)
unit. The figure 4.6.1 explains the Block diagram of Control and Decode and simulation results are shown in figure 4.6.3
36
Device utilization summary Number of Slices Number of 4 input LUTs Number of IOs Number of bonded IOBs
: Selected Device: 4vlx15sf363-12 : : : : 11 out of 6144 19 out of 12288 41 41 out of 240 17% 0% 0%
Timing Summary Minimum period Minimum input arrival time before clock
: : :
Maximum output required time after clock : No path found Maximum combinational path delay : 5.694ns
38
39
Cell Usage : # BELS # LUT4 # FlipFlops/Latches # Clock Buffers # IO Buffers # # IBUF OBUF : : : : : : 18 16 512 1 549 37
: 512
Device utilization summary Number of Slices Number of 4 input LUTs Number of IOs Number of bonded IOBs IOB Flip Flops Number of GCLKs
: : : : : : :
Selected Device: 4vlx15sf363-12 9 out of 6144 18 out of 12288 550 550 out of 240 229% (*) 512 1 out of 32 3% 0% 0%
Timing Summary Minimum period Minimum input arrival time before clock
: No path found
RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics Cell Usage: # IOs
# BELS # INV
: :
1 1
43
# FlipFlops/Latches # FDCE
: : : : : : : :
32 32 1 1 66 34 32 32
Device utilization summary Number of Slices Number of Slice Flip-flops Number of 4 input LUTs Number of IOs Number of bonded IOBs IOB Flip Flops Number of GCLKs
: : : : : : : :
Selected Device: 4vlx15sf363-12 1 out of 6144 32 out of 12288 1 out of 12288 67 67 out of 240 32 1 out of 32 3% 27% 0% 0% 0%
Timing Summary Minimum period Minimum input arrival time before clock Maximum output required time after clock Maximum combinational path delay
45
Selected Device: 4vlx15sf363-12 : : : : : : 8 out of 6144 16 out of 12288 18 out of 12288 35 35 out of 1 out of 240 32 14% 3% 0% 0% 0%
Number of Slice Flip Flops Number of 4 input LUTs Number of IOs Number of bonded IOBs Number of GCLKs
Maximum output required time after clock : 3.806ns Maximum combinational path delay : No path found
47
# IBUF # OBUF
: 33 : 16 : Selected Device : 4vlx15sf363-12 : : : : 9 out of 6144 16 out of 12288 49 49 out of 240 20% 0% 0%
The main function of mux A is to receive the multiple inputs and produces the single output which acts as Alu operand based on the selection lines. Here the inputs are 16 with 32 bit.The figure 4.6.17 explains the block diagram of mux A and Simulation results are shown in figure 4.6.18.
49
50
Cell Usage:
# # # #
: Selected Device: 4vlx15sf363-12 : : : : 128 out of 6144 256 out of 12288 548 548 out of 240 228% (*) 2% 2%
: Speed Grade: -12 : No path found : No path found : No path found : 6.894ns
Minimum input arrival time before clock Maximum output required time after clock
51
4.6.7 Memory
The processor is designed with load/store architecture. Separate memory for instructions (program) and data Different stages of the pipeline perform simultaneous accesses to memory. This Harvard style of architecture can either be used with two completely different memory spaces, this architecture is a single dual-port memory space with separate data and instruction. Three stages of pipelining have been incorporated in the design which increases the speed of operation. The processor presented instruction set and uses a Single Instruction Single Data (SISD) execution order. Its main characteristics are: Sixteen 32-bit general purpose registers. ALU with basic arithmetic and logical operations. ROM Program Memory: The program memory as its name describes- stores instructions to be executed. It has to be non-volatile and fast. It was decided to use internal ROM as program memory, because it was the fastest option and eliminated the need for external storage. RAM Data Memory the RAM memory is a data storage block, there the stack is handled and other data are kept as variables. The figure 4.6.20 explains the Block diagram of Memory and simulation results are Shown in figure 4.6.21.
52
: NO : 50
# # # #
: : : : : : : : :
499 3 256 16 512 16 16 38 6 : : : : : : : Selected Device: 4vlx15sf363-12 422 out of 6144 512 out of 12288 275 out of 12288 50 38 out of 240 16 out of 32 15% 50% 6% 4% 2%
# IO Buffers # IBUF
Device utilization summary Number of Slices Number of Slice Flip Flops Number of 4 input LUTs Number of IOs Number of bonded IOBs Number of GCLKs
Timing Summary Minimum period Minimum input arrival time before clock
Maximum output required time after clock : 5.353ns Maximum combinational path delay : 6.940ns
54
55
: 548
# # #
: 548 : 516 : 32
Device utilization summary : Selected Device: 4vlx15sf363-12 Number of Slices Number of 4 input LUTs Number of IOs Number of bonded IOBs : 128 out of 6144 : 256 out of 12288 : 548 : 548 out of 240 228% (*) : Speed Grade: -12 : No path found 2% 2%
Minimum input arrival time before clock : No path found Maximum output required time after clock : No path found Maximum combinational path delay : 6.894ns
57
Figure 4.6.27: Simulated Timing diagram of ALU The Figure 4.6.26 explains the Block Diagram of ALU and Simulation result is shown in figure 4.6.27.
58
# # # # # # # # # # # # # # #
LUT3 LUT4 MUXCY MUXF5 VCC XORCY Flip-flops/Latches FDC Clock Buffers BUFGP IO Buffers IBUF OBUFT DSPs DSP48
: : :
76 12 38
: 112 : : : : : : : : : : : 1 31 64 64 1 1 101 69 32 22 22
Device utilization summary : Selected Device: 4vlx15sf363-12 Number of Slices Number of 4 input LU Number of IOs Number of bonded IO IOB Flip Flops Number of GCLKs Number of DSP48s : : : : : : : 296 out of 6144 539 out of 12288 102 102 out of 64 1 out of 22 out of 32 32 3% 68% 240 42% 4% 4%
Timing Summary Minimum period Minimum input arrival time before clock
Maximum output required time after clock : 3.793ns Maximum combinational path delay : No path found
60
Area is considered in terms of Number of LUTs since the processor is designed on Programmable SOC Spartan 3 Board and its Operating Frequency is in terms of MHz. Item Control&Decoder General Purpose Register Instruction Register Program Counter Memory ALU Area(No.of 4-I/P LUTs) 539 out of 12288 18 out of 12288 1 out of 12288 16 out of 12288 275 out of 12288 539 out of 12288 Table 4.1: summary of various modules Operating Frequence(MHz) 95MHZ 367MHZ 540MHZ 381MHZ 323MHZ 94MHZ
61
CHAPTER 5
The Cryptographic processor performs the tasks of instruction fetch, instruction decode, execute all in one clock cycle. First the PC value is used as an address to index the instruction memory which supplies a 32-bit value of the next instruction to be executed. This instruction is then divided into the different fields. The instructions opcode field bits [31-26] are sent to a control unit to determine the type of instruction to execute. The type of instruction then determines which control signals are to be asserted and what function the ALU is to perform, thus decoding the instruction. The instruction register address fields rs bits [25 - 21], rt bits [20 - 16], and rd bits [15-11] are used to address the register file. The register file supports two independent register reads and one register write in one clock cycle. The register file reads in the requested addresses and outputs the data values contained in these registers. These data values can then be operated on by the ALU whose operation is determined by the control unit to either compute a memory address (e.g. load or store), compute an arithmetic result (e.g. add, sub ), or perform a compare (e.g. branch). If the instruction decoded is arithmetic, the ALU result must be written to a register. If the instruction decoded is a load or a store, the ALU result is then used to address the data memory. The final step writes the ALU result or memory value back to the register file.
Once the Cryptographic processor verilog implementation is completed, our next task is to pipeline the Cryptographic processor. Pipelining, a standard feature in RISC processors, is a technique used to improve both clock speed and overall performance. Pipelining allows a processor to work on different steps of the instruction at the same time, thus more instructions can be executed in a shorter period of time. For example in the verilog single-cycle implementation, the data path is divided into
different modules, where each module must wait for the previous one to finish before it can execute, thereby completing one instruction in one long clock cycle. When the processor is pipelined, during a single clock cycle each one of those modules or stages is
62
in use at exactly the same time executing on different instructions in parallel.The Block Diagram is shown in the figure 5.1 and Simulation Results are shown in figure 5.3
63
Figure 5.3
64
: Speed : NO
: 54
: :
Selected Device : 4vlx15sf363-12 843 out of 6144 593 out of 12288 1315 out of 12288 54 13% 4% 10%
65
: : :
240 32 32
22% 6% 68%
Minimum input arrival time before clock : 15.683ns Maximum output required time after clock : 5.068ns Maximum combinational path delay Total memory usage Total equivalent gate count for design : 6.509ns : 279800 kilobytes. : 14,518gates.
66
The 32 bit cryptographic Processor perform mathematical computations used in Symmetric Key Algorithms has been designed using verilog the simulations are done with Active HDL simulator. The design is verified through exhaustive simulations. Thus processor architecture follows that one instruction executes in one clock cycle. The cryptographic processor concept proved that 20% of instruction did 80% of the work. By this we increase overall performance of the speed with low area and low propagation delay. Future Work In order to obtain a more sophisticated architecture it is necessary to add some advanced pipelining techniques .This processor can also perform floating point operations and differential equations. Apart from this it can be used in portable gaming kits, Smart cards, ATMs.
67
CHAPTER 7 REFRENCES
[1] Crypto Aware Instruction RISC Processor Nima Karimpour Darav Reza Ebrahimi Atani,Erfan Aghei,Ahmad Tamsivand ,Mahsa Rahmani and Mina Moazam IEEE-2012 [2] Antonio H. Zavala RISC Based Architecture for Computer Hardware Introduction Edicin,, 2011 IEEE.
[3] NIST, "Advanced Encryption Standard (AES), (FIPPUB 197)", November 26, 2001, http://csrc.nist.gov/publications/. [4] A. Rudra et. al., "Efficient Implementation of Rijndael Encryption with Composite Field Arithmetic", Proc.CHES2001, LNCS Vol. 2162, pp.175-188, 2001.
[5] Rohit Sharma, Vivek Kumar Sehgal, Nitin Nitin1, Pranav Bhasker, Ishita Verma , 2009, Design And Implementation Of 64-Bit RISC Processor Using Modeling And Simulation, pp. 568 573. [6] R. Uma / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 2, Mar-Apr 2012, pp.053-058. [7] IEEE TRANSACTIONS on very large scale integration (VLSI) systems, vol. 18, No 8, August 2010 1145 A High-Performance Unified-Field Reconfigurable Cryptographic Processor Jun-Hong Chen, Ming-Der Shieh, Member, IEEE, and Wen-Ching Lin. [8] FPGA Implementations of the RC6 Block Cipher Jean-Luc Beuchat Laboratoire de lInformatique du arallelisme, Ecole Normale Superieure de Lyon,46, Allee dItalie, F 69364 Lyon Cedex 07,Jean-Luc.Beuchat@ens-lyon.fr. [9] Imyong lee, Dongwook Lee, Kiyoung choi ODALRISC: A Small, Low power and Configurable 32-bit RISC processor International SOC design conference 2008. [10] Wa yne Wolf, FPGA Based System Design , Prentice Hall, 2005.
Nishitha College of Engineering and Technology 68
Computer
[11] R. Razdan and M.D. Smith, A High-Performance Micro architecture with Hardware-Programmable Functional Units,Proc. Micro-27, IEEE Computer Society, 1994, pp. 172-180. [12] Vincent t P. Heuring, and Ha rry F. Jordan, Computer Systems Design and Architecture, 2nd Edition, 2003. [13] The Practical XILINX Designer Lab Book, Dave Van den Bout, ISBN 0-13095502-7, p 30-31. [14] XILINX datasheet library, http:// www.xilinx.com/ part info/4000.pdf
[15] Evaluation of a reconfigurable computing engine for digital communication Applications, Jonas Thor, ISSN 1402-1617, p 12-17.
[16] A 32-b RISC Implemented in Enhancement-Mode JFET Ga As Rasset, T.L. ;Niederland, R.A.;Lane,J.H,Geideman,W.A.;McDonnellDouglas Astronautics Company, Huntington Beach, CA 92647 Date of Current Version: 27 March 2009
[17] A 32-b RISC/DSP microprocessor with reduced complexity Dolle, M. Jhand, S. Lehner, W. Muller, O. Schlett, M. Hyperstone Electron., Konstanz Date of Current Version: 06 August 2002 [18] VHDL-based development of a 32-b pipelined RISC processor for educational Purposes Buhler, M. Baitinger, and U.G. Stuttgart Univ. Date of Current Version: 06 August 2002
69