04 32 Bit Loss Less Comp

www.1000projects.com www.fullinterview.com www.chetanasprojects.
com
CHAPTER 1
1 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com
Introduction to VLSI
1.1 Historical perspective:
The electronics industry has achieved a phenomenal growth over the last two decades, mainly due to the rapid advances in integration technologies, large-scale systems design - in short, due to the advent of VLSI. The number of applications of integrated circuits in high-performance computing, telecommunications, and consumer electronics has been rising steadily, and at a very fast pace. Typically, the required computational power (or, in other words, the intelligence) of these applications is the driving force for the fast development of this field. The current leading-edge technologies (such as low bit-rate video and cellular communications) already provide the end-users a certain amount of processing power and portability. This trend is expected to continue, with very important implications on VLSI and systems design. One of the most important characteristics of information services is their increasing need for very high processing power and bandwidth (in order to handle real-time video, for example). The other important characteristic is that the information services tend to become more and more personalized (as opposed to collective services such as broadcasting), which means that the devices must be more intelligent to answer individual demands, and at the same time they must be portable to allow more flexibility/mobility. As more and more complex functions are required in various data processing and telecommunications devices, the need to integrate these functions in a small system/package is also increasing. The level of integration as measured by the number of logic gates in a monolithic chip has been steadily rising for almost three decades, mainly due to the rapid progress in processing technology and interconnects technology
www.1000projects.com www.fullinterview.com www.chetanasprojects.com 1.2 Advantages of IC: The most important message here is that the logic complexity per chip has been (and still is) increasing exponentially. The monolithic integration of a large number of functions on a single chip usually provides

Less area/volume and therefore, compactness Less power consumption
Less testing requirements at system level Higher reliability, mainly due to improved on-chip interconnects Higher speed, due to significantly reduced interconnection length Significant cost savings
1.3 Levels Of ICs: Digital circuits are constructed with integrated circuits. An integrated circuit (IC) is a small silicon semiconductor crystal, called a chip. Containing the electronic components for the digital gates. The various gates are interconnected inside the chip to form the circuit.
Digital ICs are categorized according to their circuit complexity. As measured by the number of logic gates in a single package, they are: Small Scale Integration (SSI). Medium Scale Integration (MSI) Large Scale Integration (LSI) Very Large Scale Integration (VLSI).
1.4 Classification of ICS by device count:
www.1000projects.com www.fullinterview.com www.chetanasprojects.com Active Device Count 1-100
Nomenclature
Functions Gates, op-amps, Many linear Applications Registers, filters etc
Technology
SSI
Bipolar Bipolar like
MSI
100-1,000
TTL, ECL MOS: NMOS,PMOS
LSI
1,000-10,000
Microprocessors Memories, computers, Signal processors
VLSI
1,00,000-10,00,000
CMOS
Very Large Scale Integration: Micro electron chip which contains billions of physical components or millions of logical The feature size (physical dimension) of the component which is placing on a VLSI chip ACMOS IC fabricated with Very Deep Sub-micron (VDSM) technology (0.09micron
components integrated (embedded) on an IC. is measured in terms of microns.
1.5 VLSI Design Flow:

1. Design Specification: The first step in high level design flow is the design specification process. This process involves specifying the behavior expected of the final design. The specifications specify the expected function and behavior of the design using textual description and graphic element. 2. Behavioral Description: 5 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com Behavioral description is created to analyze the functionality and algorithm and then framed and its performance and compliance to standard is verified.
VLSI Design Flow:
Design Specification
Behavioral Description
RTL description (VHDL)
Functional Verifying &testing
Logic Synthesis
Gate level
Logic Verification& testing
Flour Planning Automatic Planning & Routing
Physical Layout
3. RTL Description (VHDL): Once the algorithm is scrutinized, the code is written keeping in mind the functionality & its ability to be synchronized, the RTL description can be written in Gate level, Data flow or behavioral levels. A standard VHDL simulator can be used to read the RTL description and verify the correctness of the design. 4. Functional Verifying & Testing: The VHDL simulator reads VHDL description compiles it in to an internal format, and then executes the compiled format using test vectors, after compilation if any syntax errors are there they has to be removed and recompiled. After analyzing the results of the simulation stimulus for the design has to be added. This may be file of input stimulus design (or) the file output stimulus design using waveform editor the respective output waveform are to be observed to test the functionality of the design. 5. Logic Synthesis: Once the code is validated to implement the design process VHDL synthesis tool are used. The goal of the VHDL synthesis step is to create a design that implements the required functionality and constraints provided. Logic synthesis tool convert the given RTL code in to Optimized Gate level net list.
www.1000projects.com www.fullinterview.com www.chetanasprojects.com 6. Gate level: A gate level net list is the description of the design (circuit) interms of the Gates and connections between them. Gate level is an input to automatic place and route tool. 7. Logic Verification &Testing: The VHDL synthesis tool report syntax & synthesis errors. It gives errors & warnings. If it founds mismatches between RTL Simulation results & output netlist simulation results. If it is error free the next step is to map the design. 8. Flour Planning, Automatic Placing and Routing: Place and route tools are used to take the design netlist and implement the design to the target technology device. 9. Physical Layout: In this each component or primitive from the netlist are placed on the target device according to the design or architecture. The signals from one module to the other are also connected to form a Physical layout.
1.6
INTRODUCTION TO VHDL 1.6.1 What is VHDL?

VHDL stands for VHSIC Hardware Description Language, where VHSIC stands for Very High Speed Integrated Circuit. Like the name implies, VHDL is a language for describing the behavior of digital hardware. VHDL is just another way of describing what outputs of a digital circuit are desired when it is given certain inputs. The critical difference between VHDL and these other languages are that it can be readily interpreted by software, enabling the computer to accomplish your design work for you. As the size and complexity of digital systems increase, more computers aided design tools are introduced into the hardware design process. The early paper-and-pencil design methods have given way to sophisticated design entry, verification, and automatic hardware generation tools. The newest 8
www.1000projects.com www.fullinterview.com www.chetanasprojects.com addition to this design methodology is the hardware description languages (HDL). Although the concept of HLDs is not new, their widespread use in digital system design is no more than a decade old. Based on HDLs, new digital system CAD tools have been developed and are now being utilized by hardware designers. 1.6.2 VHDL History: In 1980 the US government developed the Very High Speed Integrated Circuit (VHSIC) project to enhance the electronic design process, technology, and procurement, spawning development of many advanced integrated circuit (IC) process technologies. This was followed by the arrival of VHSIC Hardware Description Language (VHDL). 1.6.3 Why We Use VHDL? There are many reasons why it makes good design sense to use VHDL: 1. Portability: Technology changes so quickly in the digital industry that discrete digital devices require constant rework in order to remain current. VHDL is designed to be device-independent, meaning that if you describe your circuit in VHDL, as opposed to designing it with discrete devices, changing hardware becomes a (relatively) trivial process. 2. Flexibility: Most working engineers can recall a situation where they felt frustrated with their customer, supervisor, or team members because the design specification that they were working with was constantly changing. Sometimes these changes can't be helped. Design work is usually focused on creating small, easily maintainable components and then integrating these components into a larger device. On larger projects different teams of engineers will each design separate parts of the project at the same time. This can mean that if one component in the project changes, all of the components must change, even those being worked on by other engineering teams. Suppose you were told to design a simple counter that set an output bit after it had counted to 100. However, the software engineer working on this project discovered that the entire design could be radically simplified if your counter 9 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com could count down from 300 instead of up to 100. If you had implemented your design in discrete circuits, you'd have to start over from scratch. But, if you'd designed using VHDL, all you'd have to do is change your code.
1.6.4 VHDL Features:

General features: VHDL can be used for design documentation, high level design, simulation, synthesis, and testing of hardware and as a driver for a physical design tool. 1. Concurrency: In VHDL the transfer statements, descriptions of components, and instantiations of gates or logical units can all be executed such that in the end they appear to have been executed simultaneously. 2. Support for design hierarchy: In VHDL the operation of a system can be specified based on its functionality, or it can be specified structurally in terms of its smaller subcomponents. 3. Library support: User and system defined primitives and descriptions reside in the library system. VHDL provides a mechanism for accessing various libraries. Moreover different designers can access these libraries.
4. Sequential statement: VHDL provides mechanism for executing sequential statements. These statements provide an easy method for modeling hardware components based on their functionality. Sequential or procedural capability is only for convenience, and the overall structure of the VHDL language remains highly concurrent. 5. Type declaration and usage:
www.1000projects.com www.fullinterview.com www.chetanasprojects.com VHDL is not limited to just bit or boolean types, but it also supports integer, floating-point, enumerated types and user-defined types. In addition, VHDL also allows array-type declarations and composite-type definitions. 6. Use of subprograms: VHDL allows the use of functions and procedures which can be used in type conversions, logic unit definitions, operator redefinitions, new operation definitions, and other applications. 7. Timing control: VHDL allows the designer to schedule values to signals and delay the actual assignment of values until a later time. It also allows the use of any number of explicitly defined clock signals. It provides features for edge detection, delay specification, setup and hold time specification, pulse width checking, and setting various time constraints. 8. Structural specification: VHDL allows the designer to describe a generic 1-bit design and use it when describing multibit regular structures in one or more dimensions.
1.7 Advantages of VHDL:

VHDL offers the following advantages for digital design: 1. Standard: VHDL is an EKE standard. Just like any standard (such as graphics X- window standard, bus communication interface standard, high-level programming languages, and so on), it reduces confusion and makes interfaces between tools, companies, and products easier. Any development to the standard would have better chances of lasting longer and have less chance of becoming obsolete due to incompatibility with others. 2. Government support: VHDL is a result of the VHSIC program; hence, it is clear that the US government supports the VHDL standard for electronic procurement. The Department of Defense (DOD) requires contractors to supply VHDL for all Application Specific Integrated Circuit (ASIC) designs.
www.1000projects.com www.fullinterview.com www.chetanasprojects.com 3. Industry support: With the advent of more powerful and efficient VHDL tools has come the growing support of the electronic industry. Companies use VHDL tools not only with regard to defense contracts, but also for their commercial designs. 4. Portability: The same VHDL code can be simulated and used in many design tools and at different stages of the design process. This reduces dependency on a set of design tools whose limited capability may not be competitive in later markets. The VHDL standard also transforms design data much easier than a design database of a proprietary design tool. 5. Modeling capability: VHDL was developed to model all levels of designs, from electronic boxes to transistors. VHDL can accommodate behavioral constructs and mathematical routines that describe complex models, such as queuing networks and analog circuits. It allows use of multiple architectures and associates with the same design during various stages of the design process. VHDL can describe low-level transistors up to very large systems. 6. Reusability: Certain common designs can be described, verified, and modified slightly in VHDL for future use. This eliminates reading and marking changes to schematic pages, which is time consuming and subject to error. For example, a parameterized multiplier VHDL code can be reused easily by changing the width parameter so that the same VHDL code can do either 16 by 16 or 12 by 8 multiplication. 7. Technology and foundry independence: The functionality and behavior of the design can be described with VHDL and verified, making it foundry and technology independent. This frees the designer to proceed without having to wait for the foundry and technology to be selected. 8. Documentation:
www.1000projects.com www.fullinterview.com www.chetanasprojects.com Single place by embedding it in the code. The combining of comments and the code that actually dictates what the design should do reduces the ambiguity between specification and implementation. 9. New design methodology: Using VHDL and synthesis creates a new methodology that increases the design productivity, shortens the design cycle, and lowers costs. It amounts to a revolution comparable to that introduced by the automatic semi-custom layout synthesis tools of the last few years. Synthesis, in the domain of digital design, is a process of translation and optimization. For example, layout synthesis is a process of taking a design netlist and translating it into a form of data that facilitates placement and routing, resulting in optimizing timing and/or chip size. Logic synthesis, on the other hand, is the process of taking a form of input (VHDL), translating it into a form (Boolean equations and synthesis tool specific), and then optimizing in terms of propagation delay and/or area. After the VHDL code is translated into an internal form, the optimization process can be performed based on constraints such as speed, area, power.
CHAPTER 2
INTRODUCTION TO LOSSLESS COMPRESSION

2.1. Objective
With the increase in silicon densities, it is becoming feasible for multiple compression systems to be implemented in parallel onto a single chip. A 32-BITsystem with distributed memory architecture is based on having multiple data compression and decompression engines working independently on different data at the same time. This data is stored in memory distributed to each processor. The objective of the project is to design a lossless parallel data compression system which operates in high-speed to achieve high compression rate. By using Parallel architecture of compressors, the data compression rates are significantly improved. Also inherent scalability of parallel architecture is possible. The main parts of the system are the two Xmatchpro based data compressors in parallel and the control blocks providing control signals for the Data compressors, allowing appropriate control of the routing of data into and from the system. Each Data compressor can process four bytes of data into and from a block of data every clock cycle. The data entering the system needs to be clocked in at a rate of 4n bytes every clock cycle, where n is the number of compressors in the system. This is to ensure that adequate data is present for all compressors to process 15 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com rather than being in an idle state. 2.2. Goal of the Thesis To achieve higher compression rates using 32-bit compression/decompression architecture with least increase in latency. 2.3. LITERATURE SURVEY
2.3.1. Compression Techniques

At present there is an insatiable demand for ever-greater bandwidth in communication networks and forever-greater storage capacity in computer system. This led to the need for an efficient compression technique. The compression is the process that is required either to reduce the volume of information to be transmitted text, fax and images or reduce the bandwidth that is required for its transmission speech, audio and video. The compression technique is first applied to the source information prior to its transmission. Compression algorithms can be classified in to two types, namely O Lossless Compression O Lossy Compression
2.3.1.1. Lossless Compression In this type of lossless compression algorithm, the aim is to reduce the amount of source information to be transmitted in such a way that, when the compressed information is decompressed, there is no loss of information. Lossless compression is said therefore, to be reversible. i.e., Data is not altered or lost in the process of compression or decompression. Decompression generates an exact replica of the original object. The Various lossless Compression Techniques are, Packbits encoding CCITT Group 3 1D CCITT Group 3 2D Lempel-Ziv and Welch algorithm LZW Huffman 16 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com Arithmetic Example applications of lossless compression are transferring data over a network as a text file since, in such applications, it is normally imperative that no part of the source information is lost during either the compression or decompression operations and file storage systems (tapes, hard disk drives, solid state storage, file servers) and communication networks (LAN, WAN, wireless).
2.3.1.2. Lossy Compression The aim of the Lossy compression algorithms is normally not to reproduce an exact copy of the source information after decompression but rather a version of it that is perceived by the recipient as a true copy. The Lossy compression algorithms are: JPEG (Joint Photographic Expert Group) MPEG (Moving Picture Experts Group) CCITT H.261 (Px64)
Example applications of lossy compression are the transfer of digitized images and audio and video streams. In such cases, the sensitivity of the human eye or ear is such that any fine details that may be missing from the original source signal after decompression are not detectable.
2.3.1.3. Text Compression There are three different types of text unformatted, formatted and hypertext and all are represented as strings of characters selected from a defined set. The compression algorithm associated with text must be lossless since the loss of just a single character could modify the meaning of a complete string. The text compression is restricted to the use of entropy encoding and in practice, statistical encoding methods. There are two types of statistical encoding methods which are used with text: one which uses single character as the basis of deriving an optimum set of code words and the 17 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com other which uses variable length strings of characters. Two examples of the former are the Huffman and Arithmetic coding algorithms and an example of the latter is Lempel-Ziv (LZ) algorithm. The majority of work on hardware approaches to lossless parallel data compression has used an adapted form of the dictionary-based Lempel-Ziv algorithm, in which a large number of simple processing elements are arranged in a systolic array [1], [2], [3], [4].
2.3.2. Previous work on Lossless Compression Methods

A second Lempel-Ziv method used a content addressable memory (CAM) capable of performing a complete dictionary search in one clock cycle [5], [6], [7]. The search for the most common string in the dictionary (normally, the most computationally expensive operation in the Lempel-Ziv algorithm) can be performed by the CAM in a single clockcycle, while the systolic array method uses a much slower deep pipelining technique to implement its dictionary search. However, compared to the CAM solution, the systolic array method has advantages in terms of reduced hardware costs and lower power consumption, which may be more important criteria in some situations than having faster dictionary searching. In [8], the authors show that hardware main memory data compression is both feasible and worthwhile. The authors also describe the design and implementation of a novel compression method, the XMatchPro algorithm. The authors exhibit the substantial impact such memory compression has on overall system performance. The adaptation of compression code for parallel implementation is investigated by Jiang and Jones [9]. They recommended the use of a processing array arranged in a tree-like structure. Although compression can be implemented in this manner, the implementation of the decompressors search and decode stages in parallel hardware would greatly increase the complexity of the design and it is likely that these aspects would need to be implemented sequentially. An FPGA implementation of a parallel binary arithmetic coding architecture that is able to process 8 bits per clock cycle compared to the standard 1 bit per cycle is described by Stefo et al [10]. Although little research has been performed on architectures involving several independent compression units working in a concurrent cooperative manner, IBM has introduced the MXT 18 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com chip [11], which has four independent compression engines operating on a shared memory area. The four Lempel-Ziv compression engines are used to provide data throughput sufficient for memory compression in computer servers. Adaptation of software compression algorithms to make use of multiple CPU systems was demonstrated by research of Penhorn [12] and Simpson and Sabharwal [13]. Penhorn used two CPUs to compress data using a technique based on the Lempel-Ziv algorithm and showed that useful compression rate improvements can be achieved, but only at the cost of increasing the learning time for the dictionary. Simpson and Sabharwa described the software implementation of compression system for a multiprocessor system based on the parallel architecture developed by Gonzalez and Smith and Storer [14]. 2.3.2.1. Statistical Methods Statistical Modeling of lossless data compression system is based on assigning Values to events depending on their probability. The higher the value, the higher the probability. The accuracy with which this frequency distribution reflects reality determines the efficiency of the model. In Markov modeling, predictions are done based on the symbols that precede the current symbol. Statistical Methods in hardware are restricted to simple higher order modeling using binary alphabets that limits speed, or simple multisymbol alphabets using zeroth-order models that limits compression. Binary alphabets limit speed because only a few bits (typically a single bit) are processed in each cycle while zeroth order models limit compression because they can only provide an inexact representation of the statistical properties of the data source.
2.3.2.2. Dictionary Methods

Dictionary Methods try to replace a symbol or group of symbols by a dictionary location code. Some dictionary-based techniques use simple uniform binary codes to rocess the information supplied. Both software and hardware based dictionary models achieve good throughput and competitive compress The UNIX utility compress uses Lempel-Ziv-2 (LZ2) algorithm and the data compression Lempel-Ziv (DCLZ) family of compressors initially invented by HewlettPackard[16] and currently being developed by AHA[17],[18] also use LZ2 derivatives. Bunton 19 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com and Borriello present another LZ2 implementation in [19] that improves on the Data Compression Lempel-Ziv method. It uses a tag attached to each dictionary location to identify which node should be eliminated once the dictionary becomes full.
2.4. XMatchPro Based System

The Lossless data compression system is a derivative of the XMatchPro Algorithm which originates from previous research of the authors [15] and advances in FPGA technology. The flexibility provided by using this technology is of great interest since the chip can be adapted to the requirements of a particular application easily. The drawbacks of some of the previous methods are overcome by using the XmatchPro algorithm in design. The objective is then to obtain better compression ratios and still maintain a high throughput so that the compression/decompression processes do not slow the original system down.
CHAPTER 3
FUNCTIONS OF LOSSLESS COMPRESSION
3.1. BASICS OF COMMUNICATION A sender can compress data before transmitting it and a receiver can decompress the data after receiving it, thus effectively increasing the data rate of the communication channel. Lossless data compression is the process of encoding a body of data into a smaller body of data that can at a later time be uniquely decoded back to the original data.
Lossless compression removes redundant information from the data while they are being transmitted or before they are stored in memory, and lossless decompression reintroduces the redundant information to recover fully the original data. In the same way, the data is compressed before it is stored and decompressed when it is retrieved, thus increasing the effective capacity of the storage device.
3.2. Proposed Method

www.1000projects.com www.fullinterview.com www.chetanasprojects.com In [1], the author discusses about the Parallel Algorithm that can be implemented form High Speed Data Compression. The authors gives the basic idea about how the Data Compression is carried out using the Lempel-Ziv Algorithm and how it could be altered for Parallelism of the algorithm. The author describes the Lempel-Ziv algorithm as a very efficient universal data compression technique, based upon an incremental parsing technique, which maintains codebooks of parsed phrases at the transmitter and at the receiver. An important feature of the algorithm is that it is not necessary to determine a model of the source, which generates the data. According to the author, in an attempt to increase the speed of the algorithm on general-purpose processors, the algorithm has been parallelised to run on two processors. 3.3. Background The author explains a novel architecture for a high-performance lossless data compressor that is organized around a selectively shiftable Content Addressable Memory, which permits full matching, the processor offers very high performance with good compression of computer-based data. The author also gives details about the operation, architecture and performance of the Data Compression Techniques. He also introduces the XMatchPro lossless data compressor. In [3], the authors discuss about the parallelism in Data Compression Techniques and the authors explain the Parallel Architecture for High Speed Data Compression. In this paper, the author expresses Data Communication as an essential component of high-speed data communication and storage. In [4], the authors discuss about the various methods of Data Compression and their Techniques and drawbacks and propose a new methodology for a high speed Parallel Lossless Data Compression. The authors describes the research and hardware implementation of a high performance parallel multi compressor chip which could able to meet the intensive data processing demands of highly concurrent system. The authors also investigate the performance of alternative input and output routing strategies for realistic data sets demonstrate that the design of parallel compression devices involves important trade offs that affect compression performance, latency and throughput. Compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source. 24 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
3.4. Usage of XMatchPro Algorithm

The Lossless Parallel Data Compression system designed uses the XMatchPro Algorithm. The XMatchPro algorithm uses a fixed-width dictionary of previously seen data and attempts to match the current data element with a match in the dictionary. It works by taking a 4-byte word and trying to match or partially match this word with past data. This past data is stored in a dictionary, which is constructed from a content addressable memory. As each entry is 4 bytes wide, several types of matches are possible. If all the bytes do not match with any data present in the dictionary they are transmitted with an additional miss bit. If all the bytes are matched then the match location and match type is coded and transmitted, this match is then moved to the front of the dictionary. The dictionary is maintained using a move to front strategy whereby a new tuple is placed at the front of the dictionary while the rest move down one position. When the dictionary becomes full the tuple placed in the last position is discarded leaving space for a new one. The coding function for a match is required to code several fields as follows: A zero followed by: 1). Match location: It uses the binary code associated to the matching location. 2). Match type: Indicates which bytes of the incoming tuple have matched. 3). Characters that did not match transmitted in literal form. A description of the XMatchPro algorithm in pseudo-code is given in the figure below. clear the dictionary; set the next free location (NFL) to 0; Do { read in a tuple T from the data stream; search the dictionary for tuple T; IF (full or partial hit) { determine the best match location ML and match type MT; output 0; 25 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com output any required literal characters of T; } ELSE {
output 1; output tuple T;
} IF (full hit) { move dictionary entries 0 to ML -1 down by one location ; } ELSE { move all dictionary entries down by one location; increment NFL (if dictionary is not full); } copy tuple T to dictionary location 0; } WHILE (more data is to be compressed); Fig.3.2. Pseudo Code for XMatchPro Algorithm With the increase in silicon densities, it is becoming feasible for multiple XMatchPros to be implemented in parallel onto a single chip. A parallel system with distributed memory architecture is based on having multiple data compression and decompression engines working independently on different data at the same time. This data is stored in memory distributed to each processor. There are several approaches in which data can be routed to and from the compressors that will affect the speed, compression and complexity of the system. Lossless compression removes redundant information from the data while they are transmitted or before they are stored in memory. Lossless decompression reintroduces the redundant information to recover fully the original data. There are two important contributions made by the current parallel compression & decompression work, namely, improved compression rates and the inherent scalability. Significant 26 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com improvements in data compression rates have been achieved by sharing the computational requirement between compressors without significantly compromising the contribution made by individual compressors. The scalability feature permits future bandwidth or storage demands to be met by adding additional compression engines.
3.4.1. The XMatchPro based Compression system

Previous research on the lossless XMatchPro data compressor has been on optimising and implementing the XMatchPro algorithm for speed, complexity and compression in hardware. The XMatchPro algorithm uses a fixed width dictionary of previously seen data and attempts to match the current data element with a match in the dictionary. It works by taking a 4-byte word and trying to match this word with past data. This past data is stored in a dictionary, which is constructed from a content addressable memory. Initially all the entries in the dictionary are empty & 4-bytes are added to the front of the dictionary, while the rest move one position down if a full match has not occurred. The larger the dictionary, the greater the number of address bits needed to identify each memory location, reducing compression performance. Since the number of bits needed to code each location address is a function of the dictionary size greater compression is obtained in comparison to the case where a fixed size dictionary uses fixed address codes for a partially full dictionary. In the parallel XMatchPro system, the data stream to be compressed enters the compression system, which is then partitioned and routed to the compressors. For managing the supply so that neither stall conditions nor data overflow occurs. parallel compression systems, it is important to ensure all compressors are supplied with sufficient data by
3.4.2. The Main Component- Content Addressable Memory

Dictionary based schemes copy repetitive or redundant data into a lookup table (such as CAM) and output the dictionary address as a code to replace the data. The compression architecture is based around a block of CAM to realize the dictionary. This is necessary since the search operation must be done in parallel in all the entries in the dictionary to allow high and data-independent 27 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com throughput.
Fig.3.3. Conceptual view of CAM The number of bits in a CAM word is usually large, with existing implementations ranging from 36 to 144 bits. A typical CAM employs a table size ranging between a few hundred entries to 32K entries, corresponding to an address space ranging from 7 bits to 15 bits. The length of the CAM varies with three possible values of 16, 32 or 64 tuples trading complexity for compression. The no. of tuples present in the dictionary has an important effect on compression. In principle, the larger the dictionary the higher the probability of having a match and improving compression. On the other hand, a bigger dictionary uses more bits to code its locations degrading compression when processing small data blocks that only use a fraction of the dictionary length available. The width of the CAM is fixed with 4bytes/word. Content Addressable Memory (CAM) compares input search data against a table of stored data, and returns the address of the matching data. CAMs have a single clock cycle throughput making them faster than other hardware and software-based search systems. The input to the system is the search word that is broadcast onto the searchlines to the table of stored data. Each stored word has a matchline that indicates whether the search word and 28 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com stored word are identical (the match case) or are different (a mismatch case, or miss). The matchlines are fed to an encoder that generates a binary match location corresponding to the matchline that is in the match state. An encoder is used in systems where only a single match is expected. The overall function of a CAM is to take a search word and return the matching memory location. 3.4.2.1. Managing Dictionary entries Since the initialization of a compression CAM sets all words to zero, a possible input word formed by zeros will generate multiple full matches in different locations. TheXmatchpro compression system simply selects the full match closer to the top. This operational mode initializes the dictionary to a state where all the words with location address bigger than zero are declared invalid without the need for extra logic. The reason is that location x can never generate a match until the data contents of location x-1 are different from zero because locations closer to the top have higher priority generating matches. Also to increase dictionary efficiency, only one dictionary position contains repeated information and in the best case, all the dictionary positions contain different data.
CHAPTER 4
XMATCHPRO LOSSLESS COMPRESSION SYSTEM

4.1. DESIGN METHODOLOGY
The XMatchPro algorithm is efficient at compressing the small blocks of data necessary with cache and page based memory hierarchies found in computer systems. It is suitable for high performance hardware implementation. The XMatchPro hardware achieves a throughput 2-3 times greater than other high-performance hardware implementation. The core component of the system is the XMatchPro based Compression / Decompression system. The XMatchPro is a highspeed lossless dictionary based data compressor. The XMatchPro algorithm works by taking an incoming four-byte tuple of data and attempting to match fully or partially match the tuple with the past data.
4.2. FUNCTIONAL DESCRIPTION

The XMatchPro algorithm maintains a dictionary of data previously seen and attempts to match the current data element with an entry in the dictionary, replacing it with a shorter code referencing the match location. Data elements that do not produce a match are transmitted in full (literally) prefixed by a single bit. Each data element is exactly 4 bytes in width and is referred to as tuple. This feature gives a guaranteed input data rate during compression and thus also guaranteed data rates during decompression, irrespective of the data mix. Also the 4-byte tuple size gives an inherently higher throughput than other algorithms, which tend to operate on a byte stream. 31 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com The dictionary is maintained using move to front strategy, where by the current tuple is placed at the front of the dictionary and the other tuples move down by one location as necessary to make space. The move to front strategy aims to exploit locality in the input data. If the dictionary becomes full, the tuple occupying the last location is simply discarded. A full match occurs when all characters in the incoming tuple fully match a Dictionary entry. A partial match occurs when at least any tow of the characters in the incoming tuple match exactly with a dictionary entry, with the characters that do not match being transmitted literally. The use of partial matching improves the compression ratio when compared with allowing only 4 byte matches, but still maintains high throughput. If neither a full nor partial match occurs, then a miss is registered and a single miss bit of 1 is transmitted followed by the tuple itself in literal form. The only exception to this is the first tuple in any compression operation, which will always generate a miss as the dictionary begins in an empty state. In this case no miss bit is required to prefix the tuple. At the beginning of each compression operation, the dictionary size is reset to zero. The dictionary then grows by one location for each incoming tuple being placed at the front of the dictionary and all other entries in the dictionary moving down by one location. A full match does not grow the dictionary, but the move-to-front rule is still applied. This growth of the dictionary means that code words are short during the early stages of compressing a block. Because the XMatchPro algorithm allows partial matches, a decision must be made about which of the locations provides the best overall match, with the selection criteria being the shortest possible number of output bits.
4.3 Parallel Xmatchpro Compression

The Input router of the system divide the data to be processed and Output router concatenate the data to give as output of the parallel compression system respectively. The split data by Input Router are sent to each of the compression system or XMatchPro compression engines where the data is compressed and is sent to the Output Router to merge the compressed data and sent out as the compressed data. 32 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
For multiple compression systems, it is important to ensure all compressors are supplied with sufficient data by managing the supply so that neither stall conditions nor data overflow occurs. There are several approaches in which data can be routed in and out of the compressors. The basic method for input routing used in this project is done by getting twice the size of the input to the XMatchPro compressor, the lower 32 bit is given to the Compressor 0 and the higher 32 bits are given to the other Compressor 1. The method is used for output routing and additional output pins are assigned for both the Compressor 0 and Compressor 1.
4.4. DATA FLOW FOR PARALLEL XMATCHPRO COMPRESSOR

The below figure shows graphically the general concept of this approach. Thedata stream to be compressed enters the compression system, which is then partitioned and routed to the compressors. Appropriate methods for routing the data are discussed below, but to achieve good compression performance, it is important that the partitioning mechanism supplies the compressors with sufficient data to keep them active for as great a proportion of the time that the stream is entering the system as is possible. As the compressors operate independently, each producing its own compressed data stream, a mechanism is required to merge these streams in such a way that subsequent decompression can be performed correctly. Also, subsequent decompression needs to be capable of operating in an appropriate parallel fashion, otherwise, a disparity in compression and decompression 33 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
www.1000projects.com www.fullinterview.com www.chetanasprojects.com speeds will occur, reducing overall throughput. The data Flow for parallel compression system is given in Figure 3 below.
4.5. INPUT ROUTING

As per the Algorithm, XMatchPro can process four bytes of data per clock cycle, then to ensure that all are busy, data must enter the system at a rate of 4n bytes per clock cycle, where n is the number of compressors in the system. It can be achieved by 2 methods. 1. 2. Interleaved input method Blocked Input method
4.5.1 INTERLEAVED INPUT METHOD In the Interleaved input approach, the router divides the input data into 4-byte wide data streams that are fed into the compressors. This is illustrated in the below figure for two compressors, but the technique can be extended to supply data to any required number of compressors. 34 www.1000projects.com www.fullinterview.com www.chetanasprojects.com
7 5 7 8 5 6 3 4 1 2 I R
3 1
XMatchPro
8 6
4 2
XMatchPro
Fig.4.3. Interleaved Input Routing The interleaved method avoids the need for input buffering as data are continuously fed to the compressors and acted upon immediately on arrival. This minimization of latency is an important advantage of the approach.
4.5.2. BLOCKED INPUT METHOD In the blocked input approach, a fixed length block of data is sent from the incoming data stream to each of the compressors in turn, as shown in the following figure. In this scheme, the data has to arrive at the dedicated memory of the compressor at a rate slower than it can be processed, thereby allowing the memory to be filled with data.
To minimize the latency introduced in blocked mode, compressors need to start processing data as it arrives. It is also important to ensure that sufficient data are available for the compressor to work on while data are being routed to the other compressors, as no new data can be added to the dedicated memory until this process has been completed.
4.5.3. PROPOSED INPUT ROUTING In this project, Blocked Input Routing method is used for inputting data to compression system as it is more advantageous than interleaved input approach. The advantage of going for this method is that the complexity in designing and coding is reduced and helps in achieving superior compression ratio. But at the same time number of input pins increase as it assigns another set of pins for the second XMatchPro compressor. Actually, the input data size for one XMatchPro compressor is 32 bit, so another 32 bit is required for the second XMatchPro compressor. In order to achieve this, while designing the parallel compressor an input data is assigned as 64 bits and the lower order 32 bits is sent to one XMatchPro compressor and the higher order 32 bit is sent to the second XMatchPro compressor. Thus, by doing so, both the XMatchPro compressor is supplied with the data simultaneously and this increases the speed of compression. 4.6. OUTPUT ROUTING The lengths of the compressed data output blocks from an array of parallel compressors will generally not be constant due to the variability of redundancy in the data. As in decompression, the system would not know the data boundaries of each block, these data cannot be sent directly to the output bus and additional manipulation is needed in order to guarantee that the original data can be recovered. It is achieved by 3 methods, namely, 1. Single Compressed Block 2. Multiple Compressed Block 3. Interleaved Compressed Block 4.6.1. SINGLE COMPRESSED BLOCK In this method, it is assumed that the data enters the system using the blocked mode technique and that the compressed data are collected in the compressors output buffers. The buffer outputs are routed in strict order of the compressor number and a boundary tag that
contains information on the block length is added so as to precede the data. As the tag will enter the decompression system, first, it will know the length of the compressed data input belonging to any given decompression engine. The introduction of tags is detrimental to the compression ratio, but this effect diminishes as the block length is increased, as the overhead of one tag per block of compressed data is largely constant.
One of the drawbacks of this approach is that the data output may contain idle time. This arises since a whole block of data needs to be compressed before the appropriate tag values can be determined and, so, a compressor may still be compressing its data when router becomes available. 4.6.2. MULTIPLE COMPRESSED BLOCK The Figure 2.7 illustrates the format of an output data stream containing multiple blocks. This is similar to the single block scheme, but, instead of waiting for each compressor to finish processing its block of data, all compressors need to finish compressing blocks before the data are sent. In this technique, the tag provides information on the length or the compressed data to enable correct decompression. As all compressors need to have completed their operations before an output can be produced, this approach has a greater latency compared with the single compressed block case, but, as fewer tags are needed, the effect on the compression ratio is reduced. The combined tag is shorter than the sum of the individual tags as the output bus granularity is of fixed width. Output tags are sized in accordance with the output but width in order to simplify the routing architecture and decoding operations, even though fewer bits are required to determine block size boundaries.
4.6.3. INTERLEAVED COMPRESSED BLOCK The figure illustrates the interleaved approach for routing multiple compressed blocks of data to the output stream. Instead of waiting for a whole block to be compressed, a predefined fixed length of compressed data is always sent to the output. If a compressor has not completed its operations, the system must wait until the data block has been produced.
There are two benefits of this approach compared with the previously discussed methods. First, there is a reduction in latency since data can be sent to the output before the whole block is compressed. Second, since no boundary tags are required, the compression ratio is improved. At the end of compression sequence, the interleaved approach needs to add dummy tags to the output stream in receipt of the stop signal, output routing continues until all compressors have completed operations on their input blocks. It is likely that the final interleaved block from each compressor will contain insufficient data to fill the required fixed output length and, so, the dummy data tags are added as required in order to maintain the interleave length.
4.6.4. PROPOSED OUTPUT ROUTING In this project, the Interleaved technique was selected as the Output Routing method as it imparts no overhead to maintain compressed data boundaries, and so has no detrimental effect on the compression ratio. The advantage of going for this method is that the complexity in designing and coding is reduced. But at the same time number of input pins increase as it assigns another set of pins for the second XMatchPro compressor. Actually, the output data size for one 32 bit compressor is either 7 bit (match is found) or 33 bit (match not found), so another set of 33 bit in case of no match and 7 bit in case of match is required for the second compressor. In order to achieve this, while designing the parallel compressor an output data is assigned with two sets of 7 bits as well as two 33 bit output pins. Thus, by doing so, both the compressors are supplied with data simultaneously and the output data is transmitted via the external bus 4.7. IMPLEMENTATION OF XMATCHPRO BASED COMPRESSOR The block diagram gives the details about the components of a single 32 bit Compressor. There are three components namely, COMPARATOR, ARRAY, CAMCOMPARATOR. The comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for equal and 0 for unequal. The CAM COMPARATOR searches the CAM dictionary entries for a full match of the input data given.
The reason for choosing a full match is to get a prototype of the high throughout Xmatchpro
compressor with reduced complexity and high performance. If a full match occurs, the match-hit signal is generated and the corresponding match location is given as output by the CAM Comparator.. If no full match occurs, the corresponding data that is given as input at the given time is got as output. Array is of length of 64X32 bit locations. This is used to store the unmatched incoming data and when a new data comes, the incoming data is compared with all the data stored in this array. If a match occurs, the corresponding match location is sent as output else the incoming data is stored in next free location of the array & is sent as output. The last component is the cam comparator and is used to send the match location of the CAM dictionary as output if a match has occurred. This is done by getting match information as input from the comparator. Suppose the output of the comparator goes high for any input, the match is found and the corresponding address is retrieved and sent as output along with one bit to indicate that match is found. At the same time, suppose no match occurs, or no matched data is found, the incoming data is stored in the array and it is sent as the output. These are the functions of the three components of the Compressor. The hardware descriptions of these modules are done using VHDL Language. VHDL is an acronym for Very high-speed integrated circuits Hardware Description Language. It can be used to model a digital system at many levels of the abstraction, ranging from the algorithmic level to gate level. The VHDL language can be regarded as an integrated amalgamation of the following languages: o o o o o Sequential language Concurrent language Net-list language Timing specifications Waveform generation language.
So the language has constructs that enable you to express the concurrent or sequential behavior of a digital system with or without timing. It also allows modeling the system as an inter-connection of components. Test waveforms can also be generated using the same constructs. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore, models written in this language can be verified using a VHDL simulator. VHDL is event driven, to allow for efficient simulation of hardware. Computations are only performed when some data has changed (event occurred).
CHAPTER 5
DESIGN OF PARALLEL LOSSLESS COMPRESSION/DECOMPRESSION SYSTEM

5.1. DESIGN OF COMPRESSOR / DECOMPRESSOR The block diagram [Fig.12] gives the details about the components of a single 32-bit compressor / decompressor. The Same design approach is used for designing a 64-bit Compression/Decompression system which is essentially used for comparison of increased compression rates given by the 64-bit Lossless Parallel High-Speed Data Compression System. There are three components namely COMPRESSOR, DECOMPRESSOR and CONTROL.The compressor has the following components - COMPARATOR, ARRAY, and CAMCOMPARATOR. The comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for equal and 0 for unequal. Array is of length of 64X32bit locations. This is used to store the unmatched in coming data and when the next new data comes, that data is compared with all the data stored in this array. If the incoming data matches with any of the data stored in array, the Comparator generates a match signal and sends it to Cam Comparator. The last component is the Cam comparator and is used to send the incoming data and all the stored data in array one by one to the comparator. Suppose output of comparator goes high for any input, then the match is found and the corresponding address (match location) is retrieved and sent as output along with one bit to indicate the match is found. At the same time, suppose no match is found, then the incoming data stored in the array is sent as output. These are the functions of the three components of the XMatchPro based compressor. The decompressor has the following components Array and Processing Unit. Array has the same function as that of the array unit which is used in the Compressor. It is also of the same length. Processing unit checks the incoming match hit data and if it is 0, it indicates that the data is not present in the Array, so it stores the data in the Array and if the match hit data is 1, it indicates the data is present in the Array, then it instructs to find the data from the Array with the help of the address input and sends as output to the data out.
Fig.5.1. Block Diagram of 32 bit Compressor/Decompressor
The Control has the input bit called C / D i.e., Compression / Decompression indicates whether compression or decompression has to be done. If it has the value 0 then compressor is stared when the value is 1 decompression is done.
5.2. DESIGN OF 64 BIT SINGLE COMPRESSION/DECOMPRESSION SYSTEM

The 64 bit single Compression /Decompression system is done to compare the compression rate & area with the parallel compression / decompression system which gives higher throughput. The design & functionality of the 64- bit Single compression system is same as that of the 32-bit compressor / decompressor discussed above except the input is changed from 32-bit to 64bit & hence to accommodate more data in CAM dictionary, the array size is increased from 64X32 to 128 X 64. The match location is now given by 7 bits for the fixed 128 locations of the memory. In the Compression system, the comparator compares the incoming 64 bit data with data entries that are previously stored in the memory. If any of the dictionary entries matches with the incoming data, then a match signal is generated to provide the corresponding match location as output along with match signal. If no match occurs, then the incoming data is stored in the dictionary entry and the data is given as output of the compressor. The Decompression system hence gets 64 bit data if a match has not occurred or 1 bit match signal & 7 bit match location to be processed by the 128 X 64 array in decompressor to give the data in the match location as output. The block diagram of the 64 bit Compression / Decompression System is given below.
Fig.5.2. Block Diagram of 64 bit Compression / Decompression system
5.3. PARALLEL COMPRESSION / DECOMPRESSION SYSTEM
5.3.1 DESIGN OF PARALLEL COMPRESSION SYSTEM The block diagram gives the details about the components of a parallel Compression system. Here the compressor is instantiated twice for the two processors. The number of input as well as the number of output pins are twice as that of the single compressor.
The components of the single instantiated compressor are same as that of the 32-bit compressor. The components involved in the 32-bit compressor are namely, COMPARATOR, ARRAY, and CAMCOMPARATOR. The comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for equal and 0 for unequal. Array is of length of 64X32bit locations. This is used to store the unmatched incoming data and when a new data comes, that data is compared with the all the data stored in this array for a match. If no match occurs, the incoming data is stored in next free location of the array. The last component is the cam comparator and is used to send the incoming data and all the stored data in array one by one to the comparator.
Comparator goes high for any input the match is found and the corresponding address is retrieved and sent as output along with one bit to indicate that a match is found. At the same time, suppose that no match is found, then the incoming data is stored in the array and is sent as output. These are the functions of the three components of the 32-bit Compressor. 5.3.2 DESIGN OF PARALLEL DECOMPRESSION SYSTEM The parallel Decompression system is also implemented by concatenating the outputs of two compressors in parallel architecture and giving those data as input to the parallel decompression system comprising two 32-bit decompression system discussed above for single compression system. The 32-bit decompressor has the following components Array and Processing Unit. Array has the same function as that of the array unit which is used in the Compressor. It is also of the same length. Processing unit checks the incoming match hit data and if it is 0, it indicates that the data is not present in the Array, so it stores the data in the Array. If the match hit data is 1, it indicates the data is present in the Array, then it instructs to find the data from the Array with the help of the address input (match location) and sends as output to the data out. 5.4. SIMULATION RESULTS The design coded in VHDL is simulated using Modelsim of Mentor Graphics. The obtained waveforms are as follows Fig.5.4.Comparator
Fig.5.5. Cam Comparator
Fig.5.6.Content Addressable Memory
Fig.5.7. 32-bit Single Compression Top Module
Fig.5.8. 32-bit Single Compression Top Module Decimal inputs
Fig.5.9. 64-bit Single Compression System -Top module
Fig.5.10. 64-bit Single Compression System -Test bench Waveform
Fig.5.11. 32-bit Single Decompression Top Module
Fig.5.12. 32-bit Single Decompression- Test bench Waveform
Fig.5.13. Parallel Compression System - 64-bit input Top module
Fig.5.14. Parallel Compression System - 64-bit input Test bench
5.5. RTL SCHEMATIC The RTL Schematic for vhdl codes are generated using Xilinx Project Navigator 8.1i Fig.5.15. 32 bit Single Compression System
Fig.5.16. 32 bit Single Compression System
Fig.5.17. 64 bit Single Compression System
Fig.5.18. RTL Schematic for 64 bit Single Compression System
Fig5.19. 64 bit Parallel Compression System
Fig.5.20. RTL Schematic for 64 bit Parallel Compression System
5.6. Xilinx Synthesis Results for Target Device xc2v1500bg575-6
5.6.1. 32-bit Single Compression System =============================================================== * Synthesis Options Summary * =============================================================== ---- Source Parameters Input File Name : "xmatchpro.prj" Input Format : mixed Ignore Synthesis Constraint File : NO ---- Target Parameters Output File Name : "xmatchpro" Output Format : NGC Target Device : xc2v1500-6-bg575 =============================================================== * HDL Compilation * =============================================================== Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/comparator.vhd" in Library work. Architecture arch _comp of Entity comparator is up to date. Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/camcomp.vhd" in Library work. Architecture arch_cam64 of Entity camcomp is up to date. Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/cam.vhd" in Library work. Architecture arch_cam of Entity cam is up to date. Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/xmatchpro.vhd" in Library work. Architecture arch_xmatch of Entity xmatchpro is up to date. Table 4.1. 32-bit Single Compression System - HDL Synthesis Report Macro Statistics # ROMS 4x1 bit ROM # Adders/Subtractors 32-bit adder # Registers 1-bit register 32-bit register 6-bit register # Latches 1-bit latch 6-bit latch # Comparators 32-bit comparator equal No. 64 64 1 1 68 1 66 1 2 11 64 64 43
5.6.2. 64-bit Single Compression System

=============================================================== * Synthesis Options Summary * =============================================================== ---- Source Parameters Input File Name : "xmatchpro.prj" Input Format : mixed Ignore Synthesis Constraint File : NO ---- Target Parameters Output File Name : "xmatchpro" Output Format : NGC Target Device : xc2v1500-6-bg575 =============================================================== * HDL Compilation * =============================================================== Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/comparator.vhd" in Library work. Architecture arch_comp of Entity comparator is up to date. Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/camcomp.vhd" in Library work. Architecture arch_cam64 of Entity camcomp is up to date. Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/cam.vhd" in Library work. Architecture arch_cam of Entity cam is up to date. Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/xmatchpro.vhd" in Library work. Architecture arch_xmatchpro of Entity xmatchpro is up to date. Table 5.2. 64-bit Single Compression System - HDL Synthesis Report Macro Statistics Nos. # ROMS 128 4x1 bit ROM 128 # Adders/Subtractors 1 32-bit adder 1 # Registers 132 1-bit register 1 32-bit register 1 64-bit register 129 7-bit register 1 # Latches 2 1-bit latch 1 7-bit latch 1 # Comparators 128 64-bit comparator equal 128
5.6.4. 64-bit Parallel Decompression System =============================================================== * Synthesis Options Summary * =============================================================== ---- Source Parameters Input File Name : "LL_decomp.prj" Input Format : mixed Ignore Synthesis Constraint File : NO ---- Target Parameters Output File Name : "LL_decomp" Output Format : NGC Target Device : xc2v1500-6-bg575 =============================================================== * HDL Compilation * =============================================================== Compiling vhdl file "E:/proj/xilinx/dual_decomp/dual_decomp/de_xmatchpro.vhd" in Library work. Architecture arch_de_camcomparator of Entity de_xmatchpro is up to date. Compiling vhdl file "E:/proj/xilinx/dual_decomp/dual_decomp/LL_decomp.vhd" in Library work. Architecture arch_dualdecomp of Entity ll_decomp is up to date. Table 5.4. 64-bit Parallel Decompression System - HDL Synthesis Report
Macro Statistics # Adders / Subtractors
Nos. 2
32-bit adder # Latches 32-bit latch # Multiplexers 32-bit 64-to-1 multiplexer
2 130 130 2 2
CHAPTER 6
CHAPTER 6 -ANALYSIS OF RESULTS 6.1. Device Utilization of Various Modules Table 6.1. Compression Device Utilization Summary for Selected Device: xc2v1500bg575-6 Modules: Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: IOB Flip Flops: Number of GCLKs: 32-bit Single Compression 1756 out of 7680 22% 2064 out of 15360 13% 1368 out of 15360 8% 74 out of 392 18% 39 2 out of 16 12% 64-bit Single Compression 6819 out of 7680 88% 8168 out of 15360 53% 4776 out of 15360 31% 139 out of 392 35% 72 2 out of 16 12% 64-bit Parallel Compression 3560 out of 7680 46% 4206 out of 15360 27% 2930 out of 15360 19% 145 out of 392 36% 78 2 out of 16 12%
6.2. CADENCE RTL Compiler Reports The Hardware designs done are compiled in Cadence RTL compiler and the results are as follows: 6.2.1. 32-bit Single Compression System 6.2.1.1. Area Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 17 2007 08:42:56 PM Module: scomp_32 Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Instance Cells Cell Area Net Area Wireload ---------------------------------------------------------------------scomp_32 5393 116863 0 TSMC32K_Conservative (S)
6.2.1.2. Power Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 17 2007 08:43:13 PM Module: scomp_32 Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Leakage Internal Net Switching Instance Cells Power(nW) Power(nW) Power(nW) Power(nW) ----------------------------------------------------------------------scomp_32 5393 4.255 5832894.166 2001783.940 7834678.105 6.2.1.3. Timing Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 17 2007 08:42:21 PM Module: scomp_32 Technology libraries: typical 1.3
tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Fanout Load Slew Delay Arrival (fF) (ps) (ps) (ps) ---------------------------------------------------------------------(clock clk) launch 0R u3 Wr_addr_reg_reg[31]/CK setup 0 +365 7451 R ----------------------------------(clock clk) capture 7991 R ---------------------------------------------------------------------Timing slack : 540ps Start-point : u3/Wr_addr_reg_reg[28]/CK End-point : u3/Wr_addr_reg_reg[31]/SI 6.2.2. 64-bit Single Compression System 6.2.2.1 Area Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 17 2007 11:52:05 PM Module: scomp_64 Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Instance Cells Cell Area Net Area Wireload ---------------------------------------------------------------------scomp_64 19722 453869 0 TSMC128K_Conservative (S) u3 19722 453869 0 TSMC128K_Conservative (S) u128 357 2641 0 TSMC8K_Conservative (S) 6.2.2.2. Power Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 17 2007 11:52:21 PM Module: scomp_64 Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 Pin Type
pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Leakage Internal Net Switching Instance Cells Power(nW) Power(nW) Power(nW) Power(nW) -----------------------------------------------------------------------scomp_64 19722 16.615 15342272.460 8297750.296 23640022.756 u3 19722 16.615 15342272.460 3961783.800 19304056.260 u128 357 0.148 4627.810 5019.480 9647.290 6.2.2.3. Timing Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 17 2007 11:51:20 PM Module: scomp_64 Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Fanout Load Slew Delay Arrival (fF) (ps) (ps) (ps) ---------------------------------------------------------------------(clock clk) launch 0R u3 ----------------------------------(clock clk) capture 14887 R ---------------------------------------------------------------------Timing slack : 7221ps Start-point : u3/Wr_addr_reg_reg[29]/CK End-point : u3/Wr_addr_reg_reg[31]/SI 6.2.3. 64-bit Parallel Compression System 6.2.3.1. Area Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 18 2007 12:41:44 AM Module: d_comp Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented Pin Type
============================================================ Instance Cells Cell Area Net Area Wireload ------------------------------------------------------------------------d_comp 10875 237975 0 TSMC64K_Conservative (S) A2 5359 118081 0 TSMC32K_Conservative (S) u3 5359 118081 0 TSMC32K_Conservative (S) u64 167 1317 0 TSMC8K_Conservative (S) inc_add_147_24 61 519 0 TSMC8K_Conservative (S) 6.2.3.2 Power Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 18 2007 12:42:00 AM Module: d_comp Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Leakage Internal Net Switching Instance Cells Power(nW) Power(nW) Power(nW) Power(nW) ----------------------------------------------------------------------d_comp 10875 8.757 11741538.652 3983148.824 15724687.476 A1 5359 4.302 5877754.854 918183.960 6795938.814 u3 5359 4.302 5877754.854 918183.960 6795938.814 u64 167 0.085 2840.508 2840.400 5680.908 inc_add_147_24 61 0.060 5387.845 5562.000 10949.845 6.2.3.3 Timing Report ============================================================ Generated by: Encounter(r) RTL Compiler v06.10-p003_1 Generated on: Apr 18 2007 12:41:26 AM Module: d_comp Technology libraries: typical 1.3 tpz973gtc 230 ram_128x16A 0.0 ram_256x16A 0.0 rom_512x16A 0.0 pllclk 4.3 Operating conditions: typical (balanced_tree) Wireload mode: segmented ============================================================ Fanout Load Slew Delay Arrival (fF) (ps) (ps) (ps) ----------------------------------------------------------------------(clock clk) launch 0R A2 u3 Pin Type
Wr_addr_reg_reg[31]/CK setup 0 +365 6488 R -----------------------------------(clock clk) capture 7991 R ----------------------------------------------------------------------Timing slack : 1503ps Start-point : A2/u3/Wr_addr_reg_reg[0]/CK End-point : A2/u3/Wr_addr_reg_reg[31]/SI 6.2.3.4. Gates Used Report Type Instances Area Area % --------------------------------------sequential 4328 173111.036 72.7 inverter 289 991.282 0.4 buffer 2 13.579 0.0 logic 6256 63859.583 26.8 --------------------------------------total 10875 237975.480 100.0 6.3. Cadence RTL Compiler Screenshots The ASIC Synthesis is done using Cadence RTL compiler 6.0 and the screenshots after elaborating and synthesizing the coding are as follows
Fig.6.1. 32-bit Single Compression System
Fig.6.2. 32-bit comparator
Fig.6.3. 32-bit Cam Comparator
Fig.6.4. 64-bit Comparator Schematic 1
Fig.6.5. 64-bit Comparator Schematic 2
Fig.6.6. 64-bit Parallel Compression System
Fig.6.7. Expanded Schematic of Parallel Compression System
6.4. Obtained Compression Ratio The Parallel architecture achieves high compression rates. For the project done, the compression ratio is verified using test bench for top module of 64 bit Single compression system & 64 bit Parallel compression system. The bit sequences are given as an input file & the compressed output files size is noted. The Compression ratio is given by uncompressed data divided by compressed data of the designed modules. For typical disk data: The Compression ratio for 64 bit Single Compression system is 1.69 1.82 The Compression ratio for 64 bit Parallel Compression system is 1.886 1.96 Also, the Parallel compression system operates in high speed achieving a throughput of 271 Mbytes/s. The system operates at a frequency of 33.89 MHz for a 64 bit input. This is shown using synthesis in Leonardo Spectrum as follows
6.5. Critical Path Report of Leonardo Spectrum Critical path #1, (path slack = 0.1): data arrival time 29.73
data required time (default specified - setup time) data required time data arrival time slack 29.78 29.73 ---------0.05
29.78
-------------------------------------------------------------------------------------------
FUTURE SCOPE
As future work, improving compression for the disk data set by increasing dictionary length & introducing run length coding techniques to the algorithm to improve compression ratio is considered. Further work considered includes providing the facility to select or dynamically change the routing strategy in the multiple compressor systems depending on data characteristics or system requirements. Similarly, a dynamic system that allocates additional compressors depending on the current throughput is also possible.
CONCLUSION
The various modules are designed and coded using VHDL. The source codes are simulated and the various waveforms are obtained for all the modules. Since the Compression /Decompression system uses XMatchPro algorithm, speed of compression throughput is high. The Improved Compression ratio is achieved in parallel Compression architecture with least increase in latency. The High speed throughput is achieved. The architecture Provides inherent scalability in future. The total time required to transmit compressed data is less than that of transmitting uncompressed data. This can lead to a performance benefit, as the bandwidth of a link appears greater when transmitting compressed data and hence more data can be transmitted in a given amount of time There is a potential of doubling the performance of storage / communication system by increasing the available transmission bandwidth and data capacity with minimum investment. It can be applied in Computer systems, High performance storage devices.
BIBLOGRAPHY
[1] S. Henriques and N. Ranganathan, High Speed VLSI Design for Lempel-Ziv Based Data Compression, IEEE Trans. Circuits and Systems, vol. 40, no. 2, pp. 90-106, Feb. 1993. [2] B. Jung and W.P. Burleson, A VLSI Systolic Array Architecture for Lempel-Ziv Based Data Compression, Proc. IEEE Intl Symp. Circuits and Systems, pp. 65-68, June 1994. [3] B. Jung and W.P. Burleson, Real Time VLSI Compression for High Speed Wireless Local Networks, Proc. Data Compression Conf., Mar. 1995. [4] J.A. Storer and J.H. Rief, A Parallel Architecture for High Speed Data Compression, J. Parallel and Distributed Computing, vol. 13, pp. 222-227, 1991. [5] C.Y. Lee and R.Y. Yang, High-Throughput Data Compressor Designs Using Content Addressable Memory, IEE Proc. Conf. Circuits Devices Systems, vol. 142, pp. 69-73, Feb. 1995. [6] B.W.Y. Wei, R. Tarver, J.S. Kim, and K. Ng, A Single Chip Lempel-Ziv Data Compressor, Proc. IEEE Intl Symp. Circuits and Systems (ISCAS), pp. 1953-1955, May 1993. [7] S. Jones, Partial-Matching Lossless Data Compression Hard-ware, IEE Proc. Conf. Computer and Digital Techniques, vol. 147,no. 5, pp. 329-334, Sept. 2000. [8] M. Kjelso, M. Gooch, and S.Jones, Design & performance of a main memory hardware data compressor, in Proc, 22nd Eur. Micro Conf., Prague, Czech Republic, Sept. 1996, pp. 423-430. [9] J. Jiang and S. Jones, Parallel Design of Arithmetic Coding, IEE Proc. Conf. Computer and Digital Techniques, vol. 141, no. 6, pp. 327-323, Nov. 1994. [10] R. Stefo, J.L. Nunez, C. Feregrino, S. Mahapatra, and S. Jones, FPGA-Based Modelling Unit for High Speed Lossless Arithmetic Coding, Proc. 11th Intl Conf. Field-Programmable Logic and Applications, pp. 643-647, Aug. 2001. [11] P.A. Franaszek, P. Heidelbeger, D.E. Poff, and J.T. Robinson, Algorithm and Data Structures for Compressed Memory Machines, IBM J. Research and Development, vol. 45, no. 2, Mar.2001. [12] W.T. Penzhorn, A Parallel Algorithm for High-Speed Data Compression, Proc. IEEE South African Symp. Comm. and Signal Processing, pp. 173-175, Sept. 1992. [13] J.L. Simpson and C.L. Sabharwal, A Multiple Processor Approach to Data Compression, Applied Computing Machinery, pp. 641-649, 1998. [14] M.E. Gonzalez Smith and J.A. Storer, Parallel Algorithms for Data Compression, J. ACM, vol. 32, no. 2, pp. 344-373, Apr. 1985. [15] Mark Milward, J.L. Nunez and David Mulvaney, Design and Implementation of a Lossless Parallel High-Speed Data Compression System, IEEE Transactions on Parallel and Distributed Systems, Vol. 15, No.6, June 2004.
[16] M. Bianchi, J. Katto, and D. Van Maren, Data compression in a half-inch reel-to-reel tape drive, Hewlett-Packard J.,vol.40,no.6,pp. 2631, 1989. [17] Primer: Data Compression Lempel-Ziv (DCLZ), Advanced Hardware Architectures Inc., Pullman, WA, 1996. [18] AHA3211 40 Mbytes/s DCLZ Data Compression Coprocessor IC, Advanced Hardware Architectures Inc., Pullman, WA, 1997. [19] S.Bunton and G.Borriello, Practical dictionary management for hardware data compression, Commun. ACM, vol. 35, no. 1, pp. 95104,1992. [20] J.L. Nunez, C. Feregrino, S. Jones, and S. Bateman, X-MatchPRO: A ProASIC-based 200 Mbytes/s Full-Duplex Lossless Data Compressor, Proc. 11th Intl Conf. Field-Programmable Logic and Applications, pp. 613-617, Aug. 2001 [21] A VHDL Primer- 3rd Edition J.Bhasker
< APPENDIX
PROGRAM
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity cam32 is Port ( start : in STD_LOGIC; clk : in STD_LOGIC; reset : in STD_LOGIC; datain : in STD_LOGIC_VECTOR (31 downto 0); dataout : out STD_LOGIC_VECTOR (31 downto 0); mh0, mh1, mh2, mh3, mh4, mh5, mh6, mh7 : inout std_logic; mh8,mh9,mh10,mh11,mh12,mh13,mh14,mh15 : inout std_logic; mh16,mh17,mh18,mh19,mh20,mh21,mh22,mh23 : inout std_logic; mh24,mh25,mh26,mh27,mh28,mh29,mh30,mh31 : inout std_logic; mh32,mh33,mh34,mh35,mh36,mh37,mh38,mh39 : inout std_logic; mh40,mh41,mh42,mh43,mh44,mh45,mh46,mh47 : inout std_logic; mh48,mh49,mh50,mh51,mh52,mh53,mh54,mh55 : inout std_logic; mh56,mh57,mh58,mh59,mh60,mh61,mh62,mh63 : inout std_logic; matchhit : out std_logic; --Found Match addr_out : out std_logic_vector(5 downto 0)); end cam32; architecture Behavioral of cam32 is component comparator32 port ( data1 : in std_logic_vector(31 downto 0); data2 : in std_logic_vector(31 downto 0); reset : in std_logic; eqout : out std_logic); end component; component camcomp32 port (reset : in std_logic; -- Reset start : in std_logic; -- Write h0, h1, h2, h3, h4, h5, h6, h7 : in std_logic; h8,h9,h10,h11,h12,h13,h14,h15 : in std_logic; h16,h17,h18,h19,h20,h21,h22,h23 : in std_logic; h24,h25,h26,h27,h28,h29,h30,h31 : in std_logic; h32,h33,h34,h35,h36,h37,h38,h39 : in std_logic; h40,h41,h42,h43,h44,h45,h46,h47 : in std_logic; h48,h49,h50,h51,h52,h53,h54,h55 : in std_logic; h56,h57,h58,h59,h60,h61,h62,h63 : in std_logic; mhit : out std_logic; --Found Match addr_out : out std_logic_vector(5 downto 0) ); end component ; type ram_array is array (0 to 63) of std_logic_vector(31 downto 0); signal data : ram_array;
signal sg_matchhit : std_logic; signal sg_addr_out : std_logic_vector(5 downto 0); begin u0 : comparator32 port map (data1 => datain, data2 => data(conv_integer(0)), reset => reset, eqout => mh0); u1 : comparator32 port map (data1 => datain, data2 => data(conv_integer(1)), reset => reset, eqout => mh1); u2 : comparator32 port map (data1 => datain, data2 => data(conv_integer(2)), reset => reset, eqout => mh2); u3 : comparator32 port map (data1 => datain, data2 => data(conv_integer(3)), reset => reset, eqout => mh3); u4 : comparator32 port map (data1 => datain, data2 => data(conv_integer(4)), reset => reset, eqout => mh4); u5 : comparator32 port map (data1 => datain, data2 => data(conv_integer(5)), reset => reset, eqout => mh5); u6 : comparator32 port map (data1 => datain, data2 => data(conv_integer(6)), reset => reset, eqout => mh6); u7 : comparator32 port map (data1 => datain, data2 => data(conv_integer(7)), reset => reset, eqout => mh7); u8 : comparator32 port map (data1 => datain, data2 => data(conv_integer(8)), reset => reset, eqout => mh8); u9 : comparator32 port map (data1 => datain, data2 => data(conv_integer(9)), reset => reset, eqout => mh9); u10 : comparator32 port map (data1 => datain, data2 => data(conv_integer(10)), reset => reset, eqout => mh10); u11 : comparator32 port map (data1 => datain, data2 => data(conv_integer(11)), reset => reset, eqout => mh11); u12 : comparator32 port map (data1 => datain, data2 => data(conv_integer(12)), reset => reset, eqout => mh12); u13 : comparator32 port map (data1 => datain, data2 => data(conv_integer(13)), reset => reset, eqout => mh13); u14 : comparator32 port map (data1 => datain, data2 => data(conv_integer(14)), reset => reset, eqout => mh14); u15 : comparator32 port map (data1 => datain, data2 => data(conv_integer(15)), reset => reset, eqout => mh15); u16 : comparator32 port map (data1 => datain, data2 => data(conv_integer(16)), reset => reset, eqout => mh16); u17 : comparator32 port map (data1 => datain, data2 => data(conv_integer(17)), reset => reset, eqout => mh17); u18 : comparator32 port map (data1 => datain, data2 => data(conv_integer(18)), reset => reset, eqout => mh18); u19 : comparator32 port map (data1 => datain, data2 => data(conv_integer(19)), reset => reset, eqout => mh19); u20 : comparator32 port map (data1 => datain, data2 => data(conv_integer(20)), reset => reset, eqout => mh20); u21 : comparator32 port map (data1 => datain, data2 => data(conv_integer(21)), reset => reset, eqout => mh21); u22 : comparator32 port map (data1 => datain, data2 => data(conv_integer(22)), reset => reset, eqout => mh22); u23 : comparator32 port map (data1 => datain, data2 => data(conv_integer(23)), reset => reset, eqout => mh23); u24 : comparator32 port map (data1 => datain, data2 => data(conv_integer(24)), reset => reset, eqout => mh24); u25 : comparator32 port map (data1 => datain, data2 => data(conv_integer(25)),
reset => reset, eqout => mh25); u26 : comparator32 port map (data1 => datain, data2 => data(conv_integer(26)), reset => reset, eqout => mh26); u27 : comparator32 port map (data1 => datain, data2 => data(conv_integer(27)), reset => reset, eqout => mh27); u28 : comparator32 port map (data1 => datain, data2 => data(conv_integer(28)), reset => reset, eqout => mh28); u29 : comparator32 port map (data1 => datain, data2 => data(conv_integer(29)), reset => reset, eqout => mh29); u30 : comparator32 port map (data1 => datain, data2 => data(conv_integer(30)), reset => reset, eqout => mh30); u31 : comparator32 port map (data1 => datain, data2 => data(conv_integer(31)), reset => reset, eqout => mh31); u32 : comparator32 port map (data1 => datain, data2 => data(conv_integer(32)), reset => reset, eqout => mh32); u33 : comparator32 port map (data1 => datain, data2 => data(conv_integer(33)), reset => reset, eqout => mh33); u34 : comparator32 port map (data1 => datain, data2 => data(conv_integer(34)), reset => reset, eqout => mh34); u35 : comparator32 port map (data1 => datain, data2 => data(conv_integer(35)), reset => reset, eqout => mh35); u36 : comparator32 port map (data1 => datain, data2 => data(conv_integer(36)), reset => reset, eqout => mh36); u37 : comparator32 port map (data1 => datain, data2 => data(conv_integer(37)), reset => reset, eqout => mh37); u38 : comparator32 port map (data1 => datain, data2 => data(conv_integer(38)), reset => reset, eqout => mh38); u39 : comparator32 port map (data1 => datain, data2 => data(conv_integer(39)), reset => reset, eqout => mh39); u40 : comparator32 port map (data1 => datain, data2 => data(conv_integer(40)), reset => reset, eqout => mh40); u41 : comparator32 port map (data1 => datain, data2 => data(conv_integer(41)), reset => reset, eqout => mh41); u42 : comparator32 port map (data1 => datain, data2 => data(conv_integer(42)), reset => reset, eqout => mh42); u43 : comparator32 port map (data1 => datain, data2 => data(conv_integer(43)), reset => reset, eqout => mh43); u44 : comparator32 port map (data1 => datain, data2 => data(conv_integer(44)), reset => reset, eqout => mh44); u45 : comparator32 port map (data1 => datain, data2 => data(conv_integer(45)), reset => reset, eqout => mh45); u46 : comparator32 port map (data1 => datain, data2 => data(conv_integer(46)), reset => reset, eqout => mh46); u47 : comparator32 port map (data1 => datain, data2 => data(conv_integer(47)), reset => reset, eqout => mh47); u48 : comparator32 port map (data1 => datain, data2 => data(conv_integer(48)), reset => reset, eqout => mh48); u49 : comparator32 port map (data1 => datain, data2 => data(conv_integer(49)), reset => reset, eqout => mh49); u50 : comparator32 port map (data1 => datain, data2 => data(conv_integer(50)), reset => reset, eqout => mh50); u51 : comparator32 port map (data1 => datain, data2 => data(conv_integer(51)), reset => reset, eqout => mh51); u52 : comparator32 port map (data1 => datain, data2 => data(conv_integer(52)),
reset => reset, eqout => mh52); u53 : comparator32 port map (data1 => datain, data2 => data(conv_integer(53)), reset => reset, eqout => mh53); u54 : comparator32 port map (data1 => datain, data2 => data(conv_integer(54)), reset => reset, eqout => mh54); u55 : comparator32 port map (data1 => datain, data2 => data(conv_integer(55)), reset => reset, eqout => mh55); u56 : comparator32 port map (data1 => datain, data2 => data(conv_integer(56)), reset => reset, eqout => mh56); u57 : comparator32 port map (data1 => datain, data2 => data(conv_integer(57)), reset => reset, eqout => mh57); u58 : comparator32 port map (data1 => datain, data2 => data(conv_integer(58)), reset => reset, eqout => mh58); u59 : comparator32 port map (data1 => datain, data2 => data(conv_integer(59)), reset => reset, eqout => mh59); u60 : comparator32 port map (data1 => datain, data2 => data(conv_integer(60)), reset => reset, eqout => mh60); u61 : comparator32 port map (data1 => datain, data2 => data(conv_integer(61)), reset => reset, eqout => mh61); u62 : comparator32 port map (data1 => datain, data2 => data(conv_integer(62)), reset => reset, eqout => mh62); u63 : comparator32 port map (data1 => datain, data2 => data(conv_integer(63)), reset => reset, eqout => mh63); u64 : camcomp32 port map (reset => reset, start => start, h0 => mh0, h1 => mh1, h2 => mh2, h3 => mh3, h4 => mh4, h5 => mh5, h6 => mh6, h7 => mh7, h8 => mh8, h9 => mh9, h10 => mh10, h11 => mh11, h12 => mh12, h13 => mh13, h14 => mh14, h15 => mh15, h16 => mh16, h17 => mh17, h18 => mh18, h19 => mh19, h20 => mh20, h21 => mh21, h22 => mh22, h23 => mh23, h24 => mh24, h25 => mh25, h26 => mh26, h27 => mh27, h28 => mh28, h29 => mh29, h30 => mh30, h31 => mh31, h32 => mh32, h33 => mh33, h34 => mh34, h35 => mh35, h36 => mh36, h37 => mh37, h38 => mh38, h39 => mh39, h40 => mh40, h41 => mh41, h42 => mh42, h43 => mh43, h44 => mh44, h45 => mh45, h46 => mh46, h47 => mh47, h48 => mh48, h49 => mh49, h50 => mh50, h51 => mh51, h52 => mh52, h53 => mh53, h54 => mh54, h55 => mh55, h56 => mh56, h57 => mh57, h58 => mh58, h59 => mh59, h60 => mh60, h61 => mh61, h62 => mh62, h63 => mh63, mhit => sg_matchhit, addr_out => sg_addr_out); process(clk,reset,start,sg_matchhit,sg_addr_out) variable Wr_addr :integer := 0; begin -wait until clk'event and clk = '1'; if (reset = '1') then data <= (others=>(others=>'0')); matchhit <= '0'; addr_out <= "000000"; elsif ( clk'event and clk='1') then if (start = '0') then if (sg_matchhit = '0') then if (wr_addr = 63) then wr_addr := 0;
end if; data(conv_integer(wr_addr)) <= datain; matchhit <= '0'; dataout <= datain; wr_addr := wr_addr + 1; else matchhit <= '1'; addr_out <= sg_addr_out; end if; end if; end if; end process; end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity cam64 is Port ( clk : in STD_LOGIC; reset : in STD_LOGIC; start : in STD_LOGIC; datain : in STD_LOGIC_VECTOR (63 downto 0); dataout : out STD_LOGIC_VECTOR (63 downto 0); mh0, mh1, mh2, mh3, mh4, mh5, mh6, mh7 : inout std_logic; mh8,mh9,mh10,mh11,mh12,mh13,mh14,mh15 : inout std_logic; mh16,mh17,mh18,mh19,mh20,mh21,mh22,mh23 : inout std_logic; mh24,mh25,mh26,mh27,mh28,mh29,mh30,mh31 : inout std_logic; mh32,mh33,mh34,mh35,mh36,mh37,mh38,mh39 : inout std_logic; mh40,mh41,mh42,mh43,mh44,mh45,mh46,mh47 : inout std_logic; mh48,mh49,mh50,mh51,mh52,mh53,mh54,mh55 : inout std_logic; mh56,mh57,mh58,mh59,mh60,mh61,mh62,mh63 : inout std_logic; mh64,mh65,mh66,mh67,mh68,mh69,mh70,mh71 : inout std_logic; mh72,mh73,mh74,mh75,mh76,mh77,mh78,mh79 : inout std_logic; mh80,mh81,mh82,mh83,mh84,mh85,mh86,mh87 : inout std_logic; mh88,mh89,mh90,mh91,mh92,mh93,mh94,mh95 : inout std_logic; mh96,mh97,mh98,mh99,mh100,mh101,mh102,mh103 : inout std_logic; mh104,mh105,mh106,mh107,mh108,mh109,mh110,mh111 : inout std_logic; mh112,mh113,mh114,mh115,mh116,mh117,mh118,mh119 : inout std_logic; mh120,mh121,mh122,mh123,mh124,mh125,mh126,mh127 : inout std_logic; matchhit : out std_logic; --Found Match addr_out: out std_logic_vector(6 downto 0)); end cam64; architecture Behavioral of cam64 is component comparator64 port ( data1 : in std_logic_vector(63 downto 0); data2 : in std_logic_vector(63 downto 0);
reset : in std_logic; eqout : out std_logic); end component; component camcomp64 port (reset : in std_logic; -- Reset start : in std_logic; -- Write h0, h1, h2, h3, h4, h5, h6, h7 : in std_logic; h8,h9,h10,h11,h12,h13,h14,h15 : in std_logic; h16,h17,h18,h19,h20,h21,h22,h23 : in std_logic; h24,h25,h26,h27,h28,h29,h30,h31 : in std_logic; h32,h33,h34,h35,h36,h37,h38,h39 : in std_logic; h40,h41,h42,h43,h44,h45,h46,h47 : in std_logic; h48,h49,h50,h51,h52,h53,h54,h55 : in std_logic; h56,h57,h58,h59,h60,h61,h62,h63 : in std_logic; h64, h65, h66, h67, h68, h69, h70, h71 : in std_logic; h72,h73,h74,h75,h76,h77,h78,h79 : in std_logic; h80,h81,h82,h83,h84,h85,h86,h87 : in std_logic; h88,h89,h90,h91,h92,h93,h94,h95 : in std_logic; h96,h97,h98,h99,h100,h101,h102,h103 : in std_logic; h104,h105,h106,h107,h108,h109,h110,h111 : in std_logic; h112,h113,h114,h115,h116,h117,h118,h119 : in std_logic; h120,h121,h122,h123,h124,h125,h126,h127 : in std_logic; mhit : out std_logic; addr_out: out std_logic_vector(6 downto 0)); end component ; type ram_array is array (0 to 127) of std_logic_vector(63 downto 0); signal data : ram_array; signal sg_matchhit : std_logic; signal sg_addr_out : std_logic_vector(6 downto 0); begin u0 : comparator64 port map (data1 => datain, data2 => data(conv_integer(0)), reset => reset, eqout => mh0); u1 : comparator64 port map (data1 => datain, data2 => data(conv_integer(1)), reset => reset, eqout => mh1); u2 : comparator64 port map (data1 => datain, data2 => data(conv_integer(2)), reset => reset, eqout => mh2); u3 : comparator64 port map (data1 => datain, data2 => data(conv_integer(3)), reset => reset, eqout => mh3); u4 : comparator64 port map (data1 => datain, data2 => data(conv_integer(4)), reset => reset, eqout => mh4); u5 : comparator64 port map (data1 => datain, data2 => data(conv_integer(5)), reset => reset, eqout => mh5); u6 : comparator64 port map (data1 => datain, data2 => data(conv_integer(6)), reset => reset, eqout => mh6); u7 : comparator64 port map (data1 => datain, data2 => data(conv_integer(7)), reset => reset, eqout => mh7); u8 : comparator64 port map (data1 => datain, data2 => data(conv_integer(8)), reset => reset, eqout => mh8); u9 : comparator64 port map (data1 => datain, data2 => data(conv_integer(9)), reset => reset, eqout => mh9); u10 : comparator64 port map (data1 => datain, data2 => data(conv_integer(10)), reset => reset, eqout => mh10); u11 : comparator64 port map (data1 => datain, data2 => data(conv_integer(11)),
reset => reset, eqout => mh92); u93 : comparator64 port map (data1 => datain, data2 => data(conv_integer(93)), reset => reset, eqout => mh93); u94 : comparator64 port map (data1 => datain, data2 => data(conv_integer(94)), reset => reset, eqout => mh94); u95 : comparator64 port map (data1 => datain, data2 => data(conv_integer(95)), reset => reset, eqout => mh95); u96 : comparator64 port map (data1 => datain, data2 => data(conv_integer(96)), reset => reset, eqout => mh96); u97 : comparator64 port map (data1 => datain, data2 => data(conv_integer(97)), reset => reset, eqout => mh97); u98 : comparator64 port map (data1 => datain, data2 => data(conv_integer(98)), reset => reset, eqout => mh98); u99 : comparator64 port map (data1 => datain, data2 => data(conv_integer(99)), reset => reset, eqout => mh99); u100 : comparator64 port map (data1 => datain, data2 => data(conv_integer(100)), reset => reset, eqout => mh100); u101 : comparator64 port map (data1 => datain, data2 => data(conv_integer(101)), reset => reset, eqout => mh101); u102 : comparator64 port map (data1 => datain, data2 => data(conv_integer(102)), reset => reset, eqout => mh102); u103 : comparator64 port map (data1 => datain, data2 => data(conv_integer(103)), reset => reset, eqout => mh103); u104 : comparator64 port map (data1 => datain, data2 => data(conv_integer(104)), reset => reset, eqout => mh104); u105 : comparator64 port map (data1 => datain, data2 => data(conv_integer(105)), reset => reset, eqout => mh105); u106 : comparator64 port map (data1 => datain, data2 => data(conv_integer(106)), reset => reset, eqout => mh106); u107 : comparator64 port map (data1 => datain, data2 => data(conv_integer(107)), reset => reset, eqout => mh107); u108 : comparator64 port map (data1 => datain, data2 => data(conv_integer(108)), reset => reset, eqout => mh108); u109 : comparator64 port map (data1 => datain, data2 => data(conv_integer(109)), reset => reset, eqout => mh109); u110 : comparator64 port map (data1 => datain, data2 => data(conv_integer(110)), reset => reset, eqout => mh110); u111 : comparator64 port map (data1 => datain, data2 => data(conv_integer(111)), reset => reset, eqout => mh111); u112 : comparator64 port map (data1 => datain, data2 => data(conv_integer(112)), reset => reset, eqout => mh112); u113 : comparator64 port map (data1 => datain, data2 => data(conv_integer(113)), reset => reset, eqout => mh113); u114 : comparator64 port map (data1 => datain, data2 => data(conv_integer(114)), reset => reset, eqout => mh114); u115 : comparator64 port map (data1 => datain, data2 => data(conv_integer(115)), reset => reset, eqout => mh115); u116 : comparator64 port map (data1 => datain, data2 => data(conv_integer(116)), reset => reset, eqout => mh116); u117 : comparator64 port map (data1 => datain, data2 => data(conv_integer(117)), reset => reset, eqout => mh117); u118 : comparator64 port map (data1 => datain, data2 => data(conv_integer(118)), reset => reset, eqout => mh118); u119 : comparator64 port map (data1 => datain, data2 =>
data(conv_integer(119)), reset => reset, eqout => mh119); u120 : comparator64 port map (data1 => datain, data2 => data(conv_integer(120)), reset => reset, eqout => mh120); u121 : comparator64 port map (data1 => datain, data2 => data(conv_integer(121)), reset => reset, eqout => mh121); u122 : comparator64 port map (data1 => datain, data2 => data(conv_integer(122)), reset => reset, eqout => mh122); u123 : comparator64 port map (data1 => datain, data2 => data(conv_integer(123)), reset => reset, eqout => mh123); u124 : comparator64 port map (data1 => datain, data2 => data(conv_integer(124)), reset => reset, eqout => mh124); u125 : comparator64 port map (data1 => datain, data2 => data(conv_integer(125)), reset => reset, eqout => mh125); u126 : comparator64 port map (data1 => datain, data2 => data(conv_integer(126)), reset => reset, eqout => mh126); u127 : comparator64 port map (data1 => datain, data2 => data(conv_integer(127)), reset => reset, eqout => mh127); u128 : camcomp64 port map (reset => reset, start => start, h0 => mh0, h1 => mh1, h2 => mh2, h3 => mh3, h4 => mh4, h5 => mh5, h6 => mh6, h7 => mh7, h8 => mh8, h9 => mh9, h10 => mh10, h11 => mh11, h12 => mh12, h13 => mh13, h14 => mh14, h15 => mh15, h16 => mh16, h17 => mh17, h18 => mh18, h19 => mh19, h20 => mh20, h21 => mh21, h22 => mh22, h23 => mh23, h24 => mh24, h25 => mh25, h26 => mh26, h27 => mh27, h28 => mh28, h29 => mh29, h30 => mh30, h31 => mh31, h32 => mh32, h33 => mh33, h34 => mh34, h35 => mh35, h36 => mh36, h37 => mh37, h38 => mh38, h39 => mh39, h40 => mh40, h41 => mh41, h42 => mh42, h43 => mh43, h44 => mh44, h45 => mh45, h46 => mh46, h47 => mh47, h48 => mh48, h49 => mh49, h50 => mh50, h51 => mh51, h52 => mh52, h53 => mh53, h54 => mh54, h55 => mh55, h56 => mh56, h57 => mh57, h58 => mh58, h59 => mh59, h60 => mh60, h61 => mh61, h62 => mh62, h63 => mh63, h64 => mh64,h65 => mh65,h66=> mh66,h67 => mh67,h68 => mh68,h69 => mh69,h70 => mh70, h71 => mh71,h72 => mh72,h73 => mh73,h74 => mh74,h75 => mh75,h76 => mh76,h77 => mh77,h78 => mh78, h79 => mh79,h80 => mh80,h81 => mh81,h82 => mh82,h83 => mh83,h84 => mh84,h85 => mh85,h86 => mh86, h87 => mh87,h88 => mh88,h89 => mh89,h90 => mh90,h91 => mh91,h92 => mh92,h93 => mh93,h94 => mh94,h95 => mh95,h96 => mh96, h97 => mh97,h98 => mh98,h99 => mh99,h100 => mh100,h101 => mh101,h102 => mh102,h103 => mh103,h104 => mh104,h105 => mh105, h106 => mh106,h107 => mh107,h108 => mh108,h109 => mh109,h110 => mh110,h111 => mh111,h112 => mh112,h113 => mh113,h114 => mh114,h115 => mh115,h116 => mh116, h117 => mh117,h118 => mh118,h119 => mh119,h120 => mh120,h121 => mh121,h122 => mh122,h123 => mh123,h124 => mh124,h125 => mh125,h126 => mh126,h127 => mh127, mhit => sg_matchhit, addr_out => sg_addr_out); process(clk,reset,start,sg_matchhit,sg_addr_out) --process variable Wr_addr :integer := 0; begin --wait until clk'event and clk = '1';
if (reset = '1') then data <= (others=>(others=>'0')); matchhit <= '0'; addr_out <= "0000000"; elsif ( clk'event and clk='1') then if (start = '0') then if (sg_matchhit = '0') then if (wr_addr = 127) then wr_addr := 0; end if; data(conv_integer(wr_addr)) <= datain; matchhit <= '0'; dataout <= datain; wr_addr := wr_addr + 1; else matchhit <= '1'; addr_out <= sg_addr_out; end if; end if; end if; end process; end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity camcomp32 is Port ( reset : in STD_LOGIC; start : in STD_LOGIC; h0 : in STD_LOGIC; h1 : in STD_LOGIC; h2 : in STD_LOGIC; h3 : in STD_LOGIC; h4, h5, h6, h7 : in std_logic; h8,h9,h10,h11,h12,h13,h14,h15 : in std_logic; h16,h17,h18,h19,h20,h21,h22,h23 : in std_logic; h24,h25,h26,h27,h28,h29,h30,h31 : in std_logic; h32,h33,h34,h35,h36,h37,h38,h39 : in std_logic; h40,h41,h42,h43,h44,h45,h46,h47 : in std_logic; h48,h49,h50,h51,h52,h53,h54,h55 : in std_logic; h56,h57,h58,h59,h60,h61,h62,h63 : in std_logic; mhit : out std_logic; --Found Match addr_out : out std_logic_vector(5 downto 0)); end camcomp32; architecture Behavioral of camcomp32 is
begin process(reset,start,h0, h1, h2, h3, h4, h5, h6, h7, h8,h9,h10,h11,h12,h13,h14,h15, h16,h17,h18,h19,h20,h21,h22,h23, h24,h25,h26,h27,h28,h29,h30,h31, h32,h33,h34,h35,h36,h37,h38,h39, h40,h41,h42,h43,h44,h45,h46,h47, h48,h49,h50,h51,h52,h53,h54,h55, h56,h57,h58,h59,h60,h61,h62,h63) begin if (reset = '1') then mhit <= '0'; addr_out <= "000000"; elsif (start = '0') then if (h0 = '1') then mhit <= '1'; addr_out <= "000000"; elsif (h1 = '1') then mhit <= '1'; addr_out <= "000001"; elsif (h2 = '1') then mhit <= '1'; addr_out <= "000010"; elsif (h3 = '1') then mhit <= '1'; addr_out <= "000011"; elsif (h4 = '1') then mhit <= '1'; addr_out <= "000100"; elsif (h5 = '1') then mhit <= '1'; addr_out <= "000101"; elsif (h6 = '1') then mhit <= '1'; addr_out <= "000110"; elsif (h7 = '1') then mhit <= '1'; addr_out <= "000111"; elsif (h8 = '1') then mhit <= '1'; addr_out <= "001000"; elsif (h9 = '1') then mhit <= '1'; addr_out <= "001001"; elsif (h10 = '1') then mhit <= '1'; addr_out <= "001010"; elsif (h11 = '1') then mhit <= '1'; addr_out <= "001011"; elsif (h12 = '1') then mhit <= '1'; addr_out <= "001100";
elsif (h13 = '1') then mhit <= '1'; addr_out <= "001101"; elsif (h14 = '1') then mhit <= '1'; addr_out <= "001110"; elsif (h15 = '1') then mhit <= '1'; addr_out <= "001111"; elsif (h16 = '1') then mhit <= '1'; addr_out <= "010000"; elsif (h17 = '1') then mhit <= '1'; addr_out <= "010001"; elsif (h18 = '1') then mhit <= '1'; addr_out <= "010010"; elsif (h19 = '1') then mhit <= '1'; addr_out <= "010011"; elsif (h20 = '1') then mhit <= '1'; addr_out <= "010100"; elsif (h21 = '1') then mhit <= '1'; addr_out <= "010101"; elsif (h22 = '1') then mhit <= '1'; addr_out <= "010110"; elsif (h23 = '1') then mhit <= '1'; addr_out <= "010111"; elsif (h24 = '1') then mhit <= '1'; addr_out <= "011000"; elsif (h25 = '1') then mhit <= '1'; addr_out <= "011001"; elsif (h26 = '1') then mhit <= '1'; addr_out <= "011010"; elsif (h27 = '1') then mhit <= '1'; addr_out <= "011011"; elsif (h28 = '1') then mhit <= '1'; addr_out <= "011100"; elsif (h29 = '1') then mhit <= '1'; addr_out <= "011101"; elsif (h30 = '1') then mhit <= '1'; addr_out <= "011110";
elsif (h49 = '1') then mhit <= '1'; addr_out <= "110001"; elsif (h50 = '1') then mhit <= '1'; addr_out <= "110010"; elsif (h51 = '1') then mhit <= '1'; addr_out <= "110011"; elsif (h52 = '1') then mhit <= '1'; addr_out <= "110100"; elsif (h53 = '1') then mhit <= '1'; addr_out <= "110101"; elsif (h54 = '1') then mhit <= '1'; addr_out <= "110110"; elsif (h55 = '1') then mhit <= '1'; addr_out <= "110111"; elsif (h56 = '1') then mhit <= '1'; addr_out <= "111000"; elsif (h57 = '1') then mhit <= '1'; addr_out <= "111001"; elsif (h58 = '1') then mhit <= '1'; addr_out <= "111010"; elsif (h59 = '1') then mhit <= '1'; addr_out <= "111011"; elsif (h60 = '1') then mhit <= '1'; addr_out <= "111100"; elsif (h61 = '1') then mhit <= '1'; addr_out <= "111101"; elsif (h62 = '1') then mhit <= '1'; addr_out <= "111110"; elsif (h63 = '1') then mhit <= '1'; addr_out <= "111111"; else mhit <= '0'; end if; end if; end process; end Behavioral;
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity camcomp64 is Port ( reset : in STD_LOGIC; start : in STD_LOGIC; h0, h1, h2, h3, h4, h5, h6, h7 : in std_logic; h8,h9,h10,h11,h12,h13,h14,h15 : in std_logic; h16,h17,h18,h19,h20,h21,h22,h23 : in std_logic; h24,h25,h26,h27,h28,h29,h30,h31 : in std_logic; h32,h33,h34,h35,h36,h37,h38,h39 : in std_logic; h40,h41,h42,h43,h44,h45,h46,h47 : in std_logic; h48,h49,h50,h51,h52,h53,h54,h55 : in std_logic; h56,h57,h58,h59,h60,h61,h62,h63 : in std_logic; h64, h65, h66, h67, h68, h69, h70, h71 : in std_logic; h72,h73,h74,h75,h76,h77,h78,h79 : in std_logic; h80,h81,h82,h83,h84,h85,h86,h87 : in std_logic; h88,h89,h90,h91,h92,h93,h94,h95 : in std_logic; h96,h97,h98,h99,h100,h101,h102,h103 : in std_logic; h104,h105,h106,h107,h108,h109,h110,h111 : in std_logic; h112,h113,h114,h115,h116,h117,h118,h119 : in std_logic; h120,h121,h122,h123,h124,h125,h126,h127 : in std_logic; mhit : out std_logic; --Found Match addr_out: out std_logic_vector(6 downto 0)); end camcomp64; architecture Behavioral of camcomp64 is begin process(reset,start,h0, h1, h2, h3, h4, h5, h6, h7, h8,h9,h10,h11,h12,h13,h14,h15, h16,h17,h18,h19,h20,h21,h22,h23, h24,h25,h26,h27,h28,h29,h30,h31, h32,h33,h34,h35,h36,h37,h38,h39, h40,h41,h42,h43,h44,h45,h46,h47, h48,h49,h50,h51,h52,h53,h54,h55, h56,h57,h58,h59,h60,h61,h62,h63,h64, h65, h66, h67, h68, h69, h70, h71 , h72,h73,h74,h75,h76,h77,h78,h79 , h80,h81,h82,h83,h84,h85,h86,h87 , h88,h89,h90,h91,h92,h93,h94,h95 , h96,h97,h98,h99,h100,h101,h102,h103 , h104,h105,h106,h107,h108,h109,h110,h111, h112,h113,h114,h115,h116,h117,h118,h119, h120,h121,h122,h123,h124,h125,h126,h127 ) begin if (reset = '1') then
mhit <= '0'; addr_out <= "0000000"; elsif (start = '0') then if (h0 = '1') then mhit <= '1'; addr_out <= "0000000"; elsif (h1 = '1') then mhit <= '1'; addr_out <= "0000001"; elsif (h2 = '1') then mhit <= '1'; addr_out <= "0000010"; elsif (h3 = '1') then mhit <= '1'; addr_out <= "0000011"; elsif (h4 = '1') then mhit <= '1'; addr_out <= "0000100"; elsif (h5 = '1') then mhit <= '1'; addr_out <= "0000101"; elsif (h6 = '1') then mhit <= '1'; addr_out <= "0000110"; elsif (h7 = '1') then mhit <= '1'; addr_out <= "0000111"; elsif (h8 = '1') then mhit <= '1'; addr_out <= "0001000"; elsif (h9 = '1') then mhit <= '1'; addr_out <= "0001001"; elsif (h10 = '1') then mhit <= '1'; addr_out <= "0001010"; elsif (h11 = '1') then mhit <= '1'; addr_out <= "0001011"; elsif (h12 = '1') then mhit <= '1'; addr_out <= "0001100"; elsif (h13 = '1') then mhit <= '1'; addr_out <= "0001101"; elsif (h14 = '1') then mhit <= '1'; addr_out <= "0001110"; elsif (h15 = '1') then mhit <= '1'; addr_out <= "0001111"; elsif (h16 = '1') then mhit <= '1'; addr_out <= "0010000";
elsif (h53 = '1') then mhit <= '1'; addr_out <= "0110101"; elsif (h54 = '1') then mhit <= '1'; addr_out <= "0110110"; elsif (h55 = '1') then mhit <= '1'; addr_out <= "0110111"; elsif (h56 = '1') then mhit <= '1'; addr_out <= "0111000"; elsif (h57 = '1') then mhit <= '1'; addr_out <= "0111001"; elsif (h58 = '1') then mhit <= '1'; addr_out <= "0111010"; elsif (h59 = '1') then mhit <= '1'; addr_out <= "0111011"; elsif (h60 = '1') then mhit <= '1'; addr_out <= "0111100"; elsif (h61 = '1') then mhit <= '1'; addr_out <= "0111101"; elsif (h62 = '1') then mhit <= '1'; addr_out <= "0111110"; elsif (h63 = '1') then mhit <= '1'; addr_out <= "0111111"; elsif (h64 = '1') then mhit <= '1'; addr_out <= "1000000"; elsif (h65 = '1') then mhit <= '1'; addr_out <= "1000001"; elsif (h66= '1') then mhit <= '1'; addr_out <= "1000010"; elsif (h67 = '1') then mhit <= '1'; addr_out <= "1000011"; elsif (h68 = '1') then mhit <= '1'; addr_out <= "1000100"; elsif (h69 = '1') then mhit <= '1'; addr_out <= "1000101"; elsif (h70 = '1') then mhit <= '1'; addr_out <= "1000110";
elsif (h71 = '1') then mhit <= '1'; addr_out <= "1000111"; elsif (h72 = '1') then mhit <= '1'; addr_out <= "1001000"; elsif (h73 = '1') then mhit <= '1'; addr_out <= "1001001"; elsif (h74 = '1') then mhit <= '1'; addr_out <= "1001010"; elsif (h75= '1') then mhit <= '1'; addr_out <= "1001011"; elsif (h76 = '1') then mhit <= '1'; addr_out <= "1001100"; elsif (h77 = '1') then mhit <= '1'; addr_out <= "1001101"; elsif (h78= '1') then mhit <= '1'; addr_out <= "1001110"; elsif (h79 = '1') then mhit <= '1'; addr_out <= "1001111"; elsif (h80 = '1') then mhit <= '1'; addr_out <= "1010000"; elsif (h81 = '1') then mhit <= '1'; addr_out <= "1010001"; elsif (h82 = '1') then mhit <= '1'; addr_out <= "1010010"; elsif (h83= '1') then mhit <= '1'; addr_out <= "1010011"; elsif (h84 = '1') then mhit <= '1'; addr_out <= "1010100"; elsif (h85 = '1') then mhit <= '1'; addr_out <= "1010101"; elsif (h86 = '1') then mhit <= '1'; addr_out <= "1010110"; elsif (h87= '1') then mhit <= '1'; addr_out <= "1010111"; elsif (h88 = '1') then mhit <= '1'; addr_out <= "1011000";
elsif (h89 = '1') then mhit <= '1'; addr_out <= "1011001"; elsif (h90 = '1') then mhit <= '1'; addr_out <= "1011010"; elsif (h91 = '1') then mhit <= '1'; addr_out <= "1011011"; elsif (h92 = '1') then mhit <= '1'; addr_out <= "1011100"; elsif (h93= '1') then mhit <= '1'; addr_out <= "1011101"; elsif (h94 = '1') then mhit <= '1'; addr_out <= "1011110"; elsif (h95 = '1') then mhit <= '1'; addr_out <= "1011111"; elsif (h96 = '1') then mhit <= '1'; addr_out <= "1100000"; elsif (h97 = '1') then mhit <= '1'; addr_out <= "1100001"; elsif (h98 = '1') then mhit <= '1'; addr_out <= "1100010"; elsif (h99 = '1') then mhit <= '1'; addr_out <= "1100011"; elsif (h100= '1') then mhit <= '1'; addr_out <= "1100100"; elsif (h101 = '1') then mhit <= '1'; addr_out <= "1100101"; elsif (h102 = '1') then mhit <= '1'; addr_out <= "1100110"; elsif (h103 = '1') then mhit <= '1'; addr_out <= "1100111"; elsif (h104 = '1') then mhit <= '1'; addr_out <= "1101000"; elsif (h105 = '1') then mhit <= '1'; addr_out <= "1101001"; elsif (h106 = '1') then mhit <= '1'; addr_out <= "1101010";
elsif (h107 = '1') then mhit <= '1'; addr_out <= "1101011"; elsif (h108 = '1') then mhit <= '1'; addr_out <= "1101100"; elsif (h109 = '1') then mhit <= '1'; addr_out <= "1101101"; elsif (h110 = '1') then mhit <= '1'; addr_out <= "1101110"; elsif (h111 = '1') then mhit <= '1'; addr_out <= "1101111"; elsif (h112 = '1') then mhit <= '1'; addr_out <= "1110000"; elsif (h113 = '1') then mhit <= '1'; addr_out <= "1110001"; elsif (h114 = '1') then mhit <= '1'; addr_out <= "1110010"; elsif (h115= '1') then mhit <= '1'; addr_out <= "1110011"; elsif (h116 = '1') then mhit <= '1'; addr_out <= "1110100"; elsif (h117 = '1') then mhit <= '1'; addr_out <= "1110101"; elsif (h118 = '1') then mhit <= '1'; addr_out <= "1110110"; elsif (h119 = '1') then mhit <= '1'; addr_out <= "1110111"; elsif (h120 = '1') then mhit <= '1'; addr_out <= "1111000"; elsif (h121 = '1') then mhit <= '1'; addr_out <= "1111001"; elsif (h122 = '1') then mhit <= '1'; addr_out <= "1111010"; elsif (h123 = '1') then mhit <= '1'; addr_out <= "1111011"; elsif (h124= '1') then mhit <= '1'; addr_out <= "1111100";
elsif (h125 = '1') then mhit <= '1'; addr_out <= "1111101"; elsif (h126 = '1') then mhit <= '1'; addr_out <= "1111110"; elsif (h127 = '1') then mhit <= '1'; addr_out <= "1111111"; else mhit <= '0'; end if; end if; end process; end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity comparator32 is Port ( data1 : in STD_LOGIC_VECTOR (31 downto 0); data2 : in STD_LOGIC_VECTOR (31 downto 0); reset : in STD_LOGIC; eqout : out STD_LOGIC); end comparator32; architecture Behavioral of comparator32 is begin process(reset,data1,data2) begin if (reset = '1') then eqout <= '0'; else if (data1 = data2) then eqout <= '1'; else eqout <= '0'; end if; end if; end process; end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity comparator64 is Port ( data1 : in STD_LOGIC_VECTOR (63 downto 0); data2 : in STD_LOGIC_VECTOR (63 downto 0); reset : in STD_LOGIC; eqout : out STD_LOGIC); end comparator64; architecture Behavioral of comparator64 is begin process(reset,data1,data2) begin if (reset = '1') then eqout <= '0'; else if (data1 = data2) then eqout <= '1'; else eqout <= '0'; end if; end if; end process; end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity llcomp is Port ( clk ,reset,start :in std_logic; data_in : in STD_LOGIC_VECTOR (63 downto 0); data_out : out STD_LOGIC_VECTOR (63 downto 0); addr_out : out STD_LOGIC_VECTOR (11 downto 0); match_hit : out STD_LOGIC_VECTOR (1 downto 0)); end llcomp; architecture Behavioral of llcomp is component xmatchpro32 port( clk, reset : in std_logic; start : in std_logic; udata : in std_logic_vector (31 downto 0); dataout : out std_logic_vector(31 downto 0); addrout : out std_logic_vector (5 downto 0);
matchhit : out std_logic); end component; signal d_out0,d_out1: std_logic_vector(31 downto 0); signal a_out0,a_out1: std_logic_vector(5 downto 0); signal mhit0, mhit1 : std_logic; begin A1: xmatchpro32 port map ( clk,reset,start,udata=>data_in(31 downto 0),dataout=>d_out0(31 downto 0),addrout=>a_out0(5 downto 0),matchhit=>mhit0); A2: xmatchpro32 port map ( clk,reset,start,udata=>data_in(63 downto 32),dataout=>d_out1(31 downto 0),addrout=>a_out1(5 downto 0),matchhit=>mhit1); process(mhit1,mhit0,a_out1,a_out0,d_out1,d_out0)-- also can add a_out0,a_out1,d_out0,d_out1 begin if mhit1='0' then if mhit0='0' then data_out<=d_out1 & d_out0; else data_out<=d_out1 & "00000000000000000000000000000000"; addr_out<="000000" & a_out0; end if; elsif mhit1='1' then if mhit0='0' then data_out<="00000000000000000000000000000000" & d_out0; addr_out<=a_out1 & "000000"; else addr_out<= a_out1 & a_out0; end if; end if; match_hit<= mhit1 & mhit0; end process; end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity xmatchpro32 is Port ( clk : in STD_LOGIC; reset : in STD_LOGIC; start : in STD_LOGIC; udata : in STD_LOGIC_VECTOR (31 downto 0); dataout : out STD_LOGIC_VECTOR (31 downto 0); addrout : out STD_LOGIC_VECTOR (5 downto 0); matchhit : out STD_LOGIC); end xmatchpro32; architecture Behavioral of xmatchpro32 is
component cam32 port (clk : in std_logic; -- Clock reset : in std_logic; -- Reset start : in std_logic; -- Write datain : in std_logic_vector(31 downto 0);-- Tag Data dataout : out std_logic_vector(31 downto 0);-- Data out mh0, mh1, mh2, mh3, mh4, mh5, mh6, mh7 : inout std_logic; mh8,mh9,mh10,mh11,mh12,mh13,mh14,mh15 : inout std_logic; mh16,mh17,mh18,mh19,mh20,mh21,mh22,mh23 : inout std_logic; mh24,mh25,mh26,mh27,mh28,mh29,mh30,mh31 : inout std_logic; mh32,mh33,mh34,mh35,mh36,mh37,mh38,mh39 : inout std_logic; mh40,mh41,mh42,mh43,mh44,mh45,mh46,mh47 : inout std_logic; mh48,mh49,mh50,mh51,mh52,mh53,mh54,mh55 : inout std_logic; mh56,mh57,mh58,mh59,mh60,mh61,mh62,mh63 : inout std_logic; matchhit : out std_logic; --Found Match addr_out : out std_logic_vector(5 downto 0)); end component; begin u3:cam32 port map (clk => clk, reset => reset, start => start, datain => udata, dataout => dataout, mh0 => open, mh1 => open, mh2 => open, mh3 => open, mh4 => open, mh5 => open, mh6 => open, mh7 => open, mh8 => open, mh9 => open, mh10 => open, mh11 => open, mh12 => open, mh13 => open, mh14 => open, mh15 => open, mh16 => open, mh17 => open, mh18 => open, mh19 => open, mh20 => open, mh21 => open, mh22 => open, mh23 => open, mh24 => open, mh25 => open, mh26 => open, mh27 => open, mh28 => open, mh29 => open, mh30 => open, mh31 => open, mh32 => open, mh33 => open, mh34 => open, mh35 => open, mh36 => open, mh37 => open, mh38 => open, mh39 => open, mh40 => open, mh41 => open, mh42 => open, mh43 => open, mh44 => open, mh45 => open, mh46 => open, mh47 => open, mh48 => open, mh49 => open, mh50 => open, mh51 => open, mh52 => open, mh53 => open, mh54 => open, mh55 => open, mh56 => open, mh57 => open, mh58 => open, mh59 => open, mh60 => open, mh61 => open, mh62 => open, mh63 => open, matchhit => matchhit, addr_out => addrout); end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity xmatchpro64 is
Port ( clk : in STD_LOGIC; reset : in STD_LOGIC; start : in STD_LOGIC; udata : in STD_LOGIC_VECTOR (63 downto 0); dataout : out STD_LOGIC_VECTOR (63 downto 0); addrout : out STD_LOGIC_VECTOR (6 downto 0); matchhit : out STD_LOGIC); end xmatchpro64; architecture Behavioral of xmatchpro64 is component cam64 port (clk : in std_logic; -- Clock reset : in std_logic; -- Reset start : in std_logic; -- Write datain : in std_logic_vector(63 downto 0);-- Tag Data dataout : out std_logic_vector(63 downto 0);-- Data out mh0, mh1, mh2, mh3, mh4, mh5, mh6, mh7 : inout std_logic; mh8,mh9,mh10,mh11,mh12,mh13,mh14,mh15 : inout std_logic; mh16,mh17,mh18,mh19,mh20,mh21,mh22,mh23 : inout std_logic; mh24,mh25,mh26,mh27,mh28,mh29,mh30,mh31 : inout std_logic; mh32,mh33,mh34,mh35,mh36,mh37,mh38,mh39 : inout std_logic; mh40,mh41,mh42,mh43,mh44,mh45,mh46,mh47 : inout std_logic; mh48,mh49,mh50,mh51,mh52,mh53,mh54,mh55 : inout std_logic; mh56,mh57,mh58,mh59,mh60,mh61,mh62,mh63 : inout std_logic; mh64, mh65, mh66, mh67, mh68, mh69, mh70, mh71 : inout std_logic; mh72,mh73,mh74,mh75,mh76,mh77,mh78,mh79 : inout std_logic; mh80,mh81,mh82,mh83,mh84,mh85,mh86,mh87 : inout std_logic; mh88,mh89,mh90,mh91,mh92,mh93,mh94,mh95 : inout std_logic; mh96,mh97,mh98,mh99,mh100,mh101,mh102,mh103 : inout std_logic; mh104,mh105,mh106,mh107,mh108,mh109,mh110,mh111 : inout std_logic; mh112,mh113,mh114,mh115,mh116,mh117,mh118,mh119 : inout std_logic; mh120,mh121,mh122,mh123,mh124,mh125,mh126,mh127 : inout std_logic; matchhit : out std_logic; --Found Match addr_out : out std_logic_vector(6 downto 0)); end component; begin u3:cam64 port map (clk => clk,reset => reset,start => start,datain => udata,dataout => dataout, mh0 => open, mh1 => open, mh2 => open, mh3 => open, mh4 => open, mh5 => open, mh6 => open, mh7 => open, mh8 => open, mh9 => open, mh10 => open, mh11 => open, mh12 => open, mh13 => open, mh14 => open, mh15 => open, mh16 => open, mh17 => open, mh18 => open, mh19 => open, mh20 => open, mh21 => open, mh22 => open, mh23 => open, mh24 => open, mh25 => open, mh26 => open, mh27 => open, mh28 => open, mh29 => open, mh30 => open, mh31 => open, mh32 => open, mh33 => open, mh34 => open, mh35 => open, mh36 => open, mh37 => open, mh38 => open, mh39 => open, mh40 => open, mh41 => open, mh42 => open, mh43 => open, mh44 => open, mh45 => open, mh46 => open, mh47 => open, mh48 => open, mh49 => open, mh50 => open, mh51 => open, mh52 => open, mh53 => open, mh54 => open, mh55 => open,
mh56 => open, mh57 => open, mh58 => open, mh59 => open, mh60 => open, mh61 => open, mh62 => open, mh63 => open, mh64 => open,mh65 => open,mh66 => open,mh67 => open,mh68 => open, mh69 => open,mh70 => open,mh71 => open,mh72 => open,mh73 => open, mh74 => open,mh75 => open,mh76 => open,mh77 => open,mh78 => open, mh79 => open,mh80 => open,mh81 => open,mh82 => open,mh83 => open, mh84 => open,mh85 => open,mh86 => open,mh87 => open,mh88 => open, mh89 => open,mh90 => open,mh91 => open,mh92 => open,mh93 => open,mh94 => open, mh95 => open,mh96 => open,mh97 => open,mh98 => open,mh99 => open,mh100 => open, mh101 => open,mh102 => open,mh103 => open,mh104 => open,mh105 => open,mh106 => open, mh107 => open,mh108 => open,mh109 => open,mh110 => open,mh111 => open,mh112 => open, mh113 => open,mh114 => open,mh115 => open,mh116 => open,mh117 => open,mh118 => open, mh119 => open,mh120 => open,mh121 => open,mh122 => open,mh123 => open,mh124 => open,mh125=> open, mh126 => open,mh127 => open, matchhit => matchhit, addr_out => addrout); end Behavioral;

04 32 Bit Loss Less Comp

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

04 32 Bit Loss Less Comp

Enviado por

Direitos autorais:

Formatos disponíveis

www.1000projects.com www.fullinterview.com www.chetanasprojects.

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

2 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

Less area/volume and therefore, compactness Less power consumption

3 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

1.4 Classification of ICS by device count:

4 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com Active Device Count 1-100

Functions Gates, op-amps, Many linear Applications Registers, filters etc

Bipolar Bipolar like

TTL, ECL MOS: NMOS,PMOS

Microprocessors Memories, computers, Signal processors

components integrated (embedded) on an IC. is measured in terms of microns.

1.5 VLSI Design Flow:

VLSI Design Flow:

RTL description (VHDL)

Functional Verifying &testing

Logic Verification& testing

Flour Planning Automatic Planning & Routing

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

7 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

INTRODUCTION TO VHDL 1.6.1 What is VHDL?

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

1.6.4 VHDL Features:

10 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

1.7 Advantages of VHDL:

11 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

12 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

13 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

INTRODUCTION TO LOSSLESS COMPRESSION

2.3.1. Compression Techniques

2.3.2. Previous work on Lossless Compression Methods

2.3.2.2. Dictionary Methods

2.4. XMatchPro Based System

20 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

22 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

FUNCTIONS OF LOSSLESS COMPRESSION

3.2. Proposed Method

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

3.4. Usage of XMatchPro Algorithm

www.1000projects.com www.fullinterview.com www.chetanasprojects.com output any required literal characters of T; } ELSE {

output 1; output tuple T;

3.4.1. The XMatchPro based Compression system

3.4.2. The Main Component- Content Addressable Memory

www.1000projects.com www.fullinterview.com www.chetanasprojects.com throughput.

29 www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

XMATCHPRO LOSSLESS COMPRESSION SYSTEM

4.2. FUNCTIONAL DESCRIPTION

4.3 Parallel Xmatchpro Compression

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

4.4. DATA FLOW FOR PARALLEL XMATCHPRO COMPRESSOR

4.5. INPUT ROUTING

www.1000projects.com www.fullinterview.com www.chetanasprojects.com

35 www.1000projects.com www.fullinterview.com www.chetanasprojects.com