Amd Fusion: Mithun.M

AMD FUSION
MITHUN.M VII Sem, B.Tech, CSE Department, NCERC, Thrissur, Kerala.
ABSTRACT
AMD Fusion is the first fusion processor and it is a combination of graphical processing unit (GPU) and central processing unit (CPU) called accelerating processing unit (APU). Hybrid CPU and graphics are given a full workload and it can smoothly perform decode of video 1080p Blueray, calculate Pi up to decimal 32 million and generate particle effects using the graphic. There are two flavors of Fusion currently or nearly available, one with its CPU logic based on the bobcat core and the other its CPU logic based on the k10 core.
Introduction Khan and Kandu. (2011) documented that the semiconductor industry prides itself on rapid improvements in system performance, but hardware that runs fast enough to enable these advanced capabilities still costs far too much to enable high-volume deployment. Every two years, advances in semiconductor technology allow chip architects to double the number of transistors that can fit in a given area of silicon. Over the past decade, these extra transistors have been used to increase the size of on chip caches and add more x86 processor cores to designs, making todays CPUs the fastest processors ever. Traditional CPU architectures and application programming tools optimized for scalar data structures and serial algorithms fit poorly with these new vector-oriented, multi-threaded data-parallel models. Advanced micro device (AMD) fusion is the marketing name for a series of APUs by AMD. There are two flavors of Fusion currently or nearly available, one with its CPU logic based on the bobcat core and the other its CPU logic based on the k10 core. In both cases the GPU logic is high definition (HD) 6, which itself is based on the mobile variant of the radeon HD 5 Series. Fusion was announced in 2006 and has been in development. The final design is the product of the merger between AMD and ATI, combining general processor execution as well as 3D geometry processing and other functions of modern GPUs into a single
unit. Graphics processing units (GPUs), originally intended to enhance three dimension (3D) visualization, have evolved into powerful, programmable vector processors that can accelerate a wide variety of software applications. Software tools like direct compute and OpenCL permit developers to create standards based applications that combine the power of CPU cores and programmable GPU cores, and run on a wide variety of hardware platforms. Advanced micro devices (AMD) is the forthcoming accelerated processing units (APU) build upon this momentum and takes personal computer (PC) computing to the next level. These new processors are being designed to accelerate multimedia and vector processing applications, enhance the end users PC experience, reduce power consumption, and offer a superior visual graphics experience at mainstream system price points. Accelerated Processing Unit As pointed out by Johnson et al. (2011) the new accelerated processing unit (APU) of AMD combine general-purpose x86 CPU cores with programmable vector processing engines on single silicon die. AMDs APUs also include a variety of critical system elements, including memory controllers, input or output controllers, specialized video decoders, display outputs, and bus interfaces, but real appeal of these chips stems from the inclusion of both scalar and vector hardware as full fledged processing elements. AMDs APUs is set to arrive in a variety of shapes and sizes adapted to the requirements of their target markets. Both of these APUs will combine multiple superscalars x86 processor cores with an array of programmable single instruction multiple data (SIMD) engines leveraged from AMDs discrete graphics portfolio. The key aspect is that all the major system elements x86 cores, vector (SIMD) engines, and a unified video decoder (UVD) for HD decoding tasks attach directly to the same high speed bus, and thus to the main system memory. This design concept eliminates one of the fundamental constraints that limit the performance of traditional integrated graphics controllers (IGPs). The transistor budget constraints typically mandated a two chip solution for such systems. System architecture use the chip to chip crossing between the
memory controller and either the CPU or GPU. These transfers affect memory latency, consume system power and thus impact battery life. The APUs scalar x86 cores and SIMD engines share a common path to system memory to avoid these constraints. Total system performance can be further enhanced through the addition of a discrete GPU. The common architectures of the APU and GPU allow for a multi GPU configuration where the system can scale to harness all available resources for exceptional graphics and enable truly breathtaking overall performance. Although the APUs scalar x86 cores and SIMD engines share a common path to system memory, AMDs first generation implementations divide that memory into regions managed by the operating system running on the x86 cores and other regions managed by software running on the SIMD engines. AMD provides high speed block transfer engines that move data between the x86 and SIMD memory partitions. Unlike transfers between an external frame buffer and system memory, these transfers never hit the systems external bus. Software developers can overlap the loading and unloading of blocks in the SIMD memory with execution involving data in other blocks. Insight 64 anticipates that future APU architectures will evolve towards a more seamless memory management model that allows even higher levels of balanced performance scaling. AMDs architecture have woven x86 cores and GPU cores into a single hardware fabric, software can now begin to weave high performance vector algorithms into programs, which is previously constrained by the limited computational capabilities of conventional scalar processors, even though it is arranged in multi-core configurations. Fusion system architecture Maniatakos et al. (2011) noted that the AMDs new series is a CPU and GPU on a chip (APU).The series is just the first step in unifying CPU and GPU architectures in such a way that it see as a single, multi-purpose processing unit. Its called the fusion system architecture (FSA). Very large scale instruction word (VLIW) has finally been replaced with multiple instruction multiple data (MIMD), but instead of a purely scalar setup, the FSA will use a combination of vector and scalar units which should massively
improve its ability to deal with varied and general processing tasks. SIMD is still supported and everything will be massively parallel. These features will be wrapped up into AMD, and then it is called as compute units (CU), which are entirely discrete cores that even come with its own blocks of L1 cache. These compute units will have x86, 64bit memory controllers able to read and write from installed system RAM. It makes AMDs new chips incredibly flexible. Nvidia is to make its chips suitable for general processing graphical processing unit (GPGPU) tasks. The CPU like fusion system architecture was microsofts C++ accelerated massive parallelism (C++ AMP), which will allow developers to program GPUs in the same way that programming for CPUs. C++ AMP isnt designed to replace architecture specific platforms like Nvidias CUDA rather it is designed to allow developers easy access to GPUs from a build environment. Over in reduced instruction set computing (RISC) and AMD are now actually similar with the fusion system architecture. Fusion System Architecture is shown in Figure.1.
Figure.1. Fusion system architecture The fusion system architecture is designed to enable developers to write programs that transparently use whatever hardware is under the hood exactly like the apps found on ARM powered android devices and iphones. Instead of going with a purely scalar setup
like NVIDIA, it opted for a vector plus scalar solution. The new architecture revolves around the compute unit (CU), which contains all of the functional units. The CU can almost be viewed as a fully independent processor. The unit features its own level1 (L1) cache, branch and message unit, control and decodes unit, instruction fetch arbitration functionality, and the scalar and vector units. The vector units are the primary workers in the CU. Each unit contains four cores, and allows for four wave fronts to be processed at any one time. Since AMD is stepped away from the VLIW5/4 architectures, and have gone with a vector scalar setup. The scalar unit will actually be responsible for all of the pointer as well as branching code. This particular setup harkens back to the cray supercomputers of the 1980s. The combination of scalar and vector processors was very in the combination of these processors and the overall design of each CU gives it the properties of different types of units. It is a multiple instructions multiple data (MIMD) in that can address four threads per cycle per vector, from different apps. It acts as a single instruction multiple data (SIMD) much like the previous generation of GPUs. Finally it has symmetric multithreading (SMT) so that all four vector cores can be working on different instructions, and there are 40 waves active in each CU at any one time. command streams. Memory and Caches Each CU has its own L1 cache divided into data, instruction, and load or store. The GPU then has shared L2 cache which is fully coherent. Each L1 cache has a 64 bit interface with the L2, and once this scales in terms of both CU count and GPU clock speed. It can be expected to see the multiple terabytes per second of bandwidth between the caches. The L1 caches and texture caches are now read or write, as compared to the read only units in previous architectures. This is a big node not only to efficiency and performance, but also the type of caches needed for some serious compute type workloads. The next level of memory support is that of full virtualizations of memory with the CPU. It supports multiple asynchronous and independent
Previous generations of products were limited by memory and cache. This posed some limitations on not just content in graphics, but was also problematic in compute type scenarios. Large data sets proved to be troublesome, and required a memory virtualization system which was separate from the CPUs virtual memory. By adopting x86, 64 virtual memory supports on the GPU, this gets rid of a lot of the problems in previous cards. Graphical Processing Unit The GPU shares the virtual memory space, which improves data handling and locality, as well as gracefully surviving things like page faults and oversubscriptions. This again is aimed at helping to improve the programming model. With virtual memory, the GPUs state is not hidden, and it should also allow for fast context switches as well as context switch pre emption. State changes and context switches can be quite costly, so when working in an environment that features both graphics based and compute workloads, the added features described above should make things go a whole lot smoother, as well as be significantly faster, thereby limiting the amount of downtime per compute unit. It also opens up some new advantages to traditional graphics. Megatextures, which will not fit on a cards frame buffer, can be stored in virtual memory. While not as fast as onboard, it is still far faster than loading up the texture from the hard drive. This memory virtualization will be shared between discrete GPUs as well as integrated parts. The GPU have access to the CPU memory controller, and in the fusion parts it is actually given priority over the CPU. The current quad core CPUs really only take up around 8 to 12 GB per second of bandwidth when fully loaded. AMD reworked their memory controller, and it can feed upwards of 30 GB per second of data to the GPU. Stream benchmarks will not show this kind of utilization, but in testing there are very distinct performance improvements in graphics applications by going from 1.3Ghz speed upto 1.8Ghz , speed supported by the new AMD processors.The CPU can handle the more serial operations, while the GPU goes for the
highly parallel. With the shared virtual memory space, the CPU and GPU can schedule work for each other. Vector Processors Jachan et al. (2009) explaned that vector processors like those used in advanced GPUs have dozens, and sometimes hundreds of calculating units that operate simultaneously. When an application wants to add two thousand element vectors using ten of the systems available processing units, the vector software restructures the work so that each calculation executes simultaneously on ten separate elements, and thus completes the work in as little as one tenth of the time. The operations applied to any element in each vector can be performed independently of operations applied to other elements in that same vector. For small data arrays, the overhead associated with setting up vector operations can outweigh the time saved through parallel execution. Many problems and algorithms have proven a poor fit for this technology, and are best handled using scalar approaches. AMDs accelerated processing unit designs are constructed to cut scalar workloads using AMDs proven x86 core technology and through vector workloads use enhanced versions of its GPU technology. Although AMD had to overcome many technical challenges by merge its vector and scalar technologies in a manner that preserves the advantages of both scalar and vector technologies. These two have the core IP for processing elements provided in this AMD hardware with a significant advantage over other hardware designs. Conclusion The AMD fusion family of accelerated processing units is scheduled to arrive in 2011. Its compatibility with Windows 7 and DirectX 11 will ensure that it will provide an outstanding experience for those who purchase PCs based on these processors. Its processing power and power efficiency will enable sharp and clear videos, realistic and responsive games, and notebooks that can run longer between battery charges. The processing power of AMD fusion processors will allow tackle the problems that lie beyond the capabilities of todays mainstream systems. It will enable to step up and
update existing applications or invent new programs, things that take advantage of GPU acceleration. These features will be a standard part of every APU. The dramatic increase in performance enabled by AMD fusion technology can create new opportunities for entrepreneurial developers to innovate and make the world a better and richer place. References: 1. Jachan.M, G.Matz and F.Hlawatsch, (2009). Vector Time Frequency AR Models for Nonstationary Multivariate Random Processes IEEE Transaction on Signal Processing, 57, 12, pp. 4646-4658. 2. Johnson.M.K, K.Dale, S.Avidan, H.Pfister, W.T.Freeman and W.Matusik., (2011). CG2Real:Improving the Realism of Computer Generated Images Using a Large Collection of Photographs, IEEE Transaction on Visualization and Computer Graphics, 17,5, pp.1273-1283 3. Khan.O and S.Kandu, (2011). Hardware/Software Codesign Architecture for Online Testing in Chip Multiprocessors, IEEE Transactions on Dependable and Secure Computing, 8, 5, pp. 714-724
4. Maniatakos.M, N.Karimi, C.Tirumurti, A.Jas and Y.Makris, (2011). InstructionLevel Impact Analysis of Low-Level Faults in a Modern Microprocessor Controller, IEEE Transactions of Computers, 60, 9, pp.1260-1272.

Amd Fusion: Mithun.M

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Amd Fusion: Mithun.M

Enviado por

Direitos autorais:

Formatos disponíveis

AMD FUSION

MITHUN.M VII Sem, B.Tech, CSE Department, NCERC, Thrissur, Kerala.

Você também pode gostar