Performance Scaling of Individual SPEC INT 2006 Results for AMD Processors

 

 

Abdul KAREEM P. and R. A. SINGH

 

Department of Physics and Electronics, Dr H S Gour University, Sagar, India 470003

E-mail: kareemskpa@hotmail.com

 

 

Abstract

High performance is a critical requirement to all microprocessors manufacturers. In this paper we describe the performance scaling trends in AMD Opteron 2000+ and AMD Opteron 8000+ series processors.  The micro architecture of these processors is implemented using the basis of a new family of processors to AMD x86-64 processors. These processors can provide a performance boost for many key application areas in modern generation. We analyze the scaling of performance in two major series of AMD processors (AMD Opteron 2000+ and AMD Opteron 8000+ Series) by using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. Our results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.

Keywords

Processor Performance, Benchmarks, Moore’s Law

 

 

Introduction

 

AMD Athlon™ processors are members of a new family of seventh-generation AMD processors designed to meet the computation-intensive requirements of cutting-edge software applications running on high-performance desktop systems, workstations, and servers with most advanced x86-64 architecture. It incorporates the most new microarchitectural features including a more than 200MHz system bus, a high-performance cache architecture and enhanced 3DNow!™ technology [1].

AMD Athlon processors are manufactured on AMD’s robust 0.18-micron aluminum process technology and on AMD’s leading-edge HiP6L 0.18-micron process technology featuring copper interconnects. The approximately 37-million-transistor new AMD Athlon processor has a die size of 120 mm2 on 0.18-micron technology [2].

 

                        Scope of This Study

            The analysis presented in this paper examines the scaling of performance in different series of AMD processors (AMD Opteron 2000+ and AMD Opteron 8000+ Series) which are fabricated for the requirement of the modern generation utility. Furthermore, contrary to prior work we not only quantify the performance prediction of the processors, but also have evaluated scalability of the Memory Wait Time which degraded the performance of the processor by using a simple statistical correlation technique. This analysis is more useful to performance engineers, scientists and developers to better understand how the performance scaling can be exploited in modern generation processors.

 

 

Processor Performance Trends

 

The performance of modern processors is rapidly increasing as both clock frequency and the number of transistors required for a given implementation grow. Moore’s Law says that the device density of the processor double in every 18 months. Figure 1 shows the transistor count per die of processors introduced by AMD over the past 35 years [3, 4, 5].

The processor that performs a given task in the least amount of time has the highest performance. Increased performance implies reduced execution time; the performance of the processor is given by

Performance = IPC Frequency (IPC-instructions per clock) [6].

Figure 1. Moore’s Law for microprocessor transistor counts assuming a starting point of 1960 and doubling time of 18 months.

 

 

Benchmarks

 

Benchmarks are used for the performance evolution of the processors. There are different types of benchmarks available. Among all SPEC, HINT, and TPC are most important and popular benchmarks for performance evolution. SPEC is a nonprofit corporation formed to establish, maintain, and endorse a standardized set of benchmarks. SPEC’s member-ship includes computer hardware and software vendors, leading universities, and research facilities worldwide. SPEC CPU2006 is designed to provide a comparative measure of compute-intensive performance across a range of hardware. Comprised of two suites of benchmarks, SPEC CPU2006 gauges compute-intensive integer performance with CINT2006 and measures floating-point performance with CFP2006. CINT2006 and CFP2006 results are presented as ratios, which are calculated using a reference time determined by SPEC and the runtime of the benchmark higher scores indicate better performance [7].

The SPEC CPU2006 suite contains 18 floating-point programs (Some programs are written in C and some in FORTRAN) and 13 integer programs (8 written in C, 4 in C++ and 1 in ANSI C). Table.1 and Table 2 provides a list of the benchmarks in SPEC CPU2006 suite. The SPEC CPU2006 benchmarks replace the SPEC89, SPEC92, SPEC95 and SPEC CPU 2000 benchmarks [7, 8, 9, 10].

Table 1. The CINT 2006 Suite Benchmarks

S. No

Integer Benchmark

Language

Description

1

400.perlbench

C++

PERL Programming Language

2

401.bzip2

C

Data Compression

3

403.gcc

C

C Language Optimizing Compiler

4

429.mcf

C

Combinatorial Optimization

5

445.gobmk

C

Artificial Intelligence : Game Playing

6

456.hmmer

C

Search a Gene Sequence Database

7

458.sjeng

C

Artificial Intelligence : Chess

8

462.libquantum

C

Physics / Quantum Computing

9

464.h264ref

C

Video Compression

10

471.omnetpp

C++

Discrete Event Simulation

11

473.astar

C++

Path – Finding Algorithm

12

483.xalancbmk

C++

XSLT Processor

13

998.specrand

ANSI C

-

 

Table 2. The CFP2006 Suite Benchmarks

S.No

Floating Point Benchmark

Language

Description

1

410.bwaves

Fortran – 77

Computational Fluid Dynamics

2

416.gamess

Fortran

Quantum Chemical Computations

3

433.milc

C

Physics / Quantum Chromo Dynamics

4

434.zeusmp

Fortran – 77

Physics / Magneto Hydro Dynamics

5

435.gromacs

C/Fortran

Chemistry / Molecular Dynamics

6

436.cactusADM

C / Fortran-90

Physics / General Relativity

7

437.leslie3d

Fortran – 90

Computational Fluid Dynamics

8

444.namd

C++

Scientific, Structural Biology, Classical Molecular Dynamics Simulation.

9

447.dealII

C++

Solution of Partial Differential Equations using the Adaptive Finite Element Method.

10

450.soplex

C++

Simplex Linear Programming Solver

11

453.povray

C++

Computer Visualization / Ray Tracing

12

454.calculix

C/Fortran-90

Structural Mechanics

13

459.GemsFDTD

Fortran-90

Computational Electromagnetic

14

465.tonto

Fortran-95

Quantum Crystallography

15

470.lbm

C

Computational Fluid Dynamics

16

481.wrf

C/Fortran – 90

Weather Processing

17

482.sphinx3

C

Speech Recognition

18

999.specrand

ANSI C

Mine Canary

 

Analysis of Benchmark Results

            In this study we utilize the integer benchmarks from the newly released SPEC CPU2006 suite [7] for the performance evolution of AMD Opteron 2000+ series processors and AMD Opteron 8000+ series processors under the same operating conditions. By using their performance numbers and frequency we have calculated Task Completion time, and plotted graphs in between the benchmark runtime vs. CPU core clock period for AMD Opteron 2000+ series processors and AMD Opteron 8000+ series Processors. The Scalability of AMD Opteron series Processors is shown in Fig.2 and Fig.3.

We have used the “Core Clock Period”, or “Clock Cycle” as the value characterizing the particular processor speed grade, expressed in nanoseconds (ns). The conversion formula is very simple: 1000/f [MHz] = clock period [ns]. Processor Performance Equation is calibrated using the expression

Y = [Processor time] = [Processor clock cycles for a program] x [Clock cycle time] = Ax.

            The proportionality coefficient A is simply the number of processor clocks it takes to perform the task. The total number of instructions in the particular task (Instruction Count, IC) is known, and then dividing this coefficient by IC will give us the well-known microarchitectural parameter called CPI, “Clocks per Instruction” [11]

A = [Processor clock cycles for a program] = IC x CPI.

 

Figure 2. Scalability of AMD Opteron 2000+ Series Processors on SPECint2006

 

Figure 3. Scalability of AMD Opteron 8000+ Series Processors on SPECint2006

 

 

Results and Discussion

 

The performance of the processor is calculated by using the relation, Performance = Core Utilization Time % / Memory Wait Time %. The increase in performance of processors in different series, AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors was studied by using performance numbers published from SPEC CPU 2006 [7]. We have calculated the scaling of Task Completion Time (s) with respect to Core Clock (ns).

Figure 4. Variation of Memory wait time and Core time in AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors

 

The scaling of Memory wait time and Core time in AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors was calibrated. It shows that AMD Opteron 8000+ Series processors shows maximum performance by showing its least memory wait time. From Fig.4  the AMD Opteron 8000+ series processors shows 85.5% core utilize time and 14.5 % memory wait time where as AMD Opteron 2000+ series processors shows 40.3% core utilize time and 59.7 % memory wait time on SPECint 2006 Benchmark suit. Performances of the different series processors are compared by normalizing the performance of AMD Opteron 2000+ Series Processor, which predicts that the performance of AMD Opteron 8000+ series processors was 8.7 times faster as compared to AMD Opteron 2000+ (Fig.5).  This method was so reliable to compare the performance of the modern processors without prior knowledge of the benchmark source code or detailed execution traces.

 

Figure 5. Comparison of performance of AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors

 

 

Acknowledgement

 

The authors would like to thank Prof. D. K. Gautam, Head, Department of Electronics, North Maharastra University, Jalgaon, (M.S), India, for many stimulating comments and discussions. The authors would like to thank Alex Predtechenski, Advanced Micro Devices for SPEC Benchmark analysis. One of the authors (A.K.P.) gratefully acknowledges financial support of UGC for a meritorious research fellowship. He (A.K.P) is also thankful to Fazal Noor Basha for helpful discussions.

 

 

References

 

1.       AMD Athlon Processor Architecture, A white paper, 28 Aug 2000, can be downloaded from http://www.amd.com.

2.       AMD ×86-64 Architecture Manuals, http://www.amd.com

3.       I. Tuimi, The Lives and Death of Moore’s Law, First Monday, Oct 11, 2002.

4.       www.dell.com/powersolutions

5.       Lilja, David J., Measuring Computer Performance: A Practitioner's Guide, Cambridge University Press, New York, NY, 2000.

6.       Understanding Processor Performance, A White paper, August 24, 2001, can be downloaded from http://www.amd.com

7.       Standard performance evaluation corporation (SPEC), Benchmarks, http://www.spec.org

8.       SPEC CPU2000 Press Release FAQ, available at http://www.spec.org/osg/cpu2000/press/ faq.html

9.       A. Klein Osowski and D. Lilja. Minne SPEC: A new SPEC benchmark workload for simulation- based computer architecture research. Computer Architecture Letters, Volume 1, June 2002.

10.   John L. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Computer, July 2000.

11.   Alex Predtechenski, AMD “A Method for Benchmarks Analysis” can be downloaded from http://www.spec.org/events/specworkshop/abs.html,2006