Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

 

Abdul Kareem PARCHUR1,*, Kuppangari Krishna RAO2, Fazal NOORBASHA1 and Ram Asaray SINGH1

 

1Department of Physics and Electronics, Dr. H S Gour University, 470003, Sagar, India. 2University Computer Centre, Dr. H S Gour University, 470003, Sagar, India.

E-mail: kareemskpa@hotmail.com

(*Corresponding author: +91-9907048098)

 

Received: 30 August 2010 / Accepted: 20 December 2010 / Published: 24 December 2010

 

 

Abstract

As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI) benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT) @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.

Keywords

Processor Performance; Benchmarks; Artificial Intelligence (AI)

 

 

Introduction

 

In past few years tremendous changes have undergone in the architecture of the processor which reflects an enhancement in the performance of the processors. Now a day Artificial Intelligence (AI) workload applications are becoming an increased common processor workload. Some of AI application includes games playing and chess etc. The amount of utilization of AI programs is increased ore significantly. Typically many users have huge amount of AI application programs on their desktop computers. Today’s desktop computers perform/execute variety of tasks simultaneously due to its efficient architecture. It includes multimedia applications, data compression, scientific computing simulations and discrete event simulations etc. In recent days many amendment have been done in the architecture of the processor for optimum performance [1]. However, performance of processors becomes a critical requirement for processor manufacturer. Intel and AMD launch new architecture and working technologies like AMD’s 3Dnow! technology and SUN Microsystems VIS [2]. In recent years Intel releases its high efficient processor “Intel Core 2 Duo”. Manufacturer also increases the clock frequency and multi cores in deep submicron level in a single die, i.e., processors achieves higher chip level Instruction-Per-Cycle (IPC). Intel Core 2 Duo processors are manufactured using 65 nm technology with more than 291 million CMOS transistors.

This paper demonstrates the behavior of artificial intelligence (AI) workload on Intel Core 2 Duo series processors. We estimate the similarities and dissimilarities between the SPEC CPU INT 2006 benchmark programs and also identified the hotspots in the benchmarks memory space.

 

 

Material and Method

 

We used SPEC CPU INT 2006 benchmark scores of Intel Core 2 Duo E6300, E6400, E6700, E6750, E6850, E8500, T7100, T7400, T7600, T7700 and T9500 processors for the analysis (11 processors) The detailed methodology for calculating task completion time at 1GHz, 2 GHz and 3 GHz are explained in our previous studies [3, 4]. The SPEC CPU2006 suite contains 18 floating-point programs (Some programs are written in C and some in FORTRAN) and 13 integer programs (8 written in C, 4 in C++ and 1 in ANSI C). The SPEC CPU2006 benchmarks replace the SPEC89, SPEC92, SPEC95 and SPEC CPU 2000 benchmarks [5, 6]. There are two AI integer benchmark programs 445.gobmk (Artificial Intelligence: Game Playing) and 458.sjeng (Artificial Intelligence: Chess) in SPEC CPU INT 2006 benchmark suite. The Scalability of Intel Core 2 Duo series Processors is shown in Figure 1. Extrapolation of the runtime trendlines down to zero core clock periods gives basis for useful interpretation of system behavior. The trendlines is fitted with R2 = 0.9215. It touches the task completion time axis at ~1337 sec. The negative component of task completion time indicates the negligible system memory wait time.

The task completion time (TCT) for 1 GHz, 2 GHz and 3 GHz for 12 SPEC CPU INT 2006 benchmarks is shown in Table 1. The two AI benchmarks 445.gobmk and 458.sjeng shows similar behavior on Intel Core 2 Duo series with R2 = 97%. However, 458.sjeng benchmark shows more execution time as compared to 445.gobmk benchmark. The TCT of Intel Core 2 Duo series processor are 5% and 7.4% @1GHz, 9.5% and 13.7% @2GHz and 13.6% and 19.3% @ 3GHz for 445.gobmk and 458.sjeng benchmarks respectively.

It is also found that AI benchmarks shows more task completion time as compared to data compression, C Optimizing compiling and Physics quantum computing. Physics/Quantum Computing benchmark (462.libquantum) shows less task completion time. Intel Core 2 Duo series processors are most efficient for simulating Physics/Quantum Computing programs (like Gaussian 03W). As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec. The decrease in execution is due to wider pipelines, more functional units, and a shared L2 cache memory. However, the scaling of performance for each individual benchmark application is not uniform. The other benchmarks in the suite 471.omnetpp and 403.gcc has similar execution time even though the %TCT of 471.omnetpp is very high. Different benchmarks have similar and dissimilar characteristics in memory space. To identify the similarities between the benchmarks we used principal component analysis (PCA).

 

 

Results and Discussion

 

The benchmark runtime vs. core clock period shows scaling of performance of Intel Core 2 Duo series processors is presented in Figure 1 while the results for studied processor frequencies are presented in Table 2.

Figure 1. The benchmark runtime vs. core clock period shows scaling of performance of Intel Core 2 Duo series processors, Extrapolation of the runtime trendlines down to zero core clock period gives basis for useful interpretation of system behavior

 

Table 1. Percentage of TCT for Intel Core 2 Duo processors @1GHz, @2GHz and @3GHz processor frequency

Core Frequency

RESULTS

Base score

A

B

R2

% of TCT

Core Clock (ns)

Slope

Intercept

RSQ (%)

@3GHz (%)

@2GHz (%)

@1GHz (%)

400.perlbench

911

200

89.30

39.80

30.60

18.00

401.bzip2

2074

-105

99.12

-18.00

-11.30

-5.40

403.gcc

2161

-234

71.03

-48.00

-27.60

-12.10

429.mcf

1112

-44

91.44

-13.40

-8.60

-4.10

445.gobmk

1401

74

96.64

13.60

9.50

5.00

456.hmmer

2902

-393

90.72

-68.50

-37.20

-15.70

458.sjeng

1634

130

96.75

19.30

13.70

7.40

462.libquantum

4913

-1014

66.14

-162.40

-70.20

-26.00

464.h264ref

1921

78

97.89

10.80

7.50

3.90

471.omnetpp

1112

-3

87.04

-0.90

-0.60

-0.30

473.astar

1512

-49

97.57

-10.80

-6.90

-3.30

483.xalancbmk

828

23

93.54

7.80

5.30

2.70

 

PCA computes principal components: new variables that are linear combinations of the original variables such that all principal components are uncorrelated.

PCA transforms the p variables X1, X2, … , Xp into p principal components Z1, Z2, … , Zp with Zi1 ≤ i ≤ p aijXj

This transformation has the properties:

·        Var[Z1] > Var[Z2] > … > Var[Zp], which means that Z1 contains the most information and Zp the least, and:

·        Cov[Zi, Zj] = 0, i ≠ j, which means that there is no information overlap between the principal components.

The total variance in the data remains the same before and after the transformation, is given by the formula [7]:

Σ1 ≤ I ≤ pVar[Xi] = Σ1≤i≤pVar[Zi]

Figure 2 shows the eigenvalues plot of first four most significant principal components, which explain the variance in the workload (PC1 to PC4). Principal component 1 (PC1) and principal component 2 (PC2) explains the most of the information of the system. PC1 and PC2 retain 93.2% and 3.8% of information respectively.

Figure 2. Eigenvalues plot of all principal components, which explain the variance in the workload (PC1 to PC4)

Figure 3. SPEC CINT 2006 programs plotted in the PC space using memory access characteristics (PC1 vs. PC2)

 

Figures 2-5 show the SPEC CPU INT 2006 information on principal component space, PC1 vs PC2, PC2 vs PC3, PC3 vs PC4 respectively. The AI benchmarks 445.gobmk and 458.sjeng has overlap in principal component space. This is due to the similar execution time of AI workloads, which is clearly observed in Figures 2-5.

Figure 4. SPEC CINT 2006 programs plotted in the PC space using memory access characteristics (PC3 vs. PC2).

 

However, 462.libquantum performs a dissimilar behavior as compared to other benchmarks in memory space (Figure 2). PC3 and PC4 have 1.2% and 0.8% benchmarks results, i.e., it shows many hotspots in principal component space which can be seen in Figure 4 and 5.

Figure 5. SPEC CINT 2006 programs plotted in the PC space using memory access characteristics (PC4 vs. PC3).

 

To identify the similarities and dissimilarities between the benchmark of Intel Core 2 Duo processors is identified using dendogram, which is shown in Figure 6. The AI benchmark (445.gobmk and 458.sjeng) workloads certainly linked with different linkage distance. Physics/Quantum Computing benchmark (462.libquantum) shows higher linkage distance.

Figure 6. Dendrogram showing similarity between SPEC CINT2006 Benchmark Programs behavior with linkage distance

 

 

Conclusion

 

We have studied the effect of Artificial Intelligence (AI) workload behavior in Intel Core 2 Duo series processors. In this study we estimated the performance of AI benchmarks by comparing with other SPEC CPU INT 2006 benchmarks workloads. It is observed that the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec, i.e., the performance of Physics/Quantum Computing workloads are more dominant as AI workloads. We have also identified the similarities and dissimilarities between SPEC CPU INT 2006 benchmarks on Intel Core 2 Duo commercial desktop processor. Intel Core 2 Duo processor our studied have shown benefit of some benchmark programs, but not all of them in SPEC.

 

Disclaimer

All the observations and analysis done in this paper on SPEC CPU2006int Benchmarks are the author’s opinions and should not be used as official or unofficial guidelines from SPEC in selecting benchmarks for any purpose. This paper only provides guidelines for performance engineers, academic users, scientists and developers to better understand the performance scaling in modern generation processors and to choose a subset of benchmarks the need be.

 

Acknowledgements

Author A. K. Parchur gratefully acknowledges financial support of UGC for a meritorious research fellowship.

 

 

References

 

1.             Peng L., Peir J., Prakash T.K., Staelin C., Chen Y., Koppelman D., Memory hierarchy performance measurement of commercial dual-core desktop processors, Journal of Systems Architecture, 2008, 54, p. 816-828.

2.             Abdul Kareem P., Singh R.A., Performance Scaling of Individual SPEC INT 2006 Results for AMD Processors. Leonardo Electronic Journal of Practices and Technologies, 2009, 8(14), p. 65-72.

3.             Abdul Kareem P., Singh R.A., TCT Analysis of 0.2 ns Core Clock Series Processors, GESJ: Computer Science and Telecommunications, 2010, 26(3), p. 31-39.

4.             Abdul Kareem P., Noorbasha F., Singh R.A. Study the Task completion Time of the Benchmarks @1GHz, 2GHz and 3GHz Processors, e-Journal of Science & Technology, 2010, 5(2), p. 15-22.

5.             Standard Performance Evaluation Corporation (SPEC) http://www.spec.org, [accessed on 1-12-2009].

6.             Aashish P., Ajay J., Lizy K.J., Analysis of Redundancy and Application Balance in the SPEC CPU2006 Benchmark Suite ISCA’07, 2007, p. 412-423.

7.             Abdul Kareem P, Singh R.A., Principal Component and Cluster Analysis of SPEC CPUint2006 Benchmarks: Input Data set Selection. e-Journal of Science & Technology, 2009, 4(3), p. 79-89.