C From: mcmahon@ocfmail.ocf.llnl.gov (Francis H Mcmahon) C echo '********** LFK tests script for CRAY / UNICOS' C rm $1.o $1.l $1.m $1.x a.out output bino C cft77 -o aggress,bl -dp -e sxz $1.f C echo '********** ASSEMBLED.' C bld qz bino $1.o C bld dz bino :SECOND C segldr -M $1.m -o $1.x bino C echo '********** LINKED.' C $1.x c c c PROGRAM DPMFLOPS(TAPE6=OUTPUT) Double Precision Test c LATEST KERNEL MODIFICATION DATE: 22/DEC/86 c LATEST FILE MODIFICATION DATE: 12/OCT/93 version mf528 c**************************************************************************** c MEASURES CPU PERFORMANCE RANGE OF THE COMPUTATION/COMPILER/COMPUTER COMPLEX c**************************************************************************** c * c L. L. N. L. F O R T R A N K E R N E L S T E S T: M F L O P S * c * c Our little systems have their day; * c They have their day and cease to be: * c They are but broken parts of Thee, * c And Thou, O Lord, are more than they. * c Alfred, Lord Tennyson (1850) * c * c * c These kernels measure Fortran numerical computation rates for a * c spectrum of CPU-limited computational structures. Mathematical * c through-put is measured in units of millions of floating-point * c operations executed per Second, called Mega-Flops/Sec. * c * c The experimental design of some traditional benchmark tests is * c defective when applied to computers employing vector or parallel * c processing because the range of cpu performance is 10 to 100 times * c the range of conventional, serial processors. In particular, the * c effective Cpu performance of supercomputers now ranges from a few * c megaflops to a few thousand megaflops. Attempts by some marketeers * c and decision makers to reduce this three orders of magnitude range * c of cpu performance to a single number is unscientific and has * c produced much confusion. The LFK test also has been abused by * c some analysts who quote only a single, average performance number. * c * c The Livermore Fortran Kernels (LFK) test contains a broad sample * c of generic Fortran computations which have been used to measure an * c effective numerical performance range, thus avoiding the peril of * c a single performance "rating". A complete report of 72 LFK test * c results must quote six performance range statistics(rates): the * c minimum, the harmonic, geometric, and arithmetic means, the * c maximum and the standard deviation. No single rate quotation is * c sufficient or honest. These measurements show a realistic * c variance in Fortran cpu performance that has stood the test of * c time and that is vital data for circumspect computer evaluations. * c Quote statistics from the SUMMARY table of 72 timings (DO Span= 167). * c * c This LFK test may be used as a standard performance test, as a test * c of compiler accuracy (checksums), or as a hardware endurance test. * c The LFK methodology is discussed in subroutine REPORT with references.* c The glossary and module hierarchy are documented in subroutine INDEX. * c * c Use of this program is granted with the request that a copy of the * c results be sent to the author at the address shown below, to be * c added to our studies of computer performance. Please send your * c complete LFK test output file on 5" DOS floppy-disk, or by E-mail. * c Your timing results may be held as proprietary data, if so marked. * c Otherwise your results will be quoted in published reports and will * c be disseminated through a publicly accessable computer network. * c Most computer vendors have run the LFK test(akas Livermore Loops test)* c and can provide LFK test results to prospective customers on request. * c Enhanced versions of this LFK test may be obtained from the author: * c * c * c F.H. McMahon L-35 * c Lawrence Livermore National Laboratory * c P.0. Box 808 * c Livermore, CA. 94550 * c * c (510) 422-1647 * c mcmahon@ocfmail.ocf.llnl.gov * c MCMAHON3@LLNL.GOV * c * c * c (C) Copyright 1983 the Regents of the * c University of California. All Rights Reserved. * c * c This work was produced under the sponsorship of * c the U.S. Department of Energy. The Government * c retains certain rights therein. * c**************************************************************************** c c c DIRECTIONS c c 1. We REQUIRE one test-run of the Fortran kernels as is, that is, with c no reprogramming. Standard product compiler directives may be used c for optimization as these do not constitute reprogramming. Use of c special compiler coding used only for specific LFK kernels is PROHIBITED. c We REQUIRE one mono-processed run (1 cpu) of this unaltered test. c c The performance of the standard, "as is" LFK test (no modifications) c correlates well with the performance of the majority of cpu-bound, c Fortran applications and hence of diverse workloads. These measured c correlations show the LFK to be a good sampling of the existing c inventory of Fortran coding practice in general. The extrema in c the Fortran inventory are represented from serial recurrences on c small arrays to global-parallel computation on large arrays. c c 2. In addition, the vendor may, if so desired, reprogram the kernels to c demonstrate high performance hardware features. Kernels 13,14,23 c are partially vectorisable and kernels 15,16,24 are vectorisable if c re-written. Kernels 5,6,11,17,19,20,23 are implicit computations that c must NOT be explicitly vectorised using compiler directives to c ignore dependencies. In any case, compiler listings of the codes c actually used should be returned along with the timing results. c c We permit the LFK kernels to be reprogrammed ONLY as a partial c demonstration of the performance of innovative, high performance c architectures. We may then infer from the reprogramming work c the kind and degree of optimisations which are necessary to achive c high performance as well as the cost in time and effort. c Only if it can be shown that this reprogramming can be automated c could we establish a correlation with the existing Fortran inventory. c These non-standard tests using the LFK samples are intended to explore c programming requirements and should not be correlated with standard c LFK test results (as in 1 above). c c 3. For vector processors, we REQUIRE an ALL-scalar compilation test-run c to measure the basic scalar performance range of the processor. c c 4. On computers where default single precision is REAL*4 we REQUIRE an c additional test-run with all mantissas.ge.47 . Declare all REAL*8 using: IMPLICIT DOUBLE PRECISION (A-H,O-Z) c c To change REAL*4 (MFLOPS) to REAL*8 Double Precision: c c vi... :1,$s/cANSI/ /g c vi... :1,$s/ DOUBLE PRE/Cout DOUBLE PRE/g c ( some redundance in IQRANF,REPORT,RESULT,SEQDIG,TALLY,TRIAL,VALUES) c c To reverse REAL*8 (DPMFLOPS) to REAL*4 Single Precision: c c vi... :1,$s/ IMPLICIT DOUBLE PRE/ IMPLICIT DOUBLE PRE/g c vi... :1,$s/Cout DOUBLE PRE/ DOUBLE PRE/g c c 5. Installation includes verifying or changing the following: c c First : the definition of function SECOND for CPU time only, and c Second: the definition of function MOD2N in KERNEL c Third : the system names Komput, Kontrl, and Kompil in MAIN. c During check-out run-time can be reduced by setting: Nruns= 1 in SIZES. c For Standard LFK Benchmark Test verify: Nruns= 7 in SIZES. c c 6. Each kernel's computation is check-summed for easy validation. c Your checksums should compare to the precision used, within round-off. c The number of correct, significant digits in your check-sums is printed c in the OK column next to each check-sum. Single precision should produce c 6 to 8 OK digits and double precision should produce 11 to 16 OK digits. c Try REAL*16 in subr SIGNEL and SUMO to improve accuracy of DP checksums. c c 7. Verify CPU Time measurements from function SECOND by comparing the clock c calibration printout of total CPU time with system or real-time measures. c The accuracy of SECOND is also tested using subr VERIFY and CALIBR. c Each kernel's execution may be repeated arbitrarily many times c (MULTI >> 100) without overflow and produce verifiable checksums. c c Default, uni-processor tests measure job Cpu-time in SECOND (TSS mode). c Parallel processing tests should measure Real-time in stand-alone mode. c c 8. On computers with Virtual Storage Systems assure a working-set space c larger than the entire program so that page faults are negligible, c because we must measure the CPU-limited computation rates. c IT IS ALSO NECESSARY to run this test stand-alone, i.e. NO timesharing. c In VS Systems a series of runs are needed to show stable CPU timings. c c 9. On computers with Cache memories and high resolution CPU clocks we c need, if feasible, another ALL-scalar test-run setting Loop= 1 c in SIZES to test un-primed cache (as well as encached) cpu rates. c Increase the size of array CACHE(in subr. VALUES) from 8192 to cache size. c c 10. On parallel computer systems which compile parallel Multi-tasking c at the Do-loop level (Micro-tasking) parallelisation of each c kernel is encouraged, but the number of processors used must be c reported. Parallelisation of, or invarient code hoisting outside of c the outermost, repetition loop around each kernel (including TEST) c is PROHIBITED. You may NOT declare NO-SIDE-EFFECTS function TEST. c c 11. A long endurance test can be set-up by redefining "laps" in SIZES. c c c c 12. Interpretation of LFK performance rates is discussed in Subr REPORT and: c c F.H. McMahon, The Livermore Fortran Kernels: c A Computer Test Of The Numerical Performance Range, c Lawrence Livermore National Laboratory, c Livermore, California, UCRL-53745, December 1986. c c c Quote statistics from the SUMMARY table of 72 timings (DO Span= 167) c located near line 700+ in the output file and terminated with a banner>>> c c ******************************************** c THE LIVERMORE FORTRAN KERNELS: * SUMMARY * c ******************************************** c c Computer : CRAY Y-MP1 c System : UNICOS 5.1 c Compiler : CF77 4.0 c Date : 06/03/90 c . c . c . c MFLOPS RANGE: REPORT ALL RANGE STATISTICS: c Mean DO Span = 167 c Code Samples = 72 c c Maximum Rate = 294.34 Mega-Flops/Sec. c Quartile Q3 = 123.27 Mega-Flops/Sec. c Average Rate = 82.71 Mega-Flops/Sec. c Geometric Mean = 43.42 Mega-Flops/Sec. c Median Q2 = 31.14 Mega-Flops/Sec. c Harmonic Mean = 23.20 Mega-Flops/Sec. c Quartile Q1 = 17.16 Mega-Flops/Sec. c Minimum Rate = 2.74 Mega-Flops/Sec. c <<<<<<<<<<<<<<<<<<<<<<<<<<<*>>>>>>>>>>>>>>>>>>>>>>>>>>> c < BOTTOM-LINE: 72 SAMPLES LFK TEST RESULTS SUMMARY. > c < USE RANGE STATISTICS ABOVE FOR OFFICIAL QUOTATIONS. > c <<<<<<<<<<<<<<<<<<<<<<<<<<<*>>>>>>>>>>>>>>>>>>>>>>>>>>> c c Sadly some analysts quote only the long vector(DO span=471) LFK statistics c because they are the most impressive but they are not the best guide to c the performance of a large, diverse workload; the SUMMARY statistics are. c c A complete LFK perform-range report must include the minimum, the Harmonic c Geometric, and Arithmetic means, the maximum and the standard deviation. c The best central measure is the Geometric Mean(GM) of 72 rates because the c GM is less biased by outliers than the Harmonic(HM) or Arithemetic(AM). c CRAY hardware monitors have demonstrated that net Mflop rates for the c LLNL and UCSD tuned workloads are closest to the 72 LFK test GM rate. c [ However, CRAY memories are "all cache". LLNL codes ported to smaller cache c microprocessors typically perform at only LFK Harmonic mean MFlop rates.] c c c CORRESPONDENCE OF LFK TEST PERFORMANCE MEANS WITH LARGE WORKLOAD TUNING c c ------- -------- ---------- ----------------------- c Type of CRAY-C90/1 Fraction Tuning of Workload c Mean (VL=167) Flops in Correlated with c (MFlops) Vector Ops LFK Mean Performance c ------- -------- ---------- ----------------------- c c 2*AM 382. .97 Best applications c c AM 191. .89 Optimized applications c c GM 86. .74 Tuned workload c c HM 41. .45 Untuned workload c c HM(scalar) 18. .0 All-scalar applications c ------- -------- ---------- ----------------------- c (AM,GM,HM stand for Arithmetic, Geometric, Harmonic Mean Rates) c c c c c The Livermore Loops test reports 8 standard statistics c (min to max) in order to represent the entire MFlops performance c distribution measured by the LFK samples. These statistics c provide a few well known points in the performance range c which analysts may use to establish a correspondence point c in the LFK performance range with the MFlops performance of c their application or a workload of applications. c c For example, an application named SNAIL has an c MFlops performance a little below Q1, the lowest quartile c defined by the LFK test on a half dozen different workstations. c The reason for this poor performance is that SNAIL was c formulated for CRAY using vector gather/scatter heavily which c causes poor cache performance on workstations. c Using this correspondence, we can now predict that SNAIL will c run near Q1 on the IBM-590, i.e. about 17 MFlops, c BEFORE it is timed on the 590. In general, SNAIL's speedup c can be predicted to be the ratio of LFK Q1(new)/Q1(old) c for any future new cache based workstation. c [ On CRAY-YMP which is NOT cache based and has vector gather/ c scatter, SNAIL runs about 150 MFlops. On C90 about 300 MFlops.] c c c c c c Some of the following super-scalar workstations are over 200 times faster c than the VAX-780 workhorse of the 1980's (GM= 0.17 MFlops): c c c D.335 LFK Test#--335.1 ----335.3 ----335.4 ----335.5 ----335.6 ----335.7 c c Vendor CRAY RI DEC /AXP HP IBM CRAY RI CRAY c Model YMP1 6.0 10000610 PA-755 6000/590 YMP1 6.0 Y16/1C90 c OSystem UNICOS 5 OpenVMS HPUX 9.0 AIX 3.2 UNICOS 5 UNICOS 7 c Compiler CF77 4.0 GEM X3.2 f77 9.0b XLF 3.1 CF77 4.0 CF77 5.0 c OptLevel Scalar 200MHz -O+OS+OP VAST2 4. VAST VAST c Nr.Procs 1 1 1 1 1 1 c Samples 72 72 72 72 72 72 c WordSize 64 64 64 64 64 64 c DO Span 167 167 167 167 167 167 c Year 1990 1992 1992 1993 1990 1992 c Kernel/MFlops-------- ---------- ---------- ---------- ---------- --------- c 1 23.39 49.71 72.73 110.31 254.88 708.38 c 2 14.35 24.47 29.54 92.49 68.06 105.56 c 3 25.12 36.60 61.10 206.83 237.95 495.60 c 4 22.70 34.30 57.52 123.85 85.67 172.02 c 5 19.22 20.95 37.23 22.17 19.65 33.80 c 6 9.26 20.90 44.14 31.41 20.97 31.94 c 7 29.95 106.53 68.75 204.03 294.34 826.09 c 8 29.74 74.22 65.14 178.42 214.43 596.88 c 9 29.29 49.01 64.99 151.26 235.05 604.54 c 10 18.63 13.63 23.60 46.99 113.05 291.82 c 11 19.67 22.65 26.05 25.34 19.59 33.64 c 12 16.82 51.36 19.75 81.03 125.57 309.15 c 13 6.68 11.30 9.94 7.22 20.81 62.69 c 14 9.86 10.39 19.05 17.25 29.37 99.98 c 15 7.39 15.88 24.60 15.49 32.00 130.25 c 16 8.31 24.22 15.46 15.16 8.32 10.76 c 17 15.92 28.14 24.31 27.32 15.91 21.93 c 18 24.93 30.04 47.70 76.57 197.67 553.79 c 19 20.02 27.13 37.21 33.27 19.89 35.81 c 20 17.89 16.26 28.91 18.34 17.68 31.97 c 21 20.50 96.96 48.30 223.97 281.72 798.72 c 22 8.82 16.20 17.27 15.14 97.06 187.66 c 23 20.01 37.51 51.09 63.98 35.71 73.65 c 24 3.94 12.96 16.83 13.07 38.60 83.51 c ------------- .... .... .... .... .... .... c c Maximum Rate = 29.95 131.60 72.82 223.98 294.34 826.09 c Quartile Q3 = 20.50 47.69 47.70 106.50 123.27 261.57 c Average Rate = 16.55 35.52 35.53 68.98 82.72 190.56 c Geometric Mean = 14.59 28.06 30.63 43.21 43.43 86.26 c Median Q2 = 16.97 24.47 28.91 33.27 31.15 83.51 c Harmonic Mean = 12.45 23.13 26.26 27.84 23.21 40.73 c Quartile Q1 = 8.96 16.32 18.81 17.25 17.16 31.15 c Minimum Rate = 3.87 8.08 9.70 7.21 2.75 6.79 c c Average Ratio= 1.00 2.15 2.15 4.17 5.00 11.52 c Geometric Ratio= 1.00 1.92 2.10 2.96 2.98 5.91 c Harmonic Ratio= 1.00 1.86 2.11 2.24 1.86 3.27 c c Standard Dev. = 7.52 27.41 18.88 65.13 88.20 227.25 c Avg Efficiency = 48.71% 21.32% 42.06% 19.29% 14.75% 10.4% c c c c c REALISTIC CPU PERFORMANCE COMPARISONS USING LIVERMORE LOOPS TEST MEAN RATES c c The range of speed-ups shown below as ratios of three performance mean c statistics has a very small variance compared to the enormous c performance ranges; these ratios are convergent speed-up estimates c of the relative performance of diverse workloads. c c c c TABLE OF SPEED-UP RATIOS OF LIVERMORE LOOPS MEAN RATES (72 Samples c c The Geometric Mean is the statistic least biased by outliers. c (AM,GM,HM stand for Arithmetic, Geometric, Harmonic Mean Rates) c But HM is the best MFlops estimate for cache based workstation workloads. c c c -------- ---- ------ -------- -------- -------- -------- -------- -------- c SYSTEM MEAN MFLOPS SX-3/14 VP2600 Y16/1C90 6000/590 9000/755 200MH610 c -------- ---- ------ -------- -------- -------- -------- -------- -------- c c c NEC AM= 311.820 : 1.000 1.054 1.636 4.520 8.776 8.779 c SX-3/14 GM= 95.590 : 1.000 1.028 1.108 2.212 3.121 3.407 c F77v.012 HM= 38.730 : 1.000 0.916 0.951 1.391 1.475 1.674 c SD= 499.780 c c c FUJITSU AM= 295.790 : 0.949 1.000 1.552 4.288 8.325 8.327 c VP2600 GM= 93.030 : 0.973 1.000 1.078 2.153 3.037 3.315 c F77 V12 HM= 42.260 : 1.091 1.000 1.038 1.518 1.609 1.827 c SD= 514.490 c c c CRAY AM= 190.560 : 0.611 0.644 1.000 2.763 5.363 5.365 c Y16/1C90 GM= 86.270 : 0.903 0.927 1.000 1.997 2.817 3.074 c CF77 5.0 HM= 40.730 : 1.052 0.964 1.000 1.463 1.551 1.761 c SD= 227.250 c c c IBM AM= 68.980 : 0.221 0.233 0.362 1.000 1.941 1.942 c 6000/590 GM= 43.210 : 0.452 0.464 0.501 1.000 1.411 1.540 c XLF 3.1. HM= 27.840 : 0.719 0.659 0.684 1.000 1.060 1.204 c SD= 65.130 c c c HP AM= 35.530 : 0.114 0.120 0.186 0.515 1.000 1.000 c 9000/755 GM= 30.630 : 0.320 0.329 0.355 0.709 1.000 1.092 c f77 9.0b HM= 26.260 : 0.678 0.621 0.645 0.943 1.000 1.135 c SD= 18.880 c c c DEC AM= 35.520 : 0.114 0.120 0.186 0.515 1.000 1.000 c 200MH610 GM= 28.060 : 0.294 0.302 0.325 0.649 0.916 1.000 c GEM X3.2 HM= 23.130 : 0.597 0.547 0.568 0.831 0.881 1.000 c SD= 27.410 c -------- ---- ------ -------- -------- -------- -------- -------- -------- c c c c c EVOLUTION OF AVERAGE COMPUTING RATES c c ---------- -- ---- ---- -------------- ------------ ------------- c Uniprocessor c Primary Average 1.5e+10 Flops c Computer Nr. Oper Memory(K=1024) Computing Problem c Vendor YR Proc Regs ( *dec digits) Rate(MFlops) Time (Hours) c ---------- -- ---- ---- -------------- ------------ ------------- c c UNIVAC 52 1 1 .1K *12 .001 4000. c c IBM-650 53 1 4 2K *10 .0002 20000. c c IBM-704 54 1 6 8K * 7 .008 500. c c IBM-7090 59 1 11 32K * 7 .05 83. c c IBM-7030 61 1 90K *16 .2 20. c c CDC-6600 64 1 24 128K *14 .5 8. c c CDC-7600 69 1 24 576K *14 3. 1.3 c c CRAY-1 76 1 656 1024K *14 5 - 50. .80 - .08 c c CRAY-YMP 89 8 656 131072K *14 15 -150. .26 - .026 c c CRAY-C90 91 16 656 1048576K *14 30 -600. .13 - .006 c ---------- -- ---- ---- -------------- ------------ ------------- c c c ---------- -- ---- ---- -------------- ------------ ------------- c Microprocessor c Primary Average 1.5e+10 Flops c Computer Nr. Oper Memory(K=1024) Computing Problem c Vendor YR Proc Regs ( *dec digits) Rate(MFlops) Time (Hours) c ---------- -- ---- ---- -------------- ------------ ------------- c c IBM-8086 81 1 64K *16 .002 2000. c c IBM-8087 81 1 64K *16 .009 440. c c IBM-80286 84 1 512K *16 .07 57. c c IBM-80386 87 1 2048K *16 .3 13. c c IBM-80486 89 1 4096K *16 .9 4. c c IBM-6M560 92 1 16384K *16 14 - 45. .2 c ---------- -- ---- ---- -------------- ------------ ------------- c c 1. IBM-650 Magnetic Drum Data Processing Machine, Manual of Operation, c Form 22-6060-1, pp79-83, (1953). c 2. IBM-650 MDDPM Additional Features, Form 22-6258-0, p11, (1955). c 3. IBM-704 Electronic Data Processing Machine, p6, p91, (1954). c 4. F.H. McMahon, The Livermore Fortran Kernels: A Computer Test Of The c Numerical Performance Range, LLNL, Livermore, CA, UCRL-53745, (1986). c c**************************************************************************** c c c c DEVELOPMENT HISTORY OF THE LIVERMORE LOOPS TEST PROGRAM c c The first version of the LFK Test (a.k.a. the Livermore Loops, circa c 1970) consisting of 12 numerical Fortran kernels was developed c and enhanced by F.H. McMahon unless noted otherwise below. c The author is grateful for the constructive criticism of colleagues: c J.Owens, H.Nelson, L.Berdahl, D.Fuss, L.Sloan, T.Rudy, M.Seager. c Since mainframe computers in that era all provided cpu-timers c with micro-second time resolution, each kernal was executed just c once and timed with negligible experimental timing errors. c c In 1980 the number of Fortran samples was doubled to 24 kernels c to represent a broad range of computational structures that would c challenge a comiler's capability to generate optimal machine code. c c In 1983 the LFK test driver was extended to execute all 24 kernels c three times using three sets of DO loop limits (Avg: 18, 89, 468) c since parallel computer performace depends on scale or granularity. c These 72 sample statistics are more robust and definitive. c c In 1985 a repetition loop was placed around each kernel to execute c them long enough for accurate timing using the standard UNIX c timer ETIME which has a crude time resolution of 0.01 seconds. c c In 1986 the LFK test driver was extended to run the entire test c seven times so that experimental timing errors for each of the c 72 samples could be measured. Reports of these timing errors c are necessary for honest scientific experiments. See App. B, C: c c F.H.McMahon, The Livermore Fortran Kernels: c A Computer Test Of The Numerical Performance Range, c Lawrence Livermore National Laboratory, c Livermore, California, UCRL-53745, December 1986. c c In 1986 Greg Astfalk (AT&T) reprogrammed subroutine KERNEL containing c the 24 samples in the C language. This C module can then be linked c with the standard Fortran LFK Test-driver program for testing under c identical benchmark conditions as the Fortran samples benchmark. c This C module was refined at LLNL by K.O'Hair, C.Rasbold, and M.Seager. c c In 1990 the repetition loops around each kernel were modified c following reports of some code-hoisting by global optimization. c These repetition loops were submerged into function TEST beyond c the scope of optimizers so the 72 samples are now bullet-proof. c New, highly accurate, convergent methods to measure overhead time c were implemented ( in VERIFY, SECOVT, TICK ). c c In 1991 the LFK test runtime control MULTI was increased twenty fold c for accurate timing when crude UNIX timers having poor time resolution c (Tmin= 0.01 sec) were used on very fast computers. This was only a c temporary fix since under UNIX each kernel must always be run c at least 1 sec for 1% accuracy despite ever increasing cpu speeds. c Thus new algorithms were implemented that automatically determine c appropriate values for MULTI which are sufficiently large for c accurate timing of the kernels in any system. A new method c of repetition is used that allows MULTI to be increased indefinately c (MULTI >> 100) in future without causing overflow and still compute c verifiable checksums. New checksums were generated using IEEE 754 c standard floating-point hardware on SUN, SGI, and HP workstations. c Operational accuracy of the test program is assured in future. c c**************************************************************************** c c c c c/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24) c/ PARAMETER( nk= 47, nl= 3, nr= 8 ) parameter( ntimes= 18 ) c CHARACTER Komput*24, Kontrl*24, Kompil*24, Kalend*24, Identy*24 COMMON /SYSID/ Komput, Kontrl, Kompil, Kalend, Identy c COMMON /ALPHA/ mk,ik,im,ml,il,Mruns,Nruns,jr,iovec,NPFS(8,3,47) COMMON /ORDER/ inseq, match, NSTACK(20), isave, iret COMMON /TAU/ tclock, tsecov, testov, cumtim(4) DIMENSION FLOPS(141), TR(141), RATES(141), ID(141) DIMENSION LSPAN(141), WG(141), OSUM (141), TERR(141), TK(6) cLOX REAL*8 SECOND cLLNL CALL DROPFILE ( '+MFLOPS' ) c Job start Cpu time cumtim(1)= 0.0d0 ti= SECOND( cumtim(1)) c c DEFINE YOUR COMPUTER SYSTEM: Komput = 'CRAY-YMP (6.0ns) ' Kontrl = 'UNICOS fully loaded ' Kompil = 'CFT77 4.0.3.4 ' Kalend = '91.07.14 ' Identy = 'Frank McMahon, LLNL ' c c Initialize variables and Open Files CALL INDATA( TK, iou) c Record name in active linkage chain in COMMON /DEBUG/ CALL TRACE (' MAIN. ') c c Verify Sufficient Loop Size Versus Cpu Clock Accuracy CALL VERIFY( iou ) tj= SECOND( cumtim(1)) nt= ntimes c Define control limits: Nruns(runs), Loop(time) CALL SIZES(-1) c c Run test Mruns times Cpu-limited; I/O is deferred: DO 2 k= 1,Mruns i= k jr= MOD( i-1,7) + 1 CALL IQRAN0( 256) c Run test using one of 3 sets of DO-Loop spans: c Set iou Negative to supress all I/O during Cpu timing. DO 1 j= im,ml il= j tock= TICK( -iou, nt) c CALL KERNEL( TK) 1 continue CALL TRIAL( iou, i, ti, tj) 2 continue c c Report timing errors, Mflops statistics: DO 3 j= im,ml il= j CALL RESULT( iou,FLOPS,TR,RATES,LSPAN,WG,OSUM,TERR,ID) c c Report Mflops for Vector Cpus( short, medium, long vectors): c iovec= 0 IF( iovec.EQ.1 ) THEN CALL REPORT( iou, mk,mk,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID) ENDIF 3 continue c Report Mflops SUMMARY Statistics: for Official Quotations c CALL REPORT( iou,3*mk,mk,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID) c cumtim(1)= 0.0d0 totjob= SECOND( cumtim(1)) - ti - tsecov WRITE( iou,9) inseq, totjob, TK(1), TK(2) WRITE( *,9) inseq, totjob, TK(1), TK(2) 9 FORMAT( '1',//,' Version: 22/DEC/86 mf528 ',2X,I12,/,1P, 1 ' CHECK FOR CLOCK CALIBRATION ONLY: ',/, 2 ' Total Job Cpu Time = ',e14.5, ' Sec.',/, 3 ' Total 24 Kernels Time = ',e14.5, ' Sec.',/, 4 ' Total 24 Kernels Flops= ',e14.5, ' Flops') c c Optional Cpu Clock Calibration Test of SECOND: c CALL CALIBR STOP END c*********************************************** BLOCK DATA c*********************************************** c IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) cout DOUBLE PRECISION SUMS REDUNDNT c c l1 := param-dimension governs the size of most 1-d arrays c l2 := param-dimension governs the size of most 2-d arrays c c ISPAN := Array of limits for DO loop control in the kernels c IPASS := Array of limits for multiple pass execution of each kernel c FLOPN := Array of floating-point operation counts for one pass thru kernel c WT := Array of weights to average kernel execution rates. c SKALE := Array of scale factors for SIGNEL data generator. c BIAS := Array of scale factors for SIGNEL data generator. c c MUL := Array of multipliers * FLOPN for each pass c WTP := Array of multipliers * WT for each pass c FR := Array of vectorisation fractions in REPORT c SUMW := Array of quartile weights in REPORT c IQ := Array of workload weights in REPORT c SUMS := Array of Verified Checksums of Kernels results: Nruns= 1 and 7. c c/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) c/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) c/ PARAMETER( l14=2048, l16= 75, l416= 4*l16 , l21= 25 ) c c/ PARAMETER( l1= 27, l2= 15, l1d= 2*1001 ) c/ PARAMETER( l13= 8, l13h= 8/2, l213= 8+4, l813= 8*8 ) c/ PARAMETER( l14= 16, l16= 15, l416= 4*15 , l21= 15) c c c/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) c/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) c/ PARAMETER( l14= 2048, l16= 75, l416= 4*75 , l21= 25) c c/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24) c/ PARAMETER( m1= 1001-1, m2= 101-1, m7= 1001-6 ) parameter( nsys= 5, ns= nsys+1, nd= 11, nt= 4 ) c COMMON /SPACES/ ion,j5,k2,k3,MULTI,laps,Loop,m,kr,LP,n13h,ibuf,nx, 1 L,npass,nfail,n,n1,n2,n13,n213,n813,n14,n16,n416,n21,nt1,nt2, 2 last,idebug,mpy,Loops2,mucho,mpylim, intbuf(16) c COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), ticks, 1 FR(9), TERR1(47), SUMW(7), START, 2 SKALE(47), BIAS(47), WS(95), TOTAL(47), FLOPN(47), 3 IQ(7), NPF, NPFS1(47) c CHARACTER NAMES*8 COMMON /TAGS/ NAMES(nd,nt) COMMON /RATS/ RATED(nd,nt) COMMON /SPACEI/ WTP(3), MUL(3), ISPAN(47,3), IPASS(47,3) c COMMON /ORDER/ inseq, match, NSTACK(20), isave, iret c COMMON /PROOF/ SUMS(24,3,8) c **************************************************************** c DATA ( ISPAN(i,1), i= 1,47) / 1 1001, 101, 1001, 1001, 1001, 64, 995, 100, 2 101, 101, 1001, 1000, 64, 1001, 101, 75, 3 101, 100, 101, 1000, 101, 101, 100, 1001, 23*0/ c c* : l1, l2, l1, l1, l1, l13, m7, m2, c* : l2, l2, l1, m1, l13, l1, l2, l16, c* : l2, m2, l2, m1, l21, l2, m2, l1, 23*0/ c DATA ( ISPAN(i,2), i= 1,47) / 1 101, 101, 101, 101, 101, 32, 101, 100, 2 101, 101, 101, 100, 32, 101, 101, 40, 3 101, 100, 101, 100, 50, 101, 100, 101, 23*0/ c DATA ( ISPAN(i,3), i= 1,47) / 1 27, 15, 27, 27, 27, 8, 21, 14, 2 15, 15, 27, 26, 8, 27, 15, 15, 3 15, 14, 15, 26, 20, 15, 14, 27, 23*0/ c DATA ( IPASS(i,1), i= 1,47) / 1 7, 67, 9, 14, 10, 3, 4, 10, 36, 34, 11, 12, 2 36, 2, 1, 25, 35, 2, 39, 1, 1, 11, 8, 5, 23*0/ c DATA ( IPASS(i,2), i= 1,47) / 1 40, 40, 53, 70, 55, 7, 22, 6, 21, 19, 64, 68, 2 41, 10, 1, 27, 20, 1, 23, 8, 1, 7, 5, 31, 23*0/ c DATA ( IPASS(i,3), i= 1,47) / 1 28, 46, 37, 38, 40, 21, 20, 9, 26, 25, 46, 48, 2 31, 8, 1, 14, 26, 2, 28, 7, 1, 8, 7, 23, 23*0/ c DATA ( MUL(i), i= 1,3) / 1, 2, 8 / DATA ( WTP(i), i= 1,3) / 1.0, 2.0, 1.0 / c c The following flop-counts (FLOPN) are required for scalar or serial c execution. The scalar version defines the NECESSARY computation c generally, in the absence of proof to the contrary. The vector c or parallel executions are only credited with executing the same c necessary computation. If the parallel methods do more computation c than is necessary then the extra flops are not counted as through-put. c DATA ( FLOPN(i), i= 1,47) 1 /5., 4., 2., 2., 2., 2., 16., 36., 17., 9., 1., 1., 2 7., 11., 33.,10., 9., 44., 6., 26., 2., 17., 11., 1., 23*0.0/ c DATA ( WT(i), i= 1,47) / 1 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 23*0.0/ c c DATA ( SKALE(i), i= 1,47) / 1 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 2 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 3 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 4 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 0.100D+0, 5 23*0.000D+0 / c c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 23*0.0/ c DATA ( BIAS(i), i= 1,47) / 1 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 23*0.0/ c DATA ( FR(i), i= 1,9) / 1 0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0/ c DATA ( SUMW(i), i= 1,7) / 1 1.0, 0.95, 0.9, 0.8, 0.7, 0.6, 0.5/ c DATA ( IQ(i), i= 1,7) / 1 1, 2, 1, 2, 1, 2, 1/ c c c c NEC SX-3/14 c DATA ( NAMES(1,i), i= 1,3) / 1 'NEC ', 'SX-3/14 ', 'F77v.012' / c DATA ( RATED(1,i), i= 1,4) / 1 311.82, 95.59, 38.73, 499.78 / c c FUJITSU VP2600 c DATA ( NAMES(2,i), i= 1,3) / 1 'FUJITSU ','VP2600 ','F77 V12' / c DATA ( RATED(2,i), i= 1,4) / 1 295.79, 93.03, 42.26, 514.49 / c c CRAY-Y16/1 C90 c DATA ( NAMES(3,i), i= 1,3) / 1 'CRAY ', 'Y16/1C90', 'CF77 5.0' / c DATA ( RATED(3,i), i= 1,4) / 1 190.56, 86.27, 40.73, 227.25 / c c IBM 6000/560 c DATA ( NAMES(4,i), i= 1,3) / 1 'IBM ', '6000/560', 'XLF 2.2.' / c DATA ( RATED(4,i), i= 1,4) / 1 27.15, 20.21, 14.52, 20.15 / c c HP 9000/730 c DATA ( NAMES(5,i), i= 1,3) / 1 'HP ', '9000/730', 'f77 8.05' / c DATA ( RATED(5,i), i= 1,4) / 1 18.31, 15.72, 13.28, 9.68 / c c c c c CRAY-YMP/1 c c DATA ( NAMES(2,i), i= 1,3) / c : 'CRAY ', 'YMP/1 ', 'CFT771.2' / c c DATA ( RATED(2,i), i= 1,4) / c : 78.23, 36.63, 17.66, 86.75 / c c IBM 3090S180 c c DATA ( NAMES(2,i), i= 1,3) / c : 'IBM ', '3090s180', 'VSF2.2.0' / c c DATA ( RATED(2,i), i= 1,4) / c : 17.56, 12.23, 9.02, 16.32 / c c IBM 6000/540 c c DATA ( NAMES(4,i), i= 1,3) / c : 'IBM ', '6000/540', 'XL v0.90' / c c DATA ( RATED(4,i), i= 1,4) / c : 14.17, 10.73, 7.45, 9.59 / c c COMPAQ i486/25 c c DATA ( NAMES(5,i), i= 1,3) / c : 'COMPAQ ', 'i486/25 ', ' ' / c c DATA ( RATED(5,i), i= 1,4) / c : 1.15, 1.05, 0.92, 0.48 / c c c c DATA START /0.0/, NPF/0/, ibuf/0/, match/0/, MULTI/200/, laps/1/ DATA npass/0/, nfail/0/, last/-1/ c c MULTI= 200 c DATA ( SUMS(i,1,5), i= 1,24 ) / 15.114652693224671D+04,1.539721811668385D+03,1.000742883066363D+01, 25.999250595473891D-01,4.548871642387267D+03,4.375116344729986D+03, 36.104251075174761D+04,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.342910972650109D+07,2.907141294167248D-05, 51.202533961842803D+11,3.165553044000334D+09,3.943816690352042D+04, 65.650760000000000D+05,1.114641772902486D+03,1.015727037502300D+05, 75.421816960147207D+02,3.040644339351239D+07,1.597308280710199D+08, 82.938604376566697D+02,3.549900501563623D+04,5.000000000000000D+02/ c DATA ( SUMS(i,2,5), i= 1,24 ) / 15.253344778937972D+02,1.539721811668385D+03,1.009741436578952D+00, 25.999250595473891D-01,4.589031939600982D+01,8.631675645333210D+01, 36.345586315784055D+02,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.433560407475758D+04,7.127569130821465D-06, 59.816387810944345D+10,3.039983465145393D+07,3.943816690352042D+04, 66.480410000000000D+05,1.114641772902486D+03,1.015727037502300D+05, 75.421816960147207D+02,3.126205178815431D+04,7.824524877232093D+07, 82.938604376566697D+02,3.549900501563623D+04,5.000000000000000D+01/ c DATA ( SUMS(i,3,5), i= 1,24 ) / 13.855104502494961D+01,3.953296986903059D+01,2.699309089320672D-01, 25.999250595473891D-01,3.182615248447483D+00,1.120309393467088D+00, 32.845720217644024D+01,2.960543667875003D+03,2.623968460874250D+03, 41.651291227698265D+03,6.551161335845770D+02,1.943435981130448D-06, 53.847124199949426D+10,2.923540598672011D+06,1.108997288134785D+03, 65.152160000000000D+05,2.947368618589360D+01,9.700646212337040D+02, 71.268230698051003D+01,5.987713249475302D+02,5.009945671204667D+07, 86.109968728263972D+00,4.850340602749970D+02,1.300000000000000D+01/ c c MULTI= 100 c DATA ( SUMS(i,1,4), i= 1,24 ) / 15.114652693224671D+04,1.539721811668385D+03,1.000742883066363D+01, 25.999250595473891D-01,4.548871642387267D+03,4.375116344729986D+03, 36.104251075174761D+04,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.342910972650109D+07,2.907141294167248D-05, 54.958101723583047D+10,3.165278275112100D+09,3.943816690352042D+04, 62.825760000000000D+05,1.114641772902486D+03,7.507386432940455D+04, 75.421816960147207D+02,3.040644339351239D+07,8.002484742089500D+07, 82.938604376566697D+02,3.549900501563623D+04,5.000000000000000D+02/ c DATA ( SUMS(i,2,4), i= 1,24 ) / 15.253344778937972D+02,1.539721811668385D+03,1.009741436578952D+00, 25.999250595473891D-01,4.589031939600982D+01,8.631675645333210D+01, 36.345586315784055D+02,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.433560407475758D+04,7.127569130821465D-06, 53.542728632259964D+10,3.015943681556781D+07,3.943816690352042D+04, 63.240410000000000D+05,1.114641772902486D+03,7.507386432940455D+04, 75.421816960147207D+02,3.126205178815431D+04,3.916171317449981D+07, 82.938604376566697D+02,3.549900501563623D+04,5.000000000000000D+01/ c DATA ( SUMS(i,3,4), i= 1,24 ) / 13.855104502494961D+01,3.953296986903059D+01,2.699309089320672D-01, 25.999250595473891D-01,3.182615248447483D+00,1.120309393467088D+00, 32.845720217644024D+01,2.960543667875003D+03,2.623968460874250D+03, 41.651291227698265D+03,6.551161335845770D+02,1.943435981130448D-06, 51.161063924078402D+10,2.609194549277411D+06,1.108997288134785D+03, 62.576160000000000D+05,2.947368618589360D+01,9.700646212337040D+02, 71.268230698051003D+01,5.987713249475302D+02,2.505599006414913D+07, 86.109968728263972D+00,4.850340602749970D+02,1.300000000000000D+01/ c c MULTI= 50 c DATA ( SUMS(i,1,3), i= 1,24 ) / 15.114652693224671D+04,1.539721811668385D+03,1.000742883066363D+01, 25.999250595473891D-01,4.548871642387267D+03,4.375116344729986D+03, 36.104251075174761D+04,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.342910972650109D+07,2.907141294167248D-05, 52.217514090251080D+10,3.165140890667983D+09,3.943816690352042D+04, 61.413260000000000D+05,1.114641772902486D+03,6.203834985242972D+04, 75.421816960147207D+02,3.040644339351239D+07,4.017185709583275D+07, 82.938604376566697D+02,3.549900501563623D+04,5.000000000000000D+02/ c DATA ( SUMS(i,2,3), i= 1,24 ) / 15.253344778937972D+02,1.539721811668385D+03,1.009741436578952D+00, 25.999250595473891D-01,4.589031939600982D+01,8.631675645333210D+01, 36.345586315784055D+02,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.433560407475758D+04,7.127569130821465D-06, 51.430504282675192D+10,3.003923789762475D+07,3.943816690352042D+04, 61.620410000000000D+05,1.114641772902486D+03,6.203834985242972D+04, 75.421816960147207D+02,3.126205178815431D+04,1.961994537558922D+07, 82.938604376566697D+02,3.549900501563623D+04,5.000000000000000D+01/ c DATA ( SUMS(i,3,3), i= 1,24 ) / 13.855104502494961D+01,3.953296986903059D+01,2.699309089320672D-01, 25.999250595473891D-01,3.182615248447483D+00,1.120309393467088D+00, 32.845720217644024D+01,2.960543667875003D+03,2.623968460874250D+03, 41.651291227698265D+03,6.551161335845770D+02,1.943435981130448D-06, 53.899370197966012D+09,2.452021524580127D+06,1.108997288134785D+03, 61.288160000000000D+05,2.947368618589360D+01,9.700646212337040D+02, 71.268230698051003D+01,5.987713249475302D+02,1.253425674020030D+07, 86.109968728263972D+00,4.850340602749970D+02,1.300000000000000D+01/ c c c MULTI= 10 Old Checksums used before 1991 (longer run-times were needed) c DATA ( SUMS(i,1,2), i= 1,24 ) / 15.114652693224671D+04,1.539721811668385D+03,1.000742883066363D+01, 25.999250595473891D-01,4.548871642387267D+03,4.375116344729986D+03, 36.104251075174761D+04,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.342910972650109D+07,2.907141294167248D-05, 54.057110454105199D+09,3.165030983112689D+09,3.943816690352042D+04, 62.832600000000000D+04,1.114641772902486D+03,5.165625410754861D+04, 75.421816960147207D+02,3.040644339351239D+07,8.289464835782872D+06, 82.938604376566697D+02,3.549834542443621D+04,5.000000000000000D+02/ c DATA ( SUMS(i,2,2), i= 1,24 ) / 15.253344778937972D+02,1.539721811668385D+03,1.009741436578952D+00, 25.999250595473891D-01,4.589031939600982D+01,8.631675645333210D+01, 36.345586315784055D+02,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.433560407475758D+04,7.127569130821465D-06, 52.325318944820753D+09,2.994307876327030D+07,3.943816690352042D+04, 63.244100000000000D+04,1.114641772902486D+03,5.165625410754861D+04, 75.421816960147207D+02,3.126205178815431D+04,3.986531136460764D+06, 82.938604376566697D+02,3.549894609774404D+04,5.000000000000000D+01/ c DATA ( SUMS(i,3,2), i= 1,24 ) / 13.855104502494961D+01,3.953296986903059D+01,2.699309089320672D-01, 25.999250595473891D-01,3.182615248447483D+00,1.120309393467088D+00, 32.845720217644024D+01,2.960543667875003D+03,2.623968460874250D+03, 41.651291227698265D+03,6.551161335845770D+02,1.943435981130448D-06, 54.755211251524082D+08,2.326283104822299D+06,1.108997288134785D+03, 62.577600000000000D+04,2.947368618589360D+01,9.700646212337040D+02, 71.268230698051003D+01,5.987713249475302D+02,2.516870081041265D+06, 86.109968728263972D+00,4.850340602749970D+02,1.300000000000000D+01/ c c MULTI= 1 Old Checksums used before 1986 (longer run-times were needed) c DATA ( SUMS(i,1,1), i= 1,24 ) / 15.114652693224671D+04,1.539721811668385D+03,1.000742883066363D+01, 25.999250595473891D-01,4.548871642387267D+03,4.375116344729986D+03, 36.104251075174761D+04,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.342910972650109D+07,2.907141294167248D-05, 54.468741170140841D+08,3.165006253912748D+09,3.943816690352042D+04, 62.901000000000000D+03,1.227055736845479D+03,4.932243865816480D+04, 75.421816960147207D+02,3.040644339351239D+07,1.115926577271652D+06, 82.938604376566697D+02,3.138872788135057D+04,5.000000000000000D+02/ c DATA ( SUMS(i,2,1), i= 1,24 ) / 15.253344778937972D+02,1.539721811668385D+03,1.009741436578952D+00, 25.999250595473891D-01,4.589031939600982D+01,8.631675645333210D+01, 36.345586315784055D+02,1.501268005625798D+05,1.189443609974981D+05, 47.310369784325296D+04,3.433560407475758D+04,7.127569130821465D-06, 52.323352389500009D+08,2.992144295804055D+07,3.943816690352042D+04, 63.281000000000000D+03,1.114641772902486D+03,4.932243865816480D+04, 75.421816960147207D+02,3.126205178815431D+04,4.690129326568575D+05, 82.938604376566697D+02,3.228104575530876D+04,5.000000000000000D+01/ c DATA ( SUMS(i,3,1), i= 1,24 ) / 13.855104502494961D+01,3.953296986903059D+01,2.699309089320672D-01, 25.999250595473891D-01,3.182615248447483D+00,1.120309393467088D+00, 32.845720217644024D+01,2.960543667875003D+03,2.623968460874250D+03, 41.651291227698265D+03,6.551161335845770D+02,1.943435981130448D-06, 54.143805389489125D+07,2.297991960376787D+06,1.108997288134785D+03, 62.592000000000000D+03,2.947368618589360D+01,9.700646212337040D+02, 71.268230698051003D+01,5.987713249475302D+02,2.629580827304779D+05, 86.109968728263972D+00,4.850340602749970D+02,1.300000000000000D+01/ c c**************************************************************************** c c The following DP checksums are NOT used for the standard LFK c performance test but may be used to test Fortran compiler precision. c c Checksums for Quadruple-Precision (IBM,DEC); or CRAY Double-Precision. c Quadruple precision checksums computed by Dr. D.S. Lindsay, HITACHI. c These Checksums were obtained with MULTI= 10. (BLOCKDATA) c Change the numerical edit descriptor Q to D on CRAY systems. cQc cQ DATA ( SUMS(i,1,1), i= 1,24 ) / cQ a 0.5114652693224705102247326Q+05, 0.5150345372943066022569677Q+03, cQ b 0.1000742883066623145122027Q+02, 0.5999250595474070357564935Q+00, cQ c 0.4548871642388544199267412Q+04, 0.5229095383954675635496207Q+13, cQ d 0.6104251075163778121943921Q+05, 0.1501268005627157186827043Q+06, cQ e 0.1189443609975085966254160Q+06, 0.7310369784325972183233686Q+05, cQ f 0.3342910972650530676553892Q+08, 0.2907141428639174056565229Q-04, cQ g 0.4057110454105263471505061Q+10, 0.2982036205992255154832180Q+10, cQ h 0.3943816690352311804312052Q+05, 0.2832600000000000000000000Q+05, cQ i 0.1114641772903091760464680Q+04, 0.5165625410757306606559174Q+05, cQ j 0.5421816960150398899460410Q+03, 0.3040644339317275409518862Q+08, cQ k 0.8289464835786202431495974Q+07, 0.2938604376567099667790619Q+03, cQ l 0.3549834542446150511553453Q+05, 0.5000000000000000000000000Q+03/ cQc cQ DATA ( SUMS(i,2,1), i= 1,24 ) / cQ a 0.5253344778938000681994399Q+03, 0.5150345372943066022569677Q+03, cQ b 0.1009741436579188086885138Q+01, 0.5999250595474070357564935Q+00, cQ c 0.4589031939602131581035992Q+02, 0.2693280957416549457193910Q+16, cQ d 0.6345586315772524401198340Q+03, 0.1501268005627157186827043Q+06, cQ e 0.1189443609975085966254160Q+06, 0.7310369784325972183233686Q+05, cQ f 0.3433560407476162346605343Q+05, 0.7127569144561925151361427Q-05, cQ g 0.2325318944820836005421577Q+10, 0.3045676741897511424188763Q+08, cQ h 0.3943816690352311804312052Q+05, 0.3244100000000000000000000Q+05, cQ i 0.1114641772903091760464680Q+04, 0.5165625410757306606559174Q+05, cQ j 0.5421816960150398899460410Q+03, 0.3126205178811007613028089Q+05, cQ k 0.3986531136462291709063170Q+07, 0.2938604376567099667790619Q+03, cQ l 0.3549894609776936556634240Q+05, 0.5000000000000000000000000Q+02/ cQc cQ DATA ( SUMS(i,3,1), i= 1,24 ) / cQ a 0.3855104502494983491740258Q+02, 0.1199847611437483513040755Q+02, cQ b 0.2699309089321296439173090Q+00, 0.5999250595474070357564935Q+00, cQ c 0.3182615248448271678796560Q+01, 0.8303480073326955433087865Q+12, cQ d 0.2845720217638848365786224Q+02, 0.2960543667877649943946702Q+04, cQ e 0.2623968460874419268457298Q+04, 0.1651291227698377392796690Q+04, cQ f 0.6551161335846537217862474Q+03, 0.1943435981776804808483341Q-05, cQ g 0.4755211251524563699634913Q+09, 0.2547733008933910800455698Q+07, cQ h 0.1108997288135066584075059Q+04, 0.2577600000000000000000000Q+05, cQ i 0.2947368618590713935189324Q+02, 0.9700646212341513210532085Q+03, cQ j 0.1268230698051747067958265Q+02, 0.5987713249471801461035250Q+03, cQ k 0.2516870081042209239664473Q+07, 0.6109968728264795136407718Q+01, cQ l 0.4850340602751675804605762Q+03, 0.1300000000000000000000000Q+02/ cQc END c c c*************************************** SUBROUTINE CALIBR c*********************************************************************** c * c CALIBR - Cpu clock calibration tests accuracy of SECOND function.* c * c CALIBR tests function SECOND by using it to time a computation * c repeatedly. These SECOND timings are written to stdout(terminal)* c one at a time as the cpu-clock is read, so we can observe a real * c external clock time and thus check the accuracy of SECOND code. * c Comparisons with an external clock require a stand-alone run. * c Otherwise compare with system charge for total job cpu time. * c * c Sample Output from CRAY-YMP1: * c * c * c CPU CLOCK CALIBRATION: START STOPWATCH NOW ! * c TESTS ACCURACY OF FUNCTION SECOND() * c Monoprocess this test, stand-alone, no TSS * c Verify T or DT observe external clock: * c * c ------- ------- ------ ----- * c Total T ? Delta T ? Mflops ? Flops * c ------- ------- ------ ----- * c 1 0.00 0.00 9.15 4.00000e+04 4.98000e-02 * c 2 0.01 0.01 11.67 1.20000e+05 8.98000e-02 * c 3 0.02 0.01 12.84 2.80000e+05 1.69800e-01 * c 4 0.04 0.02 13.47 6.00000e+05 3.29800e-01 * c 5 0.09 0.05 13.81 1.24000e+06 6.49800e-01 * c 6 0.18 0.09 14.00 2.52000e+06 1.28980e+00 * c 7 0.36 0.18 14.12 5.08000e+06 2.56980e+00 * c 8 0.72 0.36 14.19 1.02000e+07 5.12980e+00 * c 9 1.44 0.72 14.20 2.04400e+07 1.02498e+01 * c 10 2.88 1.44 14.23 4.09200e+07 2.04898e+01 * c 11 5.74 2.87 14.27 8.18800e+07 4.09698e+01 * c 12 11.48 5.74 14.27 1.63800e+08 8.19298e+01 * c 13 22.98 11.50 14.26 3.27640e+08 1.63850e+02 * c 14 45.92 22.94 14.27 6.55320e+08 3.27690e+02 * c 15 91.88 45.96 14.26 1.31068e+09 6.55369e+02 * c*********************************************************************** IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) c parameter( limitn= 101, ndim= limitn+10 ) DIMENSION X(ndim), Y(ndim), cumtim(10) c c CALL TRACE ('CALIBR ') cumtim(1)= 0.0d0 t0= SECOND( cumtim(1)) c WRITE( *,111) WRITE( *,110) WRITE( *,112) WRITE( *,113) WRITE( *,114) WRITE( *,115) WRITE( *,114) 111 FORMAT(//,' CPU CLOCK CALIBRATION: START STOPWATCH NOW !') 110 FORMAT(' TESTS ACCURACY OF FUNCTION SECOND()') 112 FORMAT(' Monoprocess this test, stand-alone, no TSS') 113 FORMAT(' Verify T or DT observe external clock:',/) 114 FORMAT(' ------- ------- ------ -----') 115 FORMAT(' Total T ? Delta T ? Mflops ? Flops') 119 FORMAT(4X,I2,3F12.2,2E15.5) c l= 0 n= 0 m= 200 nflop= 0 totalt= 0.00d0 deltat= 0.00d0 flops= 0.00d0 rn= 0.00d0 t1= 0.00d0 t2= 0.00d0 cumtim(1)= 0.0d0 t2= SECOND( cumtim(1)) IF( t2.GT. 1.00d04 ) GO TO 911 IF( t2.LT. 1.00d-8 ) GO TO 911 c 10 l= l + 1 m= m + m c X(1)= 0.0098000d0 Y(1)= 0.0000010d0 DO 2 i= 2,limitn Y(i)= Y(1) 2 continue c Compute LFK Kernel 11 m times DO 5 j= 1,m DO 4 k= 2,limitn X(k)= X(k-1) + Y(k) 4 continue X(1)= X(limitn) 5 continue c t1= t2 cumtim(1)= 0.0d0 t2= SECOND( cumtim(1)) c IF elapsed time can be observed, Print Mark. totalt= t2 - t0 deltat= t2 - t1 nflop= nflop + (limitn - 1) * m IF( deltat .GT. 2.00d0 .OR. l.GT.12 ) THEN n= n + 1 rn= REAL( nflop) flops= 1.00d-6 *( REAL( nflop)/( totalt +1.00d-9)) WRITE( *,119) l, totalt, deltat, flops, rn, X(limitn) ENDIF IF( deltat .LT. 200.0d0 .OR. n.LT.3 ) GO TO 10 c IF( n.LE.0 ) THEN WRITE( *,119) l, totalt, deltat, flops, rn, X(limitn) ENDIF STOP c 911 WRITE( *,61) WRITE( *,62) totalt STOP c 61 FORMAT(1X,'FATAL(CALIBR): cant measure time using func SECOND()') 62 FORMAT(/,13X,'using SECOND(): totalt=',1E20.8,' ?') c END c c*********************************************** SUBROUTINE INDEX c*********************************************** c MODULE PURPOSE c ------ ----------------------------------------------- c c CALIBR cpu clock calibration tests accuracy of SECOND function c c INDATA initialize variables c c IQRANF computes a vector of pseudo-random indices c IQRAN0 define seed for new IQRANF sequence c c KERNEL executes 24 samples of Fortran computation c c PFM optional call to system hardware performance monitor c c RELERR relative error between u,v (0.,1.) c c REPORT prints timing results c c RESULT computes execution rates into pushdown store c c SECOND cumulative CPU time for task in seconds (M.K.S. units) c c SECOVT measures the Overhead time for calling SECOND c c SENSIT sensitivity analysis of harmonic mean to 49 workloads c c SEQDIG computes nr significant, equal digits in pairs of numbers c c SIGNEL generates a set of floating-point numbers near 1.0 c c SIMD sensitivity analysis of harmonic mean to SISD/SIMD model c c SIZES test and set the loop controls before each kernel test c c SORDID simple sort c c SPACE sets memory pointers for array variables. optional. c c SPEDUP computes Speed-ups: A circumspect method of comparison. c c STATS calculates unweighted statistics c c STATW calculates weighted statistics c c SUMO check-sum with ordinal dependency c c SUPPLY initializes common blocks containing type real arrays. c c TALLY computes average and minimum Cpu timings and variances. c c TDIGIT counts lead digits followed by trailing zeroes c c TEST Repeats and times the execution of each kernel c c TESTS Checksums and initializes the data for each kernel test c c TICK measures timing overhead of subroutine test c c TILE computes m-tile value and corresponding index c c TRACE ,TRACK push/pop caller's name and serial nr. in /DEBUG/ c c TRAP checks that index-list values are in valid domain c c TRIAL validates checksums of current run for endurance trial c c VALID compresses valid timing results c c VALUES initializes special values c c VERIFY verifies sufficient Loop size versus cpu clock accuracy c c WATCH can continually test COMMON variables and localize bugs c c ------------ -------- -------- -------- -------- -------- -------- c ENTRY LEVELS: 1 2 3 4 5 6 c ------------ -------- -------- -------- -------- -------- -------- c MAIN. SECOND c INDATA c VERIFY SECOND c SIZES IQRAN0 c STATS SQRT c TDIGIT LOG10 c SIZES IQRAN0 c c TICK TEST TESTS SECOND c SIZES c SUMO c VALUES SUPPLY SIGNEL c IQRANF MOD c SECOND c VALID TRAP TRAP c STATS SQRT c IQRANF MOD c TRAP c KERNEL SPACE c SQRT c EXP c TEST TESTS SECOND c SIZES c SUMO c VALUES SUPPLY SIGNEL c IQRANF MOD c SECOND c TRIAL SEQDIG LOG10 TDIGIT c IQRAN0 c c RESULT TALLY SIZES IQRAN0 TRAP c PAGE c STATS SQRT c c SEQDIG LOG10 TDIGIT c c REPORT VALID TRAP c MOD c STATW SORDID TRAP c TILE c SQRT c LOG10 c PAGE c TRAP c SENSIT VALID TRAP c SORDID TRAP c PAGE c STATW SORDID TRAP c TILE c SIMD VALID TRAP c STATW SORDID TRAP c TILE c SPEDUP c STOP c c c c c All subroutines also call TRACE , TRACK , and WATCH to assist debugging. c c c c c c c c ------ ---- ------ ----- ------------------------------------ c BASE TYPE CLASS NAME GLOSSARY c ------ ---- ------ ----- ------------------------------------ c SPACE0 R Array BIAS - scale factors for SIGNEL data generator c SPACE0 R Array CSUM - checksums of KERNEL result arrays c BETA R Array CSUMS - sets of CSUM for all test runs c BETA R Array DOS - sets of TOTAL flops for all test runs c SPACE0 R Array FLOPN - flop counts for one execution pass c BETA R Array FOPN - sets of FLOPN for all test runs c SPACE0 R Array FR - vectorisation fractions; abscissa for REPORT c SPACES I scalar ibuf - flag enables one call to SIGNEL c ALPHA I scalar ik - current number of executing kernel c ALPHA I scalar il - selects one of three sets of loop spans c SPACES I scalar ion - logical I/O unit number for output c SPACEI I Array IPASS - Loop control limits for multiple-pass loops c SPACE0 I Array IQ - set of workload weights for REPORT c SPACEI I Array ISPAN - loop control limits for each kernel c SPACES I scalar j5 - datum in kernel 16 c ALPHA I scalar jr - current test run number (1 thru 7) c SPACES I scalar k2 - counter in kernel 16 c SPACES I scalar k3 - counter in kernel 16 c SPACES I scalar kr - a copy of mk c SPACES I scalar laps - multiplies Nruns for long Endurance test c SPACES I scalar Loop - current multiple-pass loop limit in KERNEL c SPACES I scalar m - temp integer datum c ALPHA I scalar mk - number of kernels to evaluate .LE.24 c ALPHA I scalar ml - maximum value of il= 3 c SPACES I scalar mpy - repetiton counter of MULTI pass loop c SPACES I scalar Loops2- repetiton loop limit c ALPHA I scalar Mruns - number of complete test runs .GE.Nruns c SPACEI I Array MUL - multipliers * IPASS defines Loop c SPACES I scalar MULTI - Multiplier used to compute Loop in SIZES c SPACES I scalar n - current DO loop limit in KERNEL c SPACES I scalar n1 - dimension of most 1-D arrays c SPACES I scalar n13 - dimension used in kernel 13 c SPACES I scalar n13h - dimension used in kernel 13 c SPACES I scalar n14 - dimension used in kernel 14 c SPACES I scalar n16 - dimension used in kernel 16 c SPACES I scalar n2 - dimension of most 2-D arrays c SPACES I scalar n21 - dimension used in kernel 21 c SPACES I scalar n213 - dimension used in kernel 21 c SPACES I scalar n416 - dimension used in kernel 16 c SPACES I scalar n813 - dimension used in kernel 13 c SPACE0 I scalar npf - temp integer datum c ALPHA I Array NPFS - sets of NPFS1 for all test runs c SPACE0 I Array NPFS1 - number of page-faults for each kernel c ALPHA I scalar Nruns - number of complete test runs .LE.7 c SPACES I scalar nt1 - total size of common -SPACE1- words c SPACES I scalar nt2 - total size of common -SPACE2- words c BETA R Array SEE - (i,1,jr,il) sets of TEST overhead times c BETA R Array SEE - (i,2,jr,il) sets of csums of SPACE1 c BETA R Array SEE - (i,3,jr,il) sets of csums of SPACE2 c SPACE0 R Array SKALE - scale factors for SIGNEL data generator c SPACE0 R scalar start - temp start time of each kernel c PROOF R Array SUMS - sets of verified checksums for all test runs c SPACE0 R Array SUMW - set of quartile weights for REPORT c TAU R scalar tclock- minimum cpu clock time= resolution c SPACE0 R Array TERR1 - overhead-time errors for each kernel c BETA R Array TERRS - sets of TERR1 for all runs c TAU R scalar testov- average overhead time in TEST linkage c BETA R scalar tic - average overhead time in SECOND (copy) c SPACE0 R scalar ticks - average overhead time in TEST linkage(copy) c SPACE0 R Array TIME - net execution times for all kernels c BETA R Array TIMES - sets of TIME for all test runs c SPACE0 R Array TOTAL - total flops computed by each kernel c TAU R scalar tsecov- average overhead time in SECOND c SPACE0 R Array WS - unused c SPACE0 R Array WT - weights for each kernel sample c SPACEI R Array WTP - weights for the 3 span-varying passes c SPACE0 R Array WW - unused c c c --------- ----------------------------------------------------------------- c COMMON Usage c --------- ----------------------------------------------------------------- c c /ALPHA / c VERIFY TICK TALLY SIZES RESULT REPORT KERNEL c MAIN. c /BASE1 / c SUPPLY c /BASE2 / c SUPPLY c /BASER / c SUPPLY c /BETA / c TICK TALLY SIZES RESULT REPORT KERNEL c /DEBUG / c TRACE TRACK TRAP c /ORDER / c TRACE TRACK TRAP c /PROOF / c RESULT BLOCKDATA c /SPACE0/ c VALUES TICK TEST TALLY SUPPLY SIZES RESULT c REPORT KERNEL BLOCKDATA c /SPACE1/ c VERIFY VALUES TICK TEST SUPPLY SPACE KERNEL c /SPACE2/ c VERIFY VALUES TICK TEST SUPPLY SPACE KERNEL c /SPACE3/ c VALUES c /SPACEI/ c VERIFY VALUES TICK TEST SIZES RESULT REPORT c KERNEL BLOCKDATA c /SPACER/ c VALUES TICK TEST SUPPLY SIZES KERNEL c /SPACES/ c VERIFY VALUES TICK TEST SUPPLY SIZES KERNEL c BLOCKDATA c --------- ----------------------------------------------------------------- c c c SubrouTine Timing on CRAY-XMP1: c c Subroutine Time(%) All Scalar c c KERNEL 52.24% c SUPPLY 17.85% c VERIFY 8.76% c VALUES 6.15% c STATS 5.44% c DMPY 1.97% c DADD 1.53% c EXP 1.02% c SQRT .99% c SORDID .81% c DDIV .38% c IQRANF .25% c SUMO .22% c TRACE .19% c SIGNEL .16% c TRAP .10% c TRACK .10% c STATW .08% c TILE .04% c SIZES .03% c ALOG10 .03% c c Subroutine Time(%) Auto Vector c c KERNEL 56.28% c VALUES 10.33% c STATS 8.57% c DADD 4.34% c DMPY 3.86% c VERIFY 2.61% c SUPPLY 2.28% c SQRT 2.10% c SORDID 1.84% c SUMO .80% c DDIV .78% c SDOT .67% c TRACE .53% c IQRANF .50% c SIGNEL .36% c EXP .32% c TRACK .23% c TRAP .20% c ALOG10 .18% c STATW .16% c c RETURN END c c*************************************** SUBROUTINE INDATA( TK, iou) c*************************************** c INDATA initialize variables c IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) c c/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24) c/ PARAMETER( nk= 47, nl= 3, nr= 8 ) DIMENSION TK(6) COMMON /ALPHA/ mk,ik,im,ml,il,Mruns,Nruns,jr,iovec,NPFS(8,3,47) COMMON /TAU/ tclock, tsecov, testov, cumtim(4) COMMON /BETA / tic, TIMES(8,3,47), SEE(5,3,8,3), 1 TERRS(8,3,47), CSUMS(8,3,47), 2 FOPN(8,3,47), DOS(8,3,47) c COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), ticks, 1 FR(9), TERR1(47), SUMW(7), START, 2 SKALE(47), BIAS(47), WS(95), TOTAL(47), FLOPN(47), 3 IQ(7), NPF, NPFS1(47) c COMMON /ORDER/ inseq, match, NSTACK(20), isave, iret COMMON /SPACES/ ion,j5,k2,k3,MULTI,laps,Loop,m,kr,LP,n13h,ibuf,nx, 1 L,npass,nfail,n,n1,n2,n13,n213,n813,n14,n16,n416,n21,nt1,nt2, 2 last,idebug,mpy,Loops2,mucho,mpylim, intbuf(16) c TK(1)= 0.00d0 TK(2)= 0.00d0 testov= 0.00d0 ticks = 0.00d0 tclock= 0.00d0 tsecov= 0.00d0 tic = 0.00d0 c jr = 1 Nruns = 1 il = 1 mk = 1 ik = 1 c inseq = 0 isave = 0 iret = 0 c Loops2= 1 mpylim= Loops2 mpy = 1 MULTI = 1 mucho = 1 L = 1 Loop = 1 LP = Loop n = 0 c iou = 8 ion = iou CALL INITIO( 8, 'output') c CALL INITIO( 7, 'chksum') c CALL TRACE ('INDATA ') cPFM IF( INIPFM( ion, 0) .NE. 0 ) THEN cPFM CALL WHERE(20) cPFM ENDIF c cLLL. call Q8EBM c WRITE ( *,7002) WRITE ( *,7003) WRITE ( *,7002) WRITE ( iou,7002) WRITE ( iou,7003) WRITE ( iou,7002) 7002 FORMAT( ' *********************************************') 7003 FORMAT( ' THE LIVERMORE FORTRAN KERNELS "MFLOPS" TEST:') WRITE( iou, 797) WRITE( iou, 798) 797 FORMAT(' >>> USE 72 SAMPLES LFK TEST RESULTS SUMMARY (line 330+)') 798 FORMAT(' >>> USE ALL RANGE STATISTICS FOR OFFICIAL QUOTATIONS. ') CALL TRACK ('INDATA ') RETURN END c c************************************************* SUBROUTINE INITIO( iou, name ) c*********************************************************************** c * c INITIO - Assign logdevice nr "iou" to disk file "name" * c * c iou - logical i/o device number * c name - name to assign to disk file * c * c*********************************************************************** LOGICAL LIVING CHARACTER *(*) name c CALL TRACE ('INITIO ') c INQUIRE( FILE=name, EXIST= LIVING ) IF( LIVING ) THEN OPEN ( UNIT=iou, FILE=name, STATUS='OLD') CLOSE( UNIT=iou, STATUS='DELETE') ENDIF OPEN (UNIT=iou, FILE=name, STATUS='NEW') c c CALL TRACK ('INITIO ') RETURN END c c*************************************** SUBROUTINE IQRAN0( newk) c*************************************** c c IQRAN0 - define seed for new IQRANF sequence c IMPLICIT DOUBLE PRECISION (A-H,K,O-Z) cIBM IMPLICIT REAL*8 (A-H,K,O-Z) c COMMON /IQRAND/ k0, k, k9 CALL TRACE ('IQRAN0 ') c IF( newk.LE.0 ) THEN CALL WHERE(1) ENDIF k = newk c CALL TRACK ('IQRAN0 ') RETURN END c c*************************************** SUBROUTINE IQRANF( M, Mmin,Mmax, n) c*********************************************************************** c * c IQRANF - computes a vector of psuedo-random indices * c in the domain (Mmin,Mmax) * c * c M - result array , psuedo-random positive integers * c Mmin - input integer, lower bound for random integers * c Mmax - input integer, upper bound for random integers * c n - input integer, number of results in M. * c * c M(i)= Mmin + INT( (Mmax-Mmin) * RANF(0)) * c * c CALL IQRAN0( 256 ) * c CALL IQRANF( IX, 1,1001, 30) should produce in IX: * c 3 674 435 415 389 54 44 790 900 282 * c 177 971 728 851 687 604 815 971 155 112 * c 877 814 779 192 619 894 544 404 496 505 ... * c * c S.K.Park, K.W.Miller, Random Number Generators: Good Ones * c Are Hard To Find, Commun ACM, 31(10), 1192-1201 (1988). * c*********************************************************************** c IMPLICIT DOUBLE PRECISION (A-H,K,O-Z) cIBM IMPLICIT REAL*8 (A-H,K,O-Z) cout DOUBLE PRECISION dq, dp, per, dk, spin, span REDUNDNT c dimension M(n) COMMON /IQRAND/ k0, k, k9 c save k CALL TRACE ('IQRANF ') IF( n.LE.0 ) GO TO 73 inset= Mmin span= Mmax - Mmin c spin= 16807.00d0 c per= 2147483647.00d0 spin= 16807 per= 2147483647 realn= n scale= 1.0000100d0 q= scale*(span/realn) c dk= k DO 1 i= 1,n dp= dk*spin c dk= DMOD( dp, per) dk= dp -INT( dp/per)*per dq= dk*span M(i)= inset + ( dq/ per) IF( M(i).LT.Mmin .OR. M(i).GT.Mmax ) M(i)= inset + i*q 1 continue k= dk c c ciC double precision k, ip, iq, id ci inset= Mmin ci ispan= Mmax - Mmin ci ispin= 16807 ci id= 2147483647 ci q= (REAL(ispan)/REAL(n))*1.00001 ciC ci DO 2 i= 1,n ci ip= k*ispin ci k= MOD( ip, id) ci iq= k*ispan ci M(i)= inset + ( iq/ id) ci IF( M(i).LT.Mmin .OR. M(i).GT.Mmax ) M(i)= inset + i*q ci 2 continue c CALL TRAP( M, ' IQRANF ' , 1, Mmax, n) c 73 CONTINUE CALL TRACK ('IQRANF ') RETURN c DATA k /256/ c IQRANF TEST PROGRAM: c parameter( nrange= 10000, nmaps= 1001 ) c DIMENSION IX(nrange), IY(nmaps), IZ(nmaps), IR(nmaps) c COMMON /IQRAND/ k0, k, k9 cc c CALL LINK( 'UNIT6=( output,create,text)//') c iou= 8 c DO 7 j= 1,256,255 c CALL IQRAN0( j ) c CALL IQRANF( IX, 1, nmaps, nrange) c DO 1 i= 1,nmaps c IY(i)= 0 c 1 IZ(i)= 0 cc census for each index generated in (1:nmaps) c DO 2 i= 1,nrange c 2 IY( IX(i))= IY( IX(i)) + 1 cc distribution of census tallies about nrange/nmaps c DO 3 i= 1,nmaps c 3 IZ( IY(i))= IZ( IY(i)) + 1 c IR(1)= IZ(1) cc integral of distribution c DO 4 i= 1,nmaps c 4 IR(i)= IR(i-1) + IZ(i) c WRITE( iou,112) j, IR(nmaps), k c WRITE( iou,113) ( IX(i), i= 1,20 ) c WRITE( iou,113) ( IY(i), i= 1,20 ) c WRITE( iou,113) ( IZ(i), i= 1,20 ) c WRITE( iou,113) ( IR(i), i= 1,20 ) c 112 FORMAT(/,1X,4I20) c 113 FORMAT(20I4) c 7 continue c STOP c c 1 1000 1043618065 c 1 132 756 459 533 219 48 679 680 935 384 520 831 35 54 530 672 8 384 67 c 17 12 7 10 10 10 10 12 9 9 4 15 10 7 7 9 9 9 10 11 c 0 1 8 19 40 60 86 109 133 128 107 104 70 52 39 26 7 7 2 2 c 0 1 9 28 68 128 214 323 456 584 691 795 865 917 956 982 989 996 9981000 c c 256 1000 878252412 c 3 674 435 415 389 54 44 790 900 282 177 971 728 851 687 604 815 971 155 112 c 11 17 19 6 11 11 7 9 12 7 13 7 9 11 14 9 9 12 9 9 c 1 2 10 16 30 71 93 109 131 119 118 105 69 47 28 15 15 9 5 3 c 1 3 13 29 59 130 223 332 463 582 700 805 874 921 949 964 979 988 993 996 END c c*********************************************** SUBROUTINE KERNEL( TK) c*********************************************************************** c * c KERNEL executes 24 samples of Fortran computation * c TK(1) - total cpu time to execute only the 24 kernels. * c TK(2) - total Flops executed by the 24 Kernels * c*********************************************************************** c * c L. L. N. L. F O R T R A N K E R N E L S: M F L O P S * c * c These kernels measure Fortran numerical computation rates for a * c spectrum of CPU-limited computational structures. Mathematical * c through-put is measured in units of millions of floating-point * c operations executed per Second, called Mega-Flops/Sec. * c * c This program measures a realistic CPU performance range for the * c Fortran programming system on a given day. The CPU performance * c rates depend strongly on the maturity of the Fortran compiler's * c ability to translate Fortran code into efficient machine code. * c [ The CPU hardware capability apart from compiler maturity (or * c availability), could be measured (or simulated) by programming the * c kernels in assembly or machine code directly. These measurements * c can also serve as a framework for tracking the maturation of the * c Fortran compiler during system development.] * c * c Fonzi's Law: There is not now and there never will be a language * c in which it is the least bit difficult to write * c bad programs. * c F.H.MCMAHON 1972 * c*********************************************************************** c c l1 := param-dimension governs the size of most 1-d arrays c l2 := param-dimension governs the size of most 2-d arrays c c Loop := multiple pass control to execute kernel long enough to time. c n := DO loop control for each kernel. Controls are set in subr. SIZES c c ****************************************************************** c IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) c c/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) c/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) c/ PARAMETER( l14=2048, l16= 75, l416= 4*l16 , l21= 25 ) c/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24) c c c/ PARAMETER( nk= 47, nl= 3, nr= 8 ) INTEGER TEST, AND c COMMON /ALPHA/ mk,ik,im,ml,il,Mruns,Nruns,jr,iovec,NPFS(8,3,47) COMMON /BETA / tic, TIMES(8,3,47), SEE(5,3,8,3), 1 TERRS(8,3,47), CSUMS(8,3,47), 2 FOPN(8,3,47), DOS(8,3,47) c COMMON /SPACES/ ion,j5,k2,k3,MULTI,laps,Loop,m,kr,LP,n13h,ibuf,nx, 1 L,npass,nfail,n,n1,n2,n13,n213,n813,n14,n16,n416,n21,nt1,nt2, 2 last,idebug,mpy,Loops2,mucho,mpylim, intbuf(16) c COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 1 AR,BR,C0,CR,DI,DK, 2 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 3 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM c cPFM COMMON /KAPPA/ iflag1, ikern, statis(100,20), istats(100,20) c COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), ticks, 1 FR(9), TERR1(47), SUMW(7), START, 2 SKALE(47), BIAS(47), WS(95), TOTAL(47), FLOPN(47), 3 IQ(7), NPF, NPFS1(47) c COMMON /SPACEI/ WTP(3), MUL(3), ISPAN(47,3), IPASS(47,3) c c/ INTEGER E,F,ZONE c/ COMMON /ISPACE/ E(l213), F(l213), c/ 1 IX(l1), IR(l1), ZONE(l416) c/C c/ COMMON /SPACE1/ U(l1), V(l1), W(l1), c/ 1 X(l1), Y(l1), Z(l1), G(l1), c/ 2 DU1(l2), DU2(l2), DU3(l2), GRD(l1), DEX(l1), c/ 3 XI(l1), EX(l1), EX1(l1), DEX1(l1), c/ 4 VX(l14), XX(l14), RX(l14), RH(l14), c/ 5 VSP(l2), VSTP(l2), VXNE(l2), VXND(l2), c/ 6 VE3(l2), VLR(l2), VLIN(l2), B5(l2), c/ 7 PLAN(l416), D(l416), SA(l2), SB(l2) c/C c/ COMMON /SPACE2/ P(4,l813), PX(l21,l2), CX(l21,l2), c/ 1 VY(l2,l21), VH(l2,7), VF(l2,7), VG(l2,7), VS(l2,7), c/ 2 ZA(l2,7) , ZP(l2,7), ZQ(l2,7), ZR(l2,7), ZM(l2,7), c/ 3 ZB(l2,7) , ZU(l2,7), ZV(l2,7), ZZ(l2,7), c/ 4 B(l13,l13), C(l13,l13), H(l13,l13), c/ 5 U1(5,l2,2), U2(5,l2,2), U3(5,l2,2) c c ****************************************************************** c c c/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) c/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) c/ PARAMETER( l14= 2048, l16= 75, l416= 4*75 , l21= 25) c c care c INTEGER E,F,ZONE COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) c COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(2048), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) c COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) c c ****************************************************************** c DIMENSION ZX(1023), XZ(1500), TK(6) EQUIVALENCE ( ZX(1), Z(1)), ( XZ(1), X(1)) c c c// DIMENSION E(96), F(96), U(1001), V(1001), W(1001), c// 1 X(1001), Y(1001), Z(1001), G(1001), c// 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), c// 3 IX(1001), XI(1001), EX(1001), EX1(1001), DEX1(1001), c// 4 VX(1001), XX(1001), IR(1001), RX(1001), RH(2048), c// 5 VSP(101), VSTP(101), VXNE(101), VXND(101), c// 6 VE3(101), VLR(101), VLIN(101), B5(101), c// 7 PLAN(300), ZONE(300), D(300), SA(101), SB(101) c//C c// DIMENSION P(4,512), PX(25,101), CX(25,101), c// 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), c// 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), c// 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), c// 4 B(64,64), C(64,64), H(64,64), c// 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) c//C c//C ****************************************************************** c//C c// COMMON /POINT/ ME,MF,MU,MV,MW,MX,MY,MZ,MG,MDU1,MDU2,MDU3,MGRD, c// 1 MDEX,MIX,MXI,MEX,MEX1,MDEX1,MVX,MXX,MIR,MRX,MRH,MVSP,MVSTP, c// 2 MVXNE,MVXND,MVE3,MVLR,MVLIN,MB5,MPLAN,MZONE,MD,MSA,MSB, c// 3 MP,MPX,MCX,MVY,MVH,MVF,MVG,MVS,MZA,MZP,MZQ,MZR,MZM,MZB,MZU, c// 4 MZV,MZZ,MB,MC,MH,MU1,MU2,MU3 c//C c// POINTER (ME,E), (MF,F), (MU,U), (MV,V), (MW,W), c// 1 (MX,X), (MY,Y), (MZ,Z), (MG,G), c// 2 (MDU1,DU1),(MDU2,DU2),(MDU3,DU3),(MGRD,GRD),(MDEX,DEX), c// 3 (MIX,IX), (MXI,XI), (MEX,EX), (MEX1,EX1), (MDEX1,DEX1), c// 4 (MVX,VX), (MXX,XX), (MIR,IR), (MRX,RX), (MRH,RH), c// 5 (MVSP,VSP), (MVSTP,VSTP), (MVXNE,VXNE), (MVXND,VXND), c// 6 (MVE3,VE3), (MVLR,VLR), (MVLIN,VLIN), (MB5,B5), c// 7 (MPLAN,PLAN), (MZONE,ZONE), (MD,D), (MSA,SA), (MSB,SB) c//C c// POINTER (MP,P), (MPX,PX), (MCX,CX), c// 1 (MVY,VY), (MVH,VH), (MVF,VF), (MVG,VG), (MVS,VS), c// 2 (MZA,ZA), (MZP,ZP), (MZQ,ZQ), (MZR,ZR), (MZM,ZM), c// 3 (MZB,ZB), (MZU,ZU), (MZV,ZV), (MZZ,ZZ), c// 4 (MB,B), (MC,C), (MH,H), c// 5 (MU1,U1), (MU2,U2), (MU3,U3) c.. COMMON DUMMY(2000) c.. LOC(X) =.LOC.X c.. IQ8QDSP = 64*LOC(DUMMY) c c ****************************************************************** c c STANDARD PRODUCT COMPILER DIRECTIVES MAY BE USED FOR OPTIMIZATION c cDIR$ VECTOR cLLL. OPTIMIZE LEVEL i cLLL. OPTION INTEGER (7) cLLL. OPTION ASSERT (NO HAZARD) cLLL. OPTION NODYNEQV c c ****************************************************************** c BINARY MACHINES MAY USE THE AND(P,Q) FUNCTION IF AVAILABLE c IN PLACE OF THE FOLLOWING CONGRUENCE FUNCTION (SEE KERNEL 13, 14) c IFF: j= 2**N c IAND(j,k) = AND(j,k) cLLL. IAND(j,k) = j.INT.k c MOD2N(i,j)= MOD(i,j) MOD2N(i,j)= IAND(i,j-1) c i is Congruent to MOD2N(i,j) mod(j) c ****************************************************************** c c c c c CALL TRACE ('KERNEL ') c CALL SPACE c cPFM call OUTPFM( 0, ion) mpy = 1 Loops2= 1 mpylim= Loops2 L = 1 Loop = 1 LP = Loop it0 = TEST(0) cPFM iflag1= 13579 c c******************************************************************************* c*** KERNEL 1 HYDRO FRAGMENT c******************************************************************************* c cdir$ ivdep 1001 DO 1 k = 1,n 1 X(k)= Q + Y(k) * (R * ZX(k+10) + T * ZX(k+11)) c c................... IF( TEST(1) .GT. 0) GO TO 1001 c we must execute DO k= 1,n repeatedly for accurate timing c c******************************************************************************* c*** KERNEL 2 ICCG EXCERPT (INCOMPLETE CHOLESKY - CONJUGATE GRADIENT) c******************************************************************************* c c 1002 II= n IPNTP= 0 222 IPNT= IPNTP IPNTP= IPNTP+II II= II/2 i= IPNTP+1 cdir$ ivdep c:ibm_dir:ignore recrdeps (x) c DO 2 k= IPNT+2,IPNTP,2 i= i+1 2 X(i)= X(k) - V(k) * X(k-1) - V(k+1) * X(k+1) IF( II.GT.1) GO TO 222 c c................... IF( TEST(2) .GT. 0) GO TO 1002 c c******************************************************************************* c*** KERNEL 3 INNER PRODUCT c******************************************************************************* c c 1003 Q= 0.000d0 DO 3 k= 1,n 3 Q= Q + Z(k) * X(k) c c................... IF( TEST(3) .GT. 0) GO TO 1003 c c******************************************************************************* c*** KERNEL 4 BANDED LINEAR EQUATIONS c******************************************************************************* c m= (1001-7)/2 fw= 1.000d-25 c 1004 DO 404 k= 7,1001,m lw= k-6 temp= XZ(k-1) cdir$ ivdep DO 4 j= 5,n,5 temp = temp - XZ(lw) * Y(j) 4 lw= lw+1 XZ(k-1)= Y(5) * temp 404 CONTINUE c c................... IF( TEST(4) .GT. 0) GO TO 1004 c c******************************************************************************* c*** KERNEL 5 TRI-DIAGONAL ELIMINATION, BELOW DIAGONAL (NO VECTORS) c******************************************************************************* c c cdir$ novector 1005 DO 5 i = 2,n 5 X(i)= Z(i) * (Y(i) - X(i-1)) cdir$ vector c c................... IF( TEST(5) .GT. 0) GO TO 1005 c c******************************************************************************* c*** KERNEL 6 GENERAL LINEAR RECURRENCE EQUATIONS c******************************************************************************* c c 1006 DO 6 i= 2,n W(i)= 0.0100d0 cdir$ novector DO 6 k= 1,i-1 W(i)= W(i) + B(i,k) * W(i-k) 6 CONTINUE cdir$ vector c c................... IF( TEST(6) .GT. 0) GO TO 1006 c c******************************************************************************* c*** KERNEL 7 EQUATION OF STATE FRAGMENT c******************************************************************************* c c cdir$ ivdep 1007 DO 7 k= 1,n X(k)= U(k ) + R*( Z(k ) + R*Y(k )) + 1 T*( U(k+3) + R*( U(k+2) + R*U(k+1)) + 2 T*( U(k+6) + Q*( U(k+5) + Q*U(k+4)))) 7 CONTINUE c c................... IF( TEST(7) .GT. 0) GO TO 1007 c c c******************************************************************************* c*** KERNEL 8 A.D.I. INTEGRATION c******************************************************************************* c c 1008 nl1 = 1 nl2 = 2 fw= 2.000d0 DO 8 kx = 2,3 cdir$ ivdep DO 8 ky = 2,n DU1(ky)=U1(kx,ky+1,nl1) - U1(kx,ky-1,nl1) DU2(ky)=U2(kx,ky+1,nl1) - U2(kx,ky-1,nl1) DU3(ky)=U3(kx,ky+1,nl1) - U3(kx,ky-1,nl1) U1(kx,ky,nl2)=U1(kx,ky,nl1) +A11*DU1(ky) +A12*DU2(ky) +A13*DU3(ky) 1 + SIG*(U1(kx+1,ky,nl1) -fw*U1(kx,ky,nl1) +U1(kx-1,ky,nl1)) U2(kx,ky,nl2)=U2(kx,ky,nl1) +A21*DU1(ky) +A22*DU2(ky) +A23*DU3(ky) 1 + SIG*(U2(kx+1,ky,nl1) -fw*U2(kx,ky,nl1) +U2(kx-1,ky,nl1)) U3(kx,ky,nl2)=U3(kx,ky,nl1) +A31*DU1(ky) +A32*DU2(ky) +A33*DU3(ky) 1 + SIG*(U3(kx+1,ky,nl1) -fw*U3(kx,ky,nl1) +U3(kx-1,ky,nl1)) 8 CONTINUE c c................... IF( TEST(8) .GT. 0) GO TO 1008 c c******************************************************************************* c*** KERNEL 9 INTEGRATE PREDICTORS c******************************************************************************* c c 1009 DO 9 k = 1,n PX( 1,k)= DM28*PX(13,k) + DM27*PX(12,k) + DM26*PX(11,k) + 1 DM25*PX(10,k) + DM24*PX( 9,k) + DM23*PX( 8,k) + 2 DM22*PX( 7,k) + C0*(PX( 5,k) + PX( 6,k))+ PX( 3,k) 9 CONTINUE c c................... IF( TEST(9) .GT. 0) GO TO 1009 c c******************************************************************************* c*** KERNEL 10 DIFFERENCE PREDICTORS c******************************************************************************* c c 1010 DO 10 k= 1,n AR = CX(5,k) BR = AR - PX(5,k) PX(5,k) = AR CR = BR - PX(6,k) PX(6,k) = BR AR = CR - PX(7,k) PX(7,k) = CR BR = AR - PX(8,k) PX(8,k) = AR CR = BR - PX(9,k) PX(9,k) = BR AR = CR - PX(10,k) PX(10,k)= CR BR = AR - PX(11,k) PX(11,k)= AR CR = BR - PX(12,k) PX(12,k)= BR PX(14,k)= CR - PX(13,k) PX(13,k)= CR 10 CONTINUE c c................... IF( TEST(10) .GT. 0) GO TO 1010 c c******************************************************************************* c*** KERNEL 11 FIRST SUM. PARTIAL SUMS. (NO VECTORS) c******************************************************************************* c c 1011 X(1)= Y(1) cdir$ novector DO 11 k = 2,n 11 X(k)= X(k-1) + Y(k) cdir$ vector c c................... IF( TEST(11) .GT. 0) GO TO 1011 c c******************************************************************************* c*** KERNEL 12 FIRST DIFF. c******************************************************************************* c c cdir$ ivdep 1012 DO 12 k = 1,n 12 X(k)= Y(k+1) - Y(k) c c................... IF( TEST(12) .GT. 0) GO TO 1012 c c******************************************************************************* c*** KERNEL 13 2-D PIC Particle In Cell c******************************************************************************* c fw= 1.000d0 c 1013 DO 13 k= 1,n i1= P(1,k) j1= P(2,k) i1= 1 + MOD2N(i1,64) j1= 1 + MOD2N(j1,64) P(3,k)= P(3,k) + B(i1,j1) P(4,k)= P(4,k) + C(i1,j1) P(1,k)= P(1,k) + P(3,k) P(2,k)= P(2,k) + P(4,k) i2= P(1,k) j2= P(2,k) i2= MOD2N(i2,64) j2= MOD2N(j2,64) P(1,k)= P(1,k) + Y(i2+32) P(2,k)= P(2,k) + Z(j2+32) i2= i2 + E(i2+32) j2= j2 + F(j2+32) H(i2,j2)= H(i2,j2) + fw 13 CONTINUE c c................... IF( TEST(13) .GT. 0) GO TO 1013 c c******************************************************************************* c*** KERNEL 14 1-D PIC Particle In Cell c******************************************************************************* c c fw= 1.000d0 c 1014 DO 141 k= 1,n VX(k)= 0.0d0 XX(k)= 0.0d0 IX(k)= INT( GRD(k)) XI(k)= REAL( IX(k)) EX1(k)= EX ( IX(k)) DEX1(k)= DEX ( IX(k)) 141 CONTINUE c DO 142 k= 1,n VX(k)= VX(k) + EX1(k) + (XX(k) - XI(k))*DEX1(k) XX(k)= XX(k) + VX(k) + FLX IR(k)= XX(k) RX(k)= XX(k) - IR(k) IR(k)= MOD2N( IR(k),2048) + 1 XX(k)= RX(k) + IR(k) 142 CONTINUE c DO 14 k= 1,n RH(IR(k) )= RH(IR(k) ) + fw - RX(k) RH(IR(k)+1)= RH(IR(k)+1) + RX(k) 14 CONTINUE c c................... IF( TEST(14) .GT. 0) GO TO 1014 c c c c c c c c c c c c c c c c c c c c******************************************************************************* c*** KERNEL 15 CASUAL FORTRAN. DEVELOPMENT VERSION. c******************************************************************************* c c c CASUAL ORDERING OF SCALAR OPERATIONS IS TYPICAL PRACTICE. c THIS EXAMPLE DEMONSTRATES THE NON-TRIVIAL TRANSFORMATION c REQUIRED TO MAP INTO AN EFFICIENT MACHINE IMPLEMENTATION. c c 1015 NG= 7 NZ= n AR= 0.05300d0 BR= 0.07300d0 DO 45 j = 2,NG DO 45 k = 2,NZ IF( j-NG) 31,30,30 30 VY(k,j)= 0.0d0 GO TO 45 31 IF( VH(k,j+1) -VH(k,j)) 33,33,32 32 T= AR GO TO 34 33 T= BR 34 IF( VF(k,j) -VF(k-1,j)) 35,36,36 35 R= MAX( VH(k-1,j), VH(k-1,j+1)) S= VF(k-1,j) GO TO 37 36 R= MAX( VH(k,j), VH(k,j+1)) S= VF(k,j) 37 VY(k,j)= SQRT( VG(k,j)**2 +R*R)*T/S IF( k-NZ) 40,39,39 39 VS(k,j)= 0.0d0 GO TO 45 40 IF( VF(k,j) -VF(k,j-1)) 41,42,42 41 R= MAX( VG(k,j-1), VG(k+1,j-1)) S= VF(k,j-1) T= BR GO TO 43 42 R= MAX( VG(k,j), VG(k+1,j)) S= VF(k,j) T= AR 43 VS(k,j)= SQRT( VH(k,j)**2 +R*R)*T/S 45 CONTINUE c c................... IF( TEST(15) .GT. 0) GO TO 1015 c c c c c c c c c c c c c c c******************************************************************************* c*** KERNEL 16 MONTE CARLO SEARCH LOOP c******************************************************************************* c II= n/3 LB= II+II k2= 0 k3= 0 c c 1016 m= 1 i1= m 410 j2= (n+n)*(m-1)+1 DO 470 k= 1,n k2= k2+1 j4= j2+k+k j5= ZONE(j4) IF( j5-n ) 420,475,450 415 IF( j5-n+II ) 430,425,425 420 IF( j5-n+LB ) 435,415,415 425 IF( PLAN(j5)-R) 445,480,440 430 IF( PLAN(j5)-S) 445,480,440 435 IF( PLAN(j5)-T) 445,480,440 440 IF( ZONE(j4-1)) 455,485,470 445 IF( ZONE(j4-1)) 470,485,455 450 k3= k3+1 IF( D(j5)-(D(j5-1)*(T-D(j5-2))**2+(S-D(j5-3))**2 1 +(R-D(j5-4))**2)) 445,480,440 455 m= m+1 IF( m-ZONE(1) ) 465,465,460 460 m= 1 465 IF( i1-m) 410,480,410 470 CONTINUE 475 CONTINUE 480 CONTINUE 485 CONTINUE c c................... IF( TEST(16) .GT. 0) GO TO 1016 c c******************************************************************************* c*** KERNEL 17 IMPLICIT, CONDITIONAL COMPUTATION (NO VECTORS) c******************************************************************************* c c RECURSIVE-DOUBLING VECTOR TECHNIQUES CAN NOT BE USED c BECAUSE CONDITIONAL OPERATIONS APPLY TO EACH ELEMENT. c dw= 5.0000d0/3.0000d0 fw= 1.0000d0/3.0000d0 tw= 1.0300d0/3.0700d0 cdir$ novector c 1017 k= n j= 1 ink= -1 SCALE= dw XNM= fw E6= tw GO TO 61 c STEP MODEL 60 E6= XNM*VSP(k)+VSTP(k) VXNE(k)= E6 XNM= E6 VE3(k)= E6 k= k+ink IF( k.EQ.j) GO TO 62 61 E3= XNM*VLR(k) +VLIN(k) XNEI= VXNE(k) VXND(k)= E6 XNC= SCALE*E3 c SELECT MODEL IF( XNM .GT.XNC) GO TO 60 IF( XNEI.GT.XNC) GO TO 60 c LINEAR MODEL VE3(k)= E3 E6= E3+E3-XNM VXNE(k)= E3+E3-XNEI XNM= E6 k= k+ink IF( k.NE.j) GO TO 61 62 CONTINUE cdir$ vector c c................... IF( TEST(17) .GT. 0) GO TO 1017 c c******************************************************************************* c*** KERNEL 18 2-D EXPLICIT HYDRODYNAMICS FRAGMENT c******************************************************************************* c c 1018 T= 0.003700d0 S= 0.004100d0 KN= 6 JN= n DO 70 k= 2,KN DO 70 j= 2,JN ZA(j,k)= (ZP(j-1,k+1)+ZQ(j-1,k+1)-ZP(j-1,k)-ZQ(j-1,k)) 1 *(ZR(j,k)+ZR(j-1,k))/(ZM(j-1,k)+ZM(j-1,k+1)) ZB(j,k)= (ZP(j-1,k)+ZQ(j-1,k)-ZP(j,k)-ZQ(j,k)) 1 *(ZR(j,k)+ZR(j,k-1))/(ZM(j,k)+ZM(j-1,k)) 70 CONTINUE c DO 72 k= 2,KN DO 72 j= 2,JN ZU(j,k)= ZU(j,k)+S*(ZA(j,k)*(ZZ(j,k)-ZZ(j+1,k)) 1 -ZA(j-1,k) *(ZZ(j,k)-ZZ(j-1,k)) 2 -ZB(j,k) *(ZZ(j,k)-ZZ(j,k-1)) 3 +ZB(j,k+1) *(ZZ(j,k)-ZZ(j,k+1))) ZV(j,k)= ZV(j,k)+S*(ZA(j,k)*(ZR(j,k)-ZR(j+1,k)) 1 -ZA(j-1,k) *(ZR(j,k)-ZR(j-1,k)) 2 -ZB(j,k) *(ZR(j,k)-ZR(j,k-1)) 3 +ZB(j,k+1) *(ZR(j,k)-ZR(j,k+1))) 72 CONTINUE c DO 75 k= 2,KN DO 75 j= 2,JN ZR(j,k)= ZR(j,k)+T*ZU(j,k) ZZ(j,k)= ZZ(j,k)+T*ZV(j,k) 75 CONTINUE c c................... IF( TEST(18) .GT. 0) GO TO 1018 c c******************************************************************************* c*** KERNEL 19 GENERAL LINEAR RECURRENCE EQUATIONS (NO VECTORS) c******************************************************************************* c 1019 KB5I= 0 c c IF( JR.LE.1 ) THEN cdir$ novector DO 191 k= 1,n B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 191 CONTINUE c ELSE c DO 193 i= 1,n k= n-i+1 B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 193 CONTINUE c ENDIF cdir$ vector c c................... IF( TEST(19) .GT. 0) GO TO 1019 c c******************************************************************************* c*** KERNEL 20 DISCRETE ORDINATES TRANSPORT: RECURRENCE (NO VECTORS) c******************************************************************************* c dw= 0.200d0 cdir$ novector c 1020 DO 20 k= 1,n DI= Y(k)-G(k)/( XX(k)+DK) DN= dw IF( DI.NE.0.0) DN= MAX( S,MIN( Z(k)/DI, T)) X(k)= ((W(k)+V(k)*DN)* XX(k)+U(k))/(VX(k)+V(k)*DN) XX(k+1)= (X(k)- XX(k))*DN+ XX(k) 20 CONTINUE cdir$ vector c c................... IF( TEST(20) .GT. 0) GO TO 1020 c c******************************************************************************* c*** KERNEL 21 MATRIX*MATRIX PRODUCT c******************************************************************************* c c 1021 DO 21 k= 1,25 DO 21 i= 1,25 DO 21 j= 1,n PX(i,j)= PX(i,j) +VY(i,k) * CX(k,j) 21 CONTINUE c c................... IF( TEST(21) .GT. 0) GO TO 1021 c c c c c c c c******************************************************************************* c*** KERNEL 22 PLANCKIAN DISTRIBUTION c******************************************************************************* c c c EXPMAX= 234.500d0 EXPMAX= 20.0000d0 fw= 1.00000d0 U(n)= 0.99000d0*EXPMAX*V(n) c 1022 DO 22 k= 1,n care IF( U(k) .LT. EXPMAX*V(k)) THEN Y(k)= U(k)/V(k) care ELSE care Y(k)= EXPMAX care ENDIF W(k)= X(k)/( EXP( Y(k)) -fw) 22 CONTINUE c................... IF( TEST(22) .GT. 0) GO TO 1022 c c******************************************************************************* c*** KERNEL 23 2-D IMPLICIT HYDRODYNAMICS FRAGMENT c******************************************************************************* c fw= 0.17500d0 c 1023 DO 23 j= 2,6 DO 23 k= 2,n QA= ZA(k,j+1)*ZR(k,j) +ZA(k,j-1)*ZB(k,j) + 1 ZA(k+1,j)*ZU(k,j) +ZA(k-1,j)*ZV(k,j) +ZZ(k,j) 23 ZA(k,j)= ZA(k,j) +fw*(QA -ZA(k,j)) c c................... IF( TEST(23) .GT. 0) GO TO 1023 c c******************************************************************************* c*** KERNEL 24 FIND LOCATION OF FIRST MINIMUM IN ARRAY c******************************************************************************* c c X( n/2)= -1.000d+50 X( n/2)= -1.000d+10 c 1024 m= 1 DO 24 k= 2,n IF( X(k).LT.X(m)) m= k 24 CONTINUE c c m= imin1( n,x,1) 35 nanosec./element STACKLIBE/CRAY c................... IF( TEST(24) .NE. 0) GO TO 1024 c c******************************************************************************* c cPFM iflag1= 0 sum= 0.00d0 som= 0.00d0 DO 999 k= 1,mk sum= sum + TIME (k) TIMES(jr,il,k)= TIME (k) TERRS(jr,il,k)= TERR1(k) NPFS (jr,il,k)= NPFS1(k) CSUMS(jr,il,k)= CSUM (k) DOS (jr,il,k)= TOTAL(k) FOPN (jr,il,k)= FLOPN(k) som= som + FLOPN(k) * TOTAL(k) 999 continue c TK(1)= TK(1) + sum TK(2)= TK(2) + som c Dumpout Checksums c WRITE ( 7,706) jr, il c 706 FORMAT(1X,2I3) c WRITE ( 7,707) ( CSUM(k), k= 1,mk) c 707 FORMAT(5X,'&',1PE21.15,',',1PE21.15,',',1PE21.15,',') c CALL TRACK ('KERNEL ') RETURN END c*********************************************** SUBROUTINE PAGE( iou) c*********************************************** CALL TRACE ('PAGE ') WRITE(iou,1) 1 FORMAT('1') c 1 FORMAT(1H ) CALL TRACK ('PAGE ') RETURN END c c******************************************** FUNCTION RELERR( U,V) c******************************************** c c RELERR - RELATIVE ERROR BETWEEN U,V (0.,1.) c U - INPUT c V - INPUT c******************************************** c IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) cout DOUBLE PRECISION x, y REDUNDNT c CALL TRACE ('RELERR ') w= 0.00d0 IF( u .NE. v ) THEN w= 1.00d0 o= 1.00d0 IF( SIGN( o, u) .EQ. SIGN( o, v)) THEN a= ABS( u) b= ABS( v) x= MAX( a, b) y= MIN( a, b) IF( x .NE. 0.00d0) THEN w= 1.00d0 - y/x ENDIF ENDIF ENDIF c RELERR= w CALL TRACK ('RELERR ') RETURN END c c*********************************************************************** SUBROUTINE REPORT( iou, ntk,nek,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID) c*********************************************************************** c * c REPORT - Prints Statistical Evaluation Of Fortran Kernel Timings* c * c iou - Logical Output Device Number * c ntk - Total number of Kernels to Edit in Report * c nek - Number of Effective Kernels in each set to Edit * c FLOPS - Array: Number of Flops executed by each kernel * c TR - Array: Time of execution of each kernel(microsecs) * c RATES - Array: Rate of execution of each kernel(megaflops/sec)* c LSPAN - Array: Span of inner DO loop in each kernel * c WG - Array: Weight assigned to each kernel for statistics * c OSUM - Array: Checksums of the results of each kernel * c*********************************************************************** c c REFERENCES c c F.H.McMahon, The Livermore Fortran Kernels: c A Computer Test Of The Numerical Performance Range, c Lawrence Livermore National Laboratory, c Livermore, California, UCRL-53745, December 1986. c c from: National Technical Information Service c U.S. Department of Commerce c 5285 Port Royal Road c Springfield, VA. 22161 c c J.T. Feo, An Analysis Of The Computational And Parallel c Complexity Of The Livermore Loops, PARALLEL COMPUTING c (North Holland), Vol 7(2), 163-185, (1988). c c NOTICE c c "This report was prepared as an account c of work sponsored by the United States c Government. Neither the United States c nor the United States Department of c Energy, nor any of their employees, nor c any of their contractors, subcontractors, c or their employees, makes any warranty, c express or implied, or assumes any legal c liability or responsibility for the c accuracy, completeness or usefulness of c any information, apparatus, product or c process disclosed, or represents that its c use would not infringe privateiy-owned c rights." c c Reference to a company or product name c does not impiy approval or recommendation c of the product by the University of c California or the U.S. Department of c Energy to the exclusion of others that c may be suitable. c c c Work performed under the auspices of the c U.S. Department of Energy by the Lawrence c Livermore Laboratory under contract c number W-7405-ENG-48. c c*********************************************************************** c c Abstract c c A computer performance test that measures a realistic floating-point c performance range for Fortran applications is described. A variety c of computer performance analyses may be easily carried out using this c small central processing unit (cpu) test that would be infeasible or c too costly using complete applications as benchmarks, particularly in c the developmental phase of an immature computer system. The problem c of benchmarking numerical applications sufficiently, especially on c new supercomputers, is analyzed to identify several useful roles for c the Livermore Fortran Kernal (LFK) test. The 24 LFK contain enough c samples of Fortran practice to expose many specific inefficiencies in c the formulation of the Fortran source, in the quality of compiled cpu c code, and in the capability of the instruction architecture. c Examples show how the LFK may be used to study compiled Fortran code c efficiency, to test the ability of compilers to vectorize Fortran, to c simulate mature coding of Fortran on new computers, and to estimate c the effective subrange of supercomputer performance for Fortran c applications. c c Cpu performance measurements of several Fortran benchmarks and c numerical applications that correlate well with the cpu performance c range measured by the LFK test are presented. The numerical c performance metric Mflops, first introduced in 1970 in this cpu test c to quantify the cpu performance range of numerical applications, is c discussed. Analyses of the LFK performance results argue against c reducing the cpu performance range of supercomputers to a single c number. The 24 LFK measured rates show a realistic variance in c Fortran cpu performance that is essential data for circumspect c computer evaluations. Cpu performance data measured by the LFK test c on a number of recent computer systems are tabulated for reference. c c c c I: FORTRAN CPU PERFORMANCE ANALYSIS c c c These kernels measure Fortran numerical computation rates for a c spectrum of CPU-limited computational structures or benchmarks. c The kernels benchmark contains extracts or kernels from more c than a score CPU-limited scientific application programs. These c kernels are The most important CPU time components from The c application programs. This benchmark may be easily extended c with important new kernels leaving performance statistics intact. c c The time required to convert, debug, execute and time many, c entire, large programs on new machines each having a new c implementation of Fortran, or several implementations or c dialects rapidly becomes excessive. Almost all The conversion c costs are in segments of The programs which are irrelevant for c evaluation of The CPU, e.g., I/O, Fortran variations, memory c allocation, overlays, job control, etc. all of these c complexities are reduced to a single, small benchmark which uses c a minimum of I/O and a single level of storage. further, the c computation in the kernels is the most stable part of the c Fortran language. c c The kernels benchmark is sufficient to determine a range of CPU c performance for many different computational structures in a c single computer run. Since The range in performance is usually c large the mean has a secondary significance. To estimate the c performance of a particular, CPU-limited application program c select the case(s) which are most similar to the application as c most relevent to the estimate. The performance ratio of a c kernel on two different machines or compiled by two different c compilers on the same machine will approximate the ratio of c through-puts for an application which is very similar in c structure. c c This set of kernels was chosen to measure lower and upper bounds c for scalar Fortran computation rates. The upper bound on scalar c rates serves as a base to evaluate the effectiveness of vector c computation. The kind of Fortran which has the highest MIP c rates is pure arithmetic in DO-loops where complete local code c optimization by a Fortran compiler is possible. All other kinds c of Fortran operations execute at much lower MIP rates on c multiple register machines (these ops may not be necessary). c c Through-put is measured in units of floating-point operations c executed per micro-second; called results per micro-second or c mega-flops. The Mflop is a measure of the NECESSARY results in c a scientific application program regardless of the number or c kind of operations or processing. The ratio of Mflops for two c different machines will approximate the ratio of through-puts c for the majority of compute-limited scientific applications on c the two machines. The kernels measure performance scale c factors. c c c II: FORTRAN PROGRAMMING SYSTEM MATURITY c c Hardware performance gains depend criticaly on compiler c maturity. These kernels measure the joint performance of c hardware and Fortran compiler software and may easily be used c for a comparative analysis of all the available compilers or c options on a given machine. For a new or proposed machine where c no compiler is available the performance may be estimated by c simulating a reasonable compilation. An example of simulation c rationale is given below. c c Fortran compilers for new types of machines require a lengthy c development cycle to achive an effective level of machine c utilization. A fully mature compiler may not be completed in c the first years of a new machine. Indeed, maturity is not a c stationary state but evolves with advances in program c optimization techniques. Some of these techniques depend on c special facilities in the new machines and serious development c and implementation cannot start much earlier than development of c the new machine. Assumptions on the maturity of available c Fortran compilers are crucial to the evaluation of Fortran c performance and thus, compiler characteristics should be c explicit parameters of the performance analysis. c c c ----------------------------------------------------------------------------- c III: A CPU Performance Metric For Computational Physics: Mega-Flops/sec. c ----------------------------------------------------------------------------- c c c A: Floating-Point Instructions: The Necessary Mathematics c c Computational physics applies systems of PDEs from Mathematical physics to c simulate the evolution of physical systems. The mathematical methods depend c on real valued functions and the algorithms are programmed, almost c exclusively, in Fortran Floating-point computer operations (Flops). These c floating-point operations are, unquestionably, the NECESSARY computer c operations on ANY computer and the total number is INVARIANT. Thus a c meaningful computation rate can always be measured by counting the total c number of Flops and dividing by the total execution time of a program. c c B: Procedural Machine Instructions: Artifices Of An Archetecture c c All of the non-arithmetic instructions in a machine program are artifices of c a particular hardware architecture, i.e. machine dependant, as well as the c result of a particular compiler's imperfect coding techniques. How many of c these procedural machine instructions are strictly necessary can only be c determined by further, tedious analysis which is ALWAYS machine dependant. A c famous example of software masking hardware capabilities is the PASCAL c compiler written by n.Wirth which used only 50% of the command set to c generate machine programs for the CDC-7600. c c Unless the next generation computer design is constrained for some reason, to c closely resemble its obsolete predecessor, the instruction mix used in c current machines is not necessarily relevent. Furthermore, the instruction c mix is not a definitive characterization of the intrinsic physics or the c mathematical algorithms. c c 1. Primary Memory Access Instructions c c The number of memory instructions that are necessary for a given algorithm c depends strongly on the number and kind of CPU registers and is a highly c machine dependent number. Operating registers, scratch-pad memories, vector c buffers, short-stop and feed-back paths in the cpu are examples of hardware c artifices which reduce the number of primary memory operations. Compilers c and other coders must make intelligent use of these particular cpu resources c to minimize memory operations and this is generally not the case, as is well c known. c c 2. Branching Instructions c c Branching instructions are the slowest and most expensive procedural c instructions and are very often unecessary. Here the source programmer has c primary responsibility to minimize branching in the program by avoiding IF c statements whenever possible by using MAX, MIN, or merge functions like c CSMG. Careful logical reduction and placement of IF tests is required to c minimize the execution of branching operations. Compilers can do very little c to change or optimize the branch graph specified in the source program. c c On vector computers ALL IF tests over mesh or array (state) variables can be c eliminated. Conditional computation can be vectorized by direct construction c using explicit sub-set mappings. Vector relationals replace the IF clauses. c Then sparse, one-to-one mappings called vector Compress/Decompress and c one-to-many mappings called vector Gather/Scatter are necessary and c sufficient to compose sub-vector operands for simple vector operations. c c c c c c IV: PERFORMANCE MEASUREMENTS c c c Through-put is measured in units of millions of floating-point c operations executed per second, called mflops. c c c Artificially long computer runs do not have to be contrived for c timing on machines where a cpu clock may be read in job mode. c Statistics on the accuracy of the timing method should be c measured. c c Net mflops is meaningful only if real run time of each kernel c is adjusted such that it weights the total time in proportion c to the actual usage of that catagory of computation in the c total workload. c c c c c c 1. Assignment Of Weights To Floating-Point Operations c c Weights are assigned to different kinds of floating-point c operations to normalize their hardware execution time to c addition time so that the flop rates computed for various c Fortran Kernels will be commensurable. c c +,-,* 1 c /,SQRT 4 c EXP,SIN,ETC. 8 c IF(X.REL.Y) 1 c c c Each Kernel flop-count is the weighted number of flops required for c serial execution. The scalar version defines the NECESSARY computation c generally, in the absence of proof to the contrary. The vector c or parallel executions are only credited with executing the same c necessary computation. If the parallel methods do more computation c than is necessary then the extra flops are not counted as through-put. c c c 2. SAMPLE OUTPUT: CDC-7600/FTN-4.4 c c KERNEL FLOPS TIME MFLOPS c 1 500 94.4 5.30 c 2 300 45.3 6.62 c 3 100 21.9 4.57 c 4 300 109.3 2.75 c 5 100 25.6 3.91 c 6 100 27.8 3.60 c 7 640 88.2 7.25 c 8 1440 249.0 5.78 c 9 680 123.2 5.52 c 10 360 102.8 3.50 c 11 49 34.8 1.41 c 12 49 18.3 2.68 c 13 224 107.7 2.08 c 14 3300 809.3 4.08 c 15 3960 1769.5 2.24 c 16 530 320.3 1.65 c 17 405 92.2 4.39 c 18 6600 1121.5 5.88 c 19 540 105.8 5.11 c 20 1300 266.0 4.89 c 21 1250 370.9 3.37 c 22 1700 601.9 2.82 c 23 1650 362.4 4.55 c 24 200 171.7 1.16 c c AVERAGE RATE = 3.96 MEGA-FLOPS/SEC. c MEDIAN RATE = 4.08 MEGA-FLOPS/SEC. c HARMONIC MEAN = 3.15 MEGA-FLOPS/SEC. c STANDARD DEV. = 1.61 MEGA-FLOPS/SEC. c c F.H.MCMAHON 1972 c c c c c c c 3. INTERPRETATION OF OUTPUT FILE FROM SUBROUTINE REPORT: c c c c The highly instrumented LFK test program measures the effective cpu c performance range and has sufficient timed samples for many statisical c analyses thus avoiding the PERIL of a SINGLE performance "rating". c A COMPLETE REPORT OF LFK TEST RESULTS MUST QUOTE THE PERFORMANCE RANGE c STATISTICS BASED ON THE SUMMARY OF 72 TIMED SAMPLES: the minium, c the equi-weighted harmonic, geometric, and arithmetic means and the maximum c rates. The standard deviation must also be quoted to show the variance c in performance rates. NO SINGLE RATE QUOTATION IS SUFFICIENT OR HONEST. c c The LFK test (Livermore loops) outputs data for three benchmarking contexts c following print-outs of cpu clock checks and experimental timing errors: c c c c 1. Conventional "Balanced" Cpus, e.g. PCs, DEC-VAXs, IBM-370s. c c 1.1. [Refer to SUMMARY of 72 timings on pp.9-10 of LFK test OUTPUT file. c The bottom line is the set of nine performance range statistics c min thru max plus standard deviation listed after SUMMARY table. c These statistics may be used for computer comparisons as shown c in figure 11, p.24 of the LFK report UCRL-53745. Ratios of the c range statistics from two computers show the range of speed-ups.] c c 1.2. An all-scalar coded LFK test (NOVECTOR) measures the basal scalar, c mono-processor computing capability. c c c c 2. Vector "Unbalanced" Cpus, e.g. CRAY, NEC, IBM-3090. c c 2.1. [Pages 2-8 of the LFK test OUTPUT file analyzes three different c runs of the 24 Livermore loops with short, medium, and long DO c loop spans (vector lengths). The performance range statistics c for each of these three runs on vector computers should be compared c as shown in figure 12, p.25 of the LFK report UCRL-53745.] c c 2.2 The performance rates of most applications on vector computers are c observed in a sub-range from approximately the harmonic mean through c the mean rate of the 24 LFK samples (thru the two middle quartiles). c c 2.2.1 The equi-weighted arithmetic mean (AM) of 72 LFK rates c correlates with highly vectorised applications in the workload, c (80%-90% of flops) because the average is dominated by the high c vector rates. Very highly vectorised applications (95%-99%+) c may run several times the average rate (figure 10, p21, ibid). c c 2.2.2 The equi-weighted harmonic mean (HM) of 72 LFK rates c correlates with poorly vectorised applications in the workload, c (30%-40% of flops) because the HM is dominated by the low c scalar rates. An all-scalar coded LFK test (NOVECTOR) c measures the basal scalar, mono-processor computing capability. c c 2.2.3 The best central measure is the Geometric Mean(GM) of 72 rates, c because it is least biased by outliers. CRAY hardware monitors c have demonstrated net Mflop rates for the LLNL and UCSD c workloads are closest to the 72 LFK test geometric mean rate. c c c c 3. Parallel "Unbalanced" Cpus, e.g. CRAY, NEC, IBM-3090. c c 3.1. The lower, uni-processor bound of an MP system is given by 1.2. c c 3.2. The upper, multi-processor bound of an MP system is estimated by c multiplying the LFK performance statistics from 1.2 or from 2.2. c by N, the number of processors. c c c c c Comparision of two or more computers should make use of all the c performance range statistics in the tables below ( DO span= 167): c the extrema, the mean rates, and the standard deviation. c NO SINGLE MFLOPS RATE QUOTATION IS SUFFICIENT OR HONEST. c If the performance range is very large the causes and implicatio