A benchmark is a point of reference for a measurement. The term presumably originates from the practice of making dimensional height measurements of an object on a workbench using a graduated scale or similar tool, and using the surface of the workbench as the origin for the measurements.
In surveying, benchmarks are landmarks of reliable, precisely-known altitude, and are often man-made objects, such as features of permanent structures that are unlikely to change, or special-purpose "trig points", which are typically small concrete obelisks, approximately 3 feet tall and 1 foot at the base, set permanently into the earth.
In computing, a benchmark is the result of running a computer program, or a set of programs, in order to assess the relative performance of an object, by running a number of standard tests and trials against it. The term, benchmark, is also commonly used for specially-designed benchmarking programs themselves. Benchmarking is usually associated with assessing performance characteristics of computer hardware, e.g., the floating point operation performance of a CPU, but there are circumstances when the technique is also applicable to software. Software benchmarks are, for example, run against compilers or database management systems.
Benchmarks provide a method of comparing the performance of various subsystems across different chip/system architectures.
As computer architecture advanced, it became more and more difficult to compare the performance of various computer systems simply by looking at their specifications. Therefore, tests were developed that could be performed on different systems, allowing the results from these tests to be compared across different architectures. For example, Intel Pentium 4 processors have a higher hertz rating than AMD Athlon XP processors for the same computational speed, in other words a 'slower' AMD processors could be as fast on benchmark tests as a higher hertz rated Intel processors.
Benchmarks are designed to mimic a particular type of workload on a component or system. "Synthetic" benchmarks do this by specially-created programs that impose the workload on the component. "Application" benchmarks, instead, run actual real-world programs on the system. Whilst application benchmarks usually give a much better measure of real-world performance on a given system, synthetic benchmarks still have their use for testing out individual components, like a hard disk or networking device.
Computer manufacturers have a long history of trying to set up their systems to give unrealistically high performance on benchmark tests that is not replicated in real usage. For instance, during the 1980s some compilers could detect a specific mathematical operation used in a well-known floating-point benchmark and replace the operation with a mathematically-equivalent operation that was much faster. However, such a transformation was rarely useful outside the benchmark. Manufacturers commonly report only those benchmarks (or aspects of benchmarks) that show their products in the best light. They also have been known to mis-represent the significance of benchmarks, again to show their products in the best possible light. Taken together, these practices are called bench-marketing. See this article for an excellent example of how application benchmarks can differ significantly from synthetic benchmarks.
Users are recommended to take benchmarks, particularly those provided by manufacturers themselves, with ample quantities of salt. If performance is really critical, the only benchmark that matters is the actual workload that the system is to be used for. If that is not possible, benchmarks that resemble real workloads as closely as possible should be used, and even then used with skepticism. It is quite possible for system A to outperform system B when running program "furble" on workload X (the workload in the benchmark), and the order to be reversed with the same program on your own workload.
Types of benchmarks
- Real program
- word processing software
- tool software of CDA
- user's application software (MIS)
- contains key codes
- normally abstracted from actual program
- popular kernel: Livermore loop
- linpack benchmark (contains basic linear algebra subroutine written in FORTRAN language)
- results are represented in MFLOPS
- Toy Benchmark
- user can program it and use it to test computer`s basic components
- Synthetic Benchmark
- Procedure for programming synthetic Bench mark
- take statistics of all type of operations from plenty of application programs
- get proportion of each operation
- write a program based on the proportion above
- Types of Synthetic Benchmark are:
- Its results are represented in KWIPS (kilo whetstone instructions per second).It is not suitable for measuring pipeline computers.