Parallel vector processors The great supercomputers of the past were built around custom vector processors. These are the expensive, high performance masterpieces pioneered by Seymor Cray. There are currently only a few examples of computers in production that still use vector processors, and all are parallel vector processors (PVP\'s) that run small numbers of vector processors within the same machine. NEC Earth Simulator 8 GFlops / 500 MHz proc --> 41 TFlops peak Cray SV1 8 GFlops / vector proc --> 64 GFlops/cabinet NEC SX-6 (Cray SX-6) 8 GFlops / proc Cray SV2 --> 10\'s of TFlops Vector processors operate on large vectors of data at the same time. The compiler automatically vectorizes the innermost loops to break the work into blocks, often of 64 elements in size, when possible. The functional units are pipelined to operate on all 64 elements within a single clock cycle, and the memory subsystem is optimized to keep the processors fed at this rate. Since the compilers do much of the optimization automatically, the user only needs to attack those problem areas where there is some impediment to the compiler understanding how to vectorize the loop. MPI, compiler directives, OpenMP, and pThreads packages can be used to parallelize a program to run across multiple vector processors.