Memory-level parallelism

Memory-level parallelism (MLP) is a term in computer architecture referring to the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time.

In a single processor, MLP may be considered a form of instruction-level parallelism (ILP). However, ILP is often conflated with superscalar, the ability to execute more than one instruction at the same time. E.g., a processor such as the Intel Pentium Pro is five-way superscalar, with the ability to start executing five different microinstructions in a given cycle, but it can handle four different cache misses for up to 20 different load microinstructions at any time.

It is possible to have a machine that is not superscalar but which nevertheless has high MLP.

Arguably a machine that has no ILP, which is not superscalar, which executes one instruction at a time in a non-pipelined manner, but which performs hardware prefetching (not software instruction level prefetching) exhibits MLP (due to multiple prefetches outstanding) but not ILP. This is because there are multiple memory operations outstanding, but not instructions. Instructions are often conflated with operations.

Furthermore, multiprocessor and multithreaded computer systems may be said to exhibit MLP and ILP due to parallelism—but not intra-thread, single process, ILP and MLP. Often, however, we restrict the terms MLP and ILP to refer to extracting such parallelism from what appears to be non-parallel single threaded code.

References

Glew, A. (1998). "MLP yes! ILP no!" (abstract / slides), In Wild and Crazy Ideas Session, 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998.
Ronen, R.; Mendelson, A.; Lai, K.; Shih-Lien Lu; Pollack, F.; Shen, J. P. (2001). "Coming challenges in microarchitecture and architecture". Proc. IEEE. 89 (3): 325–340. CiteSeerX 10.1.1.136.5349. doi:10.1109/5.915377.
Zhou, H.; Conte, T. M. (2003). "Enhancing memory level parallelism via recovery-free value prediction". Proceedings of the 17th annual international conference on Supercomputing. pp. 326–335. CiteSeerX 10.1.1.14.4405. doi:10.1145/782814.782859. ISBN 1-58113-733-8.
Yuan Chou; Fahs, B.; Abraham, S. (2004). Microarchitecture optimizations for exploiting memory-level parallelism. ISCA'04. Proceedings. 31st Annual International Symposium on Computer Architecture, 2004. pp. 76–87. doi:10.1109/ISCA.2004.1310765. ISBN 0-7695-2143-6.
Qureshi, M. K.; Lynch, D. N.; Mutlu, O.; Patt, Y. N. (2006). "A Case for MLP-Aware Cache Replacement". 33rd International Symposium on Computer Architecture. pp. 167–178. CiteSeerX 10.1.1.94.4663. doi:10.1109/ISCA.2006.5. ISBN 0-7695-2608-X.
Van Craeynest, K.; Eyerman, S.; Eeckhout, L. (2009). "MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor". High Performance Embedded Architectures and Compilers (PDF). LNCS. 5409. pp. 110–124. CiteSeerX 10.1.1.214.3261. doi:10.1007/978-3-540-92990-1_10. ISBN 978-3-540-92989-5.

Parallel computing

General	Distributed computing Cloud computing High-performance computing

Levels	Bit Instruction Task Data Memory Loop Pipeline

Multithreading	Temporal Simultaneous Preemptive Cooperative

Theory	PRAM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup

Elements	Process Thread Fiber Instruction window

Coordination	Multiprocessing Memory coherency Cache coherency Cache invalidation Barrier Synchronization Application checkpointing

Programming	Stream Processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm

Hardware	Flynn's taxonomy SISD SIMD MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Grid computer

APIs	Ateji PX Boost.Thread Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays MPI OpenMP OpenCL OpenHMPP OpenACC TPL PLINQ PVM POSIX Threads RaftLib UPC TBB

Problems	Embarrassingly parallel Software lockout Scalability Race condition Deadlock Livelock Starvation Deterministic algorithm Parallel slowdown

Category: parallel computing Media related to parallel computing at Wikimedia Commons

CPU technologies

Architecture	Von Neumann Harvard (Modified) Dataflow TTA

Instruction set	ASIP CISC RISC EDGE (TRIPS) VLIW (EPIC) MISC OISC NISC ZISC Comparison

Word size	1-bit 4-bit 8-bit 9-bit 10-bit 12-bit 15-bit 16-bit 18-bit 22-bit 24-bit 25-bit 26-bit 27-bit 31-bit 32-bit 33-bit 34-bit 36-bit 39-bit 40-bit 48-bit 50-bit 60-bit 64-bit 128-bit 256-bit 512-bit Variable

Execution	Instruction pipelining Bubble Operand forwarding Out-of-order execution Register renaming Speculative execution Branch predictor Memory dependence prediction Hazards

Parallel level	Bit Bit-serial Word Instruction Scalar Superscalar Task Thread Process Data Vector Memory

Multithreading	Temporal Simultaneous Preemptive Cooperative

Flynn's taxonomy	SISD SIMD MISD MIMD SPMD Addressing mode

Core count	Single-core processor Multi-core processor Manycore processor

Types	Digital signal processor (DSP) GPGPU Microcontroller Physics processing unit System on a chip (SoC) Cellular

Components	Address generation unit (AGU) Arithmetic logic unit (ALU) Barrel shifter Floating-point unit (FPU) Back-side bus Multiplexer Demultiplexer Registers Memory management unit (MMU) Translation lookaside buffer (TLB) Cache Register file Microcode Control unit Clock rate

Power management	APM ACPI Dynamic frequency scaling Dynamic voltage scaling Clock gating

Hardware security	Non-executable memory (NX bit) Bounds checking (Intel MPX) Hardware restriction (firmware) Software Guard Extensions (Intel SGX) Trusted Execution Technology Secure cryptoprocessor Hardware security module Hengzhi chip

This article is issued from Wikipedia - version of the 12/1/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Memory-level parallelism

See also

References