An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-chip Integration
Oh la la
Your session has expired but donâ€™t worry, your message
has been saved.Please log in and weâ€™ll bring you back
to this page. Youâ€™ll just need to click â€śSendâ€ť.
Your evaluation is of great value to our authors and readers. Many thanks for your time.
When you're done, click "publish"
Only blue fields are mandatory.
Your mailing list is currently empty.
It will build up as you send messages
and links to your peers.
besides you has access to this list.
Enter the e-mail addresses of your recipients in the box below. Note: Peer Evaluation will NOT store these email addresses log in
Your message has been sent.
Full text for this article was not available? Send a request to the author(s)
: An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-chip Integration
Abstract : Abstract—Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware, and the network interface/router. In this paper, we exploit such integration scale, presenting a novel node architecture aimed at reducing the long L2 miss latencies and the memory overhead of using directories that characterize cc-NUMA machines and limit their scalability. Our proposal replaces the traditional directory with a novel three-level directory architecture, as well as it adds a small shared data cache to each of the nodes of a multiprocessor system. Due to their small size, the first-level directory and the shared data cache are integrated into the processor chip in every node, which enhances performance by saving accesses to the slower main memory. Scalability is guaranteed by having the second and third-level directories out of the processor chip and using compressed data structures. A taxonomy of the L2 misses, according to the actions performed by the directory to satisfy them, is also presented. Using execution-driven simulations, we show that significant latency reductions can be obtained by using the proposed node architecture, which translates into reductions of more than 30 percent in several cases in the application execution time. Index Terms—cc-NUMA multiprocessor, directory memory overhead, L2 miss latency, three-level directory, shared data cache, onprocessor-chip integration. 1
: Computer Science
Leave a comment
This contribution has not been reviewed yet. review?