Show simple item record

dc.contributor.advisorYun, Heechul
dc.contributor.authorKumar Valsan, Prathap
dc.date.accessioned2017-01-02T20:18:45Z
dc.date.available2017-01-02T20:18:45Z
dc.date.issued2016-05-31
dc.date.submitted2016
dc.identifier.otherhttp://dissertations.umi.com/ku:14736
dc.identifier.urihttp://hdl.handle.net/1808/22348
dc.description.abstractThe reduced space, weight and power(SWaP) characteristics of multi-core systems has motivated the real-time systems research community to explore them in mixed- criticality embedded systems. However, the major challenge in mixed-criticality em- bedded systems is to provide determinism to real-time tasks in the presence of co- running non-real-time tasks. The shared resources such as Memory subsystem (caches and DRAM) and, Bus make this a challenging effort. This work focuses on the shared Memory subsystem. First, we studied Commercial-Off-The-Shelf (COTS) DRAM controllers to an extend to demonstrate the interference due to shared resources inside DRAM Controllers, its impact on predictability and, proposed a DRAM controller design, called MEDUSA, to provide predictable memory performance in multicore based real-time systems. MEDUSA can provide high time predictability when needed for real-time tasks but also strive to provide high average performance for non-real-time tasks through a close collaboration between the OS and the DRAM controller. In our approach, the OS partially partitions DRAM banks into two groups: reserved banks and shared banks. The reserved banks are exclusive to each core to provide predictable timing while the shared banks are shared by all cores to efficiently utilize the resources. MEDUSA has two separate queues for read and write requests, and it prioritizes reads over writes. In processing read requests, MEDUSA employs a two-level scheduling algorithm that prioritizes the memory requests to the reserved banks in a Round Robin fashion to provide strong timing predictability. In processing write requests, MEDUSA largely relies on the FR-FCFS (First Ready-First Come First Serve) for high throughput but makes an immediate switch to read upon arrival of read requests to the reserved banks. We implemented MEDUSA in a cycle-accurate full-system simulator and a Linux ker- nel and performed experiments using a set of synthetic and SPEC2006 benchmarks to analyze the performance impact of MEDUSA on both real-time and non-real-time tasks. The results show that MEDUSA achieves up to 91% better worst-case per- formance for real-time tasks while achieving up to 29% throughput improvement for non-real-time tasks. Second, we studied the contention at shared caches and its impact on predictabil- ity. We demonstrate that the prevailing cache partition techniques do not necessar- ily ensure predictable cache performance in modern COTS multicore platforms that use non-blocking caches to exploit memory-level-parallelism (MLP).Through care- fully designed experiments using three real COTS multicore platforms (four distinctCPU architectures) and a cycle-accurate full system simulator, we show that special hardware registers in non-blocking caches, known as Miss Status Holding Registers (MSHRs), which track the status of outstanding cache-misses, can be a significant source of contention; we observe up to 21X WCET (Worst-Case-Execution-Time) increase in a real COTS multicore platform due to MSHR contention. We propose a hardware and system software (OS) collaborative approach to efficiently eliminate MSHR contention for multicore real-time systems. Our approach includes a low-cost hardware extension that enables dynamic control of per-core MLP by the OS. Using the hardware extension, the OS scheduler then globally controls each core’s MLP in such a way that eliminates MSHR contention and maximizes overall throughput of the system. We implement the hardware extension in a cycle-accurate full-system simulator and the scheduler modification in Linux 3.14 kernel. We evaluate the effectiveness of our approach using a set of synthetic and macro benchmarks. In a case study, we achieve up to 19% WCET reduction (average: 13%) for a set of EEMBC benchmarks compared to a baseline cache partitioning setup.
dc.format.extent81 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsCopyright held by the author.
dc.subjectComputer engineering
dc.subjectDRAM Controller
dc.subjectPredictable-Memory-Performance
dc.subjectReal-time-systems
dc.subjectShared Cache
dc.subjectShared-Memory
dc.subjectThesis
dc.titleTowards achieving predictable memory performance on multi-core based mixed criticality embedded systems
dc.typeThesis
dc.contributor.cmtememberYun, Heechul
dc.contributor.cmtememberKulkarni, Prasad
dc.contributor.cmtememberEl-Araby, Esam
dc.thesis.degreeDisciplineElectrical Engineering & Computer Science
dc.thesis.degreeLevelM.S.
dc.identifier.orcid
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record