Towards achieving predictable memory performance on multi-core based mixed criticality embedded systems

Kumar Valsan, Prathap

dc.contributor.advisor	Yun, Heechul
dc.contributor.author	Kumar Valsan, Prathap
dc.date.accessioned	2017-01-02T20:18:45Z
dc.date.available	2017-01-02T20:18:45Z
dc.date.issued	2016-05-31
dc.date.submitted	2016
dc.identifier.other	http://dissertations.umi.com/ku:14736
dc.identifier.uri	http://hdl.handle.net/1808/22348
dc.description.abstract	The reduced space, weight and power(SWaP) characteristics of multi-core systems has motivated the real-time systems research community to explore them in mixed- criticality embedded systems. However, the major challenge in mixed-criticality em- bedded systems is to provide determinism to real-time tasks in the presence of co- running non-real-time tasks. The shared resources such as Memory subsystem (caches and DRAM) and, Bus make this a challenging effort. This work focuses on the shared Memory subsystem. First, we studied Commercial-Off-The-Shelf (COTS) DRAM controllers to an extend to demonstrate the interference due to shared resources inside DRAM Controllers, its impact on predictability and, proposed a DRAM controller design, called MEDUSA, to provide predictable memory performance in multicore based real-time systems. MEDUSA can provide high time predictability when needed for real-time tasks but also strive to provide high average performance for non-real-time tasks through a close collaboration between the OS and the DRAM controller. In our approach, the OS partially partitions DRAM banks into two groups: reserved banks and shared banks. The reserved banks are exclusive to each core to provide predictable timing while the shared banks are shared by all cores to efficiently utilize the resources. MEDUSA has two separate queues for read and write requests, and it prioritizes reads over writes. In processing read requests, MEDUSA employs a two-level scheduling algorithm that prioritizes the memory requests to the reserved banks in a Round Robin fashion to provide strong timing predictability. In processing write requests, MEDUSA largely relies on the FR-FCFS (First Ready-First Come First Serve) for high throughput but makes an immediate switch to read upon arrival of read requests to the reserved banks. We implemented MEDUSA in a cycle-accurate full-system simulator and a Linux ker- nel and performed experiments using a set of synthetic and SPEC2006 benchmarks to analyze the performance impact of MEDUSA on both real-time and non-real-time tasks. The results show that MEDUSA achieves up to 91% better worst-case per- formance for real-time tasks while achieving up to 29% throughput improvement for non-real-time tasks. Second, we studied the contention at shared caches and its impact on predictabil- ity. We demonstrate that the prevailing cache partition techniques do not necessar- ily ensure predictable cache performance in modern COTS multicore platforms that use non-blocking caches to exploit memory-level-parallelism (MLP).Through care- fully designed experiments using three real COTS multicore platforms (four distinctCPU architectures) and a cycle-accurate full system simulator, we show that special hardware registers in non-blocking caches, known as Miss Status Holding Registers (MSHRs), which track the status of outstanding cache-misses, can be a significant source of contention; we observe up to 21X WCET (Worst-Case-Execution-Time) increase in a real COTS multicore platform due to MSHR contention. We propose a hardware and system software (OS) collaborative approach to efficiently eliminate MSHR contention for multicore real-time systems. Our approach includes a low-cost hardware extension that enables dynamic control of per-core MLP by the OS. Using the hardware extension, the OS scheduler then globally controls each core’s MLP in such a way that eliminates MSHR contention and maximizes overall throughput of the system. We implement the hardware extension in a cycle-accurate full-system simulator and the scheduler modification in Linux 3.14 kernel. We evaluate the effectiveness of our approach using a set of synthetic and macro benchmarks. In a case study, we achieve up to 19% WCET reduction (average: 13%) for a set of EEMBC benchmarks compared to a baseline cache partitioning setup.
dc.format.extent	81 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	Copyright held by the author.
dc.subject	Computer engineering
dc.subject	DRAM Controller
dc.subject	Predictable-Memory-Performance
dc.subject	Real-time-systems
dc.subject	Shared Cache
dc.subject	Shared-Memory
dc.subject	Thesis
dc.title	Towards achieving predictable memory performance on multi-core based mixed criticality embedded systems
dc.type	Thesis
dc.contributor.cmtemember	Yun, Heechul
dc.contributor.cmtemember	Kulkarni, Prasad
dc.contributor.cmtemember	El-Araby, Esam
dc.thesis.degreeDiscipline	Electrical Engineering & Computer Science
dc.thesis.degreeLevel	M.S.
dc.identifier.orcid
dc.rights.accessrights	openAccess

Files in this item

Name:: KUMARVALSAN_ku_0099M_14736_DAT ...
Size:: 2.023Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.