ECE/CS 757 Project Topic Information

For the 757 course project, you need to form a team of 3 (possibly 4) students to conduct a research-focused project during the second half of the semester. The projects typically involve some implementation, evaluation, and analysis, along with a thorough survey of prior work.

You are encouraged to come up with your own ideas, but a list is provided here to help you in case you are stuck.

Requirements:

· You need to work in teams of 3-4 students to complete the project. Smaller or larger teams must adjust the scope of their work to match the size of the team.

· You must submit a written project proposal of up to two pages via the learn@uw dropbox by March 17, 2017. The proposal must include the names of all team members, a summary of the proposed topic and a research plan that outlines how you will accomplish your goals. For a hardware implementation project, you must also describe your proposed testbench and validation methodology.

· You must submit a written project status report via the learn@uw dropbox on April 21, 2017. The status report must describe your project activities to date, whether or not you are on track to complete your original goals, and any changes in your project plan and goals.

· Project findings will be presented orally during class time on May 1 & 3, 2017 (the last week of class).

· Written project reports that fully document your activities and findings are due at the end of the day via the learn@uw dropbox on May 8, 2017.

· The project report must also include a statement of work that identifies the contributions of each individual on the team. This statement of work must reflect a team consensus and must be signed by all team members. I recommend that you structure this statement as a table with a row for each project milestone, a column for each team member, and the percentage contribution of each team member to each milestone in the entries in the table.

One option to consider is to implement hardware for some critical piece of a parallel system. Note that you are largely on your own for tool support with hardware projects, so you should probably rely on pre-existing familiarity with simulation and synthesis tools from prior research or coursework (e.g. ECE 551):

· Design and evaluate a fully-functional interconnection network router with advanced features (e.g. NoX-style crossbar, virtual multicast trees, single-cycle routing, express virtual channels, flattened butterfly topology, etc.).

· Design and evaluate alternatives for on-chip interconnection, including e.g. swizzle-switch crossbars [K. Sewell et al., U Michigan]

· Design and evaluate an advanced, high-concurrency cache coherence controller and/or directory coherence controller, including all details of MSHRs, transient states, etc.

· Design and evaluate features needed to support recently-announced transactional memory support in e.g. IBM zSeries, IBM Power, or Intel Haswell. Information on the Haswell extensions can be found at this link.

· Design and evaluate an advanced DRAM controller with features like read bypassing and different open-page/closed-page policies.

The project requirements in general are open-ended: you can work on any computer architecture-related research topic pertinent to the course. Your goal should be to replicate the scope and quality of a typical conference paper in computer architecture. It is not likely that you will reach this goal during the semester, but you should at least have a good start towards that objective. The topic need not be original or novel, but that is encouraged.

I prefer that you come up with your own ideas of what you are interested in. Attached are some ideas if you get stuck.

Some Possible Research Topics

· Professor Mark Hill has volunteered to guide research-focused projects in this class. Please stop by during his office hours to discuss further.

· Propose and evaluate a novel cache replacement scheme, and participate in the cache replacement policy championship. One specific idea I have for this is to free up some space in each cache line in memory using simple compression techniques as proposed by Palframan [“COP”, ISCA 2015], and use that space to remember some useful information so that on the next fill of that block from memory the replacement policy can make a better decision.

· Implement and evaluate various enhancements to the widely-used gem5 multiprocessor simulation infrastructure. Some ideas on this front include better modeling of virtual memory/TLB, enhancing classic coherence to support MOESI, adding and evaluating modern prefetchers and evaluating their impact from a coherence perspective (see Enright et al, Friendly Fire), add SMT support into core model, add a third level of coherent cache to the Ruby model.

· Implement atomic coherence [Vantrease, HPCA 2011] and atomic consistency [Gope, HPCA 2014] in gem5, and assess hardware savings in the coherence system (since there are no colliding requests, none of the MSHR and snoop queue structures need to be searchable).

· Investigate the use of update protocols (PushS) as proposed by Vantrease [Phd Thesis, 2010] to accelerate deep neural networks and deep convolutional networks.

· Parallelize, characterize, and optimize new or emerging workloads that you may be familiar with (for example, in domains like machine learning or computer vision). You can implement applications in several parallel programming environments and models and evaluate performance, ease of programming, etc. across them (e.g. TBB, Transactional Memory, pthreads, OpenMP, CiLK, CUDA, OpenCL, SSE/AVX, etc.).

· Reinvestigate prior approaches for prediction in coherence protocols (e.g. multicast snooping) with newer workloads.

· Investigate the use of update protocols for on-chip multicore cache coherence.

· Evaluate various alternatives for on-chip, multilevel coherence protocols (writethrough, writeback, coherent vs. noncoherent LLC, etc.).

· Evaluate various interconnection network topologies for on-chip networks (ring, mesh, torus, flattened butterfly, hypercube, etc.).

· Evaluate interconnection network topologies assuming 3D stacking of chips in future designs.

· Evaluate, compare, and contrast some shared on-chip cache proposals from the literature (e.g. victim replication, R-NUCA, etc.).

· Evaluate and invent alternatives to the F (forward) state for providing efficient cache-to-cache transfers for shared lines.

· Evaluate opportunities for opportunistic caching of shared blocks to minimize off-chip misses.

· Reimplement and study any of the techniques described in the assigned readings.

· Utilize or recycle aging smartphones for some novel parallel application. We have ~20 Nokia N95 phones and ~15 Google G1 phones that could be used as a substrate for parallel computation. For example, use the displays, arrayed physically, as an HD display, and implement a distributed video decoder that plays back an HD movie. Alternatively, take on open-source game like Doom and recreate it in distributed form for such a display, so that each tile is rendered by the local GPU. Another idea would be to use the phone cameras in an array organization to collaboratively provide high-def photos or video, do 3D reconstruction based on multiple perspectives, or provide high-dynamic range (HDR) photos and/or video. Similar ideas for using arrays of microphones can also be explored.

· Investigate implementation of TLS (thread-level speculation) using the IBM or Intel TM support that is appearing in upcoming processor designs.

· Study additional network router approaches in the context of the NoX crossbar from MICRO 2011 [Hayenga]. Specifically, consider how to add virtual channel support, whether or not dimension slicing makes sense, whether crossbar speedup provides additional benefits, etc.