ECE/CS 757 Project Topic Information
For the 757 course project, you need to form a team of 3 (possibly 4)
students to conduct a research-focused project during the second half of the
semester. The projects typically
involve some implementation, evaluation, and analysis, along with a thorough
survey of prior work.
You are encouraged to come up with your own ideas, but a list is provided
here to help you in case you are stuck.
You need to work in teams of 3-4 students to complete the project. Smaller or larger teams must adjust the
scope of their work to match the size of the team.
You must submit a written project proposal of up to two pages via the learn@uw dropbox by March 17,
2017. The proposal must include the
names of all team members, a summary of the proposed topic and a research plan
that outlines how you will accomplish your goals. For a hardware implementation
project, you must also describe your proposed testbench
and validation methodology.
You must submit a written project status report via the learn@uw dropbox on April 21,
2017. The status report must
describe your project activities to date, whether or not you are on track to
complete your original goals, and any changes in your project plan and goals.
Project findings will be presented orally during class time on May 1
& 3, 2017 (the last week of class).
Written project reports that fully document your activities and findings
are due at the end of the day via the learn@uw dropbox on May 8, 2017.
The project report must also include a statement of work that identifies
the contributions of each individual on the team. This statement of work must reflect a
team consensus and must be signed by all team members. I recommend that you structure this
statement as a table with a row for each project milestone, a column for each
team member, and the percentage contribution of each team member to each
milestone in the entries in the table.
One option to consider is to implement hardware for some critical piece
of a parallel system. Note that you
are largely on your own for tool support with hardware projects, so you should
probably rely on pre-existing familiarity with simulation and synthesis tools
from prior research or coursework (e.g. ECE 551):
Design and evaluate a fully-functional interconnection network router
with advanced features (e.g. NoX-style crossbar,
virtual multicast trees, single-cycle routing, express virtual channels,
flattened butterfly topology, etc.).
Design and evaluate alternatives for on-chip interconnection, including
e.g. swizzle-switch crossbars [K. Sewell et al., U Michigan]
Design and evaluate an advanced, high-concurrency cache coherence
controller and/or directory coherence controller, including all details of
MSHRs, transient states, etc.
Design and evaluate features needed to support recently-announced
transactional memory support in e.g. IBM zSeries, IBM
Power, or Intel Haswell. Information on the Haswell extensions
can be found at this link.
Design and evaluate an advanced DRAM controller with features like read
bypassing and different open-page/closed-page policies.
The project requirements in general are open-ended: you can work on any computer
architecture-related research topic pertinent to the course. Your goal should be to replicate the
scope and quality of a typical conference paper in computer architecture. It is not likely that you will reach
this goal during the semester, but you should at least have a good start
towards that objective. The topic
need not be original or novel, but that is encouraged.
I prefer that you come up with your own ideas of what you are interested
in. Attached are some ideas if you get stuck.
Some Possible Research Topics
Professor Mark Hill has
volunteered to guide research-focused projects in this class. Please stop by
during his office hours to discuss further.
Propose and evaluate a novel
cache replacement scheme, and participate in the cache replacement policy championship. One
specific idea I have for this is to free up some space in each cache line in
memory using simple compression techniques as proposed by Palframan
[“COP”, ISCA 2015], and use that space to remember some useful
information so that on the next fill of that block from memory the replacement
policy can make a better decision.
Implement and evaluate various
enhancements to the widely-used gem5 multiprocessor simulation
infrastructure. Some ideas on this
front include better modeling of virtual memory/TLB, enhancing classic
coherence to support MOESI, adding and evaluating modern prefetchers
and evaluating their impact from a coherence perspective (see Enright et al,
Friendly Fire), add SMT support into core model, add a third level of coherent
cache to the Ruby model.
Implement atomic coherence [Vantrease, HPCA 2011] and atomic consistency [Gope, HPCA 2014] in gem5, and assess hardware savings in
the coherence system (since there are no colliding requests, none of the MSHR
and snoop queue structures need to be searchable).
Investigate the use of update
protocols (PushS) as proposed by Vantrease
[Phd Thesis, 2010] to accelerate deep neural networks
and deep convolutional networks.
Parallelize, characterize, and
optimize new or emerging workloads that you may be familiar with (for example,
in domains like machine learning or computer vision). You can implement applications in
several parallel programming environments and models and evaluate performance,
ease of programming, etc. across them (e.g. TBB, Transactional Memory, pthreads, OpenMP, CiLK, CUDA, OpenCL, SSE/AVX, etc.).
Reinvestigate prior approaches
for prediction in coherence protocols (e.g. multicast snooping) with newer
Investigate the use of update
protocols for on-chip multicore cache coherence.
Evaluate various alternatives for
on-chip, multilevel coherence protocols (writethrough,
writeback, coherent vs. noncoherent
Evaluate various interconnection
network topologies for on-chip networks (ring, mesh, torus, flattened
butterfly, hypercube, etc.).
Evaluate interconnection network
topologies assuming 3D stacking of chips in future designs.
Evaluate, compare, and contrast
some shared on-chip cache proposals from the literature (e.g. victim replication,
Evaluate and invent alternatives
to the F (forward) state for providing efficient cache-to-cache transfers for
Evaluate opportunities for
opportunistic caching of shared blocks to minimize off-chip misses.
Reimplement and study any of the techniques described in the assigned readings.
Utilize or recycle aging
smartphones for some novel parallel application. We have ~20 Nokia N95 phones and ~15
Google G1 phones that could be used as a substrate for parallel computation.
For example, use the displays, arrayed physically, as an HD display, and
implement a distributed video decoder that plays back an HD movie. Alternatively, take on open-source game
like Doom and recreate it in distributed form for such a display, so that each
tile is rendered by the local GPU.
Another idea would be to use the phone cameras in an array organization
to collaboratively provide high-def photos or video,
do 3D reconstruction based on multiple perspectives, or provide high-dynamic
range (HDR) photos and/or video.
Similar ideas for using arrays of microphones can also be explored.
Investigate implementation of TLS
(thread-level speculation) using the IBM or Intel TM support
that is appearing in upcoming processor designs.
Study additional network router
approaches in the context of the NoX crossbar from MICRO 2011 [Hayenga].
Specifically, consider how to add virtual channel support, whether or not
dimension slicing makes sense, whether crossbar speedup provides additional