I architect systems that make analyzing large data sets easier and more efficient. My current focus is simplifying the process of designing hardware accelerators for specific tasks including data-parallel pipelines and image processing. In the past, I've worked on decreasing the costs of using CNNs for computer vision tasks and using distributed computing to quality control and model large financial data sets. I'm supported by a NSF Graduate Research Fellowship and a Stanford Graduate Fellowship in Science and Engineering.
Designing efficient, application-specialized
hardware accelerators requires assessing trade-offs
between a hardware module's performance and resource
requirements. To facilitate hardware design space
exploration, we describe Aetherling, a system for
automatically compiling data-parallel programs into
statically scheduled, streaming hardware circuits.
Aetherling contributes a space- and time-aware
intermediate language featuring data-parallel
operators that represent parallel or sequential
hardware modules, and sequence data types that
encode a module's throughput by specifying when
sequence elements are produced or consumed. As a
result, well-typed operator composition in the
space-time language corresponds to connecting
hardware modules via statically scheduled, streaming
We provide rules for transforming programs written
in a standard data-parallel language (that carries
no information about hardware implementation)
into equivalent space-time language programs. We
then provide a scheduling algorithm that searches
over the space of transformations to quickly
generate area-efficient hardware designs that
achieve a programmer-specified throughput. Using
benchmarks from the image processing domain, we
demonstrate that Aetherling enables rapid exploration of
hardware designs with different throughput and area
characteristics, and yields results that require
1.8-7.9x fewer FPGA slices than those of
prior hardware generation systems.
Stanford AHA Monthly Meeting - October 2020
Reconfigurable accelerators promise an exciting
set of benefits compared to other processors in
the cloud and on mobile devices. They can enable
application implementations that are more
parallel, more energy efficient, and have lower
latency. However, it can be challenging to
predict the real-world situations where
reconfigurability delivers these benefits. In
this talk, I will examine five benchmarks that
represent workloads interesting to Adobe. These
benchmarks show that there is a precise niche of
applications that benefit from
reconfigurability: applications that can be
implemented in a manner that takes advantage of
custom cache hierarchies and specialized
functional units. For other applications, there
are other ways to improve performance including
programming languages and compilers that
efficiently use existing, non-reconfigurable
accelerators with greater peak compute
performance and memory bandwidth.
PLDI 2020 - June 2020
The conference talk for the Aetherling paper.
In this talk, I focus on Aetherling's data-parallel
IR with space- and time-types, a higher-level input
language whose types are unaware of space and time,
and a simple set of rewrite rules for converting
from the higher-level language to the space-time IR.
Spark Summit East 2016 - February 2016
TopNotch is a framework for quality controlling big data through data quality metrics that scale up to large data sets, across schemas, and throughout large teams. TopNotch's SQL-based interface enables users across the technical spectrum to quality control data sets in their areas of expertise and understand data sets from other areas. I was the project lead and main developer for TopNotch while I worked at BlackRock.
Spark-NYC Meetup - September 2015
This presentation addresses the disparity between the current and desired big data user experiences. In this presentation, I demonstrate a web application with a scatterplot matrix visualization that allows non-technical users to utilize Spark to analyze large data sets.