Our lab's research interests lie broadly in operating systems, distributed systems, cloud computing, and mobile systems. Our main focus is researching the reliability and fault-tolerance of computing systems.
Order® := {Operating, Reliable, Defensible, Efficient, Responsive}
Our research is driven by in-depth observations of important reliability problems in real-world systems.
Our projects often intersect different areas including OS, program analysis, software engineering, and ML.
We collaborate closely with leading industry companies and strive to evaluate our work in real settings.
Large systems frequently suffer from partial failures. We develop a program reduction approach to synthesize custom watchdogs as a runtime monitor that mimics the main program for detecting and localizing partial failures. Best paper award in NSDI '20
Gray failures in cloud are challenging to detect. We observe that the impact of a gray fault is often observed by the "requesters" in a system. Based on this insight, we design a tool that can systematically captures observability for reporting gray failures.
Many apps use the constrained mobile resources wastefully. We adapt the lease mechanism from distributed systems into a mobile OS resource management abstraction to mitigate app energy bugs and use utility to make informed lease decisions.
Check out Phair, and Patternful AI
We appreciate our sponsors for their funding and support, which made our research possible.Sponsors