Dec 2023
Legolas is accepted to appear at NSDI '24[...]
Partial failures are notorious in distributed systems. Such failures are often only
triggered by subtle faults at rare timing, which makes it challenging to expose the bugs during testing. Legolas is a fault injection testing framework that is capable of simulate fine-grained faults in a system using instrumentation. It automatically infers abstract states from a system and leverages the states to efficiently explore the fault injection space. This is another piece of work on our research agenda of dealing with complex failures in distributed systems.
Aug 2023Welcome new PhD students Wanning, Yi, and Yuxuan!
July 2023
pBox is accepted to appear at SOSP '23[...]
Modern applications are highly concurrent with a diverse mix of activities. One noisy activity can negatively impact other activities and cause performance interference inside an application. pBox allows developers to systematically achieve strong performance isolation within an application. This work is the final piece in Yigong's PhD thesis research.
July 2023Welcome Ruiming Lu, a PhD student from STJU who will be visiting our lab.
June 2023Welcome 10 summer interns to our lab: Yunchi, Xiaoyang, Zeyin, Zhewen, Shuangyu, Angting, Yujin, Yicheng, Yuqi, and Dimas! This will be a fun summer.
Jun 2023Yigong passed his PhD defense, making him PhD #2 from the lab. He will join University of Washington as a postdoc. Congrats, Dr. Hu! Looking forward to the next research chapter that you will continue!
[...]
Yigong's PhD dissertation is titled "Reasoning About and Mitigating Performance Issues in Large-Scale Systems". It covers three major pieces of work Yigong did in his PhD to address performance issues in modern applications. It develops a symbolic execution method to systematically reason about the performance effects of configuration parameters and detect misconfiguration-induced performance issues offline (Violet [OSDI '20]). It designs an operating system-level abstraction along with a runtime library that allow developers to achieve fine-grained performance isolation within their applications (pBox [SOSP '23]). It adapts the lease mechanism in distributed systems and re- purposes lease to mitigate energy-misbehavior in mobile systems (LeaseOS [ASPLOS '19]).
May 2023Chang passed his PhD defense, making him PhD #1 from the lab. He will join University of Virginia as an Assistant Professor. Congrats, Dr. Lou! Best wishes with your new faculty role and research group!
[...]
Chang's PhD dissertation is titled Enhancing Cloud System Runtime to Address Complex Failures. It particularly addresses three classes of complex failures: partial failures (OmegaGen [NSDI '20]), silent semantic violations (OathKeeper [OSDI '22]), and slow failures in the form of memory leaks (RESIN [OSDI '22]). It carefully uses program analysis, instrumentation, dynamic tracing, and statistical analysis techniques to design novel, principled solutions for enhancing the runtime of cloud systems, in the form of watchdog checkers, semantic rules, and monitors and tracers. These runtime components enable a cloud system to detect complex failures quickly and reliably.
Jan 2023vProf is accepted to appear at EuroSys '23[...]
Traditional profilers are ineffective to debug subtle performance issues when the most costly operations are not the root cause. vProf introduces a new profiling methodology that continuously captures the program variable values besides the costs to enable more accurate performance reasoning. This is a collaboration work with Lingmei, Jason and Junfeng, and a follow-up on our previous collaboration work Argus [ATC '21].
Sep 2022Ryan gave a talk at Strange Loop on generating runtime checkers for distributed systems.
[...]
The talk summarizes the progress and results from several years' research in the lab on this topic, including Panorama [OSDI '18], OmegaGen [NSDI '20], and OathKeeper [OSDI '22].