Flowmill helps SREs accelerate production incident resolution.
By monitoring every service dependency pair, Flowmill answers questions such as “which of these 30 services is likely the cause for this incident?” in seconds, making it possible to direct escalations to fewer, more relevant engineers. This expedites triage, focuses mitigation efforts, and dramatically shrinks war room staffing and engineer burnout. Flowmill monitoring has negligible overhead, no sampling, no per-service configuration or code changes, and can be deployed in less than 20 minutes of configuration management.
I received my Ph.D at MIT CSAIL‘s Networks and Mobile Systems group, advised by Hari Balakrishnan and Devavrat Shah, with the thesis “Centralized performance control for datacenter networks“, during which we collaborated with Microsoft Research (2011) and Facebook (2013-2017). I had previously spent 7 years in communication systems R&D and HPC algorithm development as an officer in an army technological unit.
The PhD research revolved around enabling fast detection of and reaction to undesirable incidents in datacenter and cloud networks, by designing extremely fine granulrity, low overhead, low latency monitoring, processing, and control of service interactions. The systems produced mostly controlled network transfers, as this use-case provides extreme challenges for the technology. Fastpass aims for high utilization with zero queueing: a logically centralized arbiter controls and orchestrates all network transfers. Flowtune assigns shares of network throughput to pairs of applications according to organizational policy, maximizing the organization’s utility.
Other research deals with rateless error correcting codes for wireless networks: Spinal Codes (w/source code) are efficient, high-performance error correction codes, especially suited for analog channels.
- J. Perry, H. Balakrishnan, and D. Shah. Flowtune: Flowlet Control for Datacenter Networks, NSDI 17.
- J. Perry, A. Ousterhout, H. Balakrishnan, D, Shah, H. Fugal, Fastpass: A Centralized “Zero-Queue” Datacenter Network, SIGCOMM 2014.
- J. Perry, P. Iannucci, K. Fleming, H. Balakrishnan, D, Shah, Spinal Codes, SIGCOMM 2012.
- J. Perry, H. Balakrishnan, D. Shah, Rateless Spinal Codes, HotNets 2011.
- P. Iannucci, J. Perry, H. Balakrishnan, D. Shah, No Symbol Left Behind: A Link-Layer Protocol for Rateless Codes, MobiCom 2012.
- D. Shah, J. Perry, P. Iannucci, H. Balakrishnan, De-randomizing Shannon: The Design and Analysis of a Capacity-Achieving Rateless Code, Manuscript in preparation/submission.
- P. Iannucci, K. Fleming, J. Perry, H. Balakrishnan, D. Shah, A Hardware Spinal Decoder, ANCS 2012.