Jonathan Perry

Jonathan Perry, portrait

Starting something new!

OpenTelemetry Maintainer (opentelemetry-ebpf)

Founder and CEO of Flowmill: software-based (eBPF) network monitoring, acquired by Splunk.

PhD in Networks and Distributed Systems, MIT

Flowmill helps SREs accelerate​​ ​production incident resolution​.

By monitoring every service dependency pair, Flowmill answers questions such as “which of these 30 services is likely the cause for this incident?”​ in seconds, making it possible to direct escalations to fewer, more relevant engineers. This expedites triage, focuses mitigation efforts, and dramatically shrinks war room staffing and engineer burnout. Flowmill monitoring has negligible overhead, no sampling, no per-service configuration or code changes, and can be deployed in less than 20 minutes of configuration management.

I received my Ph.D at MIT CSAIL‘s Networks and Mobile Systems group, advised by Hari Balakrishnan and Devavrat Shah, with the thesis “Centralized performance control for datacenter networks“, during which we collaborated with Microsoft Research (2011) and Facebook (2013-2017). I had previously spent 7 years in communication systems R&D and HPC algorithm development as an officer in an army technological unit.

The PhD research revolved around enabling fast detection of and reaction to undesirable incidents in datacenter and cloud networks, by designing extremely fine granulrity, low overhead, low latency monitoring, processing, and control of service interactions. The systems produced mostly controlled network transfers, as this use-case provides extreme challenges for the technology. Fastpass aims for high utilization with zero queueing: a logically centralized arbiter controls and orchestrates all network transfers. Flowtune assigns shares of network throughput to pairs of applications according to organizational policy, maximizing the organization’s utility.

Other research deals with rateless error correcting codes for wireless networks: Spinal Codes (w/source code) are efficient, high-performance error correction codes, especially suited for analog channels.

Selected Publications

More publications


Spring 2014: 6.824 Distributed Systems
Spring 2013: 6.829 Computer Networks