Using Apple’s RPM tool

macOS Monterey ships with a tool that measures the responsiveness of your network connection. It saturates the network with traffic for 20 seconds, then measures the rate of short transactions to compute “Responses Per Minute.” Big numbers (above 2000) mean your network remains responsive when the network is heavily loaded. Small numbers (under 800 or so) mean your network isn’t responsive – potentially caused by bufferbloat.

There’s an iOS version described at https://support.apple.com/en-gb/HT212313

Stuart Cheshire and Vidhi Goel talked about the RPM tool at WWDC 2021. Apple also published an Internet-Draft that describes the RPM technique

Here’s a sample run from my Mac. The RPM tool displays my download and upload speeds (nominally 25mbps), and the number of simultaneous flows required to saturate the link (12, in this case). It shows the responsiveness as 1995 round-trips per minute. That’s really good: the average latency – even during heavy load – only increases a bit above the baseline (idle) 21 msec.

% /usr/bin/networkQuality -v
==== SUMMARY ====
Upload capacity: 22.657 Mbps
Download capacity: 23.755 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (1995 RPM)
Base RTT: 21
Start: 11/7/21, 7:18:37 AM
End: 11/7/21, 7:18:47 AM
OS Version: Version 12.1 (Build 21C5021h)
%

Here’s a video that shows the tool in operation: https://youtu.be/e9DUTB9okMA

NameD•Tective — mDNS over AppleTalk

[From the Archives of Amusing Technology…] Back in the ’90s, Dave Fisher and I created NameD•tective, a Macintosh control panel that gave any Mac on the Dartmouth network a static DNS name.

It used Name Binding Protocol to let someone create a DNS name based on their name plus their AppleTalk zone. The DNS name had the form: person-name.AppleTalk-zone…dartmouth.edu. The NameD•tective server looked up the NBP name, and returned the computer’s current IP address.

This screen shot shows an example: cd-changer.kiewit.atzone.dartmouth.edu was a server in my office that distributed information to other developers in the Kieiwt building. NameD•tective probably didn’t get broad use at Dartmouth, but it was a neat demonstration project. It led Dave and me to develop the MacDNS software that was shipped as part of Apple’s Internet Connection Kit. Here’s the page from Dartmouth’s website archived by the Wayback Machine: https://web.archive.org/web/19961220043011/http://www.dartmouth.edu/pages/softdev/named.html

Today, computer naming is much simpler. Modern operating systems let a computer specify a mDNS (multicast DNS) name that can be directly looked up to find a host that provides the service.

Comcast modems decrease bufferbloat

Last month, Comcast released a paper Improving Latency with Active Queue Management (AQM) During COVID-19 that shows that their PIE AQM dramatically decreases lag/latency (by a factor of 10X — from 250 msec down to 15-30 msec.) From the paper (page 13):

 

… As explained earlier, for two variants of XB6 cable modem gateway, upstream DOCSIS-PIE AQM was enabled on the CGM4140COM (experiment) variant but was not available on the TG3482G (control) variant during the measurement period

At a high level, when a device had AQM it consistently experienced between 15-30 milliseconds of latency under load. … [The] non-AQM devices experienced in many cases 250 milliseconds or higher latency under load.

See if you can get an XB6 / CGM4140COM cable modem

Toward a Consumer Responsiveness Metric

At a recent videoconference, I advocated strongly for a consumer-facing measurement of latency/responsiveness. I had not planned to speak, so I gave off-the-cuff comments. This is an organized explanation of my position. I offer these thoughts for consideration at the IAB Workshop “Measuring Network Quality for End-Users, 2021” – Rich Brown

I hunger for a day when vendors (router manufacturers and service providers) compete on the basis of “responsiveness” in the same way that they compete on speed – “Up to X megabits per second, and Y responsiveness!”

I have been working on the “Bufferbloat Project” [1] since 2011, trying to find layman’s terms for what was happening, and what to do about it. [2] [3] The delay goes by the name “lag”, “latency under load”, or “bufferbloat”. At first, the effects seemed mysterious and non-intuitive. Even to knowledgeable individuals, the magnitude of the delay caused by queueing was astonishing. No matter what name you use, it makes people say, “the internet is slow today”.

My router at home has solved this problem. I enjoy the fruits of the intense research from the mid 2010’s that led to well-understood solutions such as fq_codel, cake, PIE, and airtime fairness. Even using 7 mbps DSL, my network was quite usable, and very responsive.

My frustration in 2021 is that this remains a problem for nearly everyone else. The market has not provided solutions. Every day, people purchase brand name equipment that happily queues hundreds of msec of traffic.

I postulate that vendors have not considered responsiveness to be an important characteristic of their offerings. Consequently, they have not prioritized the engineering resources to incorporate the well-tested solutions listed above.

My hope, from this note, and from our on-going efforts, is that we can come up with a test tool that consumers can use to raise awareness of the problem of bad responsiveness.

Characteristics of a Responsiveness Tool

I seek a “responsiveness tool” with these characteristics:

  1. Easy to use. People need an easy way to measure responsiveness so they can give feedback to their vendors.
  2. A single number, so it’s easy to report and compare.
  3. Bigger must be better. High latency means bad responsiveness. People have no intuitive feel for a millisecond: “Is 100 msec bad? Isn’t that really short…?”
  4. An approximate measure is OK. Consumers won’t mind separate runs varying 20% or 30%, especially since poor responsiveness could be an order of magnitude different from good.
  5. Resistant to cheating. Vendors sometimes optimize pings to make latency look lower. But real people’s traffic doesn’t use pings. The responsiveness test must use protocols that match actual traffic patterns.
  6. Vendor and technology independent. People should use and get similar results from their phone, their desktop, on the web, or using an app.
  7. “Good enough”. A widely implemented and promoted metric that substantially matches people’s real experience is vastly superior to a host of competing metrics that muddy the waters in consumer’s minds.

A Proposed Metric – RPM

Apple has produced an Internet Draft “Responsiveness under Working Conditions” [4] and implementation. It defines a procedure for continually making short HTTPS transactions on a path to a server that has been fully loaded in both directions. The number of transactions in a fixed time is expressed as the number of “round-trips per minute”, which is given the name “RPM”, a wink to the “revolutions per minute” that we use for cars.

The RPM measurement satisfies all my concerns.

Non-requirements

It is not a requirement for the responsiveness test to provide:

  • Strict reproducibility. The wider internet has widely varying conditions, with bottlenecks moving around by time of day or adjacent traffic. It is not reasonable/feasible to expect that any measure used by consumers will be exactly reproducible.

  • Detailed statistics or distributions of measurements. This is not a diagnostic tool. A nuanced data set with medians and percentiles may excite techies, but for others, it’s hard to understand the implications.

  • Performance of any particular protocol. The responsiveness tool must measure a broad variety of typical traffic.

  • Data to be used as input for vendors to design solutions. The responsiveness measure needs to be used the same way we say to our mechanic, “The car makes a funny noise when I …”. I expect the specialist to work to reproduce the symptom, using the provided equipment, and come up with an appropriate solution.

Summary

The research of the last decade has developed a wide variety of solutions. There are plenty of corner-cases where these solutions aren’t perfect. I encourage vendors and researchers to study the field and advance our knowledge further. I would be delighted if they found practices even better than the current state of the art.

But “the rest of the internet” (including my neighbors and family members, for whom I’m the support person) would all benefit from a world where off-the-shelf equipment already incorporated well-known, best practice solutions.

References

[1] Bufferbloat Project https://bufferbloat.net

[2] Bufferbloat and the Ski Shop https://randomneuronsfiring.com/bufferbloat-and-the-ski-shop/

[3] Best Bufferbloat Analogy – Ever https://randomneuronsfiring.com/best-bufferbloat-analogy-ever/

[4] Responsiveness under Working Conditions – Internet-Draft at: https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/ Full disclosure: I am one of the editors of the “Responsiveness Under Working Conditions I-D”

Best Bufferbloat Analogy – Ever

My friends frequently ask, “Why is my network so slow?” And often, the answer is “latency” or the screwy term, “Bufferbloat” – the “undesirable latency caused when a router buffers too much data.” But what the heck does that mean?

A while back, I attempted a layman’s explanation of Bufferbloat. I compared it to a ski shop. It was pretty unsuccessful: it just didn’t have any intuitive appeal.

That’s why I was delighted that Waveform.com published what I believe is the Best Bufferbloat Analogy – Ever. (I am pleased to have contributed to the final version of their description.) That page also has a well-designed web-based Bufferbloat Tester (on a par with the Ookla Speedtest or the Cloudflare speed test).

They asked, Can you explain bufferbloat like I’m five? and noted that flows of liquids were sort of like flows of packets. The analogy was when a friend dumps a bucket of water into a sink with a narrow drain, it slows other flows (like a teaspoon of oil) from emptying out. Read the whole description…

This made me think about having a SmartSink™ to give a visual image for understanding how a well-designed router can decrease latency.

What’s a SmartSink™?

Instead of accepting a full bucket of water all at once, a SmartSink controls the bucket of water with a valve. It allows just enough water into the sink to keep the drain full. If the water gets too low, the SmartSink opens the valve: if it gets “too full”, it closes it a bit.

A SmartSink also works when lots of friends have their own buckets, pouring in colored water – pink, blue, etc. The valves on the SmartSink control each color. If the SmartSink notices too much pink water, it closes that valve a bit to bring back balance, so that each color gets its “fair share” of the drain’s capacity. And because there’s never too much water (of any color) in the sink, a small new flow always drains quickly.

Reality check: This is just an analogy. I realize that a SmartSink is a ridiculous idea. But it helps me visualize how small flows can drain quickly while big flows share the drain capacity fairly.

What does this have to do with routers?

The Smart Queue Management (SQM) algorithm in a router works like the SmartSink. When a device starts sending a lot of data (maybe a phone starts uploading photos to the cloud), SQM controls the amount of data queued for each flow (each separate upload, videoconference, voice call, gaming session, Youtube, Bittorrent, etc) to prevent any one flow from using more than its share. Instead of operating valves to control the flow of water, SQM controls the size of each flow’s queue by:

  1. Separating every traffic flow’s arriving packets into their own queue.
  2. Removing a small batch of packets from a queue, round-robin style,
    and transmitting that batch through the (slow) bottleneck link to the ISP.
    When each batch has been fully sent, retrieving a batch from the next queue, and so on.
  3. Offering back pressure to flows that are sending “more than their share” of data.

This process provides these desirable effects:

  • Most importantly, SQM provides low latency. Small flows (with just one or a few small packets) get sent right away in their next “round robin” batch.
  • Equal sharing of the bottleneck: If there are multiple senders, each can send an equal amount of data with each round-robin opportunity.
  • No waste of the bottleneck: If there’s only one sender (one queue with data), that one gets the full capacity of the link.
  • Offering backpressure to bulk senders minimizes lost packets and re-transmissions, making the network globally more efficient.

Does SQM work?

YES! Can I get a router with SQM today? YES!

Got questions? Send them to me and I’ll include them in Part 2 (coming soon) of this blog. Thanks.

Astonishing Lidar View of NH

The NH Stone Wall Mapper project uses Lidar data to display small variations in ground elevation. A UNH project built this map to identify stone walls in the state.

This site can be “misused” (in a good way) to show lots of other topographic features. Here’s a “Lidar view” of the grounds of Loch Lyme Lodge, near Post Pond. The features are shaded as if the sun were shining from the northeast. (Update: 31 Dec: Thanks to the good folks at the NH Geological Survey, the link now goes directly to the desired view!)

But wait… there’s more! You can turn on and off various “layers” to see other kinds of information. To do this:

  1. At the top-left, click the Layers Icon to display various layers
  2. Check on or off the Hillshade box to “show or hide the trees”…
  3. Click the More… icon to enable other features, such as the “Swipe Layers” that lets you compare two layers…

So much fun – play around!. Turn on/off layers, scroll to other parts of NH. If you find something interesting, send me a note and I’ll post it. Enjoy!

WireGuard Vanity Keys

A WireGuard VPN provides a fast, secure tunnel between endpoints. It uses public/private key pairs to encrypt the data.

If you have several clients, you have to enter their public keys into your server. Keeping track of those keys gets to be a hassle, since ordinarily, the keys are essentially random numbers.

I found a great project to help this problem: WireGuard Vanity Address. It continually generates WireGuard private/public key pairs, printing keys that contain a desired string in the first 10 characters. For example, I generated this public key for my MacBook Pro (MBP): MBP/DzPRZ05vNZ0XS3P9tlokZPrLy/1lb1Zsm3du4QA= Note the MBP/ at the start – it makes it easy to know that this is my Mac’s key.

To do it, I ran the wireguard-vanity-address program. Here is sample output:


$ ./wireguard-vanity-address MBP/
searching for 'mbp/' in pubkey[0..10], one of every 299593 keys should match
one trial takes 28.7 us, CPU cores available: 2
est yield: 4.3 seconds per key, 232.30e-3 keys/s
hit Ctrl-C to stop
private qMKPNrCMId59XTn5vgDICUh/QzIfhqZdrZ+XQBIJj2w= public zmbP/YEpC8Zl6MacYhcY1lq126tL2UudFjmrwbl2/18=
private HHtPY8IwGBxQ5OTtJY6GcuFpImXtDp9d187zvI0axFo= public qhIiSMbp/extT5irPy4EJfLRPR9jTzQZHlM15Fo/P2E=
private BEnEu1lVdcRI997nj2uPNGsyCZNPhBTCNfgJuYPPJHA= public hZzmBP/8EthWPOFp5wroEGPeJTHGxZ5KENnMiZvniGY=
private 8HRj+YZfSBnYZn38MPE09W2g03JvRJoGbjlDkHQ0Wnk= public mBP/q2dOd+m457PyKTIvI7MDTuXLCneG6MM0ir9rwRc=
...
private dFE8xsDDWNNNY1OjOIlxQiNVbp7Z6tZhXsaOo/5gPH0= public MBP/DzPRZ05vNZ0XS3P9tlokZPrLy/1lb1Zsm3du4QA=
^C
# This last line contains a public key starting with "MBP/"

For more details, read the github page, and also the issue where the author addresses security concerns about decreasing the size of the key space.

Update: I created a Dockerfile to make it even easier to run wireguard-vanity-address. Check out my personal github repo for details.

WireGuard GUI on macOS

A WireGuard VPN provides a fast, secure tunnel between endpoints. A macOS GUI client is available from the App Store

It works great. But its documentation is minimal. Even though the required keywords (which you must type manually) are the same as other clients, the GUI doesn’t give a hint about whether it’s right until you type it exactly correctly. Consequently, it can be a pain to configure it properly.

This screen shot shows a correctly configured (although fictitious) VPN tunnel. To get to this configuration window, use the Wireguard Manage Tunnels menu, click  and choose Add Empty Tunnel… then fill in the resulting window as shown below:

Screen shot of macOS WireGuard GUI

Although there are plenty of guides to explain WireGuard, this summarizes my best understanding of the meaning of these fields. There may be additional ways to configure the VPN, but following this advice will result in a working secure configuration.

[Interface] Section

  • PrivateKey: Private key for this computer. WireGuard uses this key to encrypt data sent to its peer, and decrypt received data. WireGuard displays the corresponding PublicKey (which you’ll enter into the peer) at the top of the window.
  • Address: Address for the VPN tunnel interface on this computer. Use a /32 address chosen from an address range that not is in either this network or the peer’s network. (This example uses 10.0.10.2/32 for this end. The peer (not shown) is 10.0.10.1/32. They were chosen because the 10.0.10.0/24 subnet is not in use on either side of the tunnel.)
  • DNS: (Optional) Address(es) of DNS servers to be used by this computer. It’s OK to leave this out – by default, WireGuard will use the underlying OS DNS servers.
  • ListenPort: (Optional) WireGuard listens on this port for traffic from its peer. It’s OK to leave this out – by default, WireGuard will select an unused port.

[Peer] Section

  • PublicKey: The public key of the remote peer. WireGuard uses this key to decrypt the packets sent from the peer, and encrypt packets sent to the peer.
  • PresharedKey: (Optional) This key will be used to encrypt the session. If specified, it is used in lieu of the public/private key pair for the peers.
  • AllowedIPs: A comma-separated list of IP (v4 or v6) addresses with CIDR masks which are allowed as destination addresses when sending via this peer and as source addresses when receiving via this peer.
  • Endpoint: (Optional) The address (or DNS name) and port of the remote peer. If specified, this peer will attempt to connect to the endpoint periodically.
  • PersistentKeepalive: (Optional) The number of seconds this peer waits before sending another keep-alive message. These messages “keep the session alive” through NAT.

I would appreciate comments on these descriptions so I can make them more helpful/useful.

Additional Thoughts

The following thoughts are refinements to the advice shown above.

    • The example above only allows traffic to/from the 192.168.4.0/24 and 172.30.42.0/24 subnets to travel through the tunnel. To send all traffic through the tunnel (say, to avoid prying eyes of your ISP, etc), you can set the AllowedIPs to 0.0.0.0/0. To send all IPv6 traffic through the tunnel, add ::/0
    • It neither necessary nor recommended to include the peer’s Address in the AllowedIPs list.
    • Although both Endpoint and PersistentKeepalive are listed as optional, you normally set both when using the macOS WireGuard client. Activating the tunnel (from the WireGuard menu), causes WireGuard to begin sending Keepalive packets to the Endpoint, which starts up the tunnel.
    • Dealing with NAT. If your ISP requires your remote peer to be behind NAT, you must configure your ISP’s router/modem to pass the WireGuard packets through. The setup varies from ISP to ISP, but in general, you’ll need to set up some kind of “virtual server”, “DMZ”, or “port forwarding” in the ISP router/modem to pass the WireGuard packets (on the port specified in the Endpoint) to the peer device.

Transmission of Covid-19

A friend (thanks, Ted!) directed me to a nice science-based article that assigns some probabilities of risks of transmitting a disease like coronavirus. The author highlights two major scenarios:

  1. Warm body transmission: how far apart should you be from other people if you want to avoid transmission from another “warm body”
  2. Surface-based transmission: what precautions should you take when you go somewhere that others have passed through recently.

You won’t be surprised by the takeaways:

  • 6 foot distancing is good
  • wearing a mask is good
  • washing hands is good

…but some of the discussion and details are interesting. View the full article at Medium.