Weave vs Flannel: Network Performance

Weave vs Flannel: Network Performance

Introduction

Weave and Flannel are currently the two main solutions to overlay networks for containers. They both try to solve the problem: how do you assign an IP to each container to connect them, when each container host can only have one IP.

They both employ the IP encapsulation approach: carrying layer2 (link layer) frames inside UDP datagrams.

The difference lies in the implementation details, which you can find in this post. The post also has a network performance test between the two, but doesn’t look at Weave fast datapath.

So in this post, I will compare three different setups: Weave with/without fast datapath, and Flannel.

Setup Weave

I launched 2 EC2 instances (t2.medium on hvm) in the same AZ.

To set up Weave, Weave doc is all we need, and is easy to follow.

Setup Flannel

Setting up Flannel is more involved. The steps are here:
Set up an etcd cluster.

# On both hosts
# Download and untar etcd
$ curl -L  https://github.com/coreos/etcd/releases/download/v2.2.2/etcd-v2.2.2-linux-amd64.tar.gz -o etcd-v2.2.2-linux-amd64.tar.gz
$ tar xzvf etcd-v2.2.2-linux-amd64.tar.gz

# On HOST0, replacing $HOST0 and $HOST1 with the EC2 private IP
$ export ETCD_INITIAL_CLUSTER="infra0=http://$HOST0:2380,infra1=http://$HOST1:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=new
# Start etcd server in the background
$ nohup ./etcd-v2.2.2-linux-amd64/etcd -name infra0 -initial-advertise-peer-urls http://$HOST0:2380 -listen-peer-urls http://$HOST0:2380 -listen-client-urls http://$HOST0:2379,http://127.0.0.1:2379 -advertise-client-urls http://$HOST0:2379 -initial-cluster-token etcd-cluster-1 &

# On HOST1, replacing $HOST0 and $HOST1 with the EC2 private IP
$ export ETCD_INITIAL_CLUSTER="infra0=http://$HOST0:2380,infra1=http://$HOST1:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=new
# Start etcd server in the background
$ nohup ./etcd-v2.2.2-linux-amd64/etcd -name infra1 -initial-advertise-peer-urls http://$HOST1:2380 -listen-peer-urls http://$HOST1:2380 -listen-client-urls http://$HOST1:2379,http://127.0.0.1:2379 -advertise-client-urls http://$HOST1:2379 -initial-cluster-token etcd-cluster-1 &

Install Flannel

# On both hosts
$ curl -L https://github.com/coreos/flannel/releases/download/v0.5.5/flannel-0.5.5-linux-amd64.tar.gz -o flannel.tar.gz
$ tar zxf flannel.tar.gz

# On one host
$ ./etcd-v2.2.2-linux-amd64/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16", "Backend": { "Type": "vxlan"} }'

Host Connection Test

Install iperf on each host.

# ===TCP===
# On HOST0
$ iperf -f M -i 1 -m -s

# On HOST1
$ iperf -f M -t 60 -i 1 -c $HOST0

# Output
[  3]  0.0-60.0 sec  6887 MBytes   115 MBytes/sec
[  4] MSS size 8949 bytes (MTU 8989 bytes, unknown interface)

# ===UDP===
# On HOST0
$ iperf -f M -i 1 -m -su

# On HOST1
$ iperf -f M -i 1 -t 60 -m -c $SERVER -b 1000M

# Output
[  4]  0.0-10.1 sec   562 MBytes  55.7 MBytes/sec   0.088 ms  450/401197 (0.11%)

Weave without Fast Datapath Test

First, we need to start Weave without fast datapath, using env variable WEAVE_NO_FASTDP.

# HOST0
$ WEAVE_NO_FASTDP=true weave launch
$ eval "$(weave env)"

# HOST1
$ WEAVE_NO_FASTDP=true weave launch $HOST1
$ eval "$(weave env)"

Next, we can launch containers.

# HOST0
$ docker run -it --name test-server ubuntu
## Inside test-server
$ apt-get update; apt-get install iperf

# HOST1
$ docker run -it --name test-client ubuntu
## Inside test-client
$ apt-get update; apt-get install iperf

Then we can run iperf in similar way as above, but you now need to use the IP in the overlay network.

# Output
# ===TCP===
[  3]  0.0-60.0 sec  5168 MBytes  86.1 MBytes/sec

# ===UDP===
[  3]  0.0-60.0 sec  2634 MBytes  43.9 MBytes/sec   0.217 ms 2246629/4125665 (54%)

Note the performance is over 20% less than host connection. Let’s enable fast datapath.

Weave with Fast Datapath Test

Weave requires some work to stop.

# On both hosts.  For HOST1, change test-server to test-client.
$ docker rm -f test-server
# Necessary to put DOCKER_HOST back to its original value
$ eval "$(weave env --restore)"
$ weave stop
$ weave reset

Now restart Weave as above, but without WEAVE_NO_FASTDP=true.

Run tests the same way.

# Output
# ===TCP===
[  4]  0.0-60.1 sec  2546 MBytes  42.4 MBytes/sec

# ===UDP===
[  3]  0.0-60.1 sec  1438 MBytes  23.9 MBytes/sec   0.556 ms 3098522/4124483 (75%)

Er … fast datapath seems to make performance worse ???!!!

Set Proper MTU

It turns out Weave by default has MTU set to 1410, despite AWS VPC can handle 9001. We need to tell Weave to use a higher value, as it can’t detect it automatically (at least I didn’t find out how).

$ WEAVE_MTU=8950 weave launch

And the test result looks much better

# Without fast datapath
# ===TCP===
[  3]  0.0-60.2 sec  5050 MBytes  83.9 MBytes/sec

# ===UDP===
[  3]  0.0-60.0 sec  3895 MBytes  64.9 MBytes/sec   0.291 ms 1349488/4127983 (33%)

# With fast datapath
# ===TCP===
[  4]  0.0-60.0 sec  6897 MBytes   115 MBytes/sec

# ===UDP===
[  3]  0.0-60.2 sec  3245 MBytes  53.9 MBytes/sec   0.518 ms 1808496/4123433 (44%)

Ok the results look more reasonable, expecially that TCP with fast datapath is the same as host speed now (of course there is some error variance, but shouldn’t be that much).

Flannel Test

Flannel seems to be able to automatically detect the network MTU, so we don’t need to care about it.

To start Flannel,

# On both hosts
$ nohup ./flannel-0.5.5/flanneld &
$ source /run/flannel/subnet.env
$ nohup docker daemon --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU} &

Now Docker daemon is configured to use Flannel’s overlay network.

Launch containers and run tests as before.

# Output
# ===TCP===
[  3]  0.0-60.0 sec  6866 MBytes   114 MBytes/sec
[  3] MSS size 8899 bytes (MTU 8939 bytes, unknown interface)

# ===UDP===
[  3]  0.0-60.2 sec  3086 MBytes  51.2 MBytes/sec  13.546 ms 1920728/4122179 (47%)

As we see, the performance is nearly the same as host speed.

Conclusions

Setup TCP UDP
Host 115MB/s 55.7MB/s
Weave without Fast Datapath, default MTU 86.1MB/s 43.9MB/s
Weave with Fast Datapath, default MTU 42.4MB/s 23.9MB/s
Weave without Fast Datapath, MTU=8950 83.9MB/s 64.9MB/s
Weave with Fast Datapath, MTU=8950 115MB/s 53.9MB/s
Flannel with backend=vxlan 114MB/s 51.2MB/s
  • Weave with Fast Datapath (and correct MTU) seems to have the same performance as Flannel.
  • UDP performance seems to be varying quite a lot during the tests, so the values above may not be representative. Indeed, when I used -l 8950 in iperf, the performance dropped to ~30MB/s. I’m not sure if this is due to some kind of throttling on AWS, but this happens with all of the setups.
  • Weave fast datapath currently doesn’t support encryption, so it’ll be down to the application to do that.

Written with StackEdit.

3 comments:

  1. Why the speed for UDP is slower than TCP?

    ReplyDelete
    Replies
    1. I think it's because of iperf. It does some internal processing on each packet which slows down the data rate. I will try to run the test again and update the results.

      Delete
  2. iperf gives bad results. use iperf3.

    Thanks for the comparison though.

    ReplyDelete