Weave vs Flannel: Network Performance
Introduction
Weave and Flannel are currently the two main solutions to overlay networks for containers. They both try to solve the problem: how do you assign an IP to each container to connect them, when each container host can only have one IP.
They both employ the IP encapsulation approach: carrying layer2 (link layer) frames inside UDP datagrams.
The difference lies in the implementation details, which you can find in this post. The post also has a network performance test between the two, but doesn’t look at Weave fast datapath.
So in this post, I will compare three different setups: Weave with/without fast datapath, and Flannel.
Setup Weave
I launched 2 EC2 instances (t2.medium on hvm) in the same AZ.
To set up Weave, Weave doc is all we need, and is easy to follow.
Setup Flannel
Setting up Flannel is more involved. The steps are here:
Set up an etcd cluster.
# On both hosts
# Download and untar etcd
$ curl -L https://github.com/coreos/etcd/releases/download/v2.2.2/etcd-v2.2.2-linux-amd64.tar.gz -o etcd-v2.2.2-linux-amd64.tar.gz
$ tar xzvf etcd-v2.2.2-linux-amd64.tar.gz
# On HOST0, replacing $HOST0 and $HOST1 with the EC2 private IP
$ export ETCD_INITIAL_CLUSTER="infra0=http://$HOST0:2380,infra1=http://$HOST1:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=new
# Start etcd server in the background
$ nohup ./etcd-v2.2.2-linux-amd64/etcd -name infra0 -initial-advertise-peer-urls http://$HOST0:2380 -listen-peer-urls http://$HOST0:2380 -listen-client-urls http://$HOST0:2379,http://127.0.0.1:2379 -advertise-client-urls http://$HOST0:2379 -initial-cluster-token etcd-cluster-1 &
# On HOST1, replacing $HOST0 and $HOST1 with the EC2 private IP
$ export ETCD_INITIAL_CLUSTER="infra0=http://$HOST0:2380,infra1=http://$HOST1:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=new
# Start etcd server in the background
$ nohup ./etcd-v2.2.2-linux-amd64/etcd -name infra1 -initial-advertise-peer-urls http://$HOST1:2380 -listen-peer-urls http://$HOST1:2380 -listen-client-urls http://$HOST1:2379,http://127.0.0.1:2379 -advertise-client-urls http://$HOST1:2379 -initial-cluster-token etcd-cluster-1 &
Install Flannel
# On both hosts
$ curl -L https://github.com/coreos/flannel/releases/download/v0.5.5/flannel-0.5.5-linux-amd64.tar.gz -o flannel.tar.gz
$ tar zxf flannel.tar.gz
# On one host
$ ./etcd-v2.2.2-linux-amd64/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16", "Backend": { "Type": "vxlan"} }'
Host Connection Test
Install iperf
on each host.
# ===TCP===
# On HOST0
$ iperf -f M -i 1 -m -s
# On HOST1
$ iperf -f M -t 60 -i 1 -c $HOST0
# Output
[ 3] 0.0-60.0 sec 6887 MBytes 115 MBytes/sec
[ 4] MSS size 8949 bytes (MTU 8989 bytes, unknown interface)
# ===UDP===
# On HOST0
$ iperf -f M -i 1 -m -su
# On HOST1
$ iperf -f M -i 1 -t 60 -m -c $SERVER -b 1000M
# Output
[ 4] 0.0-10.1 sec 562 MBytes 55.7 MBytes/sec 0.088 ms 450/401197 (0.11%)
Weave without Fast Datapath Test
First, we need to start Weave without fast datapath, using env variable WEAVE_NO_FASTDP
.
# HOST0
$ WEAVE_NO_FASTDP=true weave launch
$ eval "$(weave env)"
# HOST1
$ WEAVE_NO_FASTDP=true weave launch $HOST1
$ eval "$(weave env)"
Next, we can launch containers.
# HOST0
$ docker run -it --name test-server ubuntu
## Inside test-server
$ apt-get update; apt-get install iperf
# HOST1
$ docker run -it --name test-client ubuntu
## Inside test-client
$ apt-get update; apt-get install iperf
Then we can run iperf
in similar way as above, but you now need to use the IP in the overlay network.
# Output
# ===TCP===
[ 3] 0.0-60.0 sec 5168 MBytes 86.1 MBytes/sec
# ===UDP===
[ 3] 0.0-60.0 sec 2634 MBytes 43.9 MBytes/sec 0.217 ms 2246629/4125665 (54%)
Note the performance is over 20% less than host connection. Let’s enable fast datapath.
Weave with Fast Datapath Test
Weave requires some work to stop.
# On both hosts. For HOST1, change test-server to test-client.
$ docker rm -f test-server
# Necessary to put DOCKER_HOST back to its original value
$ eval "$(weave env --restore)"
$ weave stop
$ weave reset
Now restart Weave as above, but without WEAVE_NO_FASTDP=true
.
Run tests the same way.
# Output
# ===TCP===
[ 4] 0.0-60.1 sec 2546 MBytes 42.4 MBytes/sec
# ===UDP===
[ 3] 0.0-60.1 sec 1438 MBytes 23.9 MBytes/sec 0.556 ms 3098522/4124483 (75%)
Er … fast datapath seems to make performance worse ???!!!
Set Proper MTU
It turns out Weave by default has MTU set to 1410, despite AWS VPC can handle 9001. We need to tell Weave to use a higher value, as it can’t detect it automatically (at least I didn’t find out how).
$ WEAVE_MTU=8950 weave launch
And the test result looks much better
# Without fast datapath
# ===TCP===
[ 3] 0.0-60.2 sec 5050 MBytes 83.9 MBytes/sec
# ===UDP===
[ 3] 0.0-60.0 sec 3895 MBytes 64.9 MBytes/sec 0.291 ms 1349488/4127983 (33%)
# With fast datapath
# ===TCP===
[ 4] 0.0-60.0 sec 6897 MBytes 115 MBytes/sec
# ===UDP===
[ 3] 0.0-60.2 sec 3245 MBytes 53.9 MBytes/sec 0.518 ms 1808496/4123433 (44%)
Ok the results look more reasonable, expecially that TCP with fast datapath is the same as host speed now (of course there is some error variance, but shouldn’t be that much).
Flannel Test
Flannel seems to be able to automatically detect the network MTU, so we don’t need to care about it.
To start Flannel,
# On both hosts
$ nohup ./flannel-0.5.5/flanneld &
$ source /run/flannel/subnet.env
$ nohup docker daemon --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU} &
Now Docker daemon is configured to use Flannel’s overlay network.
Launch containers and run tests as before.
# Output
# ===TCP===
[ 3] 0.0-60.0 sec 6866 MBytes 114 MBytes/sec
[ 3] MSS size 8899 bytes (MTU 8939 bytes, unknown interface)
# ===UDP===
[ 3] 0.0-60.2 sec 3086 MBytes 51.2 MBytes/sec 13.546 ms 1920728/4122179 (47%)
As we see, the performance is nearly the same as host speed.
Conclusions
Setup | TCP | UDP |
---|---|---|
Host | 115MB/s | 55.7MB/s |
Weave without Fast Datapath, default MTU | 86.1MB/s | 43.9MB/s |
Weave with Fast Datapath, default MTU | 42.4MB/s | 23.9MB/s |
Weave without Fast Datapath, MTU=8950 | 83.9MB/s | 64.9MB/s |
Weave with Fast Datapath, MTU=8950 | 115MB/s | 53.9MB/s |
Flannel with backend=vxlan | 114MB/s | 51.2MB/s |
- Weave with Fast Datapath (and correct MTU) seems to have the same performance as Flannel.
- UDP performance seems to be varying quite a lot during the tests, so the values above may not be representative. Indeed, when I used
-l 8950
iniperf
, the performance dropped to ~30MB/s. I’m not sure if this is due to some kind of throttling on AWS, but this happens with all of the setups. - Weave fast datapath currently doesn’t support encryption, so it’ll be down to the application to do that.
Written with StackEdit.
Why the speed for UDP is slower than TCP?
ReplyDeleteI think it's because of iperf. It does some internal processing on each packet which slows down the data rate. I will try to run the test again and update the results.
Delete