See the previous episode for the concept.
Now it is the time to transform this setup:
Into this - so that every virtual appliance can reach each other, while having an internet connection. We’re pretty much simulating a LAN network here.
Planning the network
In the end we will have two special networks:
virtualnet0- the unified virtual LAN, the goal of this project:
- I’m going to use the subnet
- I’m assigning the host the address
- The default gateway of this subnet is going to sit at
172.24.0.1. We will examine that in more depth later.
Services Address pool Docker containers 172.24.1.x QEMU hosts 172.24.2.x VirtualBox hosts 172.24.3.x
I’m turning DHCP off and I am going to assign all IP addresses manually.
- I’m going to use the subnet
router_veth0 -- router_veth1: a dedicated connection for the default namespace to the PAT router.
We are going to move
enp0s25from the default namespace to the PAT router’s stack. Therefore, we need a link to the router in order to keep our Internet connection.
Creating the starting scenario
A virtual machines on
I already have a brand new OpenSUSE Leap 15.2 installation in a virtual machine, that is suitable for testing the network that can be reached through the
vboxnet0 interface. The host-only networking mode of VirtualBox creates the setup that can be seen in the first figure.
The IP address and the subnet mask that is being configured here is the one that is being assigned to the host interface. I’m ignoring this setting now as I am going to reconfigure the interface later manually.
$ ip addr show vboxnet0 10: vboxnet0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000 link/ether 0a:00:27:00:00:00 brd ff:ff:ff:ff:ff:ff inet 192.168.56.1/24 brd 192.168.56.255 scope global vboxnet0 valid_lft forever preferred_lft forever inet6 fe80::800:27ff:fe00:0/64 scope link valid_lft forever preferred_lft forever
VirtualBox survival tips
I hope I’m saving you some spare hours:
- It tends to reset the
vboxnet0interface and reassign the IP address it thinks we need. Check it sometimes and flush it if it does so. It happens typically when you start or shut down a vm.
- Sometimes packets appear in the machine, you can even see them in
tcpdumpoutput but the end application does not receive it. I have no idea why it happens and spent a lot of times trying to figure out. But, sometimes, if you shut down the vm, and turn it back again, flush and configure the interface again, the problem “solves” itself. If you know the reason, please email me to state it here.
- For me, it always happened with bridged networking mode. You should probably stick to the host-only networking mode.
- Stick to the
linux-ltskernel on the host OS. It can crash the Zen kernel and other variations.
A Docker container in an isolated network segment
First I’m creating a new docker network without the built-in PAT. The first chart does not actually depict this situation exactly. With this setup, I’m essentially cutting the Internet connection - the “Virtual PAT router” part - and start from a clean state, where none of the three networks has a link to the outside world.
# docker network create --internal docker_custom
This command has created the bridge
br-8670eabfe12d. (this bridge certainly does not know what abuses it is going to face)
I’m starting up a container connected to that network. I’m grabbing the image
praqma/network-multitool for testing. I’m leaving it now sleeping, for later access via
# docker run -d --name dockernet_tester \ --network docker_custom --cap-add=NET_ADMIN \ --rm praqma/network-multitool sleep infinity
--cap-add=NET_ADMINoption, we won’t be able to alter the network settings from the inside.
For now, I’m ignoring the IP address that Docker assigned for the host OS and for the container.
An unhooked QEMU machine
I’m creating a QEMU machine that provides a TAP virtual network interface for the host.
For this, I’m using a previously set up minimal Alpine installation. It has only a minimal set of tools installed such as
vim and some network utilities.
# qemu-system-x86_64 \ -m 256 \ -display sdl -vga qxl \ -net nic -net tap,ifname=qemutap0,script=no,downscript=no \ -hda ./alpine.qcow2
I’m running QEMU without
--enable-kvm. I’m doing so because only one hypervisor is able to make use of the hardware virtualization capabilities of the host machine. I’m compensating that by utilizing a very lightweight distribution.
The new network interface appears as expected in the host OS:
# ip addr set dev qemutap0 up # ip addr show qemutap0 8: qemutap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000 link/ether f6:9c:37:be:7a:d8 brd ff:ff:ff:ff:ff:ff inet6 fe80::f49c:37ff:febe:7ad8/64 scope link valid_lft forever preferred_lft forever
Creating the PAT router namespace
This is the network namespace that is going to be responsible to connect the appliances and the host through
enp0s25 to the outside world.
# ip netns add router
Assigning our uplink to the new namespace
Be careful, as the next steps will temporarily cut your Internet connection. If you do not succeed, you may lose it “permanently”. Well, until the next reboot, because all these tweakings are ad-hoc and they won’t get applied at system boot. You may save this page for offline reading ;)
# ip link set dev enp0s25 netns router ## assuming that you get your IP via DHCP on your local network # ip netns exec router dhclient enp0s25
Setting up PAT
router namespace has a usable link, however, our default namespace does not. Thus, we do not have an Internet connection right now, feel free to check it. It is time now to configure our router for network address translation.
- Prefix a command with
ip netns exec routerto execute it in the
ip netns exec router bashto get a shell in the namespace.
We’re applying a default-deny policy for the forwarded packets:
# ip netns exec router iptables -P FORWARD DROP
We’re enabling IP forwarding in the network namespace.
# ip netns exec router echo 1 > /proc/sys/net/ipv4/ip_forward
IP forwarding is the ability for an operating system to accept incoming network packets on one interface, recognize that it is not meant for the system itself, but that it should be passed on to another network, and then forwards it accordingly. (ref: openvpn.net)
If we did not configure this, the incoming packets would be ignored on
We are establishing a link (
router_veth0 -- router_veth1s) to connect the default and the router namespaces. We use a VETH pair to achieve that.
# ip link add router_veth0 type veth peer name router_veth1 # ip link set dev router_veth1 netns router # ip netns exec router ip addr add 10.0.22.1/30 dev router_veth1 # ip netnes exec router ip link set dev router_veth1 up # ip addr add 10.0.22.2/30 dev router_veth0 # ip link set dev router_veth0 up
From now on, instead of
router_veth0 is our exit to the outside network (in the default namespace). Therefore, we need to update our default gateway.
# ip route add default dev router_veth0 via 10.0.22.1
The default gateway for the PAT router is
enp0s25. I like to think of it as the WAN port of the virtual router. It is configured automatically by my physical LAN network’s DHCP service, but you may add it manually.
# ip netns exec router ip route add default dev enp0s25 via 192.168.0.1
Configuring PAT with IPTables MASQUERADE
This kind of NAT setup maps one IP address (the one we have on
enp0s25, our network card) to many IP addresses (the virtual network’s appliances). By default, every client on the network can initiate connections to the outside world, but the outside world is only allowed to respond to existing connections.
That is, we can request data from
google.com can not initiate sending packets behind our NAT, only if it is responding to one of our requests.
Not only is it disallowed, the remote server would not even have an IP address in the range of its network to specify. What it can do instead, is sending the response to our public IP address, directing the answer to the appropriate source port, from where it has got our packets. Our router maps these ports dynamically to our clients and forwards packets to the appropriate one back. You may learn more about PAT in the CCNA study guides.
We are configuring IPTables rules of the PAT router stack to achieve this exact behavior.
If we have no idea what to do with a particular packet, we choose the most secure option. That is, we drop it.
# ip netns exec router iptables -P FORWARD DROP
We configure the outbound interface and ask IPTables to hide (mask) our internal IP addresses behind the public IPs. Without this step, the remote server would receive our packet with a source address valid only in our internal network. There would be no way to know where to send back the response.
# export WAN=enp0s25 # ip netns exec router iptables -t nat -A POSTROUTING -o $WAN -j MASQUERADE
We allow the clients to initiate traffic from the inside, and allow the outside world to respond to existing connections.
# export WAN=enp0s25 LAN=router_veth1 # ip netns exec router \ iptables -A FORWARD -i $LAN -o $WAN -j ACCEPT # ip netns exec router \ iptables -A FORWARD -o $LAN -i $WAN -m state --state RELATED,ESTABLISHED -j ACCEPT
To enable the outside world (everything which is past our network card) to initiate connections to our servers runnning on the host machine, we either have to apply port forwarding with IPTables one-by-one, or choose a special interface to forward the packets to by default.
Our virtual network setup works similarly to a plain old regular LAN network, where you have to configure the router explicitly to allow external connections getting in.
Connecting the network segments
Now, that we have a working router stack, we are creating a new virtual bridge, combining and connecting all our virtual appliances to it, as well as our host. We are going to connect the router to it too, to provide a gateway to the Internet.
First, we create the virtual bridge with the interface
# ip link add name virtualnet0 type bridge # ip link set virtualnet0 up
Then I set the host OS IP:
# ip addr add 172.24.0.100/16 dev virtualnet0
Now it is time to plug everything together according to the second chart. First, we plug in the virtual interfaces of QEMU and VirtualBox into the bridge:
# ip addr flush vboxnet0 # ip link set vboxnet0 master virtualnet0 # ip link set qemutap0 master virtualnet0
After that, we create a VETH pair to connect the Docker and the
virtualnet0 bridges. Unfortunately, longer names for interfaces are not allowed.
# ip link add veth-dv-0 type veth peer name veth-dv-1 # ip addr flush br-8670eabfe12d # ip link set veth-dv-0 master virtualnet0 # ip link set veth-dv-1 master br-8670eabfe12d # ip link set veth-dv-0 up # ip link set veth-dv-1 up
Finally, we check the results - we print the virtual cables connected to our virtual switches (bridges):
# bridge link 10: vboxnet0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master virtualnet0 state disabled priority 32 cost 100 52: qemutap0: <BROADCAST,MULTICAST> mtu 1500 master virtualnet0 state disabled priority 32 cost 100 55: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-8670eabfe12d state forwarding priority 32 cost 2 57: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master br-8670eabfe12d state disabled priority 32 cost 2 58: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master virtualnet0 state disabled priority 32 cost 2
Essentially, what we are seeing now is the same as our second network chart.
Assigning the client IP addresses
# docker exec -it dockernet_tester bash bash-5.0# ip addr flush eth0 bash-5.0# ip addr add 172.24.1.100/16 dev eth0 bash-5.0# ip route add default dev eth0 via 172.24.0.1
# ip addr flush dev eth0 # ip addr add 172.24.2.100/16 dev eth0 # ip link set dev eth0 up # ip route add default dev eth0 via 172.24.0.1
# ip addr flush dev eth0 # ip addr add 172.24.3.100/16 dev eth0 # ip link set dev eth0 up # ip route add default dev eth0 via 172.24.0.1
Duct taping IPTables in the default namespace
We are only going to apply here and there some hacks to make Docker and the virtual appliances cooperate in the new network. What you will see works and is probably secure, but the client programs, especially Docker is not prepared for the scenario that we will override IPTables rules by hand. Worst case scenario, it may interfere with our modifications by accident in a way that may lead to security breaches.
I’m sure there is a much better and more official way of configuring Docker out of the way, but let me do it for now manually anyway, only for demonstration purposes (as always).
IPTables Debugging 101
While testing with ICMP (ping), seeing timed-out requests can be frustrating. For the rescue, there are some tools that can make our work easier and see what is actually happening with our packets in the system. With these, we can more easily pinpoint the place of the required changes.
wireshark are really useful to track which packets appear on what interface.
I also advise installing the Python2 package
watchall. It is a replacement for the
watch utility with the ability to scroll. It can run a command periodically and show the differences in the output. It will be very useful while you’re debugging the firewall rules.
Once it is installed, we can examine the byte counters in real time for all
iptables rules. It is very practical while we’re investigating where requests get dropped.
# iptables -Z # optionally reset the byte counters # watchall -n1 -d -- iptables -L -v
There is another trick I recommend. If you find that one of your
iptables chains is dropping packets by default, but you’re not sure what filter rules could you create to avoid that (preferably without side effects), you can append a rule to the end of the chain that matches all the remaining packets (that would otherwise would be dropped) and log them first.
iptables -A FORWARD -m limit --limit 2/min -j LOG --log-prefix "IPTables-Dropped: " --log-level 4
After setting up that rule, you may attempt your request again and examine the system logs, with
journalctl -f on a
systemd-based Linux for example. You may see lines like that:
aug 30 13:07:42 ceres kernel: IPTables-Dropped: IN=virtualnet0 OUT=virtualnet0 PHYSIN=veth-dv-0 PHYSOUT=qemutap0 MAC=52:54:00:12:34:56:02:42:ac:15:00:02:08:00 SRC=172.24.1.100 DST=172.24.2.100 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=57703 DF PROTO=ICMP TYPE=8 CODE=0 ID=40 SEQ=1
Avoiding of this packet being dropped leads us to the first
IPTables hack #1: accepting bridge traffic
Append this rule to allow the appliances on the bridge to reach each other.
iptables -A FORWARD -i virtualnet0 -o virtualnet0 -j ACCEPT
IPTables hack #2: removing the Docker isolation rules
Find the problematic rules with:
# iptables-save | grep br-8670eabfe12d [[email protected] ~]# iptables-save | grep br-8670eabfe12d -A FORWARD -i br-8670eabfe12d -o br-8670eabfe12d -j ACCEPT -A DOCKER-ISOLATION-STAGE-1 ! -s 172.21.0.0/16 -o br-8670eabfe12d -j DROP -A DOCKER-ISOLATION-STAGE-1 ! -d 172.21.0.0/16 -i br-8670eabfe12d -j DROP
# iptables -D DOCKER-ISOLATION-STAGE-1 ! -s 172.21.0.0/16 -o br-8670eabfe12d -j DROP # iptables -D DOCKER-ISOLATION-STAGE-1 ! -d 172.21.0.0/16 -i br-8670eabfe12d -j DROP
We could, but we do not create them again with the new IP range, because it would block the Internet access of the container. So, do NOT execute these.
# iptables -I DOCKER-ISOLATION-STAGE-1 ! -s 172.24.0.0/16 -o br-8670eabfe12d -j DROP # iptables -I DOCKER-ISOLATION-STAGE-1 ! -d 172.24.0.0/16 -i br-8670eabfe12d -j DROP
At this point, we should be to ping the host from the container and vice versa.
Tip: it is even better if you try to send some messages with
netcatfrom one host to the other.
Start a server on one host with
nc -l -p 5000and connect to it on the other with
nc <ip> 5000. Then, start chatting. The same messages should appear on both sides.
Connecting the network to the internet
The last step is to connect the default gateway to the
virtualnet0 bridge. This connection is made by the
# ip link add vnetveth0 type veth peer name vnetveth1 # ip link set dev vnetveth1 netns router # ip netns exec router ip addr add 172.24.0.1/16 dev vnetveth1 # ip netns exec router ip link set dev vnetveth1 up # ip link set dev vnetveth0 master virtualnet0 # ip link set dev vnetveth0 up
Enable PAT forwarding for
vnetveth1 in the router.
# export WAN=enp0s25 LAN=vnetveth1 # ip netns exec router \ iptables -A FORWARD -i $LAN -o $WAN -j ACCEPT # ip netns exec router \ iptables -A FORWARD -o $LAN -i $WAN -m state --state RELATED,ESTABLISHED -j ACCEPT
Port forwarding example
If you wish to make your servers reachable on your physical LAN IP, you can apply additional rules like this:
# export SERVER_IP=10.0.22.2 # export EXTERNAL_PORT=4000 # export INTERNAL_PORT=4000 # ip netns exec router \ iptables -t nat -A PREROUTING -p tcp --dport $EXTERNAL_PORT -j DNAT --to-destination $SERVER_IP:$INTERNAL_PORT # ip netns exec router \ iptables -A FORWARD -p tcp -d $SERVER_IP --dport $EXTERNAL_PORT -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
We also need to apply a technique called NAT Loopback in order to allow clients from the internal LAN - the only one which is the host OS sitting at
10.0.22.2 - to access the server at its public IP address. You can read more about this problem here.
The basic idea is that we apply the same MASQUERADE rule that we apply to the outside world if these two criteria meet:
- The request is from the LAN IP range (
10.0.22.0/30in this case)
- The request is from the enabled destination port
# export HOST_VETH_NETWORK=10.0.22.0/30 # ip netns exec router \ iptables -t nat -A POSTROUTING -s $HOST_VETH_NETWORK -d $SERVER_IP -p tcp --dport $EXTERNAL_PORT -j MASQUERADE
And we’re finally done! We have created a mutual LAN for virtual appliances and simulated a home network setup. All the machines and the container should have an Internet access and should see be able reach each other. We could even create port forwarding rules.
It was not an easy run, but it was fun! Well, sometimes at least. I was quite angry at the end. But it worth it. Really.
(Also, TL;DR for the first question in the previous episode: yes, it is possible.)
Now get back to work ;)