Creating a Unified Virtual LAN

04 September 2020

See the previous episode for the concept.

Now it is the time to transform this setup:

Into this - so that every virtual appliance can reach each other, while having an internet connection. We’re pretty much simulating a LAN network here.

Planning the network

In the end we will have two special networks:

  1. virtualnet0 - the unified virtual LAN, the goal of this project:

    • I’m going to use the subnet 172.24.0.0/16
    • I’m assigning the host the address 172.24.0.100.
    • The default gateway of this subnet is going to sit at 172.24.0.1. We will examine that in more depth later.
    Services Address pool
    Docker containers 172.24.1.x
    QEMU hosts 172.24.2.x
    VirtualBox hosts 172.24.3.x

    I’m turning DHCP off and I am going to assign all IP addresses manually.

  2. router_veth0 -- router_veth1: a dedicated connection for the default namespace to the PAT router.

We are going to move enp0s25 from the default namespace to the PAT router’s stack. Therefore, we need a link to the router in order to keep our Internet connection.

Creating the starting scenario

A virtual machines on vboxnet0

I already have a brand new OpenSUSE Leap 15.2 installation in a virtual machine, that is suitable for testing the network that can be reached through the vboxnet0 interface. The host-only networking mode of VirtualBox creates the setup that can be seen in the first figure.

The IP address and the subnet mask that is being configured here is the one that is being assigned to the host interface. I’m ignoring this setting now as I am going to reconfigure the interface later manually.

$ ip addr show vboxnet0
10: vboxnet0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 0a:00:27:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.1/24 brd 192.168.56.255 scope global vboxnet0
       valid_lft forever preferred_lft forever
    inet6 fe80::800:27ff:fe00:0/64 scope link 
       valid_lft forever preferred_lft forever

VirtualBox survival tips

I hope I’m saving you some spare hours:

  • It tends to reset the vboxnet0 interface and reassign the IP address it thinks we need. Check it sometimes and flush it if it does so. It happens typically when you start or shut down a vm.
  • Sometimes packets appear in the machine, you can even see them in tcpdump output but the end application does not receive it. I have no idea why it happens and spent a lot of times trying to figure out. But, sometimes, if you shut down the vm, and turn it back again, flush and configure the interface again, the problem “solves” itself. If you know the reason, please email me to state it here.
    • For me, it always happened with bridged networking mode. You should probably stick to the host-only networking mode.
  • Stick to the linux-lts kernel on the host OS. It can crash the Zen kernel and other variations.

A Docker container in an isolated network segment

First I’m creating a new docker network without the built-in PAT. The first chart does not actually depict this situation exactly. With this setup, I’m essentially cutting the Internet connection - the “Virtual PAT router” part - and start from a clean state, where none of the three networks has a link to the outside world.

# docker network create --internal docker_custom

This command has created the bridge br-8670eabfe12d. (this bridge certainly does not know what abuses it is going to face)

I’m starting up a container connected to that network. I’m grabbing the image praqma/network-multitool for testing. I’m leaving it now sleeping, for later access via docker exec.

# docker run -d --name dockernet_tester  \
	--network docker_custom --cap-add=NET_ADMIN \
	--rm praqma/network-multitool sleep infinity

Without the --cap-add=NET_ADMIN option, we won’t be able to alter the network settings from the inside.

For now, I’m ignoring the IP address that Docker assigned for the host OS and for the container.

An unhooked QEMU machine

I’m creating a QEMU machine that provides a TAP virtual network interface for the host.

For this, I’m using a previously set up minimal Alpine installation. It has only a minimal set of tools installed such as vim and some network utilities.

# qemu-system-x86_64 \
        -m 256 \
        -display sdl -vga qxl \
        -net nic -net tap,ifname=qemutap0,script=no,downscript=no \
        -hda ./alpine.qcow2

I’m running QEMU without --enable-kvm. I’m doing so because only one hypervisor is able to make use of the hardware virtualization capabilities of the host machine. I’m compensating that by utilizing a very lightweight distribution.

The new network interface appears as expected in the host OS:

# ip addr set dev qemutap0 up
# ip addr show qemutap0            
8: qemutap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether f6:9c:37:be:7a:d8 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f49c:37ff:febe:7ad8/64 scope link 
       valid_lft forever preferred_lft forever

Creating the PAT router namespace

This is the network namespace that is going to be responsible to connect the appliances and the host through enp0s25 to the outside world.

# ip netns add router

Be careful, as the next steps will temporarily cut your Internet connection. If you do not succeed, you may lose it “permanently”. Well, until the next reboot, because all these tweakings are ad-hoc and they won’t get applied at system boot. You may save this page for offline reading ;)

# ip link set dev enp0s25 netns router

## assuming that you get your IP via DHCP on your local network
# ip netns exec router dhclient enp0s25		

Setting up PAT

Now the router namespace has a usable link, however, our default namespace does not. Thus, we do not have an Internet connection right now, feel free to check it. It is time now to configure our router for network address translation.

  • Prefix a command with ip netns exec router to execute it in the router namespace.
  • Execute ip netns exec router bash to get a shell in the namespace.

We’re applying a default-deny policy for the forwarded packets:

# ip netns exec router iptables -P FORWARD DROP

We’re enabling IP forwarding in the network namespace.

# ip netns exec router echo 1 > /proc/sys/net/ipv4/ip_forward

IP forwarding is the ability for an operating system to accept incoming network packets on one interface, recognize that it is not meant for the system itself, but that it should be passed on to another network, and then forwards it accordingly. (ref: openvpn.net)

If we did not configure this, the incoming packets would be ignored on router_veth1 interface.

We are establishing a link (router_veth0 -- router_veth1s) to connect the default and the router namespaces. We use a VETH pair to achieve that.

# ip link add router_veth0 type veth peer name router_veth1
# ip link set dev router_veth1 netns router

# ip netns exec router ip addr add 10.0.22.1/30 dev router_veth1
# ip netnes exec router ip link set dev router_veth1 up
# ip addr add 10.0.22.2/30 dev router_veth0
# ip link set dev router_veth0 up

From now on, instead of enp0s25, router_veth0 is our exit to the outside network (in the default namespace). Therefore, we need to update our default gateway.

# ip route add default dev router_veth0 via 10.0.22.1

The default gateway for the PAT router is enp0s25. I like to think of it as the WAN port of the virtual router. It is configured automatically by my physical LAN network’s DHCP service, but you may add it manually.

# ip netns exec router ip route add default dev enp0s25 via 192.168.0.1

Configuring PAT with IPTables MASQUERADE

This kind of NAT setup maps one IP address (the one we have on enp0s25, our network card) to many IP addresses (the virtual network’s appliances). By default, every client on the network can initiate connections to the outside world, but the outside world is only allowed to respond to existing connections.

That is, we can request data from google.com but google.com can not initiate sending packets behind our NAT, only if it is responding to one of our requests.

Not only is it disallowed, the remote server would not even have an IP address in the range of its network to specify. What it can do instead, is sending the response to our public IP address, directing the answer to the appropriate source port, from where it has got our packets. Our router maps these ports dynamically to our clients and forwards packets to the appropriate one back. You may learn more about PAT in the CCNA study guides.

We are configuring IPTables rules of the PAT router stack to achieve this exact behavior.

  1. If we have no idea what to do with a particular packet, we choose the most secure option. That is, we drop it.

     # ip netns exec router iptables -P FORWARD DROP
    
  2. We configure the outbound interface and ask IPTables to hide (mask) our internal IP addresses behind the public IPs. Without this step, the remote server would receive our packet with a source address valid only in our internal network. There would be no way to know where to send back the response.

     # export WAN=enp0s25
     # ip netns exec router iptables -t nat -A POSTROUTING -o $WAN -j MASQUERADE
    
  3. We allow the clients to initiate traffic from the inside, and allow the outside world to respond to existing connections.

     # export WAN=enp0s25 LAN=router_veth1
     # ip netns exec router \
         iptables -A FORWARD -i $LAN -o $WAN -j ACCEPT
     # ip netns exec router \
         iptables -A FORWARD -o $LAN -i $WAN -m state --state RELATED,ESTABLISHED -j ACCEPT
    

    To enable the outside world (everything which is past our network card) to initiate connections to our servers runnning on the host machine, we either have to apply port forwarding with IPTables one-by-one, or choose a special interface to forward the packets to by default.

    Our virtual network setup works similarly to a plain old regular LAN network, where you have to configure the router explicitly to allow external connections getting in.

Connecting the network segments

Now, that we have a working router stack, we are creating a new virtual bridge, combining and connecting all our virtual appliances to it, as well as our host. We are going to connect the router to it too, to provide a gateway to the Internet.

First, we create the virtual bridge with the interface virtualnet0.

# ip link add name virtualnet0 type bridge
# ip link set virtualnet0 up

Then I set the host OS IP:

# ip addr add 172.24.0.100/16 dev virtualnet0

Now it is time to plug everything together according to the second chart. First, we plug in the virtual interfaces of QEMU and VirtualBox into the bridge:

# ip addr flush vboxnet0
# ip link set vboxnet0 master virtualnet0

# ip link set qemutap0 master virtualnet0

After that, we create a VETH pair to connect the Docker and the virtualnet0 bridges. Unfortunately, longer names for interfaces are not allowed.

# ip link add veth-dv-0 type veth peer name veth-dv-1
# ip addr flush br-8670eabfe12d
# ip link set veth-dv-0 master virtualnet0
# ip link set veth-dv-1 master br-8670eabfe12d

# ip link set veth-dv-0 up
# ip link set veth-dv-1 up

Finally, we check the results - we print the virtual cables connected to our virtual switches (bridges):

# bridge link
10: vboxnet0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master virtualnet0 state disabled priority 32 cost 100 
52: qemutap0: <BROADCAST,MULTICAST> mtu 1500 master virtualnet0 state disabled priority 32 cost 100 
55: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-8670eabfe12d state forwarding priority 32 cost 2 
57: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master br-8670eabfe12d state disabled priority 32 cost 2 
58: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master virtualnet0 state disabled priority 32 cost 2 

Essentially, what we are seeing now is the same as our second network chart.

Assigning the client IP addresses

Docker

# docker exec -it dockernet_tester bash
bash-5.0# ip addr flush eth0
bash-5.0# ip addr add 172.24.1.100/16 dev eth0
bash-5.0# ip route add default dev eth0 via 172.24.0.1

QEMU

# ip addr flush dev eth0
# ip addr add 172.24.2.100/16 dev eth0
# ip link set dev eth0 up
# ip route add default dev eth0 via 172.24.0.1

VirtualBox

# ip addr flush dev eth0
# ip addr add 172.24.3.100/16 dev eth0
# ip link set dev eth0 up
# ip route add default dev eth0 via 172.24.0.1

Duct taping IPTables in the default namespace

We are only going to apply here and there some hacks to make Docker and the virtual appliances cooperate in the new network. What you will see works and is probably secure, but the client programs, especially Docker is not prepared for the scenario that we will override IPTables rules by hand. Worst case scenario, it may interfere with our modifications by accident in a way that may lead to security breaches.

I’m sure there is a much better and more official way of configuring Docker out of the way, but let me do it for now manually anyway, only for demonstration purposes (as always).

IPTables Debugging 101

While testing with ICMP (ping), seeing timed-out requests can be frustrating. For the rescue, there are some tools that can make our work easier and see what is actually happening with our packets in the system. With these, we can more easily pinpoint the place of the required changes.

The tools tcpdump and wireshark are really useful to track which packets appear on what interface.

I also advise installing the Python2 package watchall. It is a replacement for the watch utility with the ability to scroll. It can run a command periodically and show the differences in the output. It will be very useful while you’re debugging the firewall rules.

Once it is installed, we can examine the byte counters in real time for all iptables rules. It is very practical while we’re investigating where requests get dropped.

# iptables -Z  # optionally reset the byte counters
# watchall -n1 -d -- iptables -L -v 

There is another trick I recommend. If you find that one of your iptables chains is dropping packets by default, but you’re not sure what filter rules could you create to avoid that (preferably without side effects), you can append a rule to the end of the chain that matches all the remaining packets (that would otherwise would be dropped) and log them first.

iptables -A FORWARD -m limit --limit 2/min -j LOG --log-prefix "IPTables-Dropped: " --log-level 4

(reference: thegeekstuff.com)

After setting up that rule, you may attempt your request again and examine the system logs, with journalctl -f on a systemd-based Linux for example. You may see lines like that:

aug 30 13:07:42 ceres kernel: IPTables-Dropped: IN=virtualnet0 OUT=virtualnet0 PHYSIN=veth-dv-0 PHYSOUT=qemutap0 MAC=52:54:00:12:34:56:02:42:ac:15:00:02:08:00 SRC=172.24.1.100 DST=172.24.2.100 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=57703 DF PROTO=ICMP TYPE=8 CODE=0 ID=40 SEQ=1 

Avoiding of this packet being dropped leads us to the first iptables abuse:

IPTables hack #1: accepting bridge traffic

Append this rule to allow the appliances on the bridge to reach each other.

iptables -A FORWARD -i virtualnet0 -o virtualnet0 -j ACCEPT

IPTables hack #2: removing the Docker isolation rules

Find the problematic rules with:

# iptables-save | grep br-8670eabfe12d
[[email protected] ~]# iptables-save | grep br-8670eabfe12d
-A FORWARD -i br-8670eabfe12d -o br-8670eabfe12d -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 ! -s 172.21.0.0/16 -o br-8670eabfe12d -j DROP
-A DOCKER-ISOLATION-STAGE-1 ! -d 172.21.0.0/16 -i br-8670eabfe12d -j DROP

Remove them:

# iptables -D DOCKER-ISOLATION-STAGE-1 ! -s 172.21.0.0/16 -o br-8670eabfe12d -j DROP
# iptables -D DOCKER-ISOLATION-STAGE-1 ! -d 172.21.0.0/16 -i br-8670eabfe12d -j DROP

We could, but we do not create them again with the new IP range, because it would block the Internet access of the container. So, do NOT execute these.

# iptables -I DOCKER-ISOLATION-STAGE-1 ! -s 172.24.0.0/16 -o br-8670eabfe12d -j DROP
# iptables -I DOCKER-ISOLATION-STAGE-1 ! -d 172.24.0.0/16 -i br-8670eabfe12d -j DROP

At this point, we should be to ping the host from the container and vice versa.

Tip: it is even better if you try to send some messages with netcat from one host to the other.

Start a server on one host with nc -l -p 5000 and connect to it on the other with nc <ip> 5000. Then, start chatting. The same messages should appear on both sides.

Connecting the network to the internet

The last step is to connect the default gateway to the virtualnet0 bridge. This connection is made by the vnetveth0-vnetveth1 pair.

# ip link add vnetveth0 type veth peer name vnetveth1
# ip link set dev vnetveth1 netns router
# ip netns exec router ip addr add 172.24.0.1/16 dev vnetveth1
# ip netns exec router ip link set dev vnetveth1 up

# ip link set dev vnetveth0 master virtualnet0
# ip link set dev vnetveth0 up

Enable PAT forwarding for vnetveth1 in the router.

# export WAN=enp0s25 LAN=vnetveth1
# ip netns exec router \
    iptables -A FORWARD -i $LAN -o $WAN -j ACCEPT
# ip netns exec router \
    iptables -A FORWARD -o $LAN -i $WAN -m state --state RELATED,ESTABLISHED -j ACCEPT

Port forwarding example

If you wish to make your servers reachable on your physical LAN IP, you can apply additional rules like this:

# export SERVER_IP=10.0.22.2
# export EXTERNAL_PORT=4000
# export INTERNAL_PORT=4000

# ip netns exec router \
    iptables -t nat -A PREROUTING -p tcp --dport $EXTERNAL_PORT -j DNAT --to-destination $SERVER_IP:$INTERNAL_PORT
# ip netns exec router \
    iptables -A FORWARD -p tcp -d $SERVER_IP --dport $EXTERNAL_PORT -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT

We also need to apply a technique called NAT Loopback in order to allow clients from the internal LAN - the only one which is the host OS sitting at 10.0.22.2 - to access the server at its public IP address. You can read more about this problem here.

The basic idea is that we apply the same MASQUERADE rule that we apply to the outside world if these two criteria meet:

  • The request is from the LAN IP range (10.0.22.0/30 in this case)
  • The request is from the enabled destination port
# export HOST_VETH_NETWORK=10.0.22.0/30
# ip netns exec router \
    iptables -t nat -A POSTROUTING -s $HOST_VETH_NETWORK -d $SERVER_IP -p tcp --dport $EXTERNAL_PORT -j MASQUERADE

Celebrate!

And we’re finally done! We have created a mutual LAN for virtual appliances and simulated a home network setup. All the machines and the container should have an Internet access and should see be able reach each other. We could even create port forwarding rules.

It was not an easy run, but it was fun! Well, sometimes at least. I was quite angry at the end. But it worth it. Really.

(Also, TL;DR for the first question in the previous episode: yes, it is possible.)

Now get back to work ;)