Advanced Connectivity Troubleshooting
This guide covers the basics of troubleshooting connectivity between ClearOS and other network. This guide requires use of command line tools. You will be required to connect to your server via the command line console or through an SSH terminal client like PuTTY.
Identifying the problem and the right tool
Connectivity between networks is often times demonstrated by the OSI model and we will use that here to explain some tools that are useful for detecting problems at various levels.
Data-unit | Layer | Function |
---|---|---|
Data | 7.Application | Network process to application |
Data | 6.Presentation | Data representation, encryption and decryption, convert machine dependent data to machine independent data |
Data | 5.Session | Interhost communication, managing sessions between applications |
Segments | 4.Transport | End-to-end connections, reliability and flow control |
Packet/Datagram | 3.Network | Path determination and logical addressing |
Frame | 2.DataLink | Physical addressing |
Bit | 1.Physical | Media, signal and binary transmission |
On ClearOS, incoming traffic passes into the
- physical network
- logical network
- firewall
- application
For troubleshooting, we will use the following tools:
- ping (included on ClearOS)
- ifconfig (included on ClearOS)
- ethtool (included on ClearOS)
- tcpdump (install at command line with 'yum -y install tcpdump')
Ping
Ping is by far the most useful tool on the internet for troubleshooting issues with connectivity. It is the 'leatherman of the interwebs'. We will use ping throughout this guide for several reasons:
- It's easy
- We can see returns on packet.
- We can see when we are BEING pinged! (See tcpdump)
To ping an IP address from the server, use the following example:
ping 192.168.1.123
When using multiwan, you can specify the interface that you want to use to ping from:
ping -I eth0 8.8.8.8
ifconfig
The ifconfig tool can show you the actual settings of the interfaces on the system. To show all the settings and statistics for all the interfaces run:
ifconfig
To show a specific address include the name of the interface:
ifconfig eth0
This program will show you errors on the interface. This is really important if you suspect that your connection to your ISP or network is 'flaky'.
Let's break down the results line by line:
eth0 Link encap:Ethernet HWaddr 00:11:22:33:44:55 inet addr:10.1.1.1 Bcast:10.1.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:16015427 errors:0 dropped:0 overruns:0 frame:0 TX packets:25563269 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1668374532 (1.5 GiB) TX bytes:34073500057 (31.7 GiB) Interrupt:16
In the above example,
- Line 1 contains
- the network device name as referenced by Linux (eth0)
- The type of network device (Ethernet)
- The real or assumed MAC address of the network card
- Line 2 contains
- the IP address assigned to this logical interface
- the broadcast address used by this interface (listening/sending broadcasts)
- the subnetmask
- Line 3 contains various network flags including,
- the status flag (UP)
- broadcast flag (BROADCAST)
- required resources allocated flag (RUNNING)
- multicast mode flag (MULTICAST)
- size of the maximum transfer unit (MTU)
- priority of interfaces (Metric)
- Line 4 contains
- number of received packets since interface was started (RX)
- number of errors (errors)
- number of dropped packets (dropped)
- number of buffer overruns (overruns)
- number of malformed frames (frame)
- Line 5 contains
- number of transmitted packets since interface was started (TX)
- number of errors (errors)
- number of dropped packets (dropped)
- number of buffer overruns (overruns)
- number of carrier errors (carrier)
- Line 6 contains
- number of collisions (collisions)
- transmit queue length (txqueuelen)
- Line 7 contains
- received bytes (RX bytes)
- transmit bytes (TX bytes)
- Line 8 contains
- the IRQ or Interrupt request resource of the network card (Interrupt)
The ifconfig tool can also bring up or down interface. Be careful using this command because you can break the very communication method that you are using to input commands.
ethtool
The program ethtool is useful for looking at the physical to logical assignments for a particular NIC.
Here is an example of the output from ethtool:
Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off Supports Wake-on: g Wake-on: d Current message level: 0x000000ff (255) drv probe link timer ifdown ifup rx_err tx_err Link detected: yes
One most useful lines here is the last (Link detected). The connectivity of the physical media requires this to say yes.
telnet
Telnet is a useful tool for determining whether or not connectivity is happening on the TCP protocol. While this isn't a useful tool for ICMP or UDP, it is quite valuable for a many protocols which are TCP oriented. The reason why telnet is useful is because of a feature in TCP called a 'synchronize' or SYN packet. At the beginning of a TCP session the 'client' will perform a SYN request and the server will perform a SYN acknowledge. Telnet sees this and starts its session. So if you telnet to a TCP port, you can tell if the server is listening if the telnet program begins its telnet session! This happens before any encryption or authentication so it serves the purpose of a TCP ping. The beauty of this trick is that you don't need a client program to determine if your service and firewall are working properly. Additionally, this works on the server against itself using the loopback IP address so there is nothing extra to install when troubleshooting firewalls connections and services.
There are two outcomes here: Either the telnet client will time out (indicating a firewall or service problem) or it will connect (indicating that the service is running and not being firewalled from the machine with which you are connecting. If connection works, you can exit telnet by typing the keystrokes Ctrl+] (the ']' key is above the enter key on an US keyboard). This will give you the 'telnet> ' command prompt. From that prompt, type 'quit'. This is what it can look like:
Good telnet to localhost SMTP
[root@server ~]# telnet localhost 25 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. 220 server.example.com ESMTP Postfix ^] telnet> quit Connection closed.
Good telnet to localhost non-existant service
[root@server ~]# telnet localhost 26 Trying 127.0.0.1... telnet: connect to address 127.0.0.1: Connection refused
Good telnet to server's SSH port
My-Mac-Computer:~ user$ telnet 192.168.1.1 22 Trying 192.168.1.1... Connected to server.example.com. Escape character is '^]'. SSH-2.0-OpenSSH_5.3 ^] telnet> quit Connection closed.
NOTE: the '^]' escape character is created by pressing the Control and ']' key (ie. Ctrl+]) together
Bad telnet to server's non-listening port
My-Mac-Computer:~ user$ telnet 192.168.1.1 23 Trying 192.168.1.1... telnet: connect to address 192.168.1.1: Connection refused telnet: Unable to connect to remote host
Good 'telnet' to SSL IMAP
You can also take a look at SSL connections much the same way as telnet but you have to wrap your session with the ability to exchange certificates. For this, you can use 'openssl' instead of 'telnet'.
My-Mac-Computer:~ user$ openssl s_client -connect 192.168.1.1:993 -quiet depth=0 C = CA, L = Toronto, O = ClearOS, OU = ClearOS, CN = system.lan, emailAddress = noreply@localhost verify error:num=18:self signed certificate verify return:1 depth=0 C = CA, L = Toronto, O = ClearOS, OU = ClearOS, CN = system.lan, emailAddress = noreply@localhost verify return:1 * OK [CAPABILITY IMAP4rev1 LITERAL+ AUTH=PLAIN] Zarafa IMAP gateway ready a1 LOGIN testuser asdfqwerQWER a1 OK [CAPABILITY IMAP4rev1 LITERAL+ CHILDREN XAOL-OPTION NAMESPACE QUOTA IDLE] LOGIN completed a2 LIST "" "*" * LIST (\HasChildren) "/" "Public folders" * LIST (\HasNoChildren) "/" "INBOX" * LIST (\HasNoChildren) "/" "Outbox" * LIST (\HasNoChildren) "/" "Deleted Items" * LIST (\HasNoChildren) "/" "Sent Items" * LIST (\HasNoChildren) "/" "Drafts" * LIST (\HasNoChildren) "/" "Junk E-mail" a2 OK LIST completed a3 EXAMINE INBOX * 3 EXISTS * 3 RECENT * FLAGS (\Seen \Draft \Deleted \Flagged \Answered $Forwarded) * OK [PERMANENTFLAGS (\Seen \Draft \Deleted \Flagged \Answered $Forwarded)] Permanent flags * OK [UIDNEXT 315623] Predicted next UID * OK [UNSEEN 3] First unseen message * OK [UIDVALIDITY 736314] UIDVALIDITY value a3 OK [READ-ONLY] EXAMINE completed a4 LOGOUT * BYE Zarafa server logging out a4 OK LOGOUT completed
tcpdump
The magic starts to happen with tcpdump. Why? tcpdump has the ability to show you every packet that goes in and out of your box. The problem though, it that can be overwhelming. If you cannot ping an IP address, then there is a way to tell what side of the OSI stack is the problem.
Can see the packet in TCPDUMP
- problem exists with service (check service status and log files for that service)
- problem exists with iptables firewall (check connectivity with firewall disabled or with different/custom firewall rules)
Can NOT see the packet in TCPDUMP
- Check status of NIC (ethtool, ifconfig config and errors)
- Check driver of NIC (dmesg or kernel messages in logs)
- Check hops between NIC and source (troubleshoot by eliminating switches and connecting directly with crossover cable)
We've identified that tcpdump can be quite useful, so let's give some examples.
To see ping requests involving eth2:
tcpdump -i eth2 icmp[icmptype]=icmp-echo
To see ping replies involving eth2:
tcpdump -i eth2 icmp[icmptype]=icmp-echoreply
To see ping requests and replies involving eth2:
tcpdump -i eth2 icmp[icmptype]=icmp-echoreply or icmp[icmptype]=icmp-echo
To see ping request and replies from only one HOST on the internet involving eth2:
tcpdump -i eth2 host 8.8.8.8 and icmp[icmptype]=icmp-echoreply or icmp[icmptype]=icmp-echo
Ok, so that covers ping but you may be having perfect pings but not some other service. For example, you may be running ClearOS as a mail server but you aren't getting any mail. Is it YOUR box that is blocking mail or is your ISP firewalling your mail. Here, tcpdump comes to the rescue.
To see all traffic from a specific port (25) involving eth0:
tcpdump -i eth0 port 25
To see ALL traffic involving just one host:
tcpdump host 8.8.8.8
To see all traffic from a specific host and port 25 involving eth0:
tcpdump -i eth0 port 25 and host 8.8.8.8
Telnet is a great program for diagnosing problems with TCP. The first stage of TCP is that TCP send a synchronize packet (SYN). Basically, the host says, 'are you listening?' and the server says, 'yes'. A SYN packet exchange can look like this via tcpdump:
11:48:56.049895 IP 16-7-1-40.ip.xmission.com.52721 > ia-in-f27.1e100.net.smtp: Flags [S], seq 2217988302, win 14600, options [mss 1460,sackOK,TS val 237735685 ecr 0,nop,wscale 7], length 0 11:48:56.104055 IP ia-in-f27.1e100.net.smtp > 16-7-1-40.ip.xmission.com.52721: Flags [S.], seq 4106579329, ack 2217988303, win 14180, options [mss 1430,sackOK,TS val 4918825 ecr 237735685,nop,wscale 6], length 0
Ok. That looks like a bunch of gobbley-goop but what you see here is that the host (16-7-1-40.ip.xmission.com) sent a SYN packet (Flags [S]) to the server (ia-in-f27.1e100.net) and then in line 2, the server (ia-in-f27.1e100.net) sent back to the host (16-7-1-40.ip.xmission.com) a SYN packet (Flags [S.]) as an acknowledgement (ack 2217988303).
So if this is getting blocked, you will see one and not the other. Telnet is disabled by default in Windows but you can easily turn it on.
This is what a telnet session may look like:
[root@home]# telnet alt1.aspmx.l.google.com 25 Trying 74.125.133.27... Connected to alt1.aspmx.l.google.com. Escape character is '^]'.
The tcpdump program will display the first half of this SYN request.
Host
[root@home]# telnet alt1.aspmx.l.google.com 25 Trying 74.125.133.27...
Server
11:48:56.049895 IP 16-7-1-40.ip.xmission.com.52721 > ia-in-f27.1e100.net.smtp: Flags [S], seq 2217988302, win 14600, options [mss 1460,sackOK,TS val 237735685 ecr 0,nop,wscale 7], length 0
And the reply will look like this:
Host
Connected to alt1.aspmx.l.google.com. Escape character is '^]'.
Server
11:48:56.104055 IP ia-in-f27.1e100.net.smtp > 16-7-1-40.ip.xmission.com.52721: Flags [S.], seq 4106579329, ack 2217988303, win 14180, options [mss 1430,sackOK,TS val 4918825 ecr 237735685,nop,wscale 6], length 0
To cancel telnet on the host, type Control+], it will return you to the telnet shell and you can quit.
^] telnet> quit Connection closed.
traceroute
In Windows, the command is 'tracert'.
mtr
'My traceroute' combines the functions of 'traceroute' and 'ping' into one network diagnostic tool.
'mtr' is not install by default on ClearOS; to install on ClearOS, execute the following command:
yum -d1 -y install mtr
Problems
Now that you have the tools let's talk about some of the typical problems one might run into and how to fix them.
MTU
MTU (Message Transfer Unit) is the size of the packet on the network. Most systems use the default of 1500 but there are cases where your ISP may force a restriction on you in order to provide connectivity. If your ISP has told you about an MTU restriction, then you must tweak your configuration in order to make things work properly.
It can be difficult to diagnose a bad MTU but there are some symptoms. For example:
- you can SSH to your server but it hangs when you do certain commands.
- you can ping the server but can't send mail to it even though it work from SSH to the localhost.
- small messages deliver but big messages never do.
- simple webpages work but big ones don't
If this is the case, call your ISP and find out what the MTU setting is.
Setting MTU for your ISP interface
To add an MTU to your interface, modify the file that correlates with your external interface, for example:
vi /etc/sysconfig/network-scripts/ifcfg-eth0
You will add the following line with the appropriate MTU value from your ISP (1380 in our example):
MTU="1380"
Setting MTU restrictions for your gateway
If you are using ClearOS as a gateway, you may need to 'clamp' the interface so that Don't Fragment (DF) flagged packets are free from that restriction. This will allow your packets to resize to the MTU of the interface. Use the custom firewall tool to add this permanently (you can test it also by inputing it from command line.
To fix this, force a clamp on the iptables by adding this custom firewall rule (replace 'eth0' with the interface name for your external interface):
iptables -t mangle -A POSTROUTING -p tcp –tcp-flags SYN,RST SYN -o eth0 -j TCPMSS –clamp-mss-to-pmtu