[NCLUG] Connectivity problems to ebay, (and possibly others)

John L. Bass jbass at dmsd.com
Tue Dec 27 13:57:17 MST 2005


Hi Guys,

For the last month I've been working to resolve a very interesting problem
connecting to ebay (and I believe doubleclick and a few dns servers that
cause SERVFAIL's from bind). Pages served from various servers hang till
they timeout the browser, and then the browser retries with a different
server. I've seen the problem mostly on various ebay servers, but also
a similar problem with ads served by various other services (may be a
different problem) - mostly doubleclick.

The problem is most likely a checksum or CRC failure in a router or load
balancer between CWX and ebay's Denver connection. The problem changes with
as little as the IP TTL field changing, and is easily changed by changing
source IP addresses which cause a different subset of the following addresses
to fail (sometimes intermittantly):

        root at fastbox log]# telnet 216.113.178.140 80
        telnet: connect to address 216.113.178.140: Connection timed out
        [root at fastbox log]# telnet 216.113.178.140 80
        telnet: connect to address 216.113.178.140: Connection timed out
        [root at fastbox log]# telnet 216.113.185.41 80
        telnet: connect to address 216.113.185.41: Connection timed out
        [root at test ~]# telnet 216.113.178.10 80
        telnet: connect to address 216.113.178.10: Connection timed out
        [root at Bass ~]# telnet 216.113.176.11 80
        telnet: connect to address 216.113.176.11: Connection timed out

On several Class C's I've tested from, there are IPs which never seem to
fail on any of these, some which fail on one or more every time, and
others which only fail one out of ten connections or less. So it's something
of a data pattern sensitive intermittant problem. The problem has been
reproduced at our ISP's help desk intermittantly, which indicates to me
that problem is broader than just our network connection.

For testing I've just telnet'ed to each of the above addresses, and if it
connects, just hit return twice, and up arrow to run again about 5-6 times
each. I've scanned about a dozen IP's on my class C and other machines in
CWX on two different Class C's with similar results.

Our ISP has been unable to determine if the problem is with one of their
routers, at the Denver NAP, or ebay -- and indication from them this morning
is that we will just have to live with the problem since no one else is
complaining, and ebay believes it's not their problem. The problem exists
from our ISP with both Level3 and Cogent transport ... with the twist that
the set of source IP's that work/fail changes ... indicating that as little
as changing the TTL in the IP header provokes or avoids the failure, strongly
suggesting a checksum/CRC hardware failure or possibly a hashing failure in
a load balancer.

Try telneting to each of the addresses above a few times and let me know if
you are also seeing the problem. If you are, call your ISP's help desk and
report it too.  If they reply "no problem found", be persistant.  If you have
more than one public IP address, try it from several. Have your friends and
coworkers do the same.

Thanks,
John Bass
CWX.net



More information about the NCLUG mailing list