We are in the process of migrating this forum. A new space will be available soon. We are sorry for the inconvenience.

UDP Packets dropped on the way out of BHS datacenter


gedalya
2017-02-11, 11:08 PM
I've run into an issue where UDP packets transmitted over IPv4 from the BHS datacenter to destinations outside of OVH are dropped, when the destination port is 3784 or 6784.

One possible situation where this could be a problem is when running a DNS server. DNS queries are supposed to come from a random source port, so occasionally the port will be 3784 or 6784. When the reply is sent, it will not arrive at the other end.

This issue was confirmed on 3 different servers, in 3 different accounts for which I am the responsible sysadmin.
If you have a dedicated server at BHS, I expect you should be able to reproduce this problem easily.

If you have a DNS server running, you can test in the following way. Let's assume your DNS server's address is ns1.example.com, and let's assume it's either authoritative for example.com, or is a recursive resolver.

If you run this command from a location *outside* the OVH network:
dig example.com @ns1.example.com
you would expect to get an answer.
If you specify the source port, or run that command enough times for 3784 or 6784 to be chosen randomly, the request will time out.
You can try:
dig -b0.0.0.0#3784 example.com @ns1.example.com
or:
dig -b0.0.0.0#6784 example.com @ns1.example.com
and get this result.

This issue is not specific to DNS. You can try to just send arbitrary UDP packets.
On a server *outside* the OVH network, run:
tcpdump -n host {ovh.server.ip.addr}
On your OVH server, run:
echo Test >/dev/udp/{ip.addr.of.others/3784
echo Test >/dev/udp/{ip.addr.of.others/6784
echo Test >/dev/udp/{ip.addr.of.others/6785
The last packet should arrive, the first two do not show up in my testing.

I have reported this issue to OVH support on January 18. For over two weeks they proved unable to understand what the issue is, and quite unwilling to help. I went back and forth with them several times until at some point they said, without any detail, that "[our experts] did
the same procedure as you and they did not get any issue on their backend". And they require me to boot my server into rescue mode and try to reproduce the issue this way.

I am unwilling to do this, as the support team's conduct has been systematic stonewalling and zero communication skills and zero technical skills. As far as I can tell this issue is 100% reproducible on all servers I have access to and this seems to have nothing to do with system configuration. Moreover, they have confirmed themselves that the packets do go through fine *within* the OVH network, but no concrete word on what happens *at the border* on the way out.
The server is doing useful work and we can not afford to take it down for several hours due to someone's stubborn and unhelpful attitude.

To top it all off, this is a known issue (and already fixed) in Cisco software:
https://quickview.cloudapps.cisco.co...bug/CSCvc67963
I have provided them with this link of course.

My question is, can anyone here take a minute and test this from their own server?