System funktioniert

Vergangene Vorfälle

20th Februar 2020

Es liegen keine Vorfälle vor

19th Februar 2020

Es liegen keine Vorfälle vor

18th Februar 2020

Es liegen keine Vorfälle vor

17th Februar 2020

Current status - network problems / 2

Despite all efforts, the well-known network disruptions still occur. However, the frequency has decreased a little. We have already made enormous efforts to solve the problem. In addition to very hardware-related debugging, we tried to rule out all hardware problems and replaced everything (including the fiber optic cables). Even a complete reset of the devices with subsequent reconfiguration from the scratch did not help. We are currently investigating BGP sessions as the cause. Since occurrence, we have been busy with the elimination of the issue and use all resources available to us.

Update 2020-02-17 @ 16:07 After checking the BGP sessions we have deactivated our free BGP router for the time being. We currently suspect that a customer's session is flapping, pushing too many updates onto the core router, which then causes the routing equipment at Interxion to struggle with the load.

Update 2020-02-17 @ 21:51 We have continued our investigation throughout the day and were able to figure out a issue which occurs on Layer3, which under certain conditions causes traffic destined to a specific host to go over the Routing Engine. This causes high load, which then causes BGP Sessions to flap between states, which is noticeable as short downtime or packetloss. We have implemented further measures to mitigate the impact for now and will keep monitoring the router. We will implement hardware based measures tomorrow morning by installing a seperate switch in order to seperate the traffic.

Update 2020-02-18 @ 11:30 The seperate switch has been installed.

Update 2020-02-19 @ 12:01 The network appears to be stable over the past 24 hours. We found out a bug in Juniper JunOS or at least the Chipset of the QFX devices we use, which led to the repeated issues. We have made extensive changes to mitigate the bug.

16th Februar 2020

Es liegen keine Vorfälle vor

15th Februar 2020

Es liegen keine Vorfälle vor

14th Februar 2020

Es liegen keine Vorfälle vor

13th Februar 2020

Current status - Network problems


Regarding the network problems we had last week, there is unfortunately no solution. As stated in the other status reports, we have already replaced the entire hardware (including all cables and modules). The problem recurs (though somewhat less). The assumption is currently being confirmed that traffic is being processed via the control plane for unknown reasons. This inevitably leads to load problems, in which our sessions also fly. We are currently evaluating the problem together with a Juniper consultant.

Update 15.02.2020 - 12:18: We are currently evaluating to move all traffic from the current QFX5100 to a MX480 in order to exclude any potential issue with a hardware bug, which relates to the Broadcom Chipset which both QFX5100 and QFX3500 uses.