Emergency Planned Work :: Switch Reboot in Telehouse West

We have seen a few brief drop outs during today for a small number of customers due to a switch in Telehouse West having forwarding issues on a number of core VLANs passing thought that Data Centre.

We’d been planning to announce a switch reboot late this evening to resolve this – however the drops have become more regular this afternoon and we have taken the decision to reboot the switch this afternoon to clear the issue.

We are aware of the cause of the issue (a bug in Juniper OS that sometimes strikes after  a config update and had plans to upgrade the switches to  more powerful model last week – However as one of our engineers has COVID this work has been slightly delayed.

We can see the switch is now back up and passing traffic – sorry for the lack of notice here but we wanted to restore the stability as quickly as we could and ensure that all services work as they should

Switch Issue – Telehouse West [9/3/22]

We are currently seeing an issue with some traffic routing via Telehouse West – we currently believe one of our core switches has a forwarding issue,

Our NOC team are currently working to resolve this which MAY invoice a brief reboot of the affected switch – causing a short downtime for services directly connected to that switch

We apologise for the brief and short notice reboot but want to resolve the issue as quickly as we can

[Update]
The switch is now back from the reboot and we are seeing ttraffic and routing back as it should be – we will investigate further the cause here as a background task to ensure it does not re-occur

 

 

At Risk – Huntingdon Data Centre / Supplier Link Down 31/1/2022 [resolved]

We are aware of one of the two 10G Link lines to Huntingdon down (this is not the link worked on over the weekend)

We have a case raised to our supplier and are waiting for updates. This is not service affecting but removes resilience for our Huntingdon Data Centre and our Manchester PoP.

We will update this as we know more

[Update 15:30]
Virgin Media have confirmed this is the result of a fibre break in Leicester. This is being worked on and we will update you as we hear. As before this is not currently affecting service but does leave our service at risk.

[Update 19:30[
The link came back approx 17:15 and Virgin have now confirmed they located damaged fibre which has now been re-spliced. Virgin have now closed the fault. We will continue to monitor however we do not anticipate any further impact

 

Reboot – THW Core Switch 23/12/2021 [resolved]

We are seeing some VLANs on our core network flap when they pass through a core switch in Telehouse West.

We may need to reset or reload this switch to clear this issue. This MAY cause some instability to the network for a short period and services hosted in Telehouse West may drop tor 15-20 minutes

We will update this as investigations continue

[update 1 – 23:30]

We have reloaded one of our core Juniper switches and are monitoring for stability.

[Update 24/12 9:44]

We have monitored the network after the switch reload at 11pm last night – and this has stayed stable with no further flaps. While the switch had no error logs during the incident it was randomly blocking traffic on some VLANs including a backup vlan between two core routers in Telehouse East and North. This was causing some packet loss / drops as the traffic was switching between links. We apologise for the issues seen however while we are aware this issue started during the day we needed to perform the reboot out of hours since it caused a 10-15 minute outage to services directly connected.

UPDATE: Outage on some broadband lines [13/10/2021]

14:46 UPDATE
The supplier advises their issues should now have been resolved, however it may take some time for all circuits to reconnect due to the increased load of subscribers attempting to login to their Radius. Our apologies again for this supplier outage.
13.20 UPDATE
Good afternoon.
The supplier had a power outage at one of their data centres that’s affected their network transit and backhaul and therefore some

lines. We are seeing some lines return but not all and will update here as we get more news.

————-
Good afternoon.
We are aware of an issue within one of our suppliers we use for these lines. At this point we don’t have an announcement or full details but we are aware of this. The issue is also affecting their portal which we access and their status page isn’t working.
We are trying to get an update on this which we will pass on as soon as we can.
Note: this issue is not on the Merula network but within the supplier network. Our apologies to this circuits affected.

UPDATE: 14th April 20:45

Once again, please accept our apologies for the problems you’ve seen over the previous couple of days. We realise that this has caused you all serious issues and for that, we’re very sorry.

Various internal changes have been implemented over the last 48 hours and currently, we believe that the network and associated services are now stable and will remain that way. We continue to monitor the situation closely to ensure that our network remains stable and there’s no further impact to your services.

Please email us in the normal way if you have any question or concerns. Thanks again for your support through this incident.