Issue with hex.cr1 – RESOLVED

We are aware of an issue with one of our Juniper Core Routers in Harbour Exchange Square.

The vast majority of services have routed round this and are not affected. As small number of directly connected customer may be seeing an issue

Our Engineers are working to restore service to this router. We will update this as we know more

[Update 18:40]
A remote hands reboot of the core router has not restored service. Our Engineers are therefore en-route to the data centre to investigate and restore service – we expect them to be on site by approx 20:15 this evening. We will update as soon as they arrive.

[Update 20:15]
Our Engineer is approx 20 minutes from the Data Centre they have collected spare parts on route in case any hardware needs swapping out. Next update by 21:15

[Update 21:15]
Our engineer is on site and working on the core router. We can see the file system is corrupt which means the router did not boot when power cycled. We are working to restore this asap. In parallel we have moved some lines onto the switch on site which is working. We are moving other links as well to restore their service. Next update within an hour

[Update 22:45]
The Router is now booted after the disk corruption was cleared. Config is being copied back & applied. We hope to have service restored very shortly. Next update by 23:30

[Update 23:20]
The router is now back and passing traffic – we have not yet enabled all peers so some traffic may take a slightly different route to ‘normal’ but no customer services are now impacted. As part of the recovery the software upgrade planned for this router has been applied so the planned work for that upgrade is no longer needed

[Update 23:30]
During the checking process we have detected the need for a reboot to ensure the router and config are updated completely. The reboot is in progress and we will expect this to complete in 10-20 minutes

[Update 23:50]
The reboot cleared the router alarm and all routing is now back up. Monitoring is now showing all links working as they should be. There will be some further checks to complete – however we do not expect any further issues. We apologise for any customers affected by the issues this evening.

 

UPDATE: 14th April 20:45

Once again, please accept our apologies for the problems you’ve seen over the previous couple of days. We realise that this has caused you all serious issues and for that, we’re very sorry.

Various internal changes have been implemented over the last 48 hours and currently, we believe that the network and associated services are now stable and will remain that way. We continue to monitor the situation closely to ensure that our network remains stable and there’s no further impact to your services.

Please email us in the normal way if you have any question or concerns. Thanks again for your support through this incident.

Ongoing DDoS attack against our network.

23:30 Again, our apologies

We continue to undertake remedial work to mitigate this ongoing attack.

We will update here as usual.

————

We are currently seeing a new, large-scale DDoS attack against our IP range. We are working to mitigate this but some services are being affected, with packet loss, routing failures or intermittent outages. Some email delivery will be queued until this is resolved.

We will update here as usual.

13:06 UPDATE

We are mitigating a large portion of this attack traffic but currently, the transit links remain saturated which is causing the current ongoing problems. We continue to work to resolve this as quickly as possible & apologise for the ongoing inconvenience caused.

13:44 UPDATE

We are seeing most services recovering. The attack target remains offline but we believe that this incident is now contained. We apologise again for this interruption in service. If you are still seeing issues, please restart your equipment. Tickets can be raised now in the normal manner & the support line remains ready to assist.

17:43 UPDATE

The offsite server that hosts the status NOC site went down during the afternoon. Purely coincidental, but it meant we weren’t able to access it to add more frequent updates. It’s now back and we’ll update the status on the DDoS attack issues shortly. Our apologies that this wasn’t available when it was needed the most.

10th April 2021 – internet issues [update]

We are currently seeing a large scale DDoS attack against our IP range.

This will lead to significant packet loss and access issues to our customers. Our NOC team are already at work to mitigate this. We will post a further update as soon as we have it

[Update 11/04/21 – 10:00am]
We believe the issue cleared shortly after 7pm yesterday. We are still monitoring this closely however we do not believe there is currently any ongoing customer impact

ISSUE: Virgin Media circuits – RESOLVED

We have seen alerts that some circuits from Virgin Media have dropped. We believe this is a Virgin Media issue but are currently investigating.
 
We will update this status as soon as we know more and within the hour at the latest.
 
UPDATE: 16:09
Virgin have confirmed an issue at one of their hub sites affecting some parts of Hertfordshire. Senior engineers and the Core Incident team are working on this at the moment.
 
We will have another update — if it’s not restored — within approx 30 minutes.
 
UPDATE: 16:20
We are seeing the affected circuits back on line – at this point we don’t have an ‘all clear’ so they should be seen at risk – but we hope this is resolved
 
RESOLVED 16:20
We have received the following update from Virgin Media
Please be advised that @ 16:22
Restoration Details:IOM Card 2 and MDA Card 1 remotely reseated on T0090 Luton Metnet 2a.
restoring all services.
 
This appears to have been a card failure at one of the Virgin Media Hub Sites – we apologise for the issues that occured here

Rack Issue – Huntingdon 16/4/2020 [update]

There appears to have lost power and/or  a switch failure in a single rack in our Huntingdon Data Centre. This dropped approx 2am this morning. This rack houses a small number of Merula and customer servers

We are aware and will investigate and resolve this asap. We are planning to be on site approx 7am and will resolve this issue then

We apologise for any issues this may cause and will update this as soon as we have more details

 

[update 8:15am]

The issue appears to be related to the switch in the rack – after being offline for approx 90 minutes the switch came back up and connectivity was restored to most servers in the rack. However we are seeing issues with connectivity via a couple of servers in this rack. We have therefore taken the decision to manually reboot the switch to see if this restores service given that the servers themselves look OK and have not rebooted. This will unfortunately result in a loss of connectivity to all services in this rack for a couple of minutes. We will update this as we know more

[Update 8:53]

The switch was rebooted – and the latest saved config has been re-applied. This we believe has restored service to the services we are aware of that had an issue. We are continuing to check for anything else with an issue and are investigating the cause of the switch outage further. We may have to schedule a swap out of the switch if we cannot locate an obvious issue here. However we believe that currently all services in Huntingdon should now be restored. Please do email support if you continue to see any issues

 

[Update 9:20]

This affected switch appears to have failed again. We will now start swapping this out for a replacement switch. We will have an update within the next 45 minutes

[Update 11:30AM]

The switch was replaced and we believe all services have recovered. We are checking for any remaining issues. If you are are seeing any issues please do raise them with support@merula.net. We will update this further later in the day or if we locate any remaining issues