[CLOSED] Connectivity issues

UPDATED:

Connectivity and routing problems occurred yesterday and today which limited connectivity (depending on a couple of strange routing problems inside the both the BT and Level3 networks) for a handful of our customers only. All our testing today seems to show this as cleared now. If you’re still aware of problems, please first switch off your router, wait 30 minutes and restart it. If after this you still need assistance, please email support@merula.net

We are in the process of re-routing around the Level3 network.

We’re aware of a few reports of connectivity issues, either failed name resolution or ADSL connection failures. We’re currently unaware of the reason but will update here as soon as we know more. Apologies for this.

ADSL issues [resolved]

UPDATE: This appears to have been an issue inside BT which has now been cleared by them.

We are seeing a few reports of users unable to login or authenticate ADSL sessions. We can see no problems here at the moment but are investigating further. If you are having problems, please email support@merula.net or call into the support line.

Some servers are down [resolved]

A core internal Cisco switch just died. We are waiting for the replacement to “learn” the current network and this should be in place within approx. 20 mins. Currently, this means that following servers are inaccessible and next to them is the effect this may have. Please email support@merula.net if you have any questions.

Radius authentication: Possible slow logins on ADSL
DNS backup server: Should have very little effect
One backup mail server Possible slight delays in mail being delivered.
One shared web server The sites hosted on here will be down

Maintenance work in our London suites on 23rd June [completed]

The below work is now complete. There was a short outage in both SOV and HEX we believe all servers are now working as they are. If you have any issues please contact support.

We are planning on some preventative maintenance work to be undertaken on various items of equipment in our suites at Harbour Exchange Square (HEX) and Sovereign House (SOV) in London.

The works planned are as follows.

At SOV:

(1) Install new UPS batteries as the existing ones are reaching the end of their planned life. This operation is designed to be performed whilst the UPS is running. It should not therefore affect the electrical supply to this cabinet. However, we have a second, spare, tested UPS on-site in case anything does break on the first one and we then lose power.

Services or items that could be affected, if this UPS work fails and we have to fall-back to the spare, lasting for a period of no more than 30 minutes are:

– RADIUS authentication – we have a backup server running in another location, so ADSL users who are not connected before and actually try and connect at this time may find that authentication takes a few seconds longer. No existing connections should be lost or dropped.

– One backup name server will drop off the Merula network; there are 3 others in separate locations. There should be no effect on any clients or other services.

– One server running shared web-sites. All clients on this server who might therefore be affected have already been advised of the work planned.

– One co-lo server. This client has also been advised.

– Merula web-sites: the NOC server, portal server and mymerula sites would all be off-line for this period. The new off-site NOC box mentioned before isn’t due now to go live until early next week as the other ISP is undertaking planned works and can’t go live until then.

(2) Swap out an internal firewall. This will be done at the same time as the above work. The only sites affected will be internal Merula servers. No customer boxes or services are behind this firewall.

(3) Un-rack a couple of obsolete servers. None of these are currently live. No effect on anybody inside or outside the Merula network.

At HEX:

(1) Install a new backup router. No effect on any services or customers. It will be enabled at some future point which we’ll advise in advance.

(2) Replace cabinet UPS batteries and install a new monitoring NIC. Again, this is a hot-swap operation. The UPS should not go down. If problems do arise, again we have a spare, tested UPS on-site and maximum down-time for the services below will be 30 minutes.

– Internal monitoring server and fax server. No external effects.

– Two co-located client servers. These clients have already been advised of this possible down-time.

– One core switch – if power is lost, there will be a small amount of packet loss for 5-10 seconds as the ring stabilises. There is no direct affect on ADSL and connectivity services although there may be a few brief periods where access and traffic may be intermittent as switches & routers pickup new routing due to the work.

If you have not received an email from us, then you should expect to see nothing more, at the worst, than a little bit of instability on your ADSL connection for a period of no more than 30 minutes.

Please do email support@merula.net if you have any concerns or issues about the scope of this work.

Authentication and DNS issues 23/04/2011

Due to a hardware fault on a single server which was temporarily hosting a number of internal systems, some users may have seen an issue on Saturday night.

Systems affected were RADIUS authentication for ADSL (users who were connected were fine, though those that had to reconnect would have had problems), one of our DNS Servers and the Merula webite and portal, since these were all being hosted as virtual servers on the same physical box until the new hardware for each of these, which we were in the process of installing that day, went live.

The cause turned out to be two faulty Hard Drives and appears to be due to a bad batch of drives. Service was restored to some degree by about 2pm and after that we moved the services and servers, one at a time to their new hardware. We believe that all services are now running fully but if you are seeing any remaining issues please call or email support in the normal way.

UPDATE: Power Outage Huntingdon

Last night the mains power failed to the building and kept tripping in and out. The UPS immediately took over and the generator kicked in as planned. Because of the up and down nature of the electrical supply we left the building running on generator until we were happy that the supply company had resolved the power problem which took a number of hours.

Unfortunately, over the last week or so we’ve had approx. 6 similar (albeit lesser) power feed issues and that’s apparently drained the UPS batteries. The engineer attended this morning and advised us that the expected life would be 8-10 years. However, we are now advised after diagnostic tests, that most of the 60-odd cells need replacing now (after only 2-3 years usage). This we’re arranging over the course of the next few days.

The effect of these problems last night was that connectivity with a few servers was lost and not restored until this morning but these we found to be internal issues with dirty file systems not allowing booting and other services on client servers failing to re-start correctly. As far as we can see, apart from the building UPS batteries reaching “end of life” far earlier than we expected, all our systems reacted correctly.