Authentication and DNS issues 23/04/2011

Due to a hardware fault on a single server which was temporarily hosting a number of internal systems, some users may have seen an issue on Saturday night.

Systems affected were RADIUS authentication for ADSL (users who were connected were fine, though those that had to reconnect would have had problems), one of our DNS Servers and the Merula webite and portal, since these were all being hosted as virtual servers on the same physical box until the new hardware for each of these, which we were in the process of installing that day, went live.

The cause turned out to be two faulty Hard Drives and appears to be due to a bad batch of drives. Service was restored to some degree by about 2pm and after that we moved the services and servers, one at a time to their new hardware. We believe that all services are now running fully but if you are seeing any remaining issues please call or email support in the normal way.

UPDATE: Power Outage Huntingdon

Last night the mains power failed to the building and kept tripping in and out. The UPS immediately took over and the generator kicked in as planned. Because of the up and down nature of the electrical supply we left the building running on generator until we were happy that the supply company had resolved the power problem which took a number of hours.

Unfortunately, over the last week or so we’ve had approx. 6 similar (albeit lesser) power feed issues and that’s apparently drained the UPS batteries. The engineer attended this morning and advised us that the expected life would be 8-10 years. However, we are now advised after diagnostic tests, that most of the 60-odd cells need replacing now (after only 2-3 years usage). This we’re arranging over the course of the next few days.

The effect of these problems last night was that connectivity with a few servers was lost and not restored until this morning but these we found to be internal issues with dirty file systems not allowing booting and other services on client servers failing to re-start correctly. As far as we can see, apart from the building UPS batteries reaching “end of life” far earlier than we expected, all our systems reacted correctly.