icon Get the most out of Surmunity, read our tips here! Need an interesting blog to read? You've got to read the Surpass Blog! | Welcome! Please register to access all of our features.

» Surpass Web Hosting Forums » Surpass Hosting » Announcements » [Data Center Update]

Announcements All Surpass announcements. Click here for RSS feed

Reply
 
LinkBack Thread Tools Search this Thread Rate Thread
Old May 23rd, 2008, 11:23 AM   #1 (permalink)
Senior Member
Surpass Staff
 
Kayla's Avatar
 
Joined in May 2003
Lives in Orlando
25,069 posts
Gave thanks: 959
Thanked 840 times
[Data Center Update]

Effected clients, co-workers, partners:

At approximately 8 A.M. EST a segment of our data center experienced a power outage. Shortly after that we realized the city had experienced a power outage as well which attributed to the loss we saw in the NOC. We immediately went into generator power and we are still currently running on generator.

What happens when there is power loss is that we immediately go into UPS for a few minutes meanwhile the generator powers up. The UPS ensures no service disruption meanwhile the generator turns on. This happened normally but approximately 40% of the servers tripped during this process. Power loss and going into UPS followed by generator power has happened a few times in the past. But we have never experienced what has happened today - the power surge during this outage was extremely high.

We are meeting with our electrical engineers as we speak to analyze the situation to learn exactly what happened. We still have the data center running on generator as the power in the city could still be unstable and we do not want to risk any further outages. As many of you know these kind of issues do not happen often and we are taking this incident very seriously. The data center is equipped with a high level of UPS and generators are in place to ensure maximum uptime. We apologize for the inconveniences as we know this has effected a lot of you and your clients. We will be compensating per our SLA and will ensure we find out in detail what happened, resolve it completely and put measures in place so it doesn't repeat again.

All effected servers should be operating normally now and we sincerely thank you for your patience.
__________________
Add Surpass on Twitter and Facebook
And check out Surpass at WebHostingTalk!
Kayla is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
These users thank Kayla for this great post!
atingle (May 23rd, 2008), jonr (May 23rd, 2008), panache (May 26th, 2008), Sentinel (May 26th, 2008)
Old May 23rd, 2008, 11:37 AM   #2 (permalink)
Senior Member
Surpass Staff
 
Kayla's Avatar
 
Joined in May 2003
Lives in Orlando
25,069 posts
Gave thanks: 959
Thanked 840 times
If you are still unable to reach your site, please open a thread here with your domain name or server IP so that we can check them immediately and reply - you do not need to open a ticket. Since these servers were rebooted there may be firewall issues with your IP or another isolated issue even though the servers are broadcasting normally now.
__________________
Add Surpass on Twitter and Facebook
And check out Surpass at WebHostingTalk!
Kayla is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
This user thanks Kayla for this great post!
panache (May 26th, 2008)
Old May 23rd, 2008, 1:56 PM   #3 (permalink)
Senior Member
Surpass Staff
 
Kayla's Avatar
 
Joined in May 2003
Lives in Orlando
25,069 posts
Gave thanks: 959
Thanked 840 times
We're still resolving isolated issues with servers - please continue posting if you still cannot access your site, we are replying to each thread.
__________________
Add Surpass on Twitter and Facebook
And check out Surpass at WebHostingTalk!
Kayla is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
This user thanks Kayla for this great post!
panache (May 26th, 2008)
Old May 23rd, 2008, 4:38 PM   #4 (permalink)
Senior Member
Surpass Staff
 
Kayla's Avatar
 
Joined in May 2003
Lives in Orlando
25,069 posts
Gave thanks: 959
Thanked 840 times
The following servers are still being worked on, please take note if you are on one of the following. We are working as fast as we can.

sh63.surpasshosting.com
sh65.surpasshosting.com
sh66.surpasshosting.com
sh68.surpasshosting.com
sh73.surpasshosting.com
sh85.surpasshosting.com
sh86.surpasshosting.com
sh102.surpasshosting.com
sh103.surpasshosting.com
sh106.surpasshosting.com
sh134.surpasshosting.com
sh135.surpasshosting.com
sh137.surpasshosting.com
sh138.surpasshosting.com
sh139.surpasshosting.com

pass14.dizinc.com
pass16.dizinc.com
pass25.dizinc.com
pass27.dizinc.com
pass30.dizinc.com
pass32.dizinc.com
pass43.dizinc.com
pass46.dizinc.com
pass48.dizinc.com
pass49.dizinc.com
pass56.dizinc.com
pass60.dizinc.com
pass61.dizinc.com
pass63.dizinc.com
pass64.dizinc.com
pass68.dizinc.com
pass74.dizinc.com

dior.surpasshosting.com
spiffy.surpasshosting.com
saprus.dizinc.com
mac.dizinc.com
gotti.surpasshosting.com
maya.surpasshosting.com
__________________
Add Surpass on Twitter and Facebook
And check out Surpass at WebHostingTalk!
Kayla is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
These users thank Kayla for this great post!
kealoha (May 23rd, 2008), panache (May 26th, 2008)
Old May 23rd, 2008, 5:41 PM   #5 (permalink)
Senior Member
Surpass Staff
 
Kayla's Avatar
 
Joined in May 2003
Lives in Orlando
25,069 posts
Gave thanks: 959
Thanked 840 times
The remaining servers are being worked on by a team of over 15 staff.

We understand and regret that these issues have caused inconvenience, business loss, and stress to everyone. We are working on getting everything up 100% within the next 3 hours max. We ask for your continued patience and once everything is resolved we will be happy to begin the process of compensating.

There are now already measures in place to ensure this does not happen again. This is the latest update we have as of 5:40 PM EST.
__________________
Add Surpass on Twitter and Facebook
And check out Surpass at WebHostingTalk!
Kayla is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
These users thank Kayla for this great post!
CAT-webhosting (May 23rd, 2008), inthestars (May 25th, 2008), kealoha (May 23rd, 2008), muffinman (May 23rd, 2008), NANO (May 23rd, 2008), Nicolay (May 23rd, 2008), panache (May 26th, 2008), ~ K ~ (May 23rd, 2008)
Old May 23rd, 2008, 11:40 PM   #6 (permalink)
Senior Member
Surpass Staff
 
Kayla's Avatar
 
Joined in May 2003
Lives in Orlando
25,069 posts
Gave thanks: 959
Thanked 840 times
Just when we thought everything was back to normal we unfortunately experienced another outage this evening. All staff are on site working to get servers online. We need and ask for your patience and support more than ever at this time. We will make this right for you and in the future. 20% of the servers are back online now and we are still working. Rest assured that we are doing everything that we can right now. Thank you.
__________________
Add Surpass on Twitter and Facebook
And check out Surpass at WebHostingTalk!
Kayla is online now  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
These users thank Kayla for this great post!
Andrew (May 26th, 2008), benjamin (May 24th, 2008), echoit (May 25th, 2008), hackjoom (May 24th, 2008), jakecamara (May 24th, 2008), NovaRod (May 24th, 2008), panache (May 26th, 2008), Q04 (May 24th, 2008), tigerstclaire (May 24th, 2008), TomK (May 23rd, 2008), Twist3d (May 27th, 2008), winnbrad (May 24th, 2008), ~ K ~ (May 23rd, 2008)
Old May 26th, 2008, 3:58 AM   #7 (permalink)
CTO, Surpass Hosting
Super #1
 
Emmanuel's Avatar
 
Joined in Apr 2003
Lives in Florida
1,834 posts
Gave thanks: 10
Thanked 149 times
View our newest data center video by clicking here.



There are no words to describe how deeply we apologize about the downtime which occurred on Friday, May 23, 2008. The incident has created immense discontentment to our organization mentally and emotionally because of the love and dedication our team has to our entire community. Moreover, because we realize the level of damage this incident has potentially caused you. We know there is neither money nor words which will replace the losses that may have been experienced by each one of you. Our organization is forever in debt to you all for the frustration and grief endured. It is never easy in disasters, but many of you showed your support as we worked non-stop to get things back to normal. We want to thank all of you for your patience, understanding, and support during such a difficult time. In any case, a formal incident report of our investigation is what we wish to rightfully deliver to you. Below is the detailed summary of events as they occurred. Please note some of you may have not experience any outage during this, not all clients were effected but we wanted to keep everyone updated.

What happened:

At approximately 8 A.M. EST our data center experienced a surge followed by a power outage which lasted several minutes from our electrical utility provider Progressive Energy. The surge tripped our facility's main breaker; this main breaker is designed to have a certain level of sensitivity and to trip in the event of a severe surge in order to protect the load (servers and critical equipment) from being burned. Immediately after this occurred, our generator automatically started up within a few seconds. Meanwhile power to our load (servers and equipment) was automatically transitioned from unavailable raw power to generator power by the automatic transfer switch (ATS), our uninterruptible power system (UPS) in conjunction with our battery set supply is supposed to automatically sustain continuous power to the load. However, it appeared this did not happen. In any case, generator power was indeed immediately available within the minute of the outage.

Immediately post the outage our engineers and electricians came on site. The diagnosis conducted revealed there was a fault within a battery string which is connected to the UPS. It is this fault that disabled the UPS from being able to fully sustain continuous power to the load meanwhile the ATS transitioned the facility to the generator power lines from the raw power lines. During this time a great portion of the data center experienced a sudden power loss which caused a myriad of servers to power cycle. Unfortunately, at times when some systems experience sudden power loss some require manual administrator intervention to get full function restored. Post the outage, our team immediately started working on checking systems and all servers that may have been adversely affected by the sudden power loss these experienced.

What was done to correct the problem:

Our on call UPS maintenance technician along with our electricians and engineers immediately came together on site to conduct a thorough diagnosis and put together a plan of action to correct any and all possible issues.

While the age of the battery supply being employed was well within the manufacturer's life span expectancy, the entire battery supply was replaced with a new set. In addition, our UPS underwent a thorough in depth inspection and all critical components were individually inspected and reconditioned as necessary. Lastly, the batteries and UPS were load tested before being re-employed to the overall power back up system to ensure 100% reliability. All this was completed within several hours of the incident.

Who was affected:

The power outage experienced was intermittent. However, once power was fully restored to the facility many servers required file system checks (FSCK), some power supply replacements, and a few others hard drive replacements due to excessive I/O errors. Unfortunately, depending on the space on the drive the system occupies a FSCK run time can range from 30 minutes to a nine hours plus (approximately 200 servers counted). Those that were worst affected are the systems that were having excessive I/O errors and needed hard drive replacements (approximately 12 servers total counted). Again, unfortunately, hard drive replacements may take 4-12 hours plus to complete depending on the space being occupied on the drive. Those that were least affected were servers that only required a power supply replacement (approximately 60 servers counted).

For those servers that experienced the greatest downtime was not due directly to power unavailability, but rather due to post sudden power loss adverse effects described above.

What preventative measures are being taken:

All critical power systems in our data center and loads were previously and are regularly inspected and maintained. This includes generator, UPS, breakers, etc. In fact, our UPS underwent an inspection and a maintenance service on the week of the 12th of May 2008. The service report came back showing the UPS was in good working condition as well as the battery supply set. The only advice made was to consider replacement of the battery set supply as these were approaching the last year of the manufacturer's life span expectancy. Pro actively following up on the advice made by the maintenance engineer, a new battery supply set was ordered right away and scheduled to be installed this Tuesday May 27, 2008.

Unfortunately, the battery supply set is what ended up being the fault and ironically this is what was already schedule for routine replacement maintenance. It is difficult to state that more could have been done as the batteries were within their life expectancy limits but failed short during this situation. Something of this magnitude, unfortunately, could not be predicted and was already being addressed with a new battery supply set replacement as a proactive measure. Nonetheless, a new standard has now been adopted as we will be increasing the battery reliability tests schedule to be completed monthly. This will allow us to intercept any and all types of possible issues with any battery sooner and overall highly reducing the probability of a failure encounter during critical times.

Our data center employs a 500KVA UPS and a 500KW generator. This is a statement that can be further proved by the recent pictures and videos taken yesterday afternoon. If you are in any kind of doubt whatsoever with regards to this, we would like to kindly ask for the opportunity to disprove your doubt. Our backup systems have protected us from several past outages to the entire data center. We uncover what maybe some of you didn't know was in place in our facility since day one so you can see that your services with us are secure.

We have been in the industry close to 8 years now and we have always tried our best to ensure 100% uptime to all of you. This is the first outage we experienced with this level of severity in our entire existence. It is not only our job but our passion to give you the best level of service possible. We do not want to use the misfortune of this unpredictable situation to be an excuse for the downtime experienced. Despite the nature of the situation, we accept full responsibility for the outage and we are ready to compensate you in anyway we can. We value your business relationship and the level of trust you put in us. We know many of you will have a desire to cancel with us due to the losses you have incurred and question our systems' integrity. We ask you to please talk to someone in management before you make your decision as we do understand the level of importance this means to each one of you. We work in a highly volatile environment where anything can happen just like with any of our competitors, however, we will always, no matter what, promise to be here whenever any issue occurs with an open hand to help resolve it as fast as humanly possible. Misfortunes will always happen to the best of us, how they are handled and treated makes the difference. If there is anything at all we can do to help you minimize your losses please just ask and consider it done. Our awareness and commitment level has tripled as a company and you can ensure this has only made us stronger and more experienced as a company. It is not everyday people or companies can overcome such issues and have the support and loyalty that many of you have given us. If you wish to reach out to me personally with any concerns, recommendations, suggestions, venting, or ways we can compensate you, please pm me personally on this forum. I will be happy to talk to you in person.
__________________
Eman Vivar
SurpassNetworks
AIM: Surpass Eman
Skype: Surpasshosting
MSN: eman.v surpasshosting.com
Cell: 407.467.2053
Orlando, FL 32801
http://www.SurpassHosting.com
Emmanuel is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
These users thank Emmanuel for this great post!
Andrew (May 26th, 2008), benjamin (May 26th, 2008), brianluau (May 27th, 2008), caseylee (May 26th, 2008), Cataclysm.ws (May 27th, 2008), ceo (May 26th, 2008), Chanty (May 26th, 2008), CoasterBGW (May 26th, 2008), dad (May 26th, 2008), DewKnight (May 26th, 2008), Diver (May 26th, 2008), j3flight (May 26th, 2008), jdcopelin (May 26th, 2008), kwright510 (May 26th, 2008), miron (May 26th, 2008), musicman2059 (May 28th, 2008), NANO (May 27th, 2008), Neil (May 26th, 2008), NovaRod (May 27th, 2008), panache (May 26th, 2008), Patty (May 26th, 2008), peconi (May 26th, 2008), Ph00ey (May 27th, 2008), pikaflash (May 26th, 2008), PKIDelirium (May 27th, 2008), psfrog (May 26th, 2008), Sentinel (May 26th, 2008), shakh (May 26th, 2008), SnakeMan (May 26th, 2008), Stu Rogers (May 31st, 2008), Thomas (May 26th, 2008), ttorion (May 26th, 2008), twirp (May 26th, 2008), Twist3d (May 27th, 2008), webhost (May 26th, 2008), ~ K ~ (June 1st, 2008)
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On