Aug 20-21 Hill Data Center Outage

Final Status Report as of 08/21/2016:  20:40

Full restoration of data center services did take longer than anticipated but services remain stable at this time. The issue with the emergency generator has been resolved.

Several central service applications have been restored to normal operation; most remaining central services should be operating normally by 21:30.  Many other hosted applications have been, or are being, brought back online as well.

Thank you again for your understanding and cooperation during this weekend’s events.

Have a great evening.

Ed


Status Update as of 8/21/2016:  19:45 – Full restoration of data center services did take longer than anticipated but services remain stable at this time.  Several central service applications have been restored to normal operation; most remaining central services should be operating normally by 21:30.  Other hosted applications are also being brought back online at this time.

Emergency generator service technician is now on site.


Status Update as of 8/21/2016:  18:00 – Data Center services have been restored and are running normally at this time.  Utility power is stable and the UPS is fully operational and online.   We are experiencing an issue with the emergency generator and the service vendor is en route.  Application restoration for many services is in progress.

 


Status Update as of 08/21/2016:  14:00 –  High voltage switch gear was successfully (and safely) re-energized, with normal transfer from emergency power.  HVAC service initial start up and test was completed and successful.  UPS re-certification is well underway and should complete by about 14:35.  Once completed we will initiate the fire suppression, life safety and EPO test and certifications.  If things continue to go well, we should finish with the emergency power re-test and then begin to put the data center back into normal configuration.   We are still on target to restore data center services on or before 17:00 today.


Major Tasks Completed or In progress:

 

– Re-energizing high voltage switch gear – completed

– HVAC services inspection and test – completed, no leaks, all PSI and flow rate ‘in the green’

– UPS Re-certification – In progress

Tasks starting soon:

– Data Center power distribution panel checkout

– Re-engage fire suppression and life safety systems

– Conduct EPO testing and certification

– initiate Emergency power retest

Once these steps are completed we will begin to restore the data center to normal configuration and notify system administrators on when application services start up may commence.

Next status update approximately 15:15  Almost there.  Thanks for hanging in there with us.


Status Update as of 08/21/2016:  12:00 – The critical tasks associate with the scheduled upgrade and repair work have been completed.  We are moving to the final inspection, test and certification phase of the shutdown period.  The testing and certifications may take 2 hours or more, depending on how things go.  We are still on target to restore data center services on or before 17:00 today.


Major Tasks Completed or In progress:

 

– Prep for UPS re-certification complete

– Cooling tower water loop pipe repairs complete; PSI looks good
– CRAC water loop pipe repairs complete; PSI looks good

– Additional water treatment valve installation completed

– High voltage switch gear modifications complete; inspection passed

Tasks starting soon:

– Re-energizing high voltage switch gear

– Data Center power distribution panel checkout

– UPS Re-certification

– HVAC services inspection and test

– Re-engage fire suppression and life safety systems

– Conduct EPO testing and certification

– initiate Emergency power retest

Once these steps are completed we will begin to restore the data center to normal configuration and notify system administrators on when application services start up may commence.

Next status update approximately 14:00

 


Status Update as of 08/21/2016: 09:30 – Work has resumed as of 06:30 this morning. Building emergency systems continue to run on generator power; fuel burn is nominal. We still expect to be able to complete all critical tasks before restoring data center services by 17:00.

Major Tasks Completed or In progress:

– Data Center UPS & Batteries reconfiguration is complete. Prepping for Recertification
– Cooling tower water loop pipe repairs nearing completion
– CRAC water loop pipe repairs nearing completion
– High voltage switch gear modifications nearing completion

Tasks starting soon:

– Inspection of high voltage electrical configuration & re-energizing
– UPS Recertification
– HVAC services inspection and test
– Re-engage fire suppression and life safety systems
– Conduct EPO testing and certification
– initiate Emergency power retest
Next status update approximately Sunday (8/21) 12:00

Ed Humphrey
Data Centers & Technical Support
Office of Information Technology

Sent from my iPhone

Status Update as of 08/20/2016:  19:45 – We’ve made up some time and have suspended work for the day.  Work will resume at 06:30 on Sunday.   We still expect to be able to complete all critical tasks before restoring data center services by 17:00 on Sunday.   RU Security is on-site to monitor building.


Major Tasks Completed or Paused :

 

– Data Center UPS & Batteries reconfiguration is about 90% complete 

– Cooling tower water loop re-filled; pipe repairs nearing completion
– CRAC water loop drained; pipe repairs nearing completion

– High voltage switch gear inspection and modifications in progress

– Repairs to PDU #5 completed

Next status update  approximately Sunday (8/21) 09:00

Have a good evening.


Status Update as of 08/20/2016:  17:00 – Low voltage switch gear modifications have been completed and passed inspection.   We are still behind schedule but the electrical and mechanical teams are continuing their work.  We still expect to be able to complete all critical tasks before restoring data center services by 17:00 on Sunday.  


Major Tasks Completed or Underway:

– Low voltage switch gear modifications completed; low voltage power restored at approximately 14:45

– Data Center UPS & Batteries reconfiguration is about 90% complete 

– Cooling tower water loop being re-filled; pipe repairs nearing completion
– CRAC water loop drained; pipe repairs nearing completion

– Water treatment valve installation completed

– High voltage switch gear inspection and modifications in progress

Next status update  approximately 21:00


Status Update as of 08/20/2016:  12:30 – Low voltage switch gear modifications are running approximately 2 hours behind schedule, which will delay the start of the high voltage switch gear work.

The electrical and mechanical teams will continue to work throughout the day to complete the scheduled tasks.  We expect to be able to complete all critical tasks before restoring data center services by 17:00 on Sunday.


Major Tasks Completed or Underway:

– Low voltage switch gear modifications in progress  – behind schedule – Data Center UPS & Batteries reconfiguration in progress

– Cooling tower water loop drained; pipe repairs in progress
– CRAC water loop drained; pipe repairs in progress

– Water treatment valve installation in progress

Next status update  approximately 16:00


Status Update as of 08/20/2016:  08:15 –Data Center shutdown procedures have started, as scheduled.  A few application services are still in the process of being brought down but, as of now, are not impacting the scheduled tasks.


Major Tasks Completed or Underway:
– Low voltage switch gear modifications in progress
– PDU #1 decommissioned
– Data Center UPS & Batteries relocated; reconfiguration in progress
– CO2 Fire Suppression systems isolated
– HVAC services halted (CRACs, primary and secondary water pumps, cooling towers)
– Cooling tower water loop is draining
– Security and Life Safety ‘walk arounds’ in progressNext status update @ 12:00 or sooner

Reminder:


The Hill Center complex, including the data center, must REMAIN vacated by non-essential personnel  until normal building power is restored and construction activities have been completed.  Only authorized emergency, project management and construction personnel may be in the building for the duration of the shutdown.
Status Update as of 08/19/2016:  18:00 – The shutdown of data center services remains on schedule for Saturday beginning at 08:00.   Hosting applications impacted by this event will be unavailable by that time.

Based on our progress and scheduling, particularly for the inspection and certification work required for new and reconfigured infrastructure components, we will not be completing activities on Saturday but we are still on target to finish  on or before our Sunday deadline of 17:00.

Reminder:  


The Hill Center complex, including the data center, must be vacated by non-essential personnel no later than 08:10 and remain so until normal building power is restored and construction activities have been completed.  Only authorized emergency, project management and construction personnel may be in the building for the duration of the shutdown.  This is required for safety purposed and to facilitate the timely completion of the construction activities.  Signage has been posted on all egress doors as of 09:00 today.

Updates on progress,  expected completion of construction activities, and when service restoration can begin will be distributed via email and posted to the rutgersit.rutgers.edu site.  

Status Update as of 08/19/2016:  11:15 – The shutdown of data center services remains on schedule for Saturday beginning at 08:00.   Hosting applications impacted by this event will be unavailable by that time.

Note:  The Hill Center complex, including the data center, must be vacated by non-essential personnel no later than 08:10 and remain so until normal building power is restored and construction activities have been completed.  Only authorized emergency, project management and construction personnel may be in the building for the duration of the shutdown.  This is required for safety purposed and to facilitate the timely completion of the construction activities.  Signage has been posted on all egress doors as of 09:00 today.

Updates on progress,  expected completion of construction activities, and when service restoration can begin will be distributed via email and posted to the rutgersit.rutgers.edu site.

Ed Humphrey

Status Update as of 08/18/2016:  18:00 – The shutdown of data center services is on schedule for Saturday beginning at 08:00.   Hosting applications impacted by this event will be unavailable by that time.  

Considerable progress has been made in preparation for the electrical and mechanical upgrade activities.  Fabrication work associated with planned repairs is on schedule as well.

This notice is being sent to several mailing lists and will also be posted at:

rutgersit.rutgers.edu and my.rutgers.edu


Status Update as of 08/16/2016:  15:30 – The shutdown of data center services is on schedule for Saturday beginning at 07:00.  Below is a summary of the major tasks to be performed during this upgrade and maintenance period:

  • High Voltage Switch Board Modifications
  • Low Voltage Switch Board Modifications
  • New Distribution Panels – Complete Configurations
    • For HV / LV compute power
    • For HVAC Motor Control
  • New Distribution Panels – Terminations with metering
  • New EPO configuration
  • Complete Relocation of Data Center UPS and Batteries
  • UPS / Battery By-Pass and Disconnect configuration and terminations
  • Existing HVAC pipe modifications / new heat exchanger
  • Existing Pipe Leak Repairs

Status Updates will be issued periodically throughout the remainder of the week and may be seen at:  https://rutgersit.rutgers.edu/Aug-20-21-hill-outage

Updates will also be sent to the Hill Data Center mailing lists.


The Office of Information Technology (OIT) will be performing the Hill Center Data Center upgrade, which will require electric in the Data Center to be shut down at various intervals from 7:00 AM on Saturday, August 20 to 5:00 PM on Sunday, August 21, 2016. This is required to accommodate greater research computing capacity and to improve reliability.

Please Note: If there are any issues with the Hill Center Data Center upgrade on August 20 and August 21, 2016, OIT will complete the Data Center upgrade the following weekend, 7:00 AM Saturday, August 27 until 5:00 PM on Sunday, August 28, 2016.

OIT is committed to improving the quality and reliability of IT services.  We apologize for any inconvenience this may cause.

This shut down involves the following:

  • Any Hill DC hosted equipment and services dependent on the electrical and mechanical systems will be out of service.
  • All data center hosted systems and services will need to be shut down before we disable the data center power distribution and HVAC services.
  • Once power is shutdown, the building must be vacated and remain unoccupied, except for the construction team, during this event.   This includes the Network Operations Center (NOC) and Help Desk (Note: The Help Desk will still have the ability to receive calls during the shutdown).
  • There will be no power for elevators or life safety systems.
  • There will be manned fire watches in place for the duration.
  • Given the considerable activity in and around the data center, and electrical and mechanical rooms, access to the building will be limited to personnel directly involved with the construction and building security.  This will help avoid life safety issues and allow the construction team to complete the work in a timely manner.

The central services listed below will be unavailable from 7:00 AM on Saturday, August 20 to 5:00 PM on Sunday, August 21, 2016.

  • Eden Services
  • Mailman (ad hoc mass emailing service)
  • RAMS (Rutgers official mass emailing service)
  • RATS
  • CMS/Drupal (UCM website hosting service)
  • Certain Enterprise Client Services Hosted Applications
  • Sakai Course Management and Collaboration Services
  • Apps.rutgers.edu
  • RUMail Collaboration Services
  • RCI Email Reading (Webmail, IMAP, POP, Pine)
  • RCI Hosted Web Sites
  • RCI Hosted Virtual Mail domains
  • Other RCI Services
  • Nagios Service Monitoring

Some network services in the Hill Center will be unavailable.

Hosted Services for the following Academic and Administrative departments may be degraded or unavailable:

  • www.rutgers.edu
  • Climatology
  • Dining Services
  • Division of Continuing Studies
  • Enrollment Management
  • Finance – Procurement
  • Housing
  • Information Technology Services
  • Internal Auditing
  • Planning and Public Policy
  • School of Arts and Sciences (SAS)
  • School of Communications and Information (SCI)

Hosted Services for the following Research departments may be degraded or unavailable:

  • Chemistry / PDB / Proteomics
  • Office of Advanced Research Computing (OARC)
  • RDI2
  • Statistics

The following services are NOT affected by this upgrade activity on Saturday, August 20, 2016 and Sunday, August 21, 2016 (note: this is not an exhaustive list):

  • Athletics
  • EAS Services (payroll, registration and other administrative services)
  • EAS LDAP (authentication services)
  • EAS Kerberos
  • MyRutgers
  • Central TD services will be unaffected by this maintenance.
  • Internet services will not be affected.
  • VoIP/Aastra System
  • OIT Computer Lab logins and applications
  • RUWireless /RUWireless Secure
  • Pharos printing on all campuses.
  • Non-OIT Wireless Services
  • RBHS (UMDNJ Legacy) Services
  • ScarletMail (Google Apps for Education @ Rutgers)
  • RUConnect (Microsoft Office 365)
  • Shibboleth

The network uplinks for these buildings on Busch campus should be NOT be affected:

  • BTN Networks for Busch Stadium and Yurkak Field
  • Busch Gas Pump Area
  • Busch Motor Pool
  • Busch Visitor Center
  • Electrical Engineering
  • Emergency Services 3569
  • Facilities Maintenance 3527
  • Facilities Maintenance 3530
  • Fire and Emergency Services Central Heating Plant
  • SERC
  • Serin Physic Laboratory
  • Sonny Werblin Recreation Center
  • Yurcak Soccer Field
  • Newark Campus network connectivity to New Brunswick
  • Newark Campus RUWireless Access Points

If you have any questions or concerns regarding this upgrade please contact:

Thank you,
Bill Lansbury
University Director
OIT, Enterprise Infrastructure