Date   

FD.io Jenkins Maintenance: 2019-12-10 1900 UTC to 2200 UTC

Vanessa Valderrama
 

What:
  • Jenkins
    • OS and security updates
    • Upgrade to 2.190.3
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2019-12-10 1900 UTC to 2200 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


Re: FD.io Gerrit Restart Required

Vanessa Valderrama
 

Gerrit restart is complete. Jenkins has been taken out of shutdown mode.

Thank you,
Vanessa


On 11/6/19 2:30 PM, Vanessa Valderrama wrote:

What:

LF is setting up a local Gerrit mirror to resolve the Gerrit cloning timeout errors that have been causing intermittent job failures. We'll need to restart Gerrit for the replication settings to take affect.

Impact:

  • Jenkins will be placed in shutdown mode to allow verify and merge jobs to complete before the Gerrit restart
    • Jenkins will NOT be restarted and NO jobs will need to be terminated
  • Gerrit will be unavailable during the restart approximately 1 minute
Thank you,
Vanessa


FD.io Gerrit Restart Required

Vanessa Valderrama
 

What:

LF is setting up a local Gerrit mirror to resolve the Gerrit cloning timeout errors that have been causing intermittent job failures. We'll need to restart Gerrit for the replication settings to take affect.

Impact:

  • Jenkins will be placed in shutdown mode to allow verify and merge jobs to complete before the Gerrit restart
    • Jenkins will NOT be restarted and NO jobs will need to be terminated
  • Gerrit will be unavailable during the restart approximately 1 minute
Thank you,
Vanessa

---------------------------------------------------------------------------------------------------------------------------------------------

Status Update

Issue: Gateway Timeout Errors

  • Summary: Intermittent Gateway Timeout Errors on the ci-management-jjb-merge jobs are causing stability issues with Jenkins causing unplanned downtime
    • We have put in a change to take Nginx out of the picture and allow the build node to talk directly to Jenkins
    • We'll be monitoring closely to ensure this resolves the issue

Issue: Gerrit cloning timeouts

  • Summary: Intermittent job failures caused by a timeout when closing a Gerrit repo
    • We have opened a Vexxhost ticket for Vexxhost and Ed Kern to troubleshoot the latency within the network the Nomad cluster is on
    • We are also setting up a local Gerrit mirror which should help resolve/improve cloning - this should be complete by the end of the week

Issue: CSIT: s3-t21-sut1 (10.30.51.44) failure

  • Summary: The device s3-t21-sut1 device is having an SSH disk read only issue and is unreachable over NW
    • We've opened a Vexxhost ticket to check the machine

Issue: Hung jobs

  • Summary: Intermittent jobs stuck/hung requiring the job to be aborted
    • We believe this issue was resolved with the latest Jenkins upgrade

Please let me know if you need additional information. If you experience any hung jobs or gateway timeout errors, please open a ticket at support.linuxfoundation.org.


Re: [tsc] FD.io Production Jenkins Restart Required

Vanessa Valderrama
 

Status Update

Issue: Gateway Timeout Errors

  • Summary: Intermittent Gateway Timeout Errors on the ci-management-jjb-merge jobs are causing stability issues with Jenkins causing unplanned downtime
    • We have put in a change to take Nginx out of the picture and allow the build node to talk directly to Jenkins
    • We'll be monitoring closely to ensure this resolves the issue

Issue: Gerrit cloning timeouts

  • Summary: Intermittent job failures caused by a timeout when closing a Gerrit repo
    • We have opened a Vexxhost ticket for Vexxhost and Ed Kern to troubleshoot the latency within the network the Nomad cluster is on
    • We are also setting up a local Gerrit mirror which should help resolve/improve cloning - this should be complete by the end of the week

Issue: CSIT: s3-t21-sut1 (10.30.51.44) failure

  • Summary: The device s3-t21-sut1 device is having an SSH disk read only issue and is unreachable over NW
    • We've opened a Vexxhost ticket to check the machine

Issue: Hung jobs

  • Summary: Intermittent jobs stuck/hung requiring the job to be aborted
    • We believe this issue was resolved with the latest Jenkins upgrade

Please let me know if you need additional information. If you experience any hung jobs or gateway timeout errors, please open a ticket at support.linuxfoundation.org.

Thank you,
Vanessa

On 11/6/19 9:56 AM, Maciek Konstantynowicz (mkonstan) wrote:
Hi Vanessa, Thanks for the note. CSIT project keeps experiencing issues
due to Jenkins outages. Do you have ETA for the fix that will stop these
outages?

-Maciek

On 5 Nov 2019, at 23:18, Vanessa Valderrama <vvalderrama@...> wrote:

Jenkins has been restarted, job views restored, jobs are running.

We will continue to investigate the Gateway Timeout and JNLP errors
we've been seeing the last couple of days.

If you experience any issues, please open a ticket at
support.linuxfoundation.org

Thank you,
Vanessa


On 11/5/19 4:39 PM, Vanessa Valderrama wrote:
We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1152): https://lists.fd.io/g/tsc/message/1152
Mute This Topic: https://lists.fd.io/mt/42686762/675185
Group Owner: tsc+owner@...
Unsubscribe: https://lists.fd.io/g/tsc/unsub  [mkonstan@...]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [tsc] FD.io Production Jenkins Restart Required

Maciek Konstantynowicz (mkonstan)
 

Hi Vanessa, Thanks for the note. CSIT project keeps experiencing issues
due to Jenkins outages. Do you have ETA for the fix that will stop these
outages?

-Maciek

On 5 Nov 2019, at 23:18, Vanessa Valderrama <vvalderrama@...> wrote:

Jenkins has been restarted, job views restored, jobs are running.

We will continue to investigate the Gateway Timeout and JNLP errors
we've been seeing the last couple of days.

If you experience any issues, please open a ticket at
support.linuxfoundation.org

Thank you,
Vanessa


On 11/5/19 4:39 PM, Vanessa Valderrama wrote:
We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1152): https://lists.fd.io/g/tsc/message/1152
Mute This Topic: https://lists.fd.io/mt/42686762/675185
Group Owner: tsc+owner@...
Unsubscribe: https://lists.fd.io/g/tsc/unsub [mkonstan@...]
-=-=-=-=-=-=-=-=-=-=-=-


Re: FD.io Production Jenkins Restart Required

Vanessa Valderrama
 

Jenkins has been restarted, job views restored, jobs are running.

We will continue to investigate the Gateway Timeout and JNLP errors
we've been seeing the last couple of days.

If you experience any issues, please open a ticket at
support.linuxfoundation.org

Thank you,
Vanessa

On 11/5/19 4:39 PM, Vanessa Valderrama wrote:
We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa


FD.io Production Jenkins Restart Required

Vanessa Valderrama
 

We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa


Re: Jenkins Issue

Vanessa Valderrama
 

Jenkins is out of shutdown mode. Jobs are starting. If you experience
any issues please open a ticket at support.linuxfoundation.org.

Thank you,
Vanessa

On 11/4/19 4:33 PM, Vanessa Valderrama wrote:
Jenkins has been restarted.

We're pushing the jobs now.


On 11/4/19 2:22 PM, Vanessa Valderrama wrote:
We are having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views. We are aware aware of the issue and
will have it resolved shortly.

Jenkins will be placed in shutdown mode while we generate the jobs.

Thank you,
Vanessa


Re: Jenkins Issue

Vanessa Valderrama
 

Jenkins has been restarted.

We're pushing the jobs now.

On 11/4/19 2:22 PM, Vanessa Valderrama wrote:
We are having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views. We are aware aware of the issue and
will have it resolved shortly.

Jenkins will be placed in shutdown mode while we generate the jobs.

Thank you,
Vanessa


Jenkins Issue

Vanessa Valderrama
 

We are having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views. We are aware aware of the issue and
will have it resolved shortly.

Jenkins will be placed in shutdown mode while we generate the jobs.

Thank you,
Vanessa


Re: FD.io Maintenance: 2019-10-23 1700 UTC to 2100 UTC

Vanessa Valderrama
 

The migration went much quicker towards the end. Maintenance is complete. All systems are available. Please open a ticket at support.linuxfoundation.org if you experience any issues.

Thank you,
Anton & Vanessa

On 10/23/2019 03:56 PM, Vanessa Valderrama wrote:

We need to extend the maintenance window until 2300 to allow the volume migration to complete.


On 10/23/2019 11:10 AM, Vanessa Valderrama wrote:

Jenkins has been placed in shtudown mode in preparation for maintenance.


On 10/22/2019 09:35 AM, Vanessa Valderrama wrote:
Maintenance reminder
On Oct 17, 2019, at 1:04 PM, Vanessa Valderrama <vvalderrama@...> wrote:

What:

LF will be performing standard system maintenance and a Jenkins migration

  • Jenkins
    • Migrate to two new SSD volumes
    • OS updates
    • Jenkins upgrade to 2.190.1
    • Plugin upgrades
  • Nexus
    • OS updates
    • Nexus upgrade to 2.14.15-01
  • Jira
    • OS updates
    • Jira upgrade to 7.13.9
  • Gerrit
    • OS updates
    • Gerrit upgrade to 2.16.12
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates

When:

2019-10-23 1700 UTC to 2100 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1600 UTC. Please let us know if specific jobs cannot be aborted.

The following systems will be unavailable during the maintenance window:

  • Jenkins sandbox
  • Jenkins production
  • Nexus
  • Jira
  • Gerrit
  • Sonar
  • OpenGrok




Re: FD.io Maintenance: 2019-10-23 1700 UTC to 2100 UTC

Vanessa Valderrama
 

We need to extend the maintenance window until 2300 to allow the volume migration to complete.


On 10/23/2019 11:10 AM, Vanessa Valderrama wrote:

Jenkins has been placed in shtudown mode in preparation for maintenance.


On 10/22/2019 09:35 AM, Vanessa Valderrama wrote:
Maintenance reminder
On Oct 17, 2019, at 1:04 PM, Vanessa Valderrama <vvalderrama@...> wrote:

What:

LF will be performing standard system maintenance and a Jenkins migration

  • Jenkins
    • Migrate to two new SSD volumes
    • OS updates
    • Jenkins upgrade to 2.190.1
    • Plugin upgrades
  • Nexus
    • OS updates
    • Nexus upgrade to 2.14.15-01
  • Jira
    • OS updates
    • Jira upgrade to 7.13.9
  • Gerrit
    • OS updates
    • Gerrit upgrade to 2.16.12
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates

When:

2019-10-23 1700 UTC to 2100 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1600 UTC. Please let us know if specific jobs cannot be aborted.

The following systems will be unavailable during the maintenance window:

  • Jenkins sandbox
  • Jenkins production
  • Nexus
  • Jira
  • Gerrit
  • Sonar
  • OpenGrok



Re: FD.io Maintenance: 2019-10-23 1700 UTC to 2100 UTC

Vanessa Valderrama
 

Jenkins has been placed in shtudown mode in preparation for maintenance.


On 10/22/2019 09:35 AM, Vanessa Valderrama wrote:
Maintenance reminder
On Oct 17, 2019, at 1:04 PM, Vanessa Valderrama <vvalderrama@...> wrote:

What:

LF will be performing standard system maintenance and a Jenkins migration

  • Jenkins
    • Migrate to two new SSD volumes
    • OS updates
    • Jenkins upgrade to 2.190.1
    • Plugin upgrades
  • Nexus
    • OS updates
    • Nexus upgrade to 2.14.15-01
  • Jira
    • OS updates
    • Jira upgrade to 7.13.9
  • Gerrit
    • OS updates
    • Gerrit upgrade to 2.16.12
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates

When:

2019-10-23 1700 UTC to 2100 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1600 UTC. Please let us know if specific jobs cannot be aborted.

The following systems will be unavailable during the maintenance window:

  • Jenkins sandbox
  • Jenkins production
  • Nexus
  • Jira
  • Gerrit
  • Sonar
  • OpenGrok


Re: FD.io Maintenance: 2019-10-23 1700 UTC to 2100 UTC

Vanessa Valderrama
 

Maintenance reminder

On Oct 17, 2019, at 1:04 PM, Vanessa Valderrama <vvalderrama@...> wrote:

What:

LF will be performing standard system maintenance and a Jenkins migration

  • Jenkins
    • Migrate to two new SSD volumes
    • OS updates
    • Jenkins upgrade to 2.190.1
    • Plugin upgrades
  • Nexus
    • OS updates
    • Nexus upgrade to 2.14.15-01
  • Jira
    • OS updates
    • Jira upgrade to 7.13.9
  • Gerrit
    • OS updates
    • Gerrit upgrade to 2.16.12
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates

When:

2019-10-23 1700 UTC to 2100 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1600 UTC. Please let us know if specific jobs cannot be aborted.

The following systems will be unavailable during the maintenance window:

  • Jenkins sandbox
  • Jenkins production
  • Nexus
  • Jira
  • Gerrit
  • Sonar
  • OpenGrok


FD.io Nexus Maintenance: 2019-10-23 1700 UTC to 2100 UTC

Vanessa Valderrama
 

What:

LF will be performing standard system maintenance and a Jenkins migration

  • Jenkins
    • Migrate to two new SSD volumes
    • OS updates
    • Jenkins upgrade to 2.190.1
    • Plugin upgrades
  • Nexus
    • OS updates
    • Nexus upgrade to 2.14.15-01
  • Jira
    • OS updates
    • Jira upgrade to 7.13.9
  • Gerrit
    • OS updates
    • Gerrit upgrade to 2.16.12
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates

When:

2019-10-23 1700 UTC to 2100 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1600 UTC. Please let us know if specific jobs cannot be aborted.

The following systems will be unavailable during the maintenance window:

  • Jenkins sandbox
  • Jenkins production
  • Nexus
  • Jira
  • Gerrit
  • Sonar
  • OpenGrok


FD.io Jenkins Issues

Vanessa Valderrama
 

We experience issues with the Nomad containers this morning and we're
now experiencing issues with Jenkins due to the queue size.

Ed Kern and I are working to get this resolved as quickly as possible.

Thank you,
Vanessa


Re: FD.io Nexus Maintenance: 2019-09-25 1700 UTC to 2100 UTC

Vanessa Valderrama
 

Nexus and Jenkins are back up. Jobs are running. Please report any issues to support.linuxfoundation.org.

Thank you,
Vanessa

On 09/25/2019 12:07 PM, Vanessa Valderrama wrote:

Starting maintenance


On 09/25/2019 11:02 AM, Vanessa Valderrama wrote:

Jenkins has been placed in shut down mode to prepare for maintenance.


On 09/23/2019 04:08 PM, Vanessa Valderrama wrote:

What:

LF will performing maintenance on the Nexus server to migrate data to two new SSD volumes in attempt to resolve the intermittent issue with hung jobs we are experiencing

  • Migrate Nexus data to new volumes
When:
2019-09-25 1700 UTC to 2100 UTC

Impact:
We will place Jenkins in shutdown mode at 1600 UTC. Jobs will be aborted at 1700 UTC
Jenkins and Nexus will be unavailable during this time

You can subscribe to status updates here:
https://fdio.statuspage.io/incidents/f5tfpfdyd8l0






Re: FD.io Nexus Maintenance: 2019-09-25 1700 UTC to 2100 UTC

Vanessa Valderrama
 

Starting maintenance


On 09/25/2019 11:02 AM, Vanessa Valderrama wrote:

Jenkins has been placed in shut down mode to prepare for maintenance.


On 09/23/2019 04:08 PM, Vanessa Valderrama wrote:

What:

LF will performing maintenance on the Nexus server to migrate data to two new SSD volumes in attempt to resolve the intermittent issue with hung jobs we are experiencing

  • Migrate Nexus data to new volumes
When:
2019-09-25 1700 UTC to 2100 UTC

Impact:
We will place Jenkins in shutdown mode at 1600 UTC. Jobs will be aborted at 1700 UTC
Jenkins and Nexus will be unavailable during this time

You can subscribe to status updates here:
https://fdio.statuspage.io/incidents/f5tfpfdyd8l0





Re: FD.io Nexus Maintenance: 2019-09-25 1700 UTC to 2100 UTC

Vanessa Valderrama
 

Jenkins has been placed in shut down mode to prepare for maintenance.


On 09/23/2019 04:08 PM, Vanessa Valderrama wrote:

What:

LF will performing maintenance on the Nexus server to migrate data to two new SSD volumes in attempt to resolve the intermittent issue with hung jobs we are experiencing

  • Migrate Nexus data to new volumes
When:
2019-09-25 1700 UTC to 2100 UTC

Impact:
We will place Jenkins in shutdown mode at 1600 UTC. Jobs will be aborted at 1700 UTC
Jenkins and Nexus will be unavailable during this time

You can subscribe to status updates here:
https://fdio.statuspage.io/incidents/f5tfpfdyd8l0




Re: FD.io Nexus Maintenance: 2019-09-25 1700 UTC to 2100 UTC

Vanessa Valderrama
 

Maintenance Reminder


On 09/23/2019 04:08 PM, Vanessa Valderrama wrote:

What:

LF will performing maintenance on the Nexus server to migrate data to two new SSD volumes in attempt to resolve the intermittent issue with hung jobs we are experiencing

  • Migrate Nexus data to new volumes
When:
2019-09-25 1700 UTC to 2100 UTC

Impact:
We will place Jenkins in shutdown mode at 1600 UTC. Jobs will be aborted at 1700 UTC
Jenkins and Nexus will be unavailable during this time

You can subscribe to status updates here:
https://fdio.statuspage.io/incidents/f5tfpfdyd8l0