Date   

Re: FD.io Production Jenkins - Restart Required

Vanessa Valderrama
 

Jenkins is back up. Jobs have started. Ed and I are still trying to
resolve an issue with a few of the CSIT hourly jobs. Please open a
ticket at support.linuxfoundation.org if you have any issues.

Thank you,
Vanessa

On 12/16/19 2:10 PM, Vanessa Valderrama wrote:
We're making some Nomad changes as well which is causing a bit of delay
in the restart. We should be done within about 30 minutes.

Thank you,

Vanessa


On 12/16/19 12:54 PM, Vanessa Valderrama wrote:
We've placed Jenkins in shutdown mode to allow for a restart. We need to
uninstall the OpenStack Single-Use Slave plugin. The plugin is no longer
required and we believe it's causing an issue that prevents the
OpenStack slaves from being removed in Jenkins.

We'll do the restart at 20:00 UTC.

The restart will take less than 5 minutes.

Thank you,
Vanessa


Re: FD.io Production Jenkins - Restart Required

Vanessa Valderrama
 

We're making some Nomad changes as well which is causing a bit of delay
in the restart. We should be done within about 30 minutes.

Thank you,

Vanessa

On 12/16/19 12:54 PM, Vanessa Valderrama wrote:
We've placed Jenkins in shutdown mode to allow for a restart. We need to
uninstall the OpenStack Single-Use Slave plugin. The plugin is no longer
required and we believe it's causing an issue that prevents the
OpenStack slaves from being removed in Jenkins.

We'll do the restart at 20:00 UTC.

The restart will take less than 5 minutes.

Thank you,
Vanessa


FD.io Production Jenkins - Restart Required

Vanessa Valderrama
 

We've placed Jenkins in shutdown mode to allow for a restart. We need to
uninstall the OpenStack Single-Use Slave plugin. The plugin is no longer
required and we believe it's causing an issue that prevents the
OpenStack slaves from being removed in Jenkins.

We'll do the restart at 20:00 UTC.

The restart will take less than 5 minutes.

Thank you,
Vanessa


FD.io - Network Issues

Vanessa Valderrama
 

There appears to be a Network Routing Issue with an upstream provider at our PDX Data-center.  This affects some core infrastructure availability.  Kernel.org and PDX mirrors are affected as well as some CAF properties (wiki) and other services.

We have a workaround in place to restore service to OpenStack builders in the Vexxhost infrastructure.

FD.io systems affected

  • git.fd.io
  • FD.io lab machines

After a router reboot, the router is still showing down. Vexxhost is working on resolving the issue.

For updates please check
https://status.linuxfoundation.org

Thank you,
Vanessa


Re: FD.io Jenkins Maintenance: 2019-12-10 1900 UTC to 2200 UTC

Vanessa Valderrama
 

Maintenance is complete. All systems are available. Please open a ticket at support.linuxfoundation.org if you experience any issues.

Thank you,
Anton & Vanessa


On 12/10/19 1:04 PM, Vanessa Valderrama wrote:

Starting maintenance

On 12/10/19 7:15 AM, Vanessa Valderrama wrote:

Jenkins sandbox is complete. Jenkins production will be shutdown at 1800 UTC in preparation for maintenance.

Thanks,
Vanessa


On 12/3/19 9:57 AM, Vanessa Valderrama wrote:

What:
  • Jenkins
    • OS and security updates
    • Upgrade to 2.190.3
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2019-12-10 1900 UTC to 2200 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


Re: FD.io Jenkins Maintenance: 2019-12-10 1900 UTC to 2200 UTC

Vanessa Valderrama
 

Starting maintenance

On 12/10/19 7:15 AM, Vanessa Valderrama wrote:

Jenkins sandbox is complete. Jenkins production will be shutdown at 1800 UTC in preparation for maintenance.

Thanks,
Vanessa


On 12/3/19 9:57 AM, Vanessa Valderrama wrote:

What:
  • Jenkins
    • OS and security updates
    • Upgrade to 2.190.3
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2019-12-10 1900 UTC to 2200 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


Re: FD.io Jenkins Maintenance: 2019-12-10 1900 UTC to 2200 UTC

Vanessa Valderrama
 

Jenkins sandbox is complete. Jenkins production will be shutdown at 1800 UTC in preparation for maintenance.

Thanks,
Vanessa


On 12/3/19 9:57 AM, Vanessa Valderrama wrote:

What:
  • Jenkins
    • OS and security updates
    • Upgrade to 2.190.3
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2019-12-10 1900 UTC to 2200 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


Re: FD.io Jenkins Maintenance: 2019-12-10 1900 UTC to 2200 UTC

Vanessa Valderrama
 

Maintenance reminder

On 12/3/19 9:57 AM, Vanessa Valderrama wrote:

What:
  • Jenkins
    • OS and security updates
    • Upgrade to 2.190.3
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2019-12-10 1900 UTC to 2200 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


FD.io Jenkins Maintenance: 2019-12-10 1900 UTC to 2200 UTC

Vanessa Valderrama
 

What:
  • Jenkins
    • OS and security updates
    • Upgrade to 2.190.3
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2019-12-10 1900 UTC to 2200 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


Re: FD.io Gerrit Restart Required

Vanessa Valderrama
 

Gerrit restart is complete. Jenkins has been taken out of shutdown mode.

Thank you,
Vanessa


On 11/6/19 2:30 PM, Vanessa Valderrama wrote:

What:

LF is setting up a local Gerrit mirror to resolve the Gerrit cloning timeout errors that have been causing intermittent job failures. We'll need to restart Gerrit for the replication settings to take affect.

Impact:

  • Jenkins will be placed in shutdown mode to allow verify and merge jobs to complete before the Gerrit restart
    • Jenkins will NOT be restarted and NO jobs will need to be terminated
  • Gerrit will be unavailable during the restart approximately 1 minute
Thank you,
Vanessa


FD.io Gerrit Restart Required

Vanessa Valderrama
 

What:

LF is setting up a local Gerrit mirror to resolve the Gerrit cloning timeout errors that have been causing intermittent job failures. We'll need to restart Gerrit for the replication settings to take affect.

Impact:

  • Jenkins will be placed in shutdown mode to allow verify and merge jobs to complete before the Gerrit restart
    • Jenkins will NOT be restarted and NO jobs will need to be terminated
  • Gerrit will be unavailable during the restart approximately 1 minute
Thank you,
Vanessa

---------------------------------------------------------------------------------------------------------------------------------------------

Status Update

Issue: Gateway Timeout Errors

  • Summary: Intermittent Gateway Timeout Errors on the ci-management-jjb-merge jobs are causing stability issues with Jenkins causing unplanned downtime
    • We have put in a change to take Nginx out of the picture and allow the build node to talk directly to Jenkins
    • We'll be monitoring closely to ensure this resolves the issue

Issue: Gerrit cloning timeouts

  • Summary: Intermittent job failures caused by a timeout when closing a Gerrit repo
    • We have opened a Vexxhost ticket for Vexxhost and Ed Kern to troubleshoot the latency within the network the Nomad cluster is on
    • We are also setting up a local Gerrit mirror which should help resolve/improve cloning - this should be complete by the end of the week

Issue: CSIT: s3-t21-sut1 (10.30.51.44) failure

  • Summary: The device s3-t21-sut1 device is having an SSH disk read only issue and is unreachable over NW
    • We've opened a Vexxhost ticket to check the machine

Issue: Hung jobs

  • Summary: Intermittent jobs stuck/hung requiring the job to be aborted
    • We believe this issue was resolved with the latest Jenkins upgrade

Please let me know if you need additional information. If you experience any hung jobs or gateway timeout errors, please open a ticket at support.linuxfoundation.org.


Re: [tsc] FD.io Production Jenkins Restart Required

Vanessa Valderrama
 

Status Update

Issue: Gateway Timeout Errors

  • Summary: Intermittent Gateway Timeout Errors on the ci-management-jjb-merge jobs are causing stability issues with Jenkins causing unplanned downtime
    • We have put in a change to take Nginx out of the picture and allow the build node to talk directly to Jenkins
    • We'll be monitoring closely to ensure this resolves the issue

Issue: Gerrit cloning timeouts

  • Summary: Intermittent job failures caused by a timeout when closing a Gerrit repo
    • We have opened a Vexxhost ticket for Vexxhost and Ed Kern to troubleshoot the latency within the network the Nomad cluster is on
    • We are also setting up a local Gerrit mirror which should help resolve/improve cloning - this should be complete by the end of the week

Issue: CSIT: s3-t21-sut1 (10.30.51.44) failure

  • Summary: The device s3-t21-sut1 device is having an SSH disk read only issue and is unreachable over NW
    • We've opened a Vexxhost ticket to check the machine

Issue: Hung jobs

  • Summary: Intermittent jobs stuck/hung requiring the job to be aborted
    • We believe this issue was resolved with the latest Jenkins upgrade

Please let me know if you need additional information. If you experience any hung jobs or gateway timeout errors, please open a ticket at support.linuxfoundation.org.

Thank you,
Vanessa

On 11/6/19 9:56 AM, Maciek Konstantynowicz (mkonstan) wrote:
Hi Vanessa, Thanks for the note. CSIT project keeps experiencing issues
due to Jenkins outages. Do you have ETA for the fix that will stop these
outages?

-Maciek

On 5 Nov 2019, at 23:18, Vanessa Valderrama <vvalderrama@...> wrote:

Jenkins has been restarted, job views restored, jobs are running.

We will continue to investigate the Gateway Timeout and JNLP errors
we've been seeing the last couple of days.

If you experience any issues, please open a ticket at
support.linuxfoundation.org

Thank you,
Vanessa


On 11/5/19 4:39 PM, Vanessa Valderrama wrote:
We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1152): https://lists.fd.io/g/tsc/message/1152
Mute This Topic: https://lists.fd.io/mt/42686762/675185
Group Owner: tsc+owner@...
Unsubscribe: https://lists.fd.io/g/tsc/unsub  [mkonstan@...]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [tsc] FD.io Production Jenkins Restart Required

Maciek Konstantynowicz (mkonstan)
 

Hi Vanessa, Thanks for the note. CSIT project keeps experiencing issues
due to Jenkins outages. Do you have ETA for the fix that will stop these
outages?

-Maciek

On 5 Nov 2019, at 23:18, Vanessa Valderrama <vvalderrama@...> wrote:

Jenkins has been restarted, job views restored, jobs are running.

We will continue to investigate the Gateway Timeout and JNLP errors
we've been seeing the last couple of days.

If you experience any issues, please open a ticket at
support.linuxfoundation.org

Thank you,
Vanessa


On 11/5/19 4:39 PM, Vanessa Valderrama wrote:
We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1152): https://lists.fd.io/g/tsc/message/1152
Mute This Topic: https://lists.fd.io/mt/42686762/675185
Group Owner: tsc+owner@...
Unsubscribe: https://lists.fd.io/g/tsc/unsub [mkonstan@...]
-=-=-=-=-=-=-=-=-=-=-=-


Re: FD.io Production Jenkins Restart Required

Vanessa Valderrama
 

Jenkins has been restarted, job views restored, jobs are running.

We will continue to investigate the Gateway Timeout and JNLP errors
we've been seeing the last couple of days.

If you experience any issues, please open a ticket at
support.linuxfoundation.org

Thank you,
Vanessa

On 11/5/19 4:39 PM, Vanessa Valderrama wrote:
We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa


FD.io Production Jenkins Restart Required

Vanessa Valderrama
 

We continue having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views.

Jenkins will need to be restarted to resolve this issue.

Thank you,
Vanessa


Re: Jenkins Issue

Vanessa Valderrama
 

Jenkins is out of shutdown mode. Jobs are starting. If you experience
any issues please open a ticket at support.linuxfoundation.org.

Thank you,
Vanessa

On 11/4/19 4:33 PM, Vanessa Valderrama wrote:
Jenkins has been restarted.

We're pushing the jobs now.


On 11/4/19 2:22 PM, Vanessa Valderrama wrote:
We are having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views. We are aware aware of the issue and
will have it resolved shortly.

Jenkins will be placed in shutdown mode while we generate the jobs.

Thank you,
Vanessa


Re: Jenkins Issue

Vanessa Valderrama
 

Jenkins has been restarted.

We're pushing the jobs now.

On 11/4/19 2:22 PM, Vanessa Valderrama wrote:
We are having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views. We are aware aware of the issue and
will have it resolved shortly.

Jenkins will be placed in shutdown mode while we generate the jobs.

Thank you,
Vanessa


Jenkins Issue

Vanessa Valderrama
 

We are having issues with Gateway Timeouts on the CI merge job which has
corrupted the Jenkins job views. We are aware aware of the issue and
will have it resolved shortly.

Jenkins will be placed in shutdown mode while we generate the jobs.

Thank you,
Vanessa


Re: FD.io Maintenance: 2019-10-23 1700 UTC to 2100 UTC

Vanessa Valderrama
 

The migration went much quicker towards the end. Maintenance is complete. All systems are available. Please open a ticket at support.linuxfoundation.org if you experience any issues.

Thank you,
Anton & Vanessa

On 10/23/2019 03:56 PM, Vanessa Valderrama wrote:

We need to extend the maintenance window until 2300 to allow the volume migration to complete.


On 10/23/2019 11:10 AM, Vanessa Valderrama wrote:

Jenkins has been placed in shtudown mode in preparation for maintenance.


On 10/22/2019 09:35 AM, Vanessa Valderrama wrote:
Maintenance reminder
On Oct 17, 2019, at 1:04 PM, Vanessa Valderrama <vvalderrama@...> wrote:

What:

LF will be performing standard system maintenance and a Jenkins migration

  • Jenkins
    • Migrate to two new SSD volumes
    • OS updates
    • Jenkins upgrade to 2.190.1
    • Plugin upgrades
  • Nexus
    • OS updates
    • Nexus upgrade to 2.14.15-01
  • Jira
    • OS updates
    • Jira upgrade to 7.13.9
  • Gerrit
    • OS updates
    • Gerrit upgrade to 2.16.12
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates

When:

2019-10-23 1700 UTC to 2100 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1600 UTC. Please let us know if specific jobs cannot be aborted.

The following systems will be unavailable during the maintenance window:

  • Jenkins sandbox
  • Jenkins production
  • Nexus
  • Jira
  • Gerrit
  • Sonar
  • OpenGrok




Re: FD.io Maintenance: 2019-10-23 1700 UTC to 2100 UTC

Vanessa Valderrama
 

We need to extend the maintenance window until 2300 to allow the volume migration to complete.


On 10/23/2019 11:10 AM, Vanessa Valderrama wrote:

Jenkins has been placed in shtudown mode in preparation for maintenance.


On 10/22/2019 09:35 AM, Vanessa Valderrama wrote:
Maintenance reminder
On Oct 17, 2019, at 1:04 PM, Vanessa Valderrama <vvalderrama@...> wrote:

What:

LF will be performing standard system maintenance and a Jenkins migration

  • Jenkins
    • Migrate to two new SSD volumes
    • OS updates
    • Jenkins upgrade to 2.190.1
    • Plugin upgrades
  • Nexus
    • OS updates
    • Nexus upgrade to 2.14.15-01
  • Jira
    • OS updates
    • Jira upgrade to 7.13.9
  • Gerrit
    • OS updates
    • Gerrit upgrade to 2.16.12
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates

When:

2019-10-23 1700 UTC to 2100 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1600 UTC. Please let us know if specific jobs cannot be aborted.

The following systems will be unavailable during the maintenance window:

  • Jenkins sandbox
  • Jenkins production
  • Nexus
  • Jira
  • Gerrit
  • Sonar
  • OpenGrok