This incident has been resolved.
Dec 1, 2023, 6:16 PM UTC
A small percentage of Copilot Chat users are still experiencing long request times and errors. We are still investigating to determine the root cause.
Dec 1, 2023, 5:36 PM UTC
Some customers are experiencing higher latency for Copilot Chat. We are continuing our investigation.
Dec 1, 2023, 4:41 PM UTC
Copilot is experiencing degraded performance. We are continuing to investigate.
Dec 1, 2023, 3:55 PM UTC
We are investigating reports that that some customers are experiencing increased latency and failed requests for Copilot Chat.
Dec 1, 2023, 3:53 PM UTC
We are currently investigating this issue.
Dec 1, 2023, 3:49 PM UTC
This incident has been resolved.
Nov 28, 2023, 7:40 PM UTC
We were not able to publish webhooks in response to push events triggered between 16:23 and 17:12 UTC. To avoid further disruption to customer workflows, we’ve decided not to continue our attempts to re-process those events.
Workflows that trigger on pull_request or git push events may not have been run during this time period.
You can run impacted workflows manually or by pushing a new commit to the same branch.
Nov 28, 2023, 7:36 PM UTC
Customers saw push event deliveries for Actions and Webhooks fail between 16:23 and 17:12 (UTC). We fixed the issue, and we are working to re-process push events for the affected time period.
Nov 28, 2023, 6:09 PM UTC
We are currently investigating this issue.
Nov 28, 2023, 6:09 PM UTC
This incident has been resolved. An interaction between two feature flag rollouts caused us to suppress delivery of push webhooks between 16:23 and 17:11 UTC in a manner that evaded our existing observability. This affected 71,000 repositories whose users will have noted missing webhook deliveries and/or experienced Actions jobs with push triggers failing to start. During this period, around 25% of new Actions jobs were impacted. The issues were resolved by disabling one of the feature flags in question.
After weighing various options for retroactively dispatching old push webhooks, we concluded that the risk of delivering stale data to customers with these redeliveries outweighed the possible benefit.
As follow up to this incident, we are working to improve monitoring of webhook throughput and to document a policy around webhook redelivery timelines.
Nov 28, 2023, 5:59 PM UTC
Actions is operating normally.
Nov 28, 2023, 5:59 PM UTC
Customers saw pull requests push event deliveries for Actions and Webhooks fail between 16:23 and 17:12 (UTC). We fixed the issue, and we are working to re-process push events for the affected time period.
Nov 28, 2023, 5:36 PM UTC
We are investigating reports of degraded performance for Actions and Webhooks
Nov 28, 2023, 5:24 PM UTC
This incident has been resolved.
On November 27, 2023 at 18:46 UTC, we attempted to rotate our OpenID Connect (OIDC) authentication flow certificates. Due to an error in the certificate formatting, we uploaded an invalid certificate configuration that was not observed in our pre-production testing. Our background job servers were unable to start because a valid configuration is required at worker start up. As a result, users experienced delays in Pull Requests, Webhooks, Issues, Actions and Projects. Rollback of the change was slowed by the invalid certificate as our deployment system relied on the same certificate. Rollback was completed at 20:35 UTC. Most services recovered by 20:44 UTC.
Delayed updates to Issues and Pull Requests were applied normally once the changes were rolled back. After the change was rolled back, a large queue of Actions-related jobs built up which included Pull Request, Pull Request review and Pull Request review comment events. About 2.3% of Actions jobs failed during the duration of the incident. Job queue times returned to normal once all remaining jobs were processed.
We are working to improve our certificate testing and rotation process to reduce the risk of customer-impacting errors.
Nov 27, 2023, 9:11 PM UTC
Webhooks is operating normally.
Nov 27, 2023, 8:44 PM UTC
Issues is operating normally.
Nov 27, 2023, 8:44 PM UTC
Pull Requests is operating normally.
Nov 27, 2023, 8:43 PM UTC
Actions customers are experiencing workflow start delays as part of the ongoing PRs incident. We are seeing previously delayed runs kick off and will continue to monitor.
Nov 27, 2023, 8:39 PM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
Nov 27, 2023, 8:30 PM UTC
Customers are also experiencing delays in webhook delivery and issue updates. We are seeing recovery and are continuing to monitor.
Nov 27, 2023, 8:22 PM UTC
Webhooks is experiencing degraded performance. We are continuing to investigate.
Nov 27, 2023, 8:16 PM UTC
Issues is experiencing degraded performance. We are continuing to investigate.
Nov 27, 2023, 8:16 PM UTC
Customers are seeing delays in pushed commits appearing on pull requests. We are currently investigating.
Nov 27, 2023, 7:46 PM UTC
We are investigating reports of degraded performance for Pull Requests
Nov 27, 2023, 7:43 PM UTC
On November 21, 2023, at 09:50 UTC GitHub Actions jobs encountered delays due to an incident in our background job service caused by excessive rebalancing in a Kafka consumer group. After a quick mitigation, we began to see recovery on the job queues by 10:02 UTC. During this time window 100% of Actions jobs were delayed in starting for up to 11 minutes.
Unfortunately, the rapid queue recovery sent a thundering herd of jobs to Actions hosted runner pools, causing a database deadlock that resulted in some hosted runner pools having increased latency when accepting new jobs. This affected only a small percentage of overall jobs, around 2%. Configuration changes led to a resolution and the system was fully recovered by 11:27 UTC and all in progress jobs were processed.
The incident is now resolved.
Nov 21, 2023, 11:27 AM UTC
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
Nov 21, 2023, 11:12 AM UTC
We have recovery for the underlying issue but are waiting for Actions queues to catch up. We expect this to be completed in less than 1 hour(s).
Nov 21, 2023, 10:24 AM UTC
We are investigating reports of degraded performance for Actions
Nov 21, 2023, 10:11 AM UTC
On 2023-11-15, from 09:44 to 10:42 UTC, some GitHub customers experienced increased latency or errors accessing repo data.
High concurrent access to a specific git object exposed a bug that forced a backend service to perform excessive calculations, overloading the service. Access to this repo was paused while load was re-rerouted, mitigating the problem.
The conditions that triggered the expensive operations have been identified and refactored.
Nov 15, 2023, 11:34 AM UTC
Error rates and performance have returned to normal.
Nov 15, 2023, 11:33 AM UTC
We have identified the source of the issue and have removed the additional load from the service. Sporadic delays in pull request experiences and intermittent 500s are still occurring and impacting a very small percentage of traffic. Next update is expected within 30 minutes.
Nov 15, 2023, 11:21 AM UTC
We are seeing connectivity issues between some of our systems and git backend services. This is causing intermittent error responses and delays in pull request experiences for a very small percentage of traffic. We are investigating mitigations and expect to provide another update within 30 minutes.
Nov 15, 2023, 11:04 AM UTC
We are currently investigating this issue.
Nov 15, 2023, 9:50 AM UTC
Between 20:35 and 21:38 we experienced up to a 20 minute delay delivering around 30,000 notifications due to side effects of some planned maintenance on supporting systems. We have noted the unexpected user impact of this type of maintenance and will address it in future maintenance planning.
Nov 13, 2023, 9:38 PM UTC
An issue related to notifications has been resolved. Users should again be seeing their notifications.
Nov 13, 2023, 9:38 PM UTC
We're seeing issues related to notifications.
Nov 13, 2023, 9:15 PM UTC
We are currently investigating this issue.
Nov 13, 2023, 9:13 PM UTC
On November 11, 2023, at 1:00 UTC, GitHub background jobs encountered delays lasting up to 50 minutes. This delay affected various services utilizing background jobs, including Actions, Webhooks, Pull Requests, and Pages. The impact persisted for approximately one hour until 2:10 UTC.
During the incident, some customers experienced delays in starting Github Actions workflow runs and Pages builds. We estimate that about 10% of Actions workflow runs were delayed during the impact window and 99% of Pages builds failed from 1:00 UTC to 1:20 UTC. Users may have experienced a delay in seeing recent pushes reflected in pull request views. This delay averaged between 5 and 10 minutes and affected up to 30% of pull request page views during the incident. 1% of pull request page views experienced delays of up to 60 minutes. Finally, 30% of webhook deliveries in this window missed our target of being delivered within 1 minute of the triggering event.
This incident was caused by excessive rebalancing in our Kafka consumer group that feeds our background job system. We have altered our Kafka configuration to reduce the likelihood of this issue, created diagnostic tools to identify future causes, and will be breaking up this relay into multiple groups to limit the blast radius if the problem does reoccur.
Nov 11, 2023, 2:14 AM UTC
Pages is operating normally.
Nov 11, 2023, 2:14 AM UTC
Actions is operating normally.
Nov 11, 2023, 2:13 AM UTC
Rebalancing completed and job queues are improving. We continue to monitor for full recovery of Webhooks, Actions, and Pages workflows.
Nov 11, 2023, 1:53 AM UTC
Actions is experiencing degraded performance. We are continuing to investigate.
Nov 11, 2023, 1:42 AM UTC
Webhooks is experiencing degraded performance. We are continuing to investigate.
Nov 11, 2023, 1:41 AM UTC
Pages builds, webhooks, and other workflows were delayed starting at 1:00 UTC. We have failed over the service that was contributing to the delays and see successful processing. We are continuing to monitor for full recovery
Nov 11, 2023, 1:40 AM UTC
We are investigating reports of degraded performance for Pages
Nov 11, 2023, 1:26 AM UTC