good

This incident has been resolved.
Apr 26, 2024, 4:49 PM UTC

minor

The issue appears to be limited in scope to a few internal users, without any reports of issues from outside GitHub. We are adding additional logging to our WebAuthn flow to detect this in the future. If you cannot use your mobile passkey to sign in, please contact support or reach out to us in here.
Apr 26, 2024, 4:49 PM UTC

minor

Sign in to GitHub.com using a passkey from a mobile device is currently failing. Users may see an error message saying that passkey sign in failed, or may not see any passkeys available after signing in with their password.
This issue impacts both GitHub.com on mobile devices as well as cross-device authentication where the phone's passkey is used to authenticate on a desktop browser.
To workaround this issue, use your password and the 2FA method you setup prior to setting up your passkey, either TOTP or SMS.
Apr 26, 2024, 3:10 PM UTC

minor

We are currently investigating this issue.
Apr 26, 2024, 3:10 PM UTC

good

This incident has been resolved.
Apr 24, 2024, 6:20 PM UTC

minor

The previous mitigation has been rolled back and updates to the pull request merge button should be working again. If you are still seeing issues, please attempt refreshing the pull request page.
Apr 24, 2024, 6:01 PM UTC

minor

One of our mitigations from the previous incident caused live updates to the pull request merge button to be disabled for some customers. Refreshing the page will update the mergability status.
Apr 24, 2024, 5:45 PM UTC

minor

We are investigating reports of degraded performance for Pull Requests
Apr 24, 2024, 5:40 PM UTC

good

This incident has been resolved.
Apr 24, 2024, 4:16 PM UTC

minor

Issues is operating normally.
Apr 24, 2024, 4:13 PM UTC

minor

Actions is operating normally.
Apr 24, 2024, 4:13 PM UTC

minor

Pull Requests is operating normally.
Apr 24, 2024, 4:13 PM UTC

minor

Webhooks is operating normally.
Apr 24, 2024, 4:13 PM UTC

minor

Git Operations is operating normally.
Apr 24, 2024, 4:13 PM UTC

minor

API Requests is operating normally.
Apr 24, 2024, 4:12 PM UTC

minor

We are seeing site-wide recovery but continue to closely monitor our systems and putting additional mitigations in place to ensure we are back to full health.
Apr 24, 2024, 3:50 PM UTC

minor

We are continuing to see consistent impact, and we’re continuing to work on multiple mitigations to reduce load on our systems.
Apr 24, 2024, 2:08 PM UTC

minor

We have found an issue that may be contributing additional load to the web site and are working on mitigations. We don't see any additional impact at this time and will provide another update within an hour if we see improvements or fully mitigate the issue based on this investigation.
Apr 24, 2024, 12:47 PM UTC

minor

We have taken some mitigations and see less than 0.3 percent of requests failing site wide but we still see elevated 500 errors and will continue to stay statused and investigate until we are confident we have restored our error rate to base line.
Apr 24, 2024, 12:00 PM UTC

minor

We are seeing increased 500 errors for various GraphQL and REST APIs related to database issues. Some users may see periodic 500 errors. The team is looking into the problematic queries and mitigations now.
Apr 24, 2024, 11:13 AM UTC

minor

Actions is experiencing degraded performance. We are continuing to investigate.
Apr 24, 2024, 11:09 AM UTC

minor

Git Operations is experiencing degraded performance. We are continuing to investigate.
Apr 24, 2024, 11:06 AM UTC

minor

Pull Requests is experiencing degraded performance. We are continuing to investigate.
Apr 24, 2024, 10:55 AM UTC

minor

Webhooks is experiencing degraded performance. We are continuing to investigate.
Apr 24, 2024, 10:52 AM UTC

minor

Issues is experiencing degraded performance. We are continuing to investigate.
Apr 24, 2024, 10:51 AM UTC

minor

We are investigating reports of degraded performance for API Requests
Apr 24, 2024, 10:45 AM UTC

good

This incident has been resolved.
Apr 24, 2024, 11:01 AM UTC

minor

We are investigating reports of degraded performance for Git Operations
Apr 24, 2024, 10:56 AM UTC

good

This incident has been resolved.
Apr 18, 2024, 6:47 PM UTC

minor

Codespaces customers using our 16 core machines in West US 2 and West US 3 region may experience issues creating new Codespaces and resuming existing Codespaces. We suggest any customers experiencing issues switch to the East US region.
Apr 18, 2024, 6:41 PM UTC

minor

We are investigating reports of degraded performance for Codespaces
Apr 18, 2024, 6:25 PM UTC

good

On April 16th, 2024, between 22:31 UTC and 00:11 UTC, Copilot chat users experienced elevated request errors. On average, the error rate was 1.2% and peaked at 5.2%. This was due to a rolling application upgrade applied to a backend system during a maintenance event.

The incident was resolved once the rolling upgrade was completed.

We are working to improve monitoring and alerting of our services, be more resilient to failures, and coordinate maintenance events to reduce our time to detection and mitigation of issues like this in the future.
Apr 17, 2024, 12:48 AM UTC

minor

We're continuing to investigate issues with Copilot
Apr 17, 2024, 12:30 AM UTC

minor

Copilot is experiencing degraded performance. We are continuing to investigate.
Apr 16, 2024, 11:59 PM UTC

minor

We're investigating issues with Copilot availability
Apr 16, 2024, 11:57 PM UTC

minor

We are currently investigating this issue.
Apr 16, 2024, 11:51 PM UTC

good

Between April 15th, 2024 (09:45 UTC) and April 18th, 2024 (19:10 UTC), Copilot completions experienced intermittent periods of degraded service availability affecting portions of Europe and North America, impacting 3.5% of users globally at its peak. This was due to rolling Operating System level maintenance updates to Copilot infrastructure within those regions, which failed to gracefully restart as intended.

The incident was mitigated by routing traffic to other regions, and was resolved once the update was completed and normal traffic routing was restored.

We are working to resolve the root issue that prevented systems from restarting gracefully, as well as improving our coordination and monitoring around backend maintenance operations going forward to reduce time to recovery from such issues in the future.
Apr 15, 2024, 2:53 PM UTC

minor

We have applied mitigation for Copilot in EU region and are working towards the full recovery of the service.
Apr 15, 2024, 2:13 PM UTC

minor

Due to an outage in one Copilot region traffic is currently being served from other regions. European users may experience higher response times.
Apr 15, 2024, 1:35 PM UTC

minor

We are investigating reports of degraded performance for Copilot
Apr 15, 2024, 12:58 PM UTC

good

Beginning at 17:30 UTC on April 11th and lasting until 20:30 UTC on April 14th, github.com saw significant (up to 2 hours) delays in delivering emails. At 14:21 UTC on April 14th, community reports of this were confirmed and an incident declared. Emails most impacted by the delay were password reset and unrecognized device verification, which contain time-sensitive links or verification codes, and are required to be acted on in order for password resets or unrecognized logins to proceed.

Users attempting to reset their password during the incident were unable to complete the reset. Users without two-factor authentication (2FA), signing in on an unrecognized device, were unable to complete device verification. Enterprise Managed Users, users with 2FA, and users on recognized devices or IP addresses were still able to sign in. This impacted 800-1000 user device verifications and 300-400 password resets.

The mailer delays were caused by increased usage of a shared resource pool; a separate internal job queue became unhealthy and prevented the mailer queue from being worked on.

We have made some immediate improvements to better detect and react to this type of situation again. As a short-term mitigation strategy, we have added a queue-bypass ability for time-sensitive emails, like password reset and unrecognized device verification. We can enable this setting if we observe email delays reoccurring, which will ensure that future incidents do not affect user ability to complete critical login flows. We have paused the unhealthy job queue, to prevent impact to other queues using shared resources. And we have updated our methods of detection for anomalous email delivery, to identify this issue sooner.
Apr 14, 2024, 9:53 PM UTC

major

We are seeing a full recovery. Device verification and password reset emails are delivered on time.
Apr 14, 2024, 9:52 PM UTC

major

We are deploying a possible mitigation to the delayed device verification and password change emails.
Apr 14, 2024, 9:34 PM UTC

major

We continue to investigate issues with delays of email deliveries which is preventing users without 2FA enabled from verifying new devices. We will provide more information as it becomes available.
Apr 14, 2024, 7:54 PM UTC

major

We are continuing to investigate issues with the delivery of device verification emails for users without 2FA.
Apr 14, 2024, 3:50 PM UTC

major

We are continuing to investigate issues with the delivery of device verification emails for users without 2FA on new devices.
Apr 14, 2024, 3:01 PM UTC

major

Device verification emails for sign-ins for users without 2FA on new devices are being sent late or not at all. This is blocking successful sign-ins for these users. We are investigating.
Apr 14, 2024, 2:27 PM UTC

major

We are currently investigating this issue.
Apr 14, 2024, 2:21 PM UTC

major

On April 10, 2024, between 2024-04-10 18:33 UTC and 2024-04-10 19:03 UTC, several services were degraded due to the release of a compute-intensive database query that prevented a key database cluster from serving other queries.

GitHub Actions saw delays and failures across the entire run life cycle and had a significant increase in the number of timeouts in API requests. All Pages deployments failed for the duration of the incident. Git Systems saw approximately 12% of raw file download requests and 16% of repository archive download requests return HTTP 50X error codes for the duration of the incident. Issues experienced increased latency for issue creation and updates. Codespaces saw roughly 500 requests to create and resume a Codespace timeout during the incident.

We mitigated the incident by rolling back the offending query. We are working to introduce measures to automatically detect compute-intensive queries in test runs during CI to prevent an issue like this one from recurring.
Apr 10, 2024, 7:03 PM UTC

major

Git Operations, API Requests, Actions, Pages, Issues and Copilot are operating normally.
Apr 10, 2024, 7:03 PM UTC

good

This incident has been resolved.
Apr 10, 2024, 7:03 PM UTC

major

Git Operations, API Requests, Actions, Pages, Issues and Copilot are operating normally.
Apr 10, 2024, 7:03 PM UTC

major

Copilot is experiencing degraded performance. We are continuing to investigate.
Apr 10, 2024, 7:01 PM UTC

major

We're aware of issues impacting multiple services and have rolled back the deployment. Systems appear to be recovering and we will continue to monitor.
Apr 10, 2024, 6:55 PM UTC

major

API Requests is experiencing degraded performance. We are continuing to investigate.
Apr 10, 2024, 6:53 PM UTC

major

Copilot is experiencing degraded availability. We are continuing to investigate.
Apr 10, 2024, 6:45 PM UTC

major

Issues is experiencing degraded performance. We are continuing to investigate.
Apr 10, 2024, 6:42 PM UTC

major

API Requests is experiencing degraded availability. We are continuing to investigate.
Apr 10, 2024, 6:42 PM UTC

major

We are investigating reports of degraded performance for Git Operations, API Requests, Actions and Pages
Apr 10, 2024, 6:41 PM UTC

good

Between 2024-04-09 21:35 UTC and 2024-04-10 19:03 UTC, creation of new Codespaces was degraded by an image upgrade to the virtual machines of new Codespaces. During the incident, approximately 7% of new Codespaces were created but never became available to their owning end users.

We mitigated the incident by reverting to the previous image version. We are working to improve deployment confidence around image upgrades to reduce the likelihood of recurrence.
Apr 10, 2024, 6:07 PM UTC

minor

We have applied a fix and are continuing to monitor. This incident will remain open for now until we have confirmed that the service is fully restored.
Apr 10, 2024, 5:31 PM UTC

minor

We believe we have identified the root cause of the issue and are working to fully restore the Codespaces service. We will provide another update within the next 30 minutes.
Apr 10, 2024, 4:56 PM UTC

minor

We’re seeing issues related to connecting to Codespaces impacting a subset of users. We are actively investigating and will provide another update shortly.
Apr 10, 2024, 4:20 PM UTC

minor

We are investigating reports of degraded performance for Codespaces
Apr 10, 2024, 4:12 PM UTC

good

Between 8:18 and 9:38 UTC on Wednesday, April 10th, customers experienced increased error rates across several services due to an overloaded primary database instance, ultimately caused by an unbounded query. We mitigated the impact by failing the instance over to more capable hardware and shipping an improved version of the query that runs against read replicas. In response to this incident, we are also working to make performance improvements to the class of queries that most frequently resulted in failed requests during this timeframe.

Web-based repository file editing saw a 17% failure rate during the incident with other repository management operations (e.g. rule updates, web-based branch creation, repository renames) seeing failure rates between 1.5% and 8%. API failure rates for these operations were higher.

Issue and Pull Request authoring was heavily impacted during this incident due to reliance on the impacted database primary. We are continuing work to remove our dependence on this particular primary instance from our authoring workflows for these services.

GitHub search saw a 5% failure rate throughout this incident due to reliance on the impacted primary database when authorizing repository access. The majority of failing requests were for search bar autocomplete with a limited number of search result failures as well.
Apr 10, 2024, 9:38 AM UTC

major

Issues and Pull Requests are operating normally.
Apr 10, 2024, 9:38 AM UTC

major

The mitigation rolled out has successfully resolved the issue. We have seen failure rates reduce and normal service return across all affected features.
Apr 10, 2024, 9:38 AM UTC

major

We are aware of impact across a number of GitHub features. This is primarily seen to be impacting write actions for Issues, Repositories and Pull Requests. Additionally we are seeing increased failure rates for search queries.

Our team has rolled out a mitigation and is monitoring for recovery.
Apr 10, 2024, 9:30 AM UTC

major

We are investigating reports of degraded availability for Issues and Pull Requests
Apr 10, 2024, 9:22 AM UTC

good

On April 9, 2024, between 18:00 and 20:17 UTC, Actions was degraded and had failures for new and existing customers. During this time, Actions failed to start for 5,426 new repositories, and 1% of runs for existing customers were delayed, with half of those failing due to an infrastructure error.

The root cause was an expired certificate which caused authentication to fail between internal services. The incident was mitigated once the cert was rotated.

We are working to improve our automation to ensure certs are rotated before expiration.
Apr 9, 2024, 8:17 PM UTC

minor

We continue to work to resolve issues with repositories not being able to enable Actions and Actions network configuration setup not working properly. We have confirmed a fix and are in the process of deploying it to production. Another update will be shared within the next 30 minutes.
Apr 9, 2024, 7:43 PM UTC

minor

We continue to work to resolve issues with repositories not being able to enable Actions and Actions network configuration setup not working properly. We will provide additional information shortly.
Apr 9, 2024, 7:06 PM UTC

minor

We are aware of issues with repositories not being able to enable Actions. We are in the process of restoring full functionality and will provide additional information shortly.
Apr 9, 2024, 6:36 PM UTC

minor

We are investigating reports of degraded performance for Actions
Apr 9, 2024, 6:36 PM UTC

good

On April 9, 2024, between 04:32 UTC and 05:10 UTC, an outage occurred in Github Packages, specifically impacting the download functionality of NPM Packages. All attempts to download NPM Packages failed during this period. Upon investigation, we found a recent code change in the NPM Registry to be the root cause. The customer impact was limited to users of NPM Registry, with no effects on other registries.

We mitigated the incident by rolling back the problematic change. We are following up with repair items to cover our observability gaps and implementing measures in our CI process to detect such failures early before they can impact customers.
Apr 9, 2024, 5:10 AM UTC

minor

We are investigating reports of issues with downloading NPM packages. We will continue to keep users updated on progress towards mitigation.
Apr 9, 2024, 4:51 AM UTC

minor

We are currently investigating this issue.
Apr 9, 2024, 4:32 AM UTC

good

On April 6, 2024, between 00:00:00 UTC and 02:20:05 UTC, access to Private Pages on the *.pages.github.io domain was degraded while the deployed TLS certificate was expired. Service was restored by uploading the renewed certificate to our CDN. This was due to a process error and a gap in our alerting. While the certificate was renewed and updated in our internal vault, it was not deployed to the CDN.

We are working to reduce potential for errors in our certificate renewal process as well as adding the *.pages.github.io domain to our existing TLS alerting system.
Apr 6, 2024, 2:22 AM UTC

minor

We are investigating issues with private pages due to an expired certificate
Apr 6, 2024, 1:52 AM UTC

minor

We are investigating reports of degraded performance for Pages
Apr 6, 2024, 1:52 AM UTC

good

On April 5, 2024, between 8:11 and 8:58 UTC a number of GitHub services were degraded, returning error responses. Web request error rate peaked at 6%, API request error rate peaked at 10%. Actions had 103,660 workflow runs fail to start.

A database load balancer change caused connection failures in one of our three data centers to various critical database clusters. The incident was mitigated once that change was rolled back.

We have updated our deployment pipeline to better detect this problem in earlier stages of rollout to reduce impact to end users.

Apr 5, 2024, 9:18 AM UTC

major

Pull Requests is operating normally.
Apr 5, 2024, 9:17 AM UTC

major

Issues is operating normally.
Apr 5, 2024, 9:17 AM UTC

major

API Requests is operating normally.
Apr 5, 2024, 9:17 AM UTC

major

Codespaces is operating normally.
Apr 5, 2024, 9:17 AM UTC

major

Actions is operating normally.
Apr 5, 2024, 9:17 AM UTC

major

Pages is operating normally.
Apr 5, 2024, 9:17 AM UTC

major

Actions is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 9:17 AM UTC

major

We've reverted a change we believe caused this, are seeing initial indications of reduced errors, and are monitoring for full recovery
Apr 5, 2024, 9:00 AM UTC

major

Pages is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:59 AM UTC

major

We're seeing connection failures to some databases in two of three sites and are investigating.
Apr 5, 2024, 8:51 AM UTC

major

Pull Requests is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:50 AM UTC

major

Issues is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:49 AM UTC

major

API Requests is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:49 AM UTC

major

Codespaces is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:49 AM UTC

major

We are investigating reports of degraded availability for Actions
Apr 5, 2024, 8:33 AM UTC

good

This incident has been resolved.
Apr 5, 2024, 8:53 AM UTC

major

We are currently investigating this issue.
Apr 5, 2024, 8:31 AM UTC

good

This incident has been resolved.
Apr 5, 2024, 8:48 AM UTC

minor

Issues, API Requests, Pull Requests and Codespaces are operating normally.
Apr 5, 2024, 8:48 AM UTC

minor

Codespaces is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:36 AM UTC

minor

Pull Requests is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:34 AM UTC

minor

API Requests is experiencing degraded performance. We are continuing to investigate.
Apr 5, 2024, 8:32 AM UTC

minor

We are investigating reports of degraded performance for Issues
Apr 5, 2024, 8:28 AM UTC

good

Between April 3rd, 2024 23:15 UTC and April 4th, 2024 01:10 UTC, GitHub Actions experienced a partial infrastructure outage that led to degraded workflows (failed or delayed starts). Additionally, 0.15% of Webhook deliveries were degraded due to an unrelated spike in database latency in a single availability zone. SLOs for Actions were 90% during the incident, but this was not evenly distributed across customers. We statused green after a long stretch of recovered SLOs, starting at April 4th, 2024 00:35 UTC. During this incident, we also had issues with incident tooling (here. failing to update the public status page and occasionally not loading.

The incident was resolved after the infrastructure issue was mitigated at 2024-04-04 04:27 UTC.

We are working to improve monitoring and processes in response to this incident. We are investigating how we can improve resilience and our communication with our infrastructure provider, and how we can better handle ongoing incidents that are no longer impacting SLOs. We are also improving our incident tooling to ensure that the public status page is updated in a timely manner.
Apr 4, 2024, 1:10 AM UTC

minor

API Requests is operating normally.
Apr 4, 2024, 1:09 AM UTC

minor

Actions is operating normally.
Apr 4, 2024, 1:07 AM UTC

minor

We are seeing recovery in Actions workflows creation and accessing Actions statuses via the API.
Apr 4, 2024, 12:46 AM UTC

minor

Webhooks is experiencing degraded performance. We are continuing to investigate.
Apr 4, 2024, 12:25 AM UTC

minor

We are investigating Actions workflows failures and delays.
Apr 4, 2024, 12:12 AM UTC

minor

API Requests is experiencing degraded performance. We are continuing to investigate.
Apr 4, 2024, 12:06 AM UTC

minor

We are investigating reports of degraded performance for Actions
Apr 3, 2024, 11:59 PM UTC