GOOD
good
This incident has been resolved.
May 23, 2024, 4:02 PM UTC
minor
We are investigating increased error rates for customers attempting to start Codespaces across all regions, around 15% of attempts are affected. Any affected customers may attempt to retry starting their Codespace. We are continuing to investigate.
May 23, 2024, 3:41 PM UTC
minor
We are investigating reports of degraded performance for Codespaces
May 23, 2024, 3:31 PM UTC
good
On May 21, 2024, between 11:40 UTC and 19:06 UTC various services experienced elevated latency due to a configuration change in an upstream cloud provider.

GitHub Copilot Chat experienced P50 latency of up to 2.5s and P95 latency of up to 6s. GitHub Actions was degraded with 20 - 60 minute delays for workflow run updates. GitHub Enterprise Importer customers experienced longer migration run times due to GitHub Actions delays. Additionally, billing related metrics for budget notifications and UI reporting were delayed leading to outdated billing details. No data was lost and systems caught up after the incident.

At 12:31 UTC, we detected increased latency to cloud hosts. At 14:09 UTC, non-critical traffic was paused, which did not result in restoration of service. At 14:27 UTC, we identified high CPU load within a network gateway cluster caused by a scheduled operating system upgrade that resulted in unintended, uneven distribution of traffic within the cluster. We initiated deployment of additional hosts at 16:35 UTC. Rebalancing completed by 17:58 UTC with system recovery observed at 18:03 UTC and completion at 19:06 UTC.

We have identified gaps in our monitoring and alerting for load thresholds. We have prioritized these fixes to improve time to detection and mitigation of this class of issues.
May 21, 2024, 7:06 PM UTC
minor
Actions is operating normally.
May 21, 2024, 6:14 PM UTC
minor
We are beginning to see recovery for any delays to Actions Workflow Runs, Workflow Job Runs, and Check Steps. Customers who are still experiencing jobs which appear to be stuck may re-run the workflow in order to see a completed state. We are also seeing recovery for GitHub Enterprise Importer migrations. We are continuing to monitor recovery.
May 21, 2024, 6:03 PM UTC
minor
We are continuing to investigate delays to status updates to Actions Workflow Runs, Workflow Job Runs, and Check Steps. This is impacting 100% of customers using these features, with an average delay of 20 minutes and P99 delay of 1 hour. Customers may see that their Actions workflows may have completed, but the run may appear to be hung waiting for its status to update. This is also impacting GitHub Enterprise Importer migrations. Migrations may take longer to complete. We are are working with our provider to address the issue and will continue to provide updates as we learn more.
May 21, 2024, 5:41 PM UTC
minor
We are continuing to investigate delays to status updates to Actions Workflow Runs, Workflow Job Runs, and Check Steps. Customers may see that their Actions workflows may have completed, but the run may appear to be hung waiting for its status to update. This is also impacting GitHub Enterprise Importer migrations. Migrations may take longer to complete. We are are working with our provider to address the issue and will continue to provide updates as we learn more.
May 21, 2024, 5:14 PM UTC
minor
We are continuing to investigate delays to Actions Workflow Runs, Workflow Job Runs, and Check Steps and will provide further updates as we learn more.
May 21, 2024, 4:02 PM UTC
minor
We have identified a change in a third party network configuration and are working with the provider to address the issue. We will continue to provide updates as we learn more.
May 21, 2024, 3:00 PM UTC
minor
We have identified network connectivity issues causing delays in Actions Workflow Runs, Workflow Job Runs, and Check Steps. We are continuing to investigate.
May 21, 2024, 2:34 PM UTC
minor
We are investigating delayed updates to Actions job statuses.
May 21, 2024, 1:58 PM UTC
minor
We are investigating reports of degraded performance for Actions
May 21, 2024, 12:45 PM UTC
good
Between May 19th 3:40AM UTC and May 20th 5:40PM UTC the service responsible for rendering Jupyter notebooks was degraded. During this time customers were unable to render Jupyter Notebooks.

This occurred due to an issue with a Redis dependency which was mitigated by restarting. An issue with our monitoring led to a delay in our response. We are working to improve the quality and accuracy of our monitors to reduce the time to detection.
May 20, 2024, 5:05 PM UTC
minor
We are beginning to see recovery rendering Jupyter notebooks and are continuing to monitor.
May 20, 2024, 5:01 PM UTC
minor
Customers may experience errors viewing rendered Jupyter notebooks from PR diff pages or the files tab
May 20, 2024, 4:50 PM UTC
minor
We are currently investigating this issue.
May 20, 2024, 4:47 PM UTC
good
On May 16, 2024, between 4:10 UTC and 5:02 UTC customers experienced various delays in background jobs, primarily UI updates for Actions. This issue was due to degradation in our background job service affecting 22.4% of total jobs. Across all affected services, the average job delay was 2m 22s. Actions jobs themselves were unaffected, this issue affected the timeliness of UI updates, with an average delay of 11m 40s and a maximum of 20m 14s.

This incident was due to a performance problem on a single processing node, where Actions UI updates were being processed. Additionally, a misconfigured monitor did not alert immediately, resulting in a 25m late detection time and a 37m total increase in time to mitigate.

We mitigated the incident by removing the problem node from the cluster and service was restored. No data was lost, and all jobs executed successfully.

To reduce our time to detection and mitigation of issues like this one in the future, we have repaired our misconfigured monitor and added additional monitoring to this service.

May 16, 2024, 5:15 AM UTC
minor
We are investigating reports of degraded performance for Actions
May 16, 2024, 4:43 AM UTC
good
This incident has been resolved.
May 14, 2024, 9:04 PM UTC
minor
We are seeing recovery for queue times on Actions Larger Runners and are continuing to monitor full recovery.
May 14, 2024, 8:47 PM UTC
minor
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
May 14, 2024, 8:09 PM UTC
minor
We are continuing to investigate long queue times for Actions Larger Runners
May 14, 2024, 7:16 PM UTC
minor
We are investigating long queue times for Actions Larger Runners
May 14, 2024, 6:40 PM UTC
minor
We are currently investigating this issue.
May 14, 2024, 6:37 PM UTC
good
On May 13, 2024, between 19:03 UTC and 19:57 UTC, some customers experienced delays in receiving status updates for in-progress GitHub Actions workflow runs. The root cause was identified to be a bug in the logic for checking the state of a configuration which would only manifest under very specific conditions and cause exceptions. These exceptions impacted the backend process for handling workflow run status updates and caused jobs with any annotations to not get updated properly. Jobs without any annotations were not affected. The affected jobs during the incident will get marked as failed after 24 hours, and affected customers will need to manually retry the jobs they want to execute.

We resolved the incident by reverting the problematic change. We are enhancing our process for deploying changes and reassessing our monitoring of relevant subsystems to prevent similar issues in the future.

May 13, 2024, 8:10 PM UTC
minor
We are seeing signs of recovery for actions jobs
May 13, 2024, 8:10 PM UTC
minor
We are investigating reports of degraded performance for Actions
May 13, 2024, 7:51 PM UTC
good
This incident has been resolved.
May 13, 2024, 3:44 PM UTC
minor
We are applying configuration changes to mitigate impact to Copilot Chat users.
May 13, 2024, 2:56 PM UTC
minor
We continue to investigate the root cause of elevated errors in Copilot Chat.
May 13, 2024, 2:13 PM UTC
minor
Copilot is experiencing degraded performance. We are continuing to investigate.
May 13, 2024, 1:27 PM UTC
minor
We are investigating an increase in exceptions impacting Copilot Chat usage from IDEs.
May 13, 2024, 1:24 PM UTC
minor
We are currently investigating this issue.
May 13, 2024, 1:23 PM UTC
good
Starting on May 7th, 2024 at 14:00 UTC, our elasticsearch cluster that powers Issues and Pull Requests became unresponsive correlating to a spike in usage. This affected GitHub customers who were trying to search issues and pull requests.

We mitigated this incident by adding additional cluster members which increased the resources available to the cluster. We are working to add additional safeguards to our endpoints. We are also continuing to investigate the root cause of the instability and if the index needs to be re-sharded for future risk mitigation.
May 7, 2024, 3:55 PM UTC
minor
We are seeing recovery, and are continuing to monitor issues and pull requests search results.
May 7, 2024, 3:25 PM UTC
minor
We’re investigating problems with our Issues and Pull Requests search cluster that are impacting result list pages and endpoints.
May 7, 2024, 3:00 PM UTC
minor
We're investigating page load problems with issues and pull requests.
May 7, 2024, 2:24 PM UTC
minor
We are investigating reports of degraded performance for Issues, API Requests and Pull Requests
May 7, 2024, 2:23 PM UTC
good
Starting on May 2, 2024 at 8:00 UTC through May 3 2:45 UTC the GitHub Enterprise Server (GHES) Azure Marketplace offering was degraded and customers were not able to create GHES VMs using our provided GHES images. This affected all GitHub customers that were attempting to deploy VMs in Azure using either the API or the Azure Portal. This was due to an incorrect configuration of our Azure Marketplace offering causing the images to be no longer visible to Azure users.

We mitigated the incident by working with our partners in Azure to restore access to the affected images.

We are working with our partners in Azure to add additional safeguards to ensure our images remain available to customers at all times. In addition, we continue to work with on restoring access to some older patch versions of GHES that remain unavailable at this time.
May 3, 2024, 2:45 AM UTC
minor
Azure Marketplace links have been restored and we are validating the images
May 3, 2024, 2:37 AM UTC
minor
Work is in progress to restore Azure Marketplace.
May 3, 2024, 1:44 AM UTC
minor
GHES Images on Azure are now restored and are available via the Azure CLI. Azure Marketplace listing is still not yet available. We will provide updates on the progress of the Azure Marketplace restoration.
May 3, 2024, 1:05 AM UTC
minor
Work is in progress to restore images. ETA is within 30 minutes.
May 3, 2024, 12:32 AM UTC
minor
Work is in progress to restore images, however there is no ETA yet for when restore will be complete
May 2, 2024, 11:58 PM UTC
minor
Work is in progress to restore images, however there is no ETA yet for when restore will be complete
May 2, 2024, 11:17 PM UTC
minor
Work in progress to restore images
May 2, 2024, 10:43 PM UTC
minor
We have identified the issue and are working to restore GHES VHD images.
May 2, 2024, 10:09 PM UTC
minor
We are actively engaged and working to mitigate the issue.
May 2, 2024, 8:26 PM UTC
minor
We have identified the root cause and are working to mitigate the issue.
May 2, 2024, 7:48 PM UTC
minor
Currently customers who use Azure Marketplace are unable to access GHES VHD images. This prevents customers who use Azure from spinning up new GHES instances (running instances are unaffected and are still able to hotpatch to new versions as well)
May 2, 2024, 7:07 PM UTC
minor
We are currently investigating this issue.
May 2, 2024, 7:07 PM UTC