GitHub Reports December 2024 Service Disruptions
GitHub, a leading platform for software development collaboration, reported two significant service disruptions in December 2024, according to GitHub's blog. These incidents resulted in degraded performance across its services, affecting user access and functionality.
Incident on December 17
The first incident occurred on December 17, 2024, from 14:33 UTC to 14:50 UTC. During this period, GitHub users encountered intermittent errors and timeouts, with the error rate averaging 8.5% and peaking at 44.3% of requests. The disruption impacted several core functionalities, including logging in, viewing repositories, and managing pull requests and issues.
The root cause was identified as an overload of the web servers due to planned maintenance, which inadvertently caused the failure of the live updates service. This service is crucial for providing automatic updates to users, who were forced to manually refresh pages, further straining the servers. GitHub mitigated the issue by reversing the maintenance changes and scaling up the service to manage the increased traffic from WebSocket clients.
Post-incident analysis revealed gaps in GitHub's alerting system, which led to a delayed assessment of the incident's impact. The company is now focused on enhancing monitoring and alerting mechanisms to prevent similar issues in the future.
Incident on December 20
The second incident took place on December 20, 2024, between 15:57 UTC and 16:39 UTC. This disruption was attributed to a partial outage with one of GitHub's third-party service providers, rendering some marketing pages inaccessible and causing 500 errors for users attempting to access them. However, no operational products or service areas were affected during this time.
The service provider resolved the issue at 16:39 UTC, restoring access to the affected pages. GitHub is currently exploring ways to improve error handling and ensure graceful degradation of service in the event of future outages.
GitHub continues to work on strategies to enhance its infrastructure resilience and service reliability. Users can track real-time service status updates on their status page and learn more about ongoing improvements on the GitHub Engineering Blog.