Paxos Cuts Postgres Downtime 50X with Aurora Blue-Green Upgrades
Paxos, the regulated blockchain infrastructure provider, has revealed that implementing Aurora Blue-Green upgrades slashed Postgres downtime by 50X—cutting it from 30-120 minutes per cluster to just about one minute. This improvement helps the company maintain its stringent 99.99% uptime service level objectives (SLOs) across its 24/7 asset trading infrastructure. With over 60 Postgres clusters, the upgrade strategy also eliminated a month’s worth of coordination work tied to traditional upgrade processes.
Database downtime is a critical operational challenge for Paxos, which services institutions and consumers in regulated digital assets. Even planned maintenance carries significant risks to trading and transactional reliability. Aurora Blue-Green upgrades, an AWS strategy, offer a solution by creating a parallel 'green' environment that syncs with the production 'blue' environment via PostgreSQL logical replication. The switchover occurs in under a minute, minimizing traffic disruptions.
Key Challenges and Lessons Learned
While the benefits have been transformative, Paxos encountered several technical hurdles during implementation:
- Ephemeral Roles: Paxos uses Vault to manage temporary database credentials, which generates
CREATE ROLEstatements. These statements conflict with Blue-Green upgrades, requiring Paxos to disable dynamic role creation during upgrade windows. - Replication Slot Drops: Before PostgreSQL 17, logical replication slots must be dropped during upgrades, leading to temporary data loss in Change Data Capture (CDC) systems. Paxos relied on backfill strategies to recover missing data, a process that can take up to 30 hours per cluster.
- Cluster ID Changes: Post-upgrade, the green cluster becomes primary and receives a new cluster ID, impacting RDS IAM authentication. Adjusting client configurations added an extra 1-2 hours of work per cluster.
Despite these challenges, the results have been compelling. Traditional upgrade methods previously required 30-120 minutes per cluster, often jeopardizing the company’s ability to meet 99.9% monthly uptime. With Blue-Green upgrades, these figures fall comfortably within 99.99% uptime, drastically improving service reliability.
Broader Implications for High-Uptime Systems
Aurora Blue-Green upgrades are not new—AWS introduced the approach for MySQL and PostgreSQL in 2025—but Paxos’ adoption underscores its importance for financial services. High-availability requirements in sectors like digital assets, where even a brief outage can disrupt markets, make these strategies critical. AWS’s promise of sub-one-minute switchovers aligns perfectly with such demands, reducing operational risk while preserving user trust.
Similar use cases have emerged, such as Wiz’s near-zero downtime Aurora PostgreSQL upgrades (August 2025) and InstantDB’s experiments with logical replication issues earlier that year. With PostgreSQL 17 addressing key limitations like replication slot drops, future upgrades promise to be even smoother.
What’s Next for Paxos?
Paxos is prioritizing upgrades to PostgreSQL 17, which eliminates the need to drop replication slots during future migrations. This change alone could prevent data gaps in CDC systems, removing the need for extensive backfilling efforts. Additionally, Paxos is advocating for AWS to support replication slot continuity from reader nodes, which would further streamline the Blue-Green process.
For firms running large-scale Postgres clusters, Paxos’ experience offers valuable insights: address dynamic authorization systems like Vault early, plan for CDC disruptions, and prepare for IAM updates. With the right strategies, even the most complex upgrades can align with the high-availability standards demanded by modern digital infrastructure.