
My friends and I were talking about what happens when you buy something online and your payment fails halfway through. Like, does your money just disappear? How do systems make sure that doesn’t happen? We ended up going down a rabbit hole and decided to build our own two phase commit protocol. I used Rust for the coordinator and Go for the microservices.
The Basic Idea
Two phase commit (2PC) is basically a voting system for distributed transactions. Either everyone agrees to do something, or nobody does it. Think of it like picking a restaurant with friends - if anyone says no, you have to start over.
What We Built
We split it into three parts: a coordinator written in Rust (the boss that tells everyone what to do), a wallet service in Go (handles user money), and an order service also in Go (manages product inventory).
The Coordinator
The coordinator is where all the decision making happens. Here’s the core logic in Rust:
1 | struct Coordinator { |
So how does it work? First phase: the coordinator asks everyone “can you do this transaction?” If anyone says no or doesn’t respond, we abort. Second phase: if everyone said yes, the coordinator tells them “okay, do it now.” Otherwise it’s like “never mind, forget about it.”
The Microservices
The microservices do the actual work. Here’s part of our wallet service:
1 | type WalletService struct { |
When Things Go Wrong
The interesting part is when stuff breaks, which happens all the time in distributed systems:
We tested what happens when services crash mid-transaction, when network connections drop, and when services are super slow to respond. Turns out distributed systems fail in really creative ways.
The Downsides
Two phase commit solves the consistency problem, but it’s not free. Everyone has to wait for the coordinator’s decision (blocking), there’s a ton of messages going back and forth (network overhead), and if the coordinator dies, everything just stops. That single point of failure is pretty brutal.
Deploying on the Cloud
We put this on Google Cloud Platform with separate VMs for each service. That’s when we learned that network latency is real and partial failures are everywhere.
Testing This Was Tricky
Testing distributed systems is way trickier than regular programs. Everything happens at once and things fail in weird ways:
1 |
|
Some Takeaways
Rust’s ownership model turned out to be really helpful for managing complex distributed state. Go’s goroutines made handling multiple transactions at once pretty straightforward.
The biggest thing though: what works perfectly on localhost breaks in weird and unexpected ways once you put it on actual infrastructure. Network latency is real and networks fail constantly. We had to think really carefully about timing and all the ways things could go wrong when testing.
Building this from scratch really helped us understand what’s happening under the hood in production systems. Distributed systems are hard, but at least now I get why payment systems are so complicated.
The code is on GitHub if you want to check it out. The README is in Norwegian though, since we wrote it for a class project.