
My friends and I were talking about what happens when you buy something online and your payment fails halfway through. Like, does your money just disappear? How do systems make sure that doesn’t happen? We ended up going down a rabbit hole and decided to build our own two phase commit protocol. I used Rust for the coordinator and Go for the microservices, which turned out to be a pretty good combination.
What Two Phase Commit Actually Does
Two phase commit (2PC) is basically a voting system for distributed transactions. Either everyone agrees to do something, or nobody does it. It’s like when you’re trying to pick a restaurant with friends: if anyone vetoes the choice, you start over.
What We Built
We made three main pieces:
- Coordinator (Rust): The boss that tells everyone what to do
- Wallet Service (Go): Handles user money
- Order Service (Go): Manages product inventory
The Coordinator Code
The coordinator is where all the decision making happens. Here’s the core logic in Rust:
1 | struct Coordinator { |
It works in two phases:
Phase 1 (Prepare): “Hey everyone, can you do this transaction?” If anyone says no or doesn’t respond, we abort.
Phase 2 (Commit): If everyone said yes, “Okay everyone, do it now.” Otherwise, “Never mind, forget about it.”
The Go Microservices
The microservices do the actual work. Here’s part of our wallet service:
1 | type WalletService struct { |
When Things Go Wrong
The interesting part is when stuff breaks, which happens a lot in distributed systems:
We tested scenarios like:
- Services crashing mid transaction
- Network connections dropping
- Services being super slow to respond
Why 2PC Isn’t Perfect
Two phase commit solves the consistency problem, but it comes with costs:
Blocking: Everyone has to wait for the coordinator’s decision
Network overhead: Lots of messages back and forth
Single point of failure: If the coordinator dies, everything stops
Deploying on the Cloud
We put this on Google Cloud Platform with separate VMs for each service. That’s when we learned that network latency is real and partial failures are everywhere.
Testing Distributed Stuff Is Hard
Testing distributed systems is way trickier than regular programs because everything happens at once and things fail in weird ways:
1 |
|
What We Learned
Building this taught us a bunch of stuff:
Rust is great for this: The ownership model really helps when you’re dealing with complex distributed state management.
Go makes concurrency easy: Goroutines made handling multiple transactions at once pretty straightforward.
Networks fail constantly: What works fine on localhost breaks in all sorts of creative ways when you put it on real infrastructure.
Testing is crucial: You have to think really carefully about timing and all the ways things can go wrong.
The whole project was a good reminder that distributed systems are hard, but building them from scratch really helps you understand what’s happening under the hood in production systems.
If you want to check out the code, it’s all on GitHub. Fair warning: the README is in Norwegian because we wrote it for a class project.