Two Phase Commit Protocol in Rust and Go

Dilawar Mahmood

2020-05-02

Distributed Systems, Performance Optimization, Software Engineering

My friends and I were talking about what happens when you buy something online and your payment fails halfway through. Like, does your money just disappear? How do systems make sure that doesn’t happen? We ended up going down a rabbit hole and decided to build our own two phase commit protocol. I used Rust for the coordinator and Go for the microservices, which turned out to be a pretty good combination.

What Two Phase Commit Actually Does

Two phase commit (2PC) is basically a voting system for distributed transactions. Either everyone agrees to do something, or nobody does it. It’s like when you’re trying to pick a restaurant with friends: if anyone vetoes the choice, you start over.

What We Built

We made three main pieces:

Coordinator (Rust): The boss that tells everyone what to do
Wallet Service (Go): Handles user money
Order Service (Go): Manages product inventory

The Coordinator Code

The coordinator is where all the decision making happens. Here’s the core logic in Rust:

struct Coordinator {
    wallet_conn: TcpStream,
    order_conn: TcpStream,
}

impl Coordinator {
    fn prepare_phase(&mut self, transaction: Transaction) -> Result<bool, Error> {
        self.wallet_conn.write_all(&transaction.serialize())?;
        self.order_conn.write_all(&transaction.serialize())?;
        
        let wallet_vote = self.wallet_conn.read_response()?;
        let order_vote = self.order_conn.read_response()?;
        
        Ok(wallet_vote == READY && order_vote == READY)
    }
    
    fn commit_phase(&mut self) -> Result<(), Error> {
        self.wallet_conn.write_all(COMMIT_MSG)?;
        self.order_conn.write_all(COMMIT_MSG)?;
        Ok(())
    }
}

It works in two phases:

Phase 1 (Prepare): “Hey everyone, can you do this transaction?” If anyone says no or doesn’t respond, we abort.

Phase 2 (Commit): If everyone said yes, “Okay everyone, do it now.” Otherwise, “Never mind, forget about it.”

The Go Microservices

The microservices do the actual work. Here’s part of our wallet service:

type WalletService struct {
    db *sql.DB
}

func (ws *WalletService) handlePrepare(tx *sql.Tx, userId int, amount float64) error {
    var balance float64
    err := tx.QueryRow("SELECT balance FROM wallets WHERE user_id = ?", userId).Scan(&balance)
    if err != nil {
        return err
    }
    
    if balance < amount {
        return errors.New("insufficient funds")
    }
    
    _, err = tx.Exec("UPDATE wallets SET balance = balance - ? WHERE user_id = ?", amount, userId)
    return err
}

When Things Go Wrong

The interesting part is when stuff breaks, which happens a lot in distributed systems:

We tested scenarios like:

Services crashing mid transaction
Network connections dropping
Services being super slow to respond

Why 2PC Isn’t Perfect

Two phase commit solves the consistency problem, but it comes with costs:

Blocking: Everyone has to wait for the coordinator’s decision
Network overhead: Lots of messages back and forth
Single point of failure: If the coordinator dies, everything stops

Deploying on the Cloud

We put this on Google Cloud Platform with separate VMs for each service. That’s when we learned that network latency is real and partial failures are everywhere.

Testing Distributed Stuff Is Hard

Testing distributed systems is way trickier than regular programs because everything happens at once and things fail in weird ways:

#[test]
fn test_node_failure_during_prepare() {
    let mut coordinator = Coordinator::new();
    let transaction = Transaction::new(user_id: 1, amount: 100.0);
    
    // Simulate node failure
    coordinator.order_conn.shutdown()?;
    
    assert!(matches!(
        coordinator.prepare_phase(transaction),
        Err(Error::Timeout)
    ));
}

What We Learned

Building this taught us a bunch of stuff:

Rust is great for this: The ownership model really helps when you’re dealing with complex distributed state management.

Go makes concurrency easy: Goroutines made handling multiple transactions at once pretty straightforward.

Networks fail constantly: What works fine on localhost breaks in all sorts of creative ways when you put it on real infrastructure.

Testing is crucial: You have to think really carefully about timing and all the ways things can go wrong.

The whole project was a good reminder that distributed systems are hard, but building them from scratch really helps you understand what’s happening under the hood in production systems.

If you want to check out the code, it’s all on GitHub. Fair warning: the README is in Norwegian because we wrote it for a class project.