An Overview of Byzantine Fault Tolerant Consensus

The General Problem

Byzantine Fault Tolerant (BFT) consensus is a class of algorithm dedicated to the resolution of conflicts in a system where each node is treated as adversarial. The problem in distributed systems derives from the byzantine generals problem which seeks to quantify the behavior of potentially malicious parties within a distributed network. These systems are mathematically secure, typically employing proofs of the security of the protocol which are rigorously evaluated via the peer review process. Contemporary BFT protocols deployed within Web3 environments rely on a bunch of different security assumptions which, if violated, can compromise the network. A non-exhaustive list is as follows:

Honest Majority: An honest majority is a security concern that is treated similarly to cryptographic security (though conceptually distinct). The honest majority assumption stipulates that, so long as a majority of members in the quorum are “honest”, meaning they participate faithfully in the protocol and do not attempt to take actions against the security of the protocol, the protocol is safe from tampering by malicious (e.g. byzantine) parties.
Cryptographic Security of Messages: This is a casual way to say that messages are signed by the party, typically using some type of key pair with each node guarding their private keys to prevent message spoofing.
Equivocation Resistance: The protocol should know, and ideally penalize, nodes which equivocate, where equivocation is the process of sending different messages to different parties, effectively pretending to be honest and dishonest at the same time.

The HotStuff Consensus Protocol

HotStuff is a modern BFT consensus protocol that streamlines the process of reaching agreement across nodes in a network. Unlike traditional BFT protocols that require multiple rounds of voting, HotStuff introduces a pipelined three-phase commit approach, with the phases being prepare, pre-commit, and commit. HotStuff is a so-called leader-based BFT protocol, meaning that at each round (or, view in the literature), the elected leader in a given view is solely responsible for the progression of the protocol. This has a number of challenges which we'll get into later. At the time of publication, HotStuff achieved novel performance characteristics. The protocol has the ability to scale linearly, and achieve quorum in a single round of communication (assuming successive honest leaders). For this reason, a deployment of HotStuff can scale linearly to the number of nodes in the network, enabling great performance characteristics for large distributed networks. HotStuff2 is an improvement to the original protocol, both of which hinge on the so-called “chain rule”. We wrote a bit of documentation that fully characterizes the HotStuff behavior, which you can find here. HotStuff2 two just accomplishes the chain rule with a two-chain instead of a three-chain.

Well-read readers may also know about DAG-consensus protocols which get around some of the stated issues with leader-heavy BFT. In fact, one of our team members built the first-ever production instance of the sailfish protocol, but we'll leave that for another time.

At Sythe Labs, we've cut our teeth not only building novel, state-of-the-art BFT Consensus protocols, but we've derived new and interesting ways to break them as well. Our unique blend of high technological experience with a combined 2 decades of security industry exposure allows us to find bugs that no other team can.

Classes of Failures

Overview

We'll talk about three common failures that we've seen over and over when building, debugging, and pentesting various consensus implementations:

Signature Replay
Deferred Validation
Implementation Errors

We'll elucidate each of these as we go, but the defining line here is that all of these typically come-about, not due to a lack of understanding of the protocol itself, but an overall lack of exposure to safety-critical programming. Paranoia is key here, and the relative new-ness of Web3 has created an environment where developers from non-traditional backgrounds are now in charge of these critical pieces of infrastructure. This has created the situation where some of these issues have shown up, despite the developers being otherwise very skilled. What we're trying to say is: these types of bugs can happen to you, no matter how good you are, and that's okay.

Unique Failures in BFT Consensus

For many protocols and chains, downtime is existential. If there's a situation in which your entire network can halt, you are likely totally owned. From user trust, to lost funds, to a compromise of your quorum, this is as severe as remote code execution. Something to bear in mind here is that web3 consensus differs from that in web2. In a web2 environment, downtime is common, and while not ideal, usually does not destroy your business (unless it happens constantly). In web3, the threat model is completely different, and as security engineers we must be keenly aware of these types of issues on top of the other standard problems that can occur in any software environment.

Some implementation details are needed before we can understand the issue in full. In HotStuff, a proposal's origin is checked via signatures, so each leader must sign their proposal. That way, we always know that a received proposal, or any signed message, is genuine. These public keys are stored in a distributed hash table (DHT) which is treated as a shared source of truth as to who is in the quorum at a given time.

If we consider that blocks are built on top of earlier blocks (making a chain of blocks) and that we check that these blocks’ predecessor blocks are the data we expect (this would also be an issue if we did not), we can imagine several cases that could lead to halts and DoS attacks if not done carefully. So, let's dive in.

Situation #1: Signature Replay

Consider what could happen if a previously valid signature can be re-sent and go undetected. What would happen?

You might think that if we include a temporal constraint like the view number in the signed proposal data, everything will be fine, but that is not necessarily true, as the appropriate handling of this data is just as important to your signature formation process. Validation checks, especially for cryptographic signatures, need to occur at each point of evaluation.

Suppose for example that we’re running an implementation of chained HotStuff. We might utilize some sort of BLS aggregate signature to ensure that all parties signed off on a message (this is bog standard from the paper). The paper does not stipulate, however, the verification of such critical data, this is left as an implementation detail to the engineer. Leader-heavy BFT is complex for a number of reasons, chief among them being the need to perform concurrent operations and do a ton of state updates. Suppose a certificate check is missing on one of these steps. Since the pipelined HotStuff model relies on running multiple views at the same time, completing them as they occur, a malicious entity could initiate a proposal for a new view (say 3 views in the future), which uses a replayed signature. If this occurs, a single missing check could result in the cancellation of the other threads in the pipeline as a very new proposal has arrived, and the protocol’s behavior upon receipt of a valid proposal would be to assume that that data is honest (it has a valid aggregate signature, after all), leading to a halt in the network for as long as the malicious party can fill slots in the pipeline.

We've seen this happen before in production. A BFT implementation receives a proposal. Encoded in the message is a signature over the contents of the block, where the commitment of the block is computed with the block contents. Crucially, the temporal information (view number) is not included in the block contents, which just contains the transaction. Instead, the view information is in the header, which is not computed in the signature over the commitment to the block data. Because of this, an enterprising party can wait for an empty block (which is very likely for home-rolled consensus at low volumes). They can keep that valid payload signature and assign any view number they'd like in the header. This enables two nasty attacks:

The byzantine party can arbitrarily get any node slashed by spamming pretending to be them.
The byzantine party can spam ever-increasing view numbers and drive consensus at a pace where views will keep timing out, halting progression.

Let's go over the two common failure paths we've seen for this from an implementation standpoint. We call these the passive voter and the incompetent leader.

Passive Voter

The passive voter is a node that doesn’t perform adequate validation when receiving proposals. It either votes blindly on any valid-looking proposal or locks itself into the first one it sees, without checking whether it matches expected data for the current view.

pending_proposal = receive_pending_proposal()
# Make sure this is for a new view
if pending_proposal.view >= current_view + 1:
    # Vote without checking if the message was signed by the leader
    vote(pending_proposal)

Incompetent Leader

The incompetent leader is one that can be tricked into locking voting for a specific view. This happens when it accepts the first proposal it receives that appears to come from the “correct” leader which an attacker might fake by replaying an old, valid signed proposal from that leader from a previous view. This is also a common flaw as a signature check might be the only validation a leader does (as described above).

proposal = receive_proposal()

# Contrived, but actually happens!
if validate_signature(proposal.payload):

    # Stop prior pipeline steps up to the newly received view
    stop_prior_pipeline_steps(current_view)

    # Uh oh, we didn't check the proposal's view number!
    view = proposal.header.view_number
    
    # Drive consensus forward, flushing old valid state, and taking new invalid state.
    process_proposal(view, proposal)

Mitigation Strategies

Each node should independently derive the expected proposal based on received data and expected view number, then compare it with the received proposal before checking the signature. From here, the signature should be validated as having come from the leader (or other valid party), and all signatures should include some nonce or other non-repeated value in the signature.

Situation #2: Deferred Validation

A major issue that can lead to a complete halt of the network involves proposals that accept any data and validate that data only after the proposal has passed the vote, deferring validation until all nodes have reached agreement and are preparing to branch off of that node in the blockchain.

For example, consider a case where the proposer can set the L1 block from which it fetched the latest depositor list. Now think about what happens if we set the latest fetched block to a very high number, and that number is not limited or validated. This can cause problems for the next leader, who must propose on top of this malicious proposal, because we might only validate that our parent’s latest fetched L1 block is a finalized L1 block. This can halt the entire network, or it can require that the new proposal’s latest finalized L1 block be larger than the previous one to ensure the chain always includes more and more new depositors. This essentially means that well-behaving operators will not be able to proceed until we reach that extremely large value or do a full-scale upgrade.

This is largely a programming or implementation error, not a failure of the protocol, but it’s very easy to miss. In leader-heavy BFT, especially when integrating third-party subsystems such as a data availability layer, you might have a lot of moving parts in your protocol that you need to reconcile. As a result, your nodes may be voting a lot. Lower-stakes information might get missed in some of these vote paths, leading to the above issue. In general, all data in all consensus-critical paths (and, ideally, all other paths as well) must be exhaustively validated to avoid such malicious input cases.

Mitigation Strategies

Proposal data must always be validated by each node, including before the quorum has been reached on a particular proposal. No proposal should ever pass voting if it includes information that hasn't been sanity checked by each voter. This is decidedly non-trivial, especially in circumstances where the nodes are in view-sync and could legitimately be 10 or 100 views behind their peers.

Situation #3: Implementation Errors

As experienced engineers, we feel very strongly that one of the critical distinctions between a junior versus senior engineer is the recognition that every codebase is likely insecure in one way or another. While rust and other memory-safe abstrations are all the rage, they will not save you from this, nor will AI, or even a good spec. At the end of the day, code is run overwhelmingly more than it is read, and as your engineering team scales beyond a single engineer, the ability to keep all parts of the codebase known by all parties shrinks expontentially. It is imperative to recognize that every line of code added, or dependency included, could be a potential CVE laying in wait for a sophisticated attacker to come and exploit. In this section, we'll talk about one of the most common sources of problems that we've encountered in leader-heavy BFT, and systems programming in general: race conditions.

What is a race condition?

A race condition is a situation in concurrent execution where two distinct execution paths "race" to the same outcome. Concretly, suppose you have an array, and two threads that operate on that array, one reading values, and the other writing values. Without any synchronization primitives (mutex, semaphore, etc), the code will behave non-deterministically, leading to potential memory access violations on the reader thread. This is because a reader might be holding a reference to some data in the array, and the writer might be mutating the array. It could delete the data you're referencing (use after free, a very common bug), or it could change the data underneath you while you're doing some conditional logic (TOCTOU, another very common bug). These are all potential exploits that can affect your code when doing asynchronous operations which, in the case of BFT, is the modus operandi if you want something that is remotely performant.

BFT Centralized State

In the pipelined model of HotStuff, the state storage mechanism is typically centralized, as multiple parts of the protocol (voting, proposing, etc) would need access to the same information as it's available. Not to mention that in the pipelined model there are other dependencies that can take awhile to arrive (a lagging voter or proposer). Because of this, the protocol is eventually consistent, meaning that, given enough time (theoretically infinite), the protocol will eventually shake out all the states that it was waiting on.

Such centralized god-states can result in trivial halting issues due to the fact that any improper updates, for example, updating the internal decide chain with old data after a newer chain has already been formed, could completely halt consensus for a single node, as it would be thought of as attempting to equivocate by proposing a different chain history. Such race conditions can be triggered via fuzzing as well. In our experience, it's easy to join a node to the network in a local deployment and condition it to always repeat every message 10 times. This could result in a lot of extra data over the wire, and a lagging node could be trivially pushed to the point of triggering something invalid. These bugs can be disgusting to track down, and it's important to have regular evaluations by security experts on these paths as there's not really a lot you can do to mitigate them (unless you have the money to use antithesis or something).

Mitigation

Clear documentation of code paths to shared state are a good first step. Well-intentioned design and isolation of all critical flows, and effective use of synchronization primitives and checks can go a long way. As a general design rule, it's always advisable to isolate critical code into well-tested chunks that can be exhaustively validated, only exposing APIs into their state to ensure that other code cannot pollute the small surface area. Lastly, non-determinism is the enemy. Any testing that can be done to minimize sources of ambiguity is crucial to maintain the stability of the code as well.

Final Thoughts

These issues are subtle but devastating in practice. They tend to emerge not during initial implementation but as systems evolve, integrate, and operate under load or adversarial conditions. It is imperative to have security personnel in your corner that can catch mistakes, and keep your codebase in a healthy condition. Above all else, though, is testing. Testing everything you can, every scenario, every edge case, can go an extremely long way in providing a robust product that your customers can rely on.

Interested in leveling up the security of your protocol? Our team has audited everything from the Nitro, OP, custom consensus protocols like HotStuff and Sailfish, as well as a number of web3 contracts for DEXes and stake tables. We can help make sure that you're always at the top of your game. Reach out today.

Your BFT Protocol Will Break in Production

An Overview of Byzantine Fault Tolerant Consensus

The General Problem

The HotStuff Consensus Protocol

Classes of Failures

Overview

Unique Failures in BFT Consensus

Situation #1: Signature Replay

Passive Voter

Incompetent Leader

Mitigation Strategies

Situation #2: Deferred Validation

Mitigation Strategies

Situation #3: Implementation Errors

What is a race condition?

BFT Centralized State

Mitigation

Final Thoughts