Yesterday Never Dies: Securing Stateless Systems Against Downgrade Attacks

Posted on 9/24/25 by Brian Esler, Backend Engineer at Turnkey (follow on X)

As discussed in the Turnkey Whitepaper, Turnkey’s application of Trusted Execution Environments (“TEEs”) provides enormous security benefits to our customers, and is in fact one of the core differentiators that sets us apart from the competition. However, the very statelessness that makes TEEs such a powerful and secure primitive also poses its own set of security challenges. Chief among these is a vulnerability to downgrade attacks, wherein a malicious adversary with privileged access attempts to trick a TEE by replaying an outdated, historical Organization in place of the current revision.

In this post, we’ll trace the evolution of Turnkey’s defense against downgrade attacks—from the crude beginnings of our Refresher system to the present-day elegance of our Merkle tree solution—exploring what worked, what didn’t, and what we learned along the way.

But First: Understanding the Threat Model

Before moving on, let’s establish an appropriate working definition for what we mean when we talk about a downgrade attack:

downgrade attack: An attack in which a stateful input is maliciously reverted to a historical state in order to elicit desired output.

In our case, the “stateful input” in question is the Organization. In essence, this represents a full and complete snapshot of an Organization. Among other things, the Organization contains collections of Users as well as their public keys, and also a collection of Policies which govern what types of Activities the Organization allows. The Organization is also timestamped and cryptographically signed by the TEE which produced it (the Notarizer), which means that its authenticity can be cryptographically verified.

Now, let’s consider one of the core flows in Turnkey’s backend: policy evaluation.

No Downgrade Attack - No Defense

In this flow, the Coordinator retrieves the most recent revision of an Organization and passes it along to the Policy Engine to determine whether a given Activity should be allowed to proceed. The Policy Engine determines that, based on the Policies present in the Organization, this Activity should in fact be denied, and thus correctly elicites a DENIED response. In short, things are working as intended (no funny business here!).

Now, let’s demonstrate what a downgrade attack by introducing a malicious Coordinator (i.e. one that is executing compromised code):

Downgrade Attack - No Defense

NOTE: In order to understand the scenario above, it is important to understand Turnkey’s threat model. Specifically: TEEs (like our Policy Engine) are treated as “trusted” components due to their verifiablity, but all other components (like our Coordinator) are treated as “untrusted”. This is why we must consider scenarios like the one above, wherein one of our “untrusted” components becomes compromised, despite the fact that such a scenario would necessitate an attacker with privileged access to our infrastructure. For more information, see the Threat model section of the Turnkey Whitepaper.

In this scenario, our compromised Coordinator deliberately retrieves an outdated Organization instead of the most recent one, and submits it to the Policy Engine. The Policy Engine determines that, based on the Policies of the old Organization, this Activity should be allowed, and thus an incorrect APPROVED response is returned.

Age-Based Repudiation: Our First Solution

Fortunately, Turnkey was aware of this attack vector from day one, and even the earliest versions of our product have had a robust defense against it. Our initial defense was simple but effective: We’ll have our TEEs enforce a “maximum age” requirement on each and every Organization, and reject any Organization exceeding this age.

Continuing with the policy evaluation flow discussed above, we can clearly see how this would be effective in preventing downgrade attacks:

Downgrade Attack - Max Age Defense

Huzzah! Our Policy Engine correctly flags the old Organization as being “too old”, and thus successfully thwarts an attempted downgrade attack.

But hang on… Let’s imagine now that, a few days later, the following scenario occurs:

No Downgrade Attack - Max Age Defense

*Gasp*: Our Policy Engine has incorrectly flagged the most recent revision of an Organization as being “too old”, despite the fact that it was the right Organization for the job.

Whatever value we choose for our “maximum age” threshold will be completely arbitrary with respect to the organic evolution of an Organization: While one Organization may change state every hour, another Organization may remain unchanged for days, months, or even years. It isn’t safe to make assumptions about the cadence at which an Organization evolves, because this is based on factors entirely outside of our control.

That is, unless we introduce a Service solely responsible for ensuring that, at any given moment, each and every legitimate Organization fulfills our “maximum age” requirement.

Enter–The Refresher

The Refresher

The sole purpose of our Refresher Service was to constantly iterate through our datastore, locate Organizations nearing the “maximum age” threshold, and then “refresh” them by using a Notarizer TEE. The Refresh operation itself functions as you might expect: The timestamp of the Organization is updated, a new Signature is applied, and the resulting payload is returned.¹

Functionally speaking, this solution is perfectly adequate: By ensuring that all valid Organizations in our datastore do not exceed the “maximum age” threshold, our Refresher has made it safe to enforce said “maximum age” in our TEEs. It would seem that we’ve solved the problem of downgrade attacks once and for all!

… But How Does it Scale?

Simply put: Not well at all. Unfortunately, the workload of our Refresher scales O(n) with respect to the number of Organizations in our database. While this poor scalability was acceptable when n was of the order of 10k or 100k, it became less so as n exceeded 1m. As n began numbering in the 10’s of millions, the Refresher system became a serious burden: At this point, the vast majority of Notarizer requests were Refresh operations, as opposed to Notarize operations (which represent actual Organization updates)!

We decided our system needed an overhaul.

Introducing the Merkle Tree: A Scalable Solution!

We were able to address the poor scalability of the above solution through the introduction of a useful cryptographic primitive: The Merkle tree!

The Merkle Tree

Structural Characteristics

The Merkle tree depicted above is much smaller than our production tree, but it displays the relevant structural characteristics:

Leaf Nodes
- Each Leaf is assigned a unique Organization;
- Leaf.Val is set to hash of the Organization;
- Leaf.T is set to the timestamp of the Organization;
- Each Leaf is cryptographically signed;
Parent Nodes
- The Val of each parent node is the hash of the concatenation of its children (i.e. hash(L || R));
Root
- The Val of the root is the hash of the concatenation of its children (i.e. hash(L || R));
- The Root contains the timestamp T of its last update;
- The Root is cryptographically signed;

Conceptually, we can summarize the structure of our tree as follows:

Each Leaf is a verifiable snapshot of one Organization state, while the Root is a verifiable snapshot of all Organization states.

See Appendix to learn more about some interesting and unique aspects of our Merkle tree data representation!

Putting our Merkle Tree to Work

Now that we understand the basic structure of our Merkle tree, let’s talk about how this data structure can be used to defend against downgrade attacks, and why it may represent a more scalable solution that our previous approach.

Previously, our “Age-Based Repudiation” defense against downgrade attacks boiled down to the enforcement of the following condition:

An Organization must have a sufficiently recent timestamp.

But now, with the introduction of our Merkle tree, we can introduce the following condition:

An Organization must prove inclusion in a sufficiently recent version of our Merkle tree.

To understand how this new revised condition might help us, let’s examine a specific inclusion proof from our example tree:

Single Merkle Inclusion Proof NOTE: For those less familiar with Merkle tree mechanics, you can think of a “Merkle inclusion proof” as the minimal set of leaves & nodes required to reproduce and verify the Root.

The graphic above shows the inclusion proof for Organization_8, and also displays an exciting quality which hints at the power of this solution: The timestamp of Organization_8 lags many days behind the timestamp of our Root!

To appreciate the power of this observation, let’s perform a simple thought experiment: Imagine that several days pass, during which the state of Organization_8 remains the same. In fact, let’s go one step further: Let’s imagine that none of our Organizations have changed state over this period of time, because everyone in the world is enjoying a long weekend. In this case, would our aforementioned Refresher get to similarly enjoy some much needed R&R? Absolutely not!! In fact, our poor, overworked Refresher would be grinding through the millions of unchanged Organizations to ensure that each one is nice and fresh on Monday morning.

Now let’s replay this scenario in our Merkle tree world. Imagine that, in place of our Refresher, we instead have a Merkle Tree Updater Service whose job it is to update our Merkle tree with Organization hashes as Organizations change state. What would the long weekend have looked like for our Merkle Tree Updater? Something like this:

Merkle Tree Updater - Long Weekend

While, sadly, our Merkle Tree Updater would be obliged to work over the long weekend, the only task it would need to perform is a periodic refresh of the Root to ensure it is kept sufficiently fresh. That’s it! One single piece of data!

To put this in more explicit terms, the work of our Merkle Tree Updater scales O(1) with respect to the number of Organizations in our database, which is a massive improvement over the O(n) scalability of our Refresher system!

Updating our Merkle Tree

So now that we understand the basics of how our Merkle tree works, let’s talk about Merkle tree updates. Continuing with our thought experiment, we can imagine that Monday morning has arrived, and there’s now a steady trickle of Organization state changes occurring as Turnkey customers return to work.

The good news is that Merkle tree updates are straightforward and very efficient; however, they must be performed synchronously with respect to the tree itself (i.e. you cannot perform two Merkle tree updates concurrently). This presents a design challenge: If performed in-line with our Notarization flow, these Merkle tree updates would act as a bottleneck, effectively prohibiting Turnkey from performing Notarizations concurrently.

To eliminate this bottleneck, we have designed our Merkle Tree Updater as a background process: It periodically scans our DB for Organizations which have changed state, and then performs a single, highly-efficient batch update operation. Effectively, the evolution of our Merkle tree is eventually consistent with respect to our Organizations, rather than immediately consistent.

However, in order to accommodate our “eventually consistent” Merkle tree, we must refine our conditional statement, which we previously summarized as follows:

An Organization must prove inclusion in a sufficiently recent version of our Merkle tree.

The problem with the statement above is that, immediately after an Organization is notarized, the condition above will fail until the update has “eventually” been incorporated into the Merkle tree. To account for this, we instead use a coallescing condition, with the prefix being the condition from our original “Age-Based Repudiation” system:

An Organization must have a sufficiently recent timestamp –OR– must prove inclusion in a sufficiently recent version of our Merkle tree.

The former threshold (a.k.a. the max_organization_age) must be kept larger than the maximum amount of time it might take for an Organization update to show up in our Merkle Tree, while the latter threshold (a.k.a. the max_merkle_root_age) must be kept larger than the cadence at which our Merkle tree is updated.

Bringing it All Together

The importance of defending our stateless systems against downgrade attacks has been on Turnkey’s radar since day one; however, so has the necessity of ensuring that those defense systems are able to scale with our product. While our infrastructure was able to absorb the cost of the poorly-scaling “Age-Based Repudiation” in the early days of Turnkey, the need to overhaul this system became more and more pressing with each onboarded customer. Thankfully, necessity truly is the mother of innovation, and we were ultimately able to implement a robust and highly scalable solution built around a fascinating and powerful cryptographic primitive: The Merkle tree.

I hope you’ve enjoyed this deep-dive into downgrade attack prevention half as much as I’ve enjoyed building these solutions, and that perhaps I’ve even sparked some ideas about how you can adapt similar Merkle tree designs in your own stateless systems. If that’s the case, please don’t hesitate to reach out and share your thoughts and ideas!

Appendix: Merkle Tree Data Representation

We made the decision to build our Merkle tree implementation in-house rather than using a 3rd party library, and this afforded us the opportunity to make several interesting design optimizations. Grokking these implementation details is not required to grok the overall design, which is why I’ve moved them to an appendix; however, they are both interesting and instructive!

Deferred Node Creation

The first optimization is a relatively obvious one, but worth mentioning nonetheless, as it is a prerequisite for the following optimizations: We designed our tree such that parent nodes with zero children are not written to the database, but rather exist “implicitly”.

Fixed Depth & Structure

The second optimization we chose is to have a Merkle Tree with a fixed depth and structure. Specifically: Our production Merkle tree is binary, and has a fixed depth of 35, which means it can support just over 17 billion Organizations.²

Note that using this type of Merkle tree would have been prohibitively difficult without the aforementioned Deferred Node Creation, as it would have required pre-populating our datastore with over 34 billion null nodes. However, with deferred node creation in place, we only needed to populate our datastore with a single node: The empty Root!

Implicit Relational Structure

If you’ve gotten this far, you’re in for a treat, because this one is a real gem!

If you’re considering what the DB schema might look like for a Merkle tree node in, say, a SQL datastore, you might be tempted to model the relationships explicitly (e.g. with each node having parent, left, and right FK’s). However, given the aforementioned fixed depth and structure of our tree, we can instead encode these relationships directly into our indexing scheme!

Merkle Tree Indexing Scheme

As you can see in the graphic above, we’ve chosen to assign integer ID’s to our nodes, and we’ve adopted a left-to-right, top-to-bottom scheme. Given this choice of indexing scheme, and also taking advantage of the fact that this is a binary tree, we can find the parent_id for a given child_id via an elegant bit-shifting operation:

parent_id = child_id >> 1

In English: The ID of the parent is the ID of the child bit shifted to the right. Try it out if you don’t believe me!

But this isn’t just elegance for the sake of elegance. It actually provides some very real advantages:

Reduced memory footprint due to elimination of FK fields.
Lightning-fast proof generations which leverage bit-shift operations rather than joining across FK’s.
DB implementation agnosticism thanks to the fact that we’re not explicitly modeling relationships.