·6 min read

Building a Drift Detection Engine with Upstash

Menelaos KotsollarisMenelaos KotsollarisStaff Software Engineer at Realtor.com (Guest Author)

In this article, we will go through the steps of building a basic drift detection engine, utilizing the power of Upstash for our online remote state, and building the necessary functions (in TypeScript) for managing the local state computation. Let's get started!

Introduction

One challenge when managing infrastructure as code is drift. Drift Management has been a hot topic in the Infrastructure As Code (IAC) area ever since IAC arose in popularity. While IAC frameworks provide the ability to initialize infrastructure, their state is always dynamic and can change from the original declarations.

Note that while "Drift Management" is commonly applied to the IAC domain, there has been a rise in such applications in many areas. For instance, Vercel's TurboRepo Engine surely includes such an approach to validate and invalidate artifacts that are no longer "useful" or are no longer active.

In this article, we will build a drift management engine using TypeScript. Let's get started!

User Acceptance Criteria:

  • Persisting state remotely
  • Compute the difference of local vs remote state
  • Resolve discrepancies in either remote or local

Persisting state remotely with Upstash

One tremendous benefit of Upstash is the ease of getting started and effectively having both a redis database and an HTTP API URLs. This is essential, as could create local functions that persist state and follow the redis protocol, imitating a CRUD API:

const getAllRemoteState = async () => {
  const result = await getAllRemoteKeys();
  const enabledResources: string[] = await result.reduce(
    async (acc: string[], currKey: string) => {
      const accum = await acc;
      await axios
        .get(`${URL}/get/${currKey}`, getHeaders())
        .then(({ data }) => data);
      accum.push(currKey);
 
      return accum;
    },
    [],
  );
  return enabledResources;
};

Persisting state locally

For this use case, we will use local json blobs. Each json can store its metadata, and the filename will be its primary key:

example.json:

{
  "name": "example",
  "description": "this is an example",
  "expires": "March 2023"
}

This resource was given name, description and expiry date. Providing an expiry date would be a very useful attribute as in the future we could determine autonomous-depreciation policies based on resource under-utilizations. For instance, if this resource fails our expected usage, it could be flagged and hence removed from the system.

At a later stage, our files will start looking like this:

resources
-- data1.json
-- data2.json
-- ...

Drifting local versus remote

We have setup both local and remote state, and now we are ready to start drifting state. Our drifting formula is something along these lines:

DRIFT = ABS(REMOTE-LOCAL) != 0

DRIFT should be equal to 0 every single time. There are 2 possible cases other than DRIFT=0 case:

  • case A: remote contains more resources than local
  • case B: local contains more resources than remote

Let's see what do do on any of these cases:

CASE 1: remote contains more resources than local due to out-of-sync git

This can usually happen when a developer is working on a branch, and hasn't pulled the latest changes from the trunk. While the trunk has been updated with the latest resources, the local branch hasn't pulled all the recent changes, resulting in discrepancies with the local state. A way to visualize this:

# Branch | Checksum | Resource #

* 040cc7c     [CURRENT RESOURCES]=6
| * d9e8b5e   [CURRENT RESOURCES]=6
| * 5b98a10   [CURRENT RESOURCES]=6
|/            [CURRENT RESOURCES]=5
|             [CURRENT RESOURCES]=5

In this case, we don't modify the remote state as it's updated. The local state should remain as-is until the latest commit is pulled from the trunk.

CASE 2: remote contains more resources than local due to resource removal

Say our git branches are in sync, but we find that there are more resources available remotely than locally. That's an indication of a resource removal that has occurred locally. in this scenario, we want to remove the resource from the remote so that we sync it with the local state. Here's how we can do so:

const computeToBeDeleted = async (directoryPath = "") => {
  const localResources = await getLocalResources(directoryPath);
  const remoteResources = await getAllRemoteResourceKeys();
  const result = remoteResources.filter((x) => !localResources.includes(x));
  return result;
};

This function will compute the resources that need to be deleted from the remote state. Note that it will return an array, which we later need to iterate to send a request to upstash redis server, removing all the keys from the redis DB.

The terraform drift management approach

When it comes to managing drift, few compare to the power of terraform. There have been detailed blogs, like this one, on how terraform manages drift, and it's worth looking into the terraform model. A few key-takeaways:

  1. terraform state is similar to the "local" state of our example
  2. terraform refresh/plan/config are commands that calculate state on demand.
  3. terraform apply executes the configured resources, after alerting of possible add/deletion of resources.

In our example, we attached the apply mechanism directly when a commit is merged on the trunk. While this is straightforward, there might be cases where treating the trunk as "pristine" might not work, and hence terraform decouples its commands from the predefined branch mechanism.

Many other IAC platforms have similar flavored concepts of managing drift, including AWS Cloudformation, Microsoft Azure, and so on.

The idea of coupling drift to branches could be appealing for certain teams who do trunk-based development, but if you are not falling under this category, a more on-demand (API) driven approach might work for you much better.

Summary

In this article, we built the bare bones of a drift management engine. By using controls for both remote and local states, we were able to interchangeably compute desired states and add/remove resources. Something to highlight is how easy it was to achieve the remote computations by using Upstash. Obviously, we could contemplate using other databases within AWS (ie DynamoDB), having a RESTful API in under 30 seconds was very e.handy to get started.

In the drift management space, there are certain players, like HashiCorp's terraform solution, which are leading the space. Drift management mechanisms could be applied in other domains. The general concept of remote and local differentiation is very close to the ones of cache invalidation, remote caching and others. I feel that the industry is just at the begging of realizing how powerful some of these concepts are, and one example is Vercel's Turborepo implementation, my bet is that there will be a steady increase of vendors utilizing such solutions in the near future, which makes the space very exciting to be part of!

Resources