Back to Blog
José Manuel Requena Plens

A Vault File That Fails Closed: Encrypt-then-MAC on an MCU

How a hardware password manager authenticates every vault file before it decrypts: encrypt-then-MAC, verify-before-decrypt, and fail-closed reads.

Cover image for A Vault File That Fails Closed: Encrypt-then-MAC on an MCU

A password vault has one job at rest: hand back exactly the bytes you stored, or refuse. Not “probably,” not “best effort” — a vault that returns almost your password, or that behaves differently when a file has been tampered with, is worse than no vault at all. On a microcontroller, where the encrypted files live on a flash chip an attacker can desolder and read, “refuse” has to be the default for everything that isn’t provably authentic.

This post is about the file format I use in a hardware password manager I’m building (not yet released) — an ESP32-class device that stores credentials on its internal flash. The whole design rests on one rule that is surprisingly easy to get wrong: never decrypt a byte you haven’t authenticated first. Get the order wrong and you’ve built a padding oracle. Get it right and the vault fails closed.

TL;DR — What this format does

  • Every vault file is an encrypt-then-MAC envelope: [ver][iv][hmacTag][cipher]. The tag authenticates a context prefixver ‖ recordType ‖ slotId ‖ generation — ahead of the IV and ciphertext.
  • On read, the tag is recomputed and compared in constant time before any decryption — verify-before-decrypt.
  • Binding slot, type, and a freshness counter makes a relocated, retyped, or rolled-back file fail the MAC — even against an attacker with raw flash access but no PIN.
  • The freshness counters live in meta.bin, which is itself authenticated; mutations are crash-safe (stage → commit → promote).
  • Encryption and authentication use separate keys; every reject path produces no plaintext and wipes its scratch.
Quick glossary (if crypto isn't your background)
  • Plaintext · ciphertext: the readable data (your password) and its scrambled, unreadable form once encrypted.
  • Encrypt · key: scramble data with a key (a secret) so only someone holding the key can reverse it. Here the key comes from your PIN.
  • Block cipher (AES) · padding: AES encrypts in fixed 16-byte chunks; padding is the filler bytes that top up the last chunk.
  • MAC · HMAC · tag: a tamper-evident seal. HMAC uses a key to compute a tag (a short fingerprint); change a single byte and the tag stops matching.
  • IV (initialization vector): random bytes, a fresh one per write, so encrypting the same data twice gives different results.
  • KDF · PBKDF2: a function that “stretches” a weak secret (a PIN) into a key, deliberately slow to frustrate guessers.
  • Offline brute force: trying every combination on your own hardware, with a stolen copy of the data and no attempt limit.
  • slot: each numbered place where the vault stores one credential.

Decrypt-first is a footgun

First, the threat. This is a device you can hold in your hand, and the credentials live on a flash chip soldered to the board. An attacker who steals it isn’t limited to typing PINs at the screen: they can desolder the flash and read it on a bench programmer, or dump it over a debug port, and walk away with the raw ciphertext. From then on the attack is offline — no lockout, no rate limit, all the time in the world. The threat model for the firmware treats raw flash readout as a real possibility (mitigated in production by the ESP32’s own flash encryption), so the file format has to assume the attacker is holding the exact bytes it wrote and is free to modify them and feed them back.

That’s what makes decrypt-first dangerous. The intuitive design is the wrong one: you have ciphertext, you want plaintext, so you decrypt — and only then, maybe, you check whether the result “looks right.” To see why that order is fatal, you have to look at what happens inside.

Start with the basics: AES is a block cipher — it only knows how to work on fixed-size chunks, 16 bytes each. Since your data almost never lands on an exact multiple of 16, the last chunk is topped up with filler bytes — the padding — until it fills the block. PKCS#7 does this by a simple rule: if 5 bytes are missing, append five 0x05 bytes; if 11 are missing, eleven 0x0B. When decrypting, the very last thing the algorithm does is look at that padding and check it’s well-formed.

Now the part that makes it dangerous. AES in CBC mode chains the blocks: each ciphertext block is mixed (with an XOR) into the next one’s decryption. Think of a row of dominoes — nudge one and you move its neighbor. That hands the attacker a lever: tamper with one ciphertext block and you control, byte by byte, what comes out of the block after it — including that final padding the decryptor is about to inspect.

And here’s the trap: whether the padding is valid is something the system gives away. If the attacker can tell a “bad padding” failure from any other — a different error message, a log line, or simply a reply that takes a hair longer — they’ve got an oracle: a box they can ask one yes/no question per try, and it always answers truthfully. It’s like a chess opponent who, without meaning to, winces whenever your move helps them: that tell, repeated, is enough to reconstruct the game. With that single “is the padding valid?” bit, asked enough times, they peel the plaintext apart one byte at a time, without ever holding the key:

That is the padding oracle attack, first described by Serge Vaudenay in 2002, and it has been breaking real protocols for two decades: Lucky Thirteen against TLS in 2013, POODLE against SSL 3.0 in 2014.

Moxie Marlinspike distilled the lesson into The Cryptographic Doom Principle: “if you have to perform any cryptographic operation before verifying the MAC on a message you’ve received, it will somehow inevitably lead to doom.” Decryption is a cryptographic operation. So the MAC check has to come first.

Three ways to combine a cipher and a MAC

Encrypting solves confidentiality — nobody can read the secret — but not integrity — knowing nobody tampered with it along the way. That’s the MAC’s job. Think of it as the wax seal on an old letter: the sender stamps the seal with a signet only they hold (the key), and the recipient checks it’s still intact before opening; if someone intercepted and rewrote the letter, the seal no longer matches and it’s discarded unread. HMAC is that seal in cryptographic form: with a key it produces a short tag from the message, and changing a single byte invalidates it.

Authenticated encryption therefore needs both pieces — a cipher (for confidentiality) and a MAC (for integrity) — but there are exactly three ways to bolt them together, and they are not equally safe. The canonical analysis is Bellare and Namprempre’s Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm (J. Cryptology, 2008), reinforced for secure channels by Hugo Krawczyk’s The Order of Encryption and Authentication for Protecting Communications.

The three generic compositions
CompositionWhat it doesNotably used byVerdict
Encrypt-and-MAC (E&M)MAC the plaintext, encrypt the plaintext, send bothSSHNot generically secure — the MAC can leak plaintext, and you must decrypt to verify
MAC-then-Encrypt (MtE)MAC the plaintext, then encrypt plaintext+MAC togetherTLS (CBC ciphersuites)You must decrypt before you can check — the doom path; source of Lucky Thirteen
Encrypt-then-MAC (EtM)Encrypt the plaintext, then MAC the ciphertextIPsecGenerically secure. The MAC gate-keeps the ciphertext, so you verify without decrypting

Encrypt-then-MAC is the only one where you can authenticate the message without touching the cipher — the tag is computed over the ciphertext, so checking it requires no decryption. That property is exactly what kills the padding oracle: a tampered file never reaches the decrypt path. So that is what the vault uses.

Put the two orderings side by side and the doom is obvious. The composition isn’t an academic taxonomy — it’s a decision about which step runs first when an attacker hands you bytes:

The envelope: [ver][iv][hmacTag][cipher]

Every encrypted file in the vault — each credential, each TOTP secret, the listing index — is one self-contained envelope. Picture a postal envelope: the outside carries, in the clear, just enough to handle and verify it (the version, the IV, and the tag that acts as the seal), and the encrypted letter rides inside. They all share the same fixed prefix:

The on-disk layout above is the whole file — but the tag authenticates a little more than the file holds. Ahead of the IV and ciphertext, the firmware prepends a 7-byte authenticated context prefix and HMACs the lot (over SHA-256) with the file’s MAC sub-key:

CPP vault_envelope.h — the authenticated message
// authPrefix = ver(1) ‖ recordType(1) ‖ slotId(1) ‖ generation(4, LE)
// authMsg    = authPrefix ‖ iv ‖ cipher
std::array<uint8_t, kEnvelopeContextSize + kIvSize + MaxCipher> message{};
size_t n = writeEnvelopeContext(message.data(), type, slot, generation); // 7 B
std::memcpy(message.data() + n, iv, kIvSize);        n += kIvSize;
std::memcpy(message.data() + n, cipher, cipherLen);  n += cipherLen;

VaultCrypto::hmacSha256(macKey, macKeyLen, message.data(), n, tagOut.data());
VaultCrypto::secureWipe(message.data(), message.size());

That context prefix is never written to the file — the on-disk version byte stays 0x01 and the layout doesn’t change a single byte. recordType and slotId come from where the file lives (which path, which slot); generation comes from the authenticated meta.bin. So the tag proves that this exact ciphertext, in this exact slot, of this exact type, at this exact revision belongs here, untouched — not just that someone who knew the key produced some ciphertext. Why those three extra fields earn their keep is its own section; first, the basics of writing and reading one.

The IV (initialization vector) is the 16 random bytes that seed CBC’s chaining for the first block; without it, two records with the same first block would decrypt to the same plaintext, leaking equality. The vault draws a fresh IV from the ESP32 hardware RNG on every write, so saving the same password twice produces two completely different ciphertexts — an attacker reading flash can’t even tell that two slots hold the same value. The IV isn’t secret (it’s stored in the clear, right there in the envelope), but it must be unpredictable and unique per write, and authenticating it under the tag stops anyone from quietly substituting one.

Writing a record: encrypt, then MAC

The write path is encrypt-then-MAC in the literal order of its name. The plaintext is first serialized to a packed binary record (more on that later), then encrypted with AES-256-CBC under the encryption sub-key, then the tag is computed over the ciphertext under the MAC sub-key, and finally the four parts are written out:

CPP save() — the encrypt-then-MAC write path (trimmed)
// Encrypt-then-MAC: encrypt the record, then authenticate the ciphertext.
VaultCrypto::encrypt(keys.enc.data(), plainBuf.data(), plainLen,
                     iv.data(), cipherBuf.data(), &cipherLen);
computeTag(keys, id, generation, iv.data(), cipherBuf.data(), cipherLen, tag);

// Persist [ver][iv][hmacTag][cipher].
const uint8_t ver = kCredFormatVersion;
f.write(&ver, kCredVerSize);
f.write(iv.data(), kIvSize);
f.write(tag.data(), kCredHmacTagSize);
f.write(cipherBuf.data(), cipherLen);

// Every transient buffer is wiped before this function returns.
VaultCrypto::secureWipe(cipherBuf.data(), cipherBuf.size());
VaultCrypto::secureWipe(iv.data(), iv.size());
VaultCrypto::secureWipe(tag.data(), tag.size());

Note the wiping. The plaintext, the keys, and the scratch buffers all hold secret material, and on a device that can be powered down and probed, leaving them in RAM is a liability. Every buffer is cleared with a zeroize primitive that the compiler is not allowed to optimize away — mbedtls_platform_zeroize on the device, which exists precisely because a plain memset can be elided as a “dead store.”

Reading a record: verify, then decrypt

This is the section the whole format exists for. When the vault loads a credential, it does five things in a strict order, and any failure at any step returns nothing:

CPP load() — verify-before-decrypt, fail-closed (trimmed)
// Fail-closed: wipe the caller's record up front, so EVERY reject path
// (bad size, version, MAC, decrypt, or decode) leaves no residue behind.
out.wipe();

// 1. Structural checks: plausible size, ciphertext is a whole number of blocks.
if (fileSize < kCredMinFileSize || fileSize > kCredMaxFileSize) return false;
const size_t cipherLen = fileSize - kCredHeaderSize;
if ((cipherLen % kAesBlockSize) != 0)                           return false;

// 2. Read [ver][iv][storedTag][cipher]; reject an unknown format version.
//    ... (reads omitted) ...
if (ver != kCredFormatVersion)                                  return false;

// 3. Verify-before-decrypt: recompute the tag and compare in CONSTANT TIME.
computeTag(keys, id, generation, iv.data(), cipherBuf.data(), cipherLen, expectedTag);
if (!VaultCrypto::constantTimeEqual(expectedTag.data(), storedTag.data(),
                                    kCredHmacTagSize)) {
    // MAC mismatch → tampered or corrupt. Wipe scratch, produce no plaintext.
    return false;
}

// 4. Tag verified — only now is it safe to decrypt.
VaultCrypto::decrypt(keys.enc.data(), iv.data(), cipherBuf.data(),
                     cipherLen, plainBuf.data(), &plainLen);

// 5. Decode the packed plaintext with a strict, fail-closed codec.
return decodeCredential(plainBuf.data(), plainLen, out);

Two details make this safe rather than merely sequential.

First, the constant-time compare. Imagine a combination lock that gave a little click — and took a hair longer to answer — each time you got a digit right: guessing it blind would stop being hopeless, because the lock itself keeps whispering “warmer, warmer.” An ordinary byte comparison leaks exactly that hint. “Constant-time” here means concrete: the comparison takes the same amount of work regardless of the data, so its duration tells an observer nothing about the secret. A naïve memcmp does the opposite — it returns the instant it finds the first differing byte. Feed it a guessed tag and the time it takes to say “no” reveals how many leading bytes you got right: a tag matching the first 3 bytes returns measurably later than one that’s wrong at byte 0. An attacker who can measure that turns tag forgery into a byte-at-a-time search, ~256 tries per byte instead of 2256 for the whole tag. The vault never uses memcmp on secret-dependent data; it uses a branch-free comparison that XOR-accumulates every byte difference into one value and only checks that value at the very end — same number of operations whether the tag matches on byte 0 or byte 31:

CPP What a constant-time compare boils down to
// The vault calls Mbed TLS's mbedtls_ct_memcmp; this is the idea it implements:
bool constantTimeEqual(const uint8_t* a, const uint8_t* b, size_t len) {
    volatile uint8_t diff = 0;            // 'volatile' stops the optimizer
    for (size_t i = 0; i < len; i++)      // always touch EVERY byte
        diff |= a[i] ^ b[i];
    return diff == 0;                     // one check, at the very end
}

The vault doesn’t hand-write that loop — it calls Mbed TLS’s mbedtls_ct_memcmp, the library’s constant-time comparison primitive, which does exactly this. If you’ve never seen why this matters, Coda Hale’s A Lesson In Timing Attacks is the classic five-minute explanation; BearSSL’s constant-time notes go deeper.

Second, fail-closed. The record is wiped before anything is read, so there is no code path — not a bad size, not a bad version, not a MAC mismatch, not a decrypt failure, not a malformed record — that can leave a partially-populated credential in the caller’s buffer. The function either returns true with a complete record or false with nothing.

What the tag really binds: slot, type, and freshness

A tag over ver ‖ iv ‖ cipher proves the ciphertext is genuine — but it says nothing about three things the format also needs to promise: which slot a file belongs to, what kind of record it is, and how recent it is. An attacker with raw flash write (a programmer or the debug port) but no PIN can turn each silence into an attack — and each maps to something mundane: dropping a letter in the wrong mailbox (wrong slot), relabeling a box so it passes for another kind (wrong type), or sliding last month’s receipt back in as if it were today’s (a stale revision):

Three gaps a ciphertext-only tag leaves open
Attack (raw flash write, no PIN)Why it worked
Cross-slot substitution — copy cred_05.bin over cred_03.binBoth sealed with the same key, so the relocated file verified and served slot 5’s secret as slot 3 — one site’s password typed into another’s form
Cross-type confusion — drop a credential file onto a TOTP pathSame key and version byte; only the record decoder’s shape check stood in the way, not an authenticated discriminator
Selective rollback — restore an older, authentic file for one slotNothing recorded how fresh a file should be, so a stale-but-genuine envelope (a password you just rotated away) still verified

The fix binds the missing context into the tag itself. This is exactly the role of associated data in an AEAD scheme — context that must be authenticated but not encrypted — except here it’s folded directly into the encrypt-then-MAC tag rather than passed to a single-pass cipher. The 7-byte prefix is centralized in one place, so every record module — and the re-key path — computes it bit-identically:

CPP vault_envelope.h — writeEnvelopeContext (one centralized prefix)
// ver(1) ‖ recordType(1) ‖ slotId(1) ‖ generation(4, little-endian)
inline size_t writeEnvelopeContext(uint8_t* out, EnvelopeRecordType type,
                                   uint8_t slot, uint32_t generation) {
    out[0] = kEnvelopeVersion;            // 0x01 — unchanged on disk
    out[1] = static_cast<uint8_t>(type);  // Credential / Totp / Index
    out[2] = slot;                        // the slot this file belongs to
    out[3] =  generation        & 0xFF;   // per-slot freshness counter,
    out[4] = (generation >> 8)  & 0xFF;   //   little-endian
    out[5] = (generation >> 16) & 0xFF;
    out[6] = (generation >> 24) & 0xFF;
    return kEnvelopeContextSize;          // 7
}

So a relocated file carries the wrong slotId, a retyped file the wrong recordType, and a rolled-back file an old generation. In each case the tag the firmware recomputes no longer matches the one on disk, so the read fails closed before a single byte is decrypted — the same verify-before-decrypt gate, guarding three more promises:

The root of trust: authenticating meta.bin

Every check so far has leaned on something already being trustworthy — the keys, the freshness counters. That trust has to bottom out somewhere: in a root of trust, the one piece you don’t get to question, because if it were forged everything built on top would inherit the lie. Here that piece is meta.bin.

The generation counters have to live somewhere, and that somewhere has to be tamper-evident — otherwise an attacker would simply lower a counter to make a stale file look current. They live in meta.bin: the small header the device reads first, holding the salts, the PIN verifier, and a per-slot generation table — all sealed under its own HMAC tag.

Because the metadata is authenticated, the unlock sequence becomes a careful ladder: nothing in the file is trusted until the tag checks out, and a wrong PIN derives a different master key — so the metadata tag and the PIN verifier both miss at once.

An old or format-mismatched meta.bin — say a formatVer 0x01 from an earlier version — simply fails to authenticate, and the vault is re-provisioned. There is no migration and no dual-read: one on-flash format, ever. (The device isn’t in production, so there’s no real data to migrate.)

Crash-safe anti-rollback

Selective rollback is a replay attack — the attacker re-presents an authentic-but-stale file — and the standard defense is a freshness value: a monotonic counter the verifier expects to only ever increase, so a replayed older record fails the check. But a freshness counter is only as good as its update story. Every mutation — add, edit, or delete — bumps the slot’s generation, and the record is rewritten under the new value. That creates a torn-write hazard: lose power between writing the record (under generation N+1) and writing meta.bin (which records N+1), and the two disagree. Resolve that wrong and you either brick a valid slot or quietly re-open the rollback you just closed.

The fix is to pick one single, indivisible moment that “counts” — like the signature on a contract: before the pen lifts nothing is binding, after it everything is. The firmware makes the meta.bin swap that moment — written to a temp file, then atomically renamed into place, so a reader always sees the old meta.bin or the new one, never a torn half-write — and recovers deterministically around it:

On the next boot, recoverPendingMutation recomputes the staged record’s tag at the current meta.gen[type][slot]. If it verifies, the commit happened (the crash landed after the rename) → finish the promote. If it doesn’t, the commit never happened → discard the orphan and keep the canonical record under the old generation. No window bricks a slot or silently re-opens rollback, and each crash window has its own native test. Because a delete bumps the generation too, a deleted entry can’t be resurrected by replaying its old file.

The honest boundary: open vs _secure

Binding generation closes selective rollback — reverting one record while the rest of the vault moves on. It does not, on the open (developer-flashable) builds, close a full snapshot rollback: revert the entire vault — meta.bin and every record together — to an earlier, internally consistent state, and the counters revert with it, so nothing looks out of place. Catching that needs anti-rollback state living outside the rewritable flash, which is exactly what the hardened _secure builds add.

Which rollback each build catches
Rollback variantOpen builds_secure builds
Selective — one record reverted, meta.bin moves onClosed — fails the generation-bound MACClosed
Full snapshotmeta.bin + every record reverted togetherNot closed — the whole set is self-consistentClosed — Secure Boot + Flash Encryption + NVS anti-rollback

I’d rather state that boundary plainly than oversell it. The generation counter is defense-in-depth for the non-secure builds against the far more practical selective attack; the whole-vault time-warp is a hardware problem, solved in hardware on the builds that opt into burning eFuses.

Two keys from one master: key separation

The envelope uses two different keys — one to encrypt, one to MAC — and that is deliberate (the same instinct that says your house key, car key, and mailbox key should be three different keys). The security argument for encrypt-then-MAC assumes the encryption and authentication keys are independent; reusing one key for both jobs voids the guarantee and invites cross-protocol mischief. So the vault derives separate sub-keys from a single master key, each tagged with a distinct domain-separation label, following the long-standing key-separation guidanceNIST SP 800-57 Part 1 puts it plainly: “a single key should be used for only one purpose”:

CPP vault_keys — distinct labels per purpose
// One PBKDF2-derived master key fans out into three single-purpose sub-keys.
//   encKey      = HMAC(masterKey, "vault-enc")              -> AES-256-CBC key
//   macKey      = HMAC(masterKey, "vault-mac" || hmacSalt)  -> file HMAC key
//   pinVerifier = HMAC(masterKey, "vault-pin")              -> unlock check
VaultCrypto::hmacSha256(masterKey.data(), masterKey.size(),
                        kEncKeyLabel, kEncLabelLen, out.enc.data());
deriveMacKey(masterKey, hmacSalt, out.mac);  // label || per-vault hmacSalt

The third sub-key, the PIN verifier, is how the device checks the PIN without storing it: unlock derives the master key from the entered PIN, computes the verifier, and compares it — again in constant time — against the stored value. A wrong PIN simply produces a different verifier, with no hint about how wrong it was. (The verifier sits in the meta.bin header alongside the salts; the PIN itself is never written anywhere.)

The point of the split is that each key can do exactly one thing. If the same key both encrypted files and authenticated them, a clever attacker could try to make ciphertext and tags interact — and the clean security proof for encrypt-then-MAC, which assumes the two keys are independent, would simply no longer apply. Keeping them separate means a compromise or misuse of one capability can’t borrow another:

Defense in depth: constant-time padding + a fail-closed codec

Because the MAC is verified first, a tampered file never reaches the decryption code, which means the classic padding oracle is unreachable by construction. The padding check below is therefore defense in depth, not the primary defense — belt and suspenders for the case where authenticated-but-corrupt data still needs to be rejected cleanly. It validates PKCS#7 padding in constant time and reports a single unified failure, so it leaks nothing through logs or timing regardless:

CPP Constant-time PKCS#7 padding check, unified failure
// The last byte claims the pad length; fold every check into one mask.
uint8_t padByte    = plainOut[cipherLen - 1];
uint8_t padInvalid = 0;
padInvalid |= (padByte == 0);                  // pad length 0 is illegal
padInvalid |= (padByte > kAesBlockSize);       // pad length > block is illegal
for (size_t i = 0; i < kAesBlockSize; i++) {
    uint8_t mask = (i < padByte) ? 0xFF : 0x00;     // examine the whole block
    padInvalid |= (plainOut[cipherLen - 1 - i] ^ padByte) & mask;
}
if (padInvalid != 0) {           // one message for corruption AND bad padding
    LOG_ERROR(kTag, "Decryption failed");
    mbedtls_platform_zeroize(plainOut, cipherLen);
    return false;
}

The decrypted plaintext is then handed to an in-house binary codec, not a general-purpose parser. The decrypted bytes are the most sensitive range in the whole product, and I don’t want a JSON or CBOR library — with its surface area, its allocations, its surprises — anywhere near them. The decoder is a forward-only cursor where every read is bounds-checked before it copies, every string length is rejected if it exceeds the field’s compile-time cap, and any trailing garbage fails the record:

CPP record_codec — a bounds-checked, fail-closed reader
// need(n): are n more bytes available, without integer-overflow wraparound?
bool need(size_t n) const noexcept { return off_ + n <= len_ && off_ + n >= off_; }

// Read a length-prefixed string: reject len > field-capacity BEFORE copying.
template <size_t N>
bool readStrBody(InplaceString<N>& dst, size_t len) noexcept {
    if (len > N || !need(len)) return false;   // fail closed, no partial copy
    std::memcpy(dst.data(), p_ + off_, len);
    dst.data()[len] = '\0';                     // NUL-terminate the destination
    dst.resyncLength();
    off_ += len;
    return true;
}

Why CBC+HMAC and not AES-GCM?

A fair question. The modern default for authenticated encryption is an AEAD construction like AES-GCM, which fuses encryption and authentication into one primitive and is harder to assemble incorrectly. If I were starting a greenfield protocol, GCM (or ChaCha20-Poly1305) would be the obvious pick.

The vault uses encrypt-then-MAC with AES-256-CBC and HMAC-SHA256 for a specific, boring reason: backend uniformity. The same code has to produce byte-identical output in two very different environments — a host build (where the unit tests run, against the legacy Mbed TLS API) and the device (where Mbed TLS 4.0’s PSA Crypto API is the only public path). CBC, HMAC, and a hand-verified compare are trivially available and deterministic across both; leaning on GCM’s nonce-management rules across two backends adds a sharper footgun than the one I’m avoiding. Encrypt-then-MAC is the generically secure composition, so building AEAD out of CBC+HMAC this way is sound — it’s how IPsec (in its standard configuration) and KDBX 4 do it too.

AES-GCM (AEAD)

One primitive, one key. Encryption and authentication are fused; less to wire up by hand.

  • Modern default; misuse-resistant if nonces are unique
  • Nonce reuse is catastrophic (key recovery)
  • Backend and nonce rules must match exactly across host and device
AES-CBC + HMAC (EtM)

Two primitives, two keys, explicit order. More moving parts, but each is simple and deterministic.

  • Byte-identical on legacy Mbed TLS (host) and PSA (device)
  • EtM is provably the safe composition
  • A fresh random IV per write, authenticated by the tag

One honest footnote on the implementation: PSA’s transition guide suggests psa_mac_verify over “compute a tag, then mbedtls_ct_memcmp.” The vault deliberately keeps the compute-and-compare form because it runs identically on both backends — and the comparison is still constant-time. It’s a trade of one PSA convenience for backend uniformity, made with eyes open.

Listing without decrypting everything: the O(1) index

There’s a nice side effect of treating every file as an independent envelope (the “O(1)” in the heading is just jargon for “the cost stays flat no matter how many credentials you have”). Listing the vault doesn’t mean decrypting every credential — that would be slow and would needlessly expose every secret in RAM. Instead a single encrypted index.bin, in the exact same [ver][iv][hmacTag][cipher] envelope, caches only the non-secret metadata each credential needs in a list view: its id, name, username, brand, and a couple of flags — never the password, URL, notes, or TOTP secret.

Fail closed, by construction

The cipher isn’t what makes this vault trustworthy at rest — the order of operations is. Put plainly: never trust a file until its seal checks out, and when in doubt, hand back nothing. Authenticate the slot, type, freshness, IV, and ciphertext with a separate key; verify that tag in constant time before a single byte is decrypted; authenticate the metadata that holds the freshness counters; wipe everything on the way out; and let every error path converge on the same answer: nothing. Get those right and AES-256 is almost an implementation detail.

Encrypt-then-MAC, context-bound, verify-before-decrypt, fail-closed

Binding slot, type, and a freshness counter into the tag — and authenticating the metadata that holds those counters — removes the padding-oracle class and defeats relocation, type-confusion, and selective rollback before any decryption. The residual gaps are by design: full-snapshot rollback (closed only on _secure builds) and the low entropy of a short PIN (the next post).

There’s one thing this format doesn’t solve, though. All of it protects the file — but the encryption key still ultimately comes from a short PIN, and a short PIN has very little entropy. On the device that’s fine: a brute-force guard locks out and eventually wipes the vault after a handful of wrong PINs, so online guessing dies quickly. The problem is offline. If someone desolders the flash and copies the ciphertext, the lockout never runs — they can derive keys from PIN guesses on a GPU, with no device in the loop to stop them — and a 4-digit keyspace falls in a fraction of a second. (Production flash encryption raises that bar, but the file format shouldn’t have to depend on it.) Closing that gap needs a different trick: one that mixes a secret only the original chip can produce into the key, so the exfiltrated ciphertext is useless on any other hardware. That’s the next post.

Frequently asked questions

What is encrypt-then-MAC?

Encrypt-then-MAC authenticates the ciphertext before any decryption. Each vault file is an envelope — [ver][iv][hmacTag][cipher] — and the HMAC tag is verified in constant time before AES runs, so a tampered file is rejected without ever being decrypted.

Why verify the MAC before decrypting?

Decrypting first exposes a padding oracle: the decrypt step inspects PKCS#7 padding and leaks timing or error differences an attacker can exploit offline. Verifying the tag first means no attacker-controlled bytes ever reach the cipher — the vault fails closed.

How does the format stop a rolled-back or relocated vault file?

The HMAC authenticates a context prefix — ver ‖ recordType ‖ slotId ‖ generation — so a file moved to another slot, retyped, or replaced with an older generation fails the tag, even against an attacker with raw flash access and no PIN.

Does encrypt-then-MAC need separate keys?

Yes. Encryption and authentication use independent keys derived from the PIN. Reusing a single key for both the cipher and the MAC weakens the construction and should be avoided.