Zero-Day Discovery: Heap Overflow to Root in a Mission-Critical Communication Platform

During an authorized penetration test of enterprise IP communication infrastructure, our team uncovered a previously unknown critical vulnerability. A single unauthenticated WebSocket message — 5 KB of malformed JSON — permanently kills the management plane of the target platform. Under the right heap layout conditions, the same input achieves remote code execution as uid=0(root). This is the complete technical breakdown.

All vendor, product, and infrastructure details have been withheld pending responsible disclosure. The vulnerability is a heap buffer overflow (CVE pending) scoring CVSS 9.8 (AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H). No authentication required. No user interaction required.

Where Is the Vulnerability?

Before the technical walkthrough, the question of where this vulnerability lives has a two-part answer, and both parts matter for remediation.

The vulnerable code is in libjansson — a widely-used open source C JSON parser. Its parse_value() function is recursive and allocates 80 bytes of stack per nesting level. jansson has a built-in depth limit of 2,048 levels, but that limit is unreachable on the affected platform's coroutine stack configuration. The stack overflow occurs inside jansson.

The exploitable condition is in the target broker's implementation. WebSocket sessions run on cooperative green threads (coroutines) whose stacks are allocated via malloc() — plain heap memory with no guard pages. When jansson recurses past the stack boundary, there is no hardware fault. Stack frames write silently into the adjacent heap object. That converts a recoverable crash into an exploitable heap corruption primitive.

Fix either component and the vulnerability is neutralised. Fix both and the defence is robust.

The Target: An Enterprise WebSocket Broker on Port 443

The engagement covered an enterprise IP communication platform deployed in environments where availability is a life-safety concern — the kind of system used in hospitals, airports, and industrial facilities for intercommunication, emergency broadcast, and operational management. The platform's real-time messaging layer was exposed over HTTPS and WebSocket on port 443. Clients connect, complete a subprotocol handshake, and exchange structured JSON messages — authentication, subscriptions, remote procedure calls, and events.

The critical configuration flaw: the nginx reverse proxy had no authentication check on the messaging endpoint. Any host on the network — unauthenticated, no session token, no prior handshake — could open a WebSocket connection and begin sending messages directly to the broker.

Discovery: Protocol Fuzzing and an Anomalous Crash

After mapping normal protocol flow, we ran a fuzzer generating 60+ concurrent connections with malformed payloads: truncated JSON, binary content in text frames, oversized string values, structurally malformed arrays. Most payloads generated protocol errors or were silently discarded. One produced a different result.

When the fuzzer sent a text frame consisting entirely of opening bracket characters repeated approximately 5,000 times, the broker process crashed — hard. Not a protocol error. Not a TCP reset. The broker exited without recovery. The container restart policy exhausted all five restart attempts and the service went permanently offline. Physical intervention was required to restore the management plane.

A 5 KB payload, requiring zero credentials, had permanently killed an enterprise communications system. That was the signal to dig deeper.

Crash Isolation: Finding the Minimum Threshold

We built an automated crash/recovery harness using a separate administrative interface to restart the broker between payloads. Minimum crash threshold was determined by binary search:

5,000 brackets → crash (100%)
2,049 brackets → crash (100%)
2,000 brackets → no crash
1,200 brackets → crash on 96 KB coroutine stack

The threshold was not 2,048 — a number that would have pointed immediately to a parser depth limit. It was stack-size dependent. That discrepancy was the first concrete indicator of the real root cause: a mismatch between two independently configured library parameters.

Root Cause: Two Correct Defaults That Combine Catastrophically

The broker handled each WebSocket session on a Boost.Coroutine fiber — a cooperative green thread from the Boost C++ libraries. Boost.Coroutine allocates stack memory for each fiber at creation time. The broker used the standard stack allocator, which calls malloc() to obtain stack memory. Not mmap(). Not mmap() with mprotect(PROT_NONE) for a guard page. Plain heap allocation. No guard pages.

The stack size was the Boost.Coroutine default:

minimum_size() = 0x3000 = 12,288 bytes
default_size() = minimum_size × 8 = 98,304 bytes (~96 KB)

ARM64 instruction in libboost_coroutine.so:
  lsl x19, x0, #3    ; left-shift by 3 = multiply by 8

JSON parsing was handled by libjansson 2.14.1. jansson's parse_value() is a recursive descent parser — every nesting level in the input JSON adds one 80-byte stack frame:

0x4b2c:  stp x29, x30, [sp, #-80]!    ; 80-byte frame: saves FP + LR

jansson has a built-in depth guard that aborts at 2,048 levels:

0x4b50:  add x0, x0, #0x1             ; depth++
0x4b58:  cmp x0, #0x800               ; compare to 2048
0x4b5c:  b.ls 0x4b98                  ; branch if safe
; This branch is NEVER REACHED on the 96 KB coroutine stack

The arithmetic makes the problem clear:

96,304 bytes available on coroutine stack
÷ 80 bytes per parse_value() recursion
= 1,203 maximum safe recursion levels

jansson depth guard fires at: 2,048
Stack overflows at:           ~1,203

The guard is unreachable. The stack overflows first.

The broker's message decoder called json_loadb(data, len, 0, &err) with flags=0, meaning jansson applied its built-in 2,048-level default. No custom depth limit was configured at the call site.

Binary Analysis: ARM64 Disassembly of the Overflow

; libjansson.so.4.14.1 -- parse_value() entry point
; ELF 64-bit LSB shared object, ARM AArch64, musl-linked, stripped

0x4b2c: a9bb7bfd  stp  x29, x30, [sp, #-80]!  ; ALLOCATE 80-byte frame
0x4b30: 910003fd  mov  x29, sp
0x4b34: a90153f3  stp  x19, x20, [sp, #16]     ; x19 = lex struct ptr (saved each frame)
0x4b48: f9402c00  ldr  x0, [x0, #88]           ; load lex->depth counter
0x4b50: 91000400  add  x0, x0, #0x1            ; depth++
0x4b54: f9002e60  str  x0, [x19, #88]          ; write back
0x4b58: f120001f  cmp  x0, #0x800              ; check against 2048
0x4b5c: 540001e9  b.ls 0x4b98                  ; DEAD on 96KB coroutine stack

; Recursive call sites (each frame: 80 bytes):
;   0x4d70  object key-value parsing
;   0x4e28  array element parsing
;   0x4f0c  top-level dispatch

Two register values saved at every frame are deterministic across ASLR rerandomisations — they are fixed intra-library offsets, unaffected by base-address randomisation:

x30 (return address): fixed at the same offset within the JSON library on every run, every ASLR rerandomisation. This is the value that overwrites an adjacent heap pointer.
x19 (internal parser pointer): also fixed — controls subsequent vtable dereference path.

Exploitation Chain: What Happens Without Guard Pages

On a properly hardened coroutine implementation, each fiber stack is allocated via mmap() with a PROT_NONE guard page below the usable region. Stack overflow → hardware fault → clean crash. No heap corruption. Not exploitable. This is the correct behaviour.

The target uses malloc(). No guard pages. When parse_value() recurses past the stack boundary, stack frames write silently into the adjacent heap object. This converts an uncontrolled crash into a controlled heap corruption primitive. The exploitation chain:

Heap grooming: Issue requests causing the allocator to place a target object — one containing a function pointer — immediately adjacent to a newly allocated coroutine stack.
Overflow trigger: Send a WebSocket frame of 1,200+ opening bracket characters. The broker passes this to the JSON parser. parse_value() recurses 1,200 times, consuming the 96 KB stack.
Function pointer overwrite: Saved x30 from the deepest parse_value() frame overwrites the adjacent object's callback pointer. Value is deterministic across 50,000 independent runs.
Control flow hijack: Overwritten callback is invoked → execution redirected via ROP chain to system@plt.
Root execution: system() executes attacker-supplied command as uid=0(root).

Emulation Proof: ARM64 Docker with Production Firmware Binaries

We extracted the broker and its library dependencies directly from device firmware and ran them in an ARM64 Docker container (Alpine 3.19, aarch64, musl libc) via QEMU — the exact production binaries, not recompiled versions.

[*] Sending 1,200-bracket nested JSON payload...
[*] parse_value() hits stack boundary at recursion ~1,183
[+] Adjacent heap callback OVERWRITTEN: 0x71a3b5810e44 -> 0x710064657443
[+] Redirecting to system@plt
[+] system("id && whoami") called

uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon)...
EXPLOIT_SUCCESS

A second payload piped 8,076 bytes of /etc/passwd, /etc/shadow, and network configuration to a listener on the attacker machine. Callback overwrite success rate: 80%. Code execution on successful overwrite: 100%. All runs: uid=0(root).

Statistical Reliability: 150,000-Run Probability Mapping

150,000 automated exploit attempts across three depth configurations (50,000 runs each at depths 1,200, 1,250, and 1,300):

112 deterministic slots: identical overwrite value across all 50,000 runs. These contain saved JSON parser return addresses — fixed intra-library offsets, invariant to ASLR base randomisation.
80 variable slots: lower 12 bits consistent across runs (page-offset bits not randomised by ASLR). Candidates for partial-overwrite techniques.
Key observation: No ASLR bypass is needed for the overwrite value itself. ASLR bypass is only needed to determine the redirect target (the address of a suitable gadget or system@plt).

Binary Mitigations: Present but Insufficient

The affected binaries had a strong hardening profile. The vulnerability bypasses every protection except the one that was absent:

PIE + ASLR: Present. Overwrite value is an intra-library offset — constant regardless of base address randomisation. Does not prevent the attack.
NX (non-executable stack/heap): Present. Mitigated by targeting system@plt rather than injecting shellcode. Does not prevent the attack.
Full RELRO: Present. GOT is read-only. Mitigated by targeting heap-resident function pointers, not GOT entries. Does not prevent the attack.
Stack canaries: Present — on the main stack only. malloc-allocated coroutine stacks have no canaries. The overflow never touches one. Completely irrelevant to this attack.
Guard pages: Absent. This is the decisive missing protection. malloc instead of mmap. No OS-enforced boundary between coroutine stack and adjacent heap objects.

The stack canary point deserves emphasis: a hardening report showing "stack canary: YES" creates a false sense of coverage. That protection only applies to the main stack. Coroutine stacks, green threads, and fibers allocated on the heap require guard pages via mmap()+mprotect() or explicit canary logic to be protected. Neither was present.

The Production Gap

The production deployment used a 512 KB coroutine stack rather than the 96 KB development default. At this size, jansson's 2,048-level depth guard fires before the main overflow path activates (512KB / 80 = 6,553 levels; guard at 2,048). However, the error unwinding path triggered by the depth guard routes through additional parsing code with a 256-byte stack frame and no depth checking of its own. Under specific error-path conditions the overflow remains triggerable. Production DoS was confirmed. Full production RCE requires physical access to calibrate heap grooming for 512 KB-aligned allocations and an information leak to anchor the redirect target.

Post-Exploitation: Why Root Here Matters

The broker ran as uid=0(root) with a shared volume from the host containing the platform's entire cryptographic trust material: master encryption key, CA private key (enabling JWT token forgery for any account on the platform without leaving authentication log entries), database superuser credentials, and all internal service-to-service authentication secrets. RCE in the broker container does not compromise a single service. It compromises the entire trust hierarchy of the platform.

Remediation

Immediate: Authenticate the Messaging Endpoint

Require a valid session token before the WebSocket upgrade completes. No authentication on a messaging broker serving critical infrastructure is an architectural error. This single change makes the vulnerability unexploitable from the network.

Short-Term: Set a JSON Depth Limit

Pass a custom maximum depth (64 levels) to the JSON parser for all incoming messages. 64 × 80 bytes = 5,120 bytes of stack — safe on any coroutine stack larger than 8 KB. jansson supports depth limiting via configuration flags.

Medium-Term: Use Protected Stack Allocation

Replace malloc()-based coroutine stack allocation with mmap() and a PROT_NONE guard page (Boost.Coroutine's protected_fixedsize_stack). Stack overflow becomes a hardware fault — clean crash, no heap corruption, not exploitable.

Long-Term: Pre-Parse Message Complexity

Scan incoming WebSocket messages for nesting depth before passing to the JSON parser. Reject messages exceeding 32 levels — sufficient for all legitimate message structures with margin. Eliminates the entire vulnerability class regardless of library version or stack configuration.

Conclusion

This vulnerability is a precise illustration of how two independently reasonable engineering decisions produce a critical security failure. libjansson's 2,048-level recursion limit is a considered defence against parser abuse. Boost.Coroutine's 96 KB default stack is a reasonable embedded default. Neither decision is obviously wrong. The failure lives in the gap between them — a gap that becomes exploitable because coroutine stacks are allocated on the heap without guard pages.

Stack canaries present on the main stack say nothing about coroutine stack protection. ASLR enabled says nothing about deterministic intra-library offsets at overflow boundaries. Defence in depth requires reviewing the interaction between library defaults across the entire stack — not just per-library hardening checklists. In environments where availability is a life-safety concern, that cross-library review is not optional.

Zero-Day Discovery: Heap Overflow to Root in a Mission-Critical Communication Platform

Where Is the Vulnerability?

The Target: An Enterprise WebSocket Broker on Port 443

Discovery: Protocol Fuzzing and an Anomalous Crash

Crash Isolation: Finding the Minimum Threshold

Root Cause: Two Correct Defaults That Combine Catastrophically

Binary Analysis: ARM64 Disassembly of the Overflow

Exploitation Chain: What Happens Without Guard Pages

Emulation Proof: ARM64 Docker with Production Firmware Binaries

Statistical Reliability: 150,000-Run Probability Mapping

Binary Mitigations: Present but Insufficient

The Production Gap

Post-Exploitation: Why Root Here Matters

Remediation

Immediate: Authenticate the Messaging Endpoint

Short-Term: Set a JSON Depth Limit

Medium-Term: Use Protected Stack Allocation

Long-Term: Pre-Parse Message Complexity

Conclusion

Subscribe to our newsletter

Thank you, check your inbox

Zero-Day Discovery: Heap Overflow to Root in a Mission-Critical Communication Platform

Where Is the Vulnerability?

The Target: An Enterprise WebSocket Broker on Port 443

Discovery: Protocol Fuzzing and an Anomalous Crash

Crash Isolation: Finding the Minimum Threshold

Root Cause: Two Correct Defaults That Combine Catastrophically

Binary Analysis: ARM64 Disassembly of the Overflow

Exploitation Chain: What Happens Without Guard Pages

Emulation Proof: ARM64 Docker with Production Firmware Binaries

Statistical Reliability: 150,000-Run Probability Mapping

Binary Mitigations: Present but Insufficient

The Production Gap

Post-Exploitation: Why Root Here Matters

Remediation

Immediate: Authenticate the Messaging Endpoint

Short-Term: Set a JSON Depth Limit

Medium-Term: Use Protected Stack Allocation

Long-Term: Pre-Parse Message Complexity

Conclusion

Related Articles

From Empty Floor to GPU Autoscaling: Building a Modern Datacenter

Defending at Computer Speed: Why the AI Era Makes 24/7 SOC Non-Negotiable

Digital Experience Monitoring: Measuring What Users Actually Feel

Want more insights?

Subscribe to our newsletter

Thank you, check your inbox