$title =

How I Debugged the Wrong Driver and Still Found the Real Bug

;

$content = [

I didn’t set out to fix anything.

I just kept pulling on a thread until the sweater confessed.

This is a story about debugging, hubris, community correction, and a Wi-Fi driver that answered a serious systems question with vibes.


Act 1: Three Days of Correct Debugging, Wrong Hardware

For three days, I debugged a Linux Wi-Fi driver with focus, discipline, and total confidence.

Unfortunately, I was debugging the wrong hardware.

I bought a D-Link WiFi adapter for pentest work. USB ID 2001:3321. I looked it up, found it listed as the DWA-182 with an RTL8821AU chipset, and got to work. Like mistaking a raccoon for a cat because both have opinions.

The 8821au driver loaded. An interface appeared. Things looked promising.

Then nothing worked.

RTW: EFUSE is empty efuse_Addr-0 efuse_data=ff
RTW: EEPROM ID(0x0) is invalid!!
RTW: VID = 0x5678, PID = 0x1234
RTW: ERROR invalid mac addr:00:00:00:00:00:00
RTW: rtl8812au_hal_init in 20ms
RTW: ERROR rtw_hal_init: fail

EFUSE reads returned garbage. The chip reported a VID/PID of 0x5678/0x1234 — clearly wrong. MAC address: all zeros. Registers responding, but not meaningfully.

I was running in a VM with USB passthrough. The obvious conclusion: timing issues. The EFUSE controller’s ready bit never set. VM USB latency was breaking the chip’s internal read cycle.

So I fixed it. Extended timeouts. Added delays. Forced EFUSE access enable. Increased array bounds for the 8-endpoint device. Fixed autopm reference counting. Added debug instrumentation to every init stage.

Three sessions. Patches totaling 200+ lines. Thorough analysis of every failure point:

Init Stage Result
rtw_hal_power_on SUCCESS
InitLLTTable8812A FAIL
FirmwareDownload8812 FAIL
PHY_MACConfig8812 SUCCESS
PHY_BBConfig8812 SUCCESS
PHY_RFConfig8812 FAIL

The methodology was sound. The patches were valid. The analysis was thorough.

The hardware identification was wrong.

I was speaking the wrong dialect to the chip, and it was responding in fluent nonsense.


Act 2: The Community Catches It

At some point, a GitHub reviewer did the digital equivalent of gently putting a hand on my shoulder and saying:

> “Hey buddy… that’s not the chip you think it is.”

I walked over to the adapter sitting on my desk. Flipped it over. Read the label I’d never bothered to check:

M/N DWA-X1850A1

Three days of debugging. Wrong chip. Wrong driver. Wrong register map. Wrong firmware.

  • Model: DWA-X1850
  • Chipset: RTL8832AU
  • Driver: rtw89, not rtw88, not rtl8821au, not whatever fever dream I’d been chasing

The “EFUSE timing issues” weren’t timing issues. The chip was returning garbage because I was sending RTL8821AU commands to RTL8832AU silicon. The “VM USB passthrough limitation” was the driver trying to initialize memory that didn’t exist in the layout it expected.

Everything I observed was real. Everything I concluded was wrong.

Here’s the important part though:

> The methodology was sound. The patches were valid. The analysis was thorough. The hardware identification was wrong.

That distinction matters. Being wrong isn’t the problem. Being wrong sloppily is.


Act 3: The Real Bug — return 42;

Once the correct driver stack was in place (rtw89 USB), the adapter worked. Monitor mode, packet capture, injection — all functional.

Then I ran hcxdumptool:

errors during runtime (mostly caused by a broken driver)
Packets: 4
Errors: 6

Four packets in 35 seconds. Six driver errors. Something was still wrong.

Buried in the USB TX path was this masterpiece:

return 42; /* TODO some kind of calculation? */

A TODO had been promoted to production, put on a fancy tie, and was lying to mac80211 with a straight face.

This function exists to answer a simple, critical question: “How many packets can you accept right now?”

mac80211 uses that answer to apply TX backpressure. When the driver returns an honest number, mac80211 throttles submission when resources are exhausted. When it returns a hardcoded constant, mac80211 keeps submitting forever.

The USB subsystem was drowning in URBs that completed slower than they arrived.

return 42; is not flow control. It’s vibes-based networking.

The fix: Atomic counters tracking in-flight URBs per TX channel.

// Submit path: increment before submit
atomic_inc(&rtwusb->tx_inflight[txch]);
ret = usb_submit_urb(urb, GFP_ATOMIC);
if (ret)
    atomic_dec(&rtwusb->tx_inflight[txch]); // rollback on failure

// Completion path: decrement on completion
atomic_dec_return(&rtwusb->tx_inflight[txch]);

// Query path: return remaining capacity
inflight = atomic_read(&rtwusb->tx_inflight[txch]);
return RTW89_USB_MAX_TX_URBS_PER_CH - inflight;

Five patches. Accounting correctness. Race condition fixes. Proper backpressure signaling. I stress-tested this until the adapter qualified for emotional damages.

Results:

Driver Duration Packets Errors
Stock 35s 4 6
Patched 45s 840 0

210x improvement. Zero errors. Zero underflow. Zero overflow. Zero kernel panics. Clean teardown under load.

The sweater confessed.


Act 4: Proving It Wasn’t the Hardware

At this point I had a working adapter, but a nagging question: was the D-Link actually good now, or just less broken?

I needed a control group. The Alfa AWUS036AXML happened to be sitting on my desk — different chipset (MT7921U), in-kernel driver, no patching required. If the D-Link still underperformed against a known-working adapter, the fix wasn’t complete.

So I ran them through the same gauntlet.

The Alfa had its own quirks. It occasionally drifted back to channel 10 when capture tools initialized. It has documented incompatibilities with aireplay-ng (mdk4 works fine). And iw reports its TX power as “3 dBm” — a cosmetic kernel bug, not actual transmit power. Every adapter has baggage.

But here’s what mattered: the patched D-Link performed. hcxdumptool ran clean. 5GHz capture worked where the Alfa threw errors. Injection worked with the full aircrack-ng suite.

The D-Link wasn’t bad hardware. It was just being gaslit by its own driver.

Once return 42; stopped lying to mac80211, the adapter did exactly what it was supposed to do.


What I Actually Learned

  1. Read the label. USB VID:PID can be reused across products. Physical verification takes 5 seconds. It’s really, really important to physically look at the hardware.

  2. “Sort of working” is a red flag. Partial functionality usually means wrong driver, not driver bugs. EFUSE returning garbage is the first sanity check.

  3. Community review catches what you miss. Two GitHub reviewers identified my hardware in minutes. I’d stared at it for days.

  4. The real bug was waiting behind the fake one. If I’d given up after the misidentification, I’d never have found the return 42; that was actually worth fixing. Sometimes the bug isn’t a race condition or undefined behavior — sometimes it’s just a TODO that grew legs, got promoted, and quietly wrecked everything while smiling.

  5. Methodology survives premise failure. The debugging approach from Sessions 1-3 wasn’t wasted — it’s exactly how I found and validated the TX flow control fix once I had the right hardware. Being wrong isn’t the problem. Being wrong sloppily is.


The patches: Lucid-Duck/tx-resources-flow-control

The comparison data: Lucid-Duck/wifi-pentest-comparisons


The debugging was real. The fix was real. The first three days just happened to be practice.

];

$date =

;

$category =

;

$author =

;

$previous =

;

$next =

;