adacam/docs/research/bee-ssh-diagnostic-report.md

12 KiB

Truck Bee SSH Tunnel Diagnostic Report

Date: 2026-03-22
Subject: SSH Reverse Tunnel Relay Failure Analysis
Device: Hivemapper Bee Dashcam (Intel Keembay ARM64)
Serial: dashcam-4A928016A02C1046


1. Executive Summary

The SSH reverse tunnel from the Truck Bee to Lucy establishes successfully but fails to relay any data through the tunnel. The SSH banner is never received when connecting via the tunnel, despite the tunnel showing as connected. Local SSH on the Bee works perfectly. This is a data relay failure, not a connection establishment issue.

Primary Root Cause: The Bee's OpenSSH client cannot properly relay TCP data through the -R reverse tunnel. This is likely due to:

  1. Socket-activated sshd interaction with relay
  2. Possible OpenSSH version incompatibility (embedded/minimal build)
  3. Kernel network stack quirks on the Intel Keembay platform

Workaround Available: HTTP agent API already deployed at /data/adacam/agent.py on port 8080.


2. Network Topology

2.1 Bee Network Interfaces

Interface IP Address Role State
wlp1s0f0 192.168.0.10/24 WiFi AP (hostapd) UP
wlp1s0f1 192.168.0.155/24 WiFi Client (zerocool) UNSTABLE
br0 192.168.197.55/28 USB Bridge DOWN
wwan0 (none) LTE Modem DOWN (no SIM)
lo 127.0.0.1/8 Loopback UP

2.2 Home Network

Device IP Role
OPNsense 192.168.0.1 Router/DHCP
Lucy 192.168.0.5 Server (tunnel endpoint)
Bee AP 192.168.0.10 Factory AP subnet
Bee Client 192.168.0.155 DHCP from zerocool

2.3 The Dual 192.168.0.0/24 Problem

Critical Issue: Both WiFi interfaces are on 192.168.0.0/24:

  • wlp1s0f0 (AP): 192.168.0.10/24 - factory default, used for phone config
  • wlp1s0f1 (client): 192.168.0.155/24 - DHCP from home router

This creates asymmetric routing:

  • Outbound to Lucy: via wlp1s0f1 (correct)
  • Return from Lucy: arrives on wlp1s0f1 but kernel may route via wlp1s0f0

Fix: Add host route before tunnel:

ip route add 192.168.0.5/32 dev wlp1s0f1

3. SSH Configuration Analysis

3.1 Socket-Activated SSHD

The Bee uses socket-activated SSH via systemd, not a persistent daemon:

sshd.socket (from session log):

[Socket]
ListenStream=22
Accept=yes

sshd@.service:

[Service]
ExecStart=-/usr/sbin/sshd -i $SSHD_OPTS
StandardInput=socket
KillMode=process

Key Points:

  • Accept=yes = systemd accepts connections and spawns sshd per-connection
  • sshd -i = inetd mode (reads from stdin/socket, not network)
  • No persistent sshd process exists until a connection arrives
  • ListenStream=22 with no IP = binds to 0.0.0.0:22 (all interfaces)

3.2 Why Socket Activation May Cause Relay Issues

In inetd mode, sshd expects:

  1. Socket already connected (passed via systemd)
  2. stdin/stdout wired to the socket
  3. No explicit network listen/accept

When the Bee's SSH client does -R 2222:localhost:22:

  1. Lucy binds 127.0.0.1:2222
  2. Connection arrives at Lucy:2222
  3. Bee's SSH client opens new connection to localhost:22
  4. sshd.socket spawns sshd@.service
  5. New sshd process gets the socket via StandardInput

The question: Does the relay work correctly with socket-activated sshd?

Testing showed: Even a standalone sshd on port 2223 (bypassing socket activation) still failed with banner timeout. This rules out socket activation as the sole cause.


4. Reverse Tunnel Analysis

4.1 Tunnel Configuration

Current bee-tunnel.service:

ssh -i /data/ssh/bee_tunnel_key \
    -R 2222:localhost:22 \
    -N -o StrictHostKeyChecking=no \
    root@192.168.0.5

Lucy's sshd_config:

AllowTcpForwarding no
Match Group root
    AllowTcpForwarding yes
GatewayPorts no
  • GatewayPorts no → tunnel binds to 127.0.0.1:2222 only (correct)
  • Root user has forwarding enabled (correct)

4.2 What We Tested

Test Tunnel Target Result
1 -R 2222:192.168.0.10:22 Banner timeout
2 -R 2222:localhost:22 Banner timeout
3 -R 2222:localhost:2223 (standalone sshd) Banner timeout
4 Local ssh -p 22 root@localhost from Bee WORKS
5 Raw TCP test Lucy→2222 Zero bytes received

4.3 The Smoking Gun

Test 5 is definitive: A raw TCP connection to Lucy:2222 receives zero bytes from the relay. This proves:

  • The tunnel establishes correctly
  • The Bee's SSH client accepts the relay request
  • The relay does not forward ANY data

This is NOT a hairpin routing issue (localhost should work). This is NOT socket activation (standalone sshd also fails). This IS something fundamentally broken in the relay.


5. WiFi Instability

5.1 Symptoms

  • wlp1s0f1 keeps dropping connection
  • Tunnel dies, requires manual restart
  • Static IP 192.168.0.155 assigned (valid_lft forever)
  • Signal drops intermittently

5.2 Cause

  • Truck parked far from house router
  • Wall/distance attenuation
  • Marvell mwifiex driver on embedded platform

5.3 Mitigations

  1. Move truck closer (Cobb's plan when Abby leaves)
  2. Add Restart=always to bee-tunnel.service
  3. Consider WiFi repeater or mesh node in garage
  4. LTE failover when SIM is inserted

6. Potential Root Causes

6.1 OpenSSH Version/Build Issues

Evidence:

  • Bee runs minimal embedded Yocto distro (meta-intel-ese)
  • OpenSSH version unknown but likely stripped/minimal build
  • Embedded builds often disable features to save space

Hypothesis: The Bee's OpenSSH may have a bug or missing feature in reverse tunnel relay code.

6.2 Kernel Network Stack

Platform:

  • Intel Keembay (ARM64)
  • Kernel 5.10.32-intel-standard
  • Custom patches for VPU/AI accelerators

Hypothesis: Intel's kernel modifications may affect TCP socket handling, especially for relayed connections.

6.3 TCP Memory/Buffer Corruption

Hypothesis: Under load (map-ai, depthai_gate, odc-api all running), the system may have memory pressure affecting socket buffers.

Counter-evidence: Same failure even with services stopped (CPU at 16-20%).

6.4 MTU/Fragmentation

Hypothesis: WiFi MTU mismatches could cause packet fragmentation that breaks the relay.

Not yet tested.


7. Fix Proposals (Ranked)

Rank 1: HTTP Agent API (READY NOW)

Status: Already deployed and locally tested
Path: /data/adacam/agent.py
Port: 8080

# On Bee:
python3 /data/adacam/agent.py &
ip route add 192.168.0.5/32 dev wlp1s0f1
ssh -i /data/ssh/bee_tunnel_key -R 2222:localhost:8080 -N root@192.168.0.5 &

# Test from Lucy:
curl -H 'X-Agent-Key: bee-agent-sulkta-2026' http://127.0.0.1:2222/status

Pros:

  • HTTP relay should work (simpler protocol)
  • Already built with /shell, /landmarks endpoints
  • Doesn't depend on SSH banner handshake

Cons:

  • Need to verify HTTP relay works through broken tunnel

Rank 2: Alternative Tunnel Tools

Options:

  • chisel - HTTP-based tunnel, battle-tested
  • bore - Simple TCP relay, Rust-based
  • rathole - High-performance Rust tunnel
# Example with chisel:
# Lucy:
chisel server --port 8080 --reverse

# Bee:
chisel client 192.168.0.5:8080 R:2222:localhost:22

Pros:

  • Avoids OpenSSH relay entirely
  • HTTP-based tunnels more firewall-friendly

Cons:

  • Need to cross-compile for ARM64 or find pre-built binary
  • /opt is read-only, must use /data/

Rank 3: Debug OpenSSH Relay

Steps:

  1. Get OpenSSH version: ssh -V
  2. Strace the relay: strace -f ssh -R 2222:localhost:22 ...
  3. tcpdump the relay traffic
  4. Check /var/log/ for sshd errors

Pros:

  • Fixes root cause

Cons:

  • Time-consuming, may be unfixable without rebuilding OpenSSH

Rank 4: LTE Failover

When SIM inserted:

  • wwan0 gets public IP (or CGNAT IP)
  • Can tunnel over LTE instead of WiFi
  • More stable than WiFi at distance

Cons:

  • Requires SIM card
  • Metered connection
  • CGNAT may complicate things

Rank 5: Physical WiFi Fix

Options:

  • Move truck to Abby's spot (closer to house)
  • Install WiFi repeater in garage
  • Run ethernet to driveway (impractical)

8. Step-by-Step Fix Procedures

8.1 Test HTTP Agent via Tunnel

Prerequisites:

  • Bee WiFi connected (wlp1s0f1 up)
  • Phone SSH session to Bee available as fallback

Steps:

# 1. On Bee (via phone SSH):
cd /data/adacam
python3 agent.py &
echo "Agent started on :8080"

# 2. Add host route:
ip route add 192.168.0.5/32 dev wlp1s0f1

# 3. Start HTTP tunnel:
ssh -i /data/ssh/bee_tunnel_key \
    -R 2222:localhost:8080 \
    -N -o StrictHostKeyChecking=no \
    root@192.168.0.5 &
echo "Tunnel PID: $!"

# 4. On Lucy (via OpenClaw):
# Kill old listeners
fuser -k 2222/tcp 2>/dev/null

# Test the API
curl -s -H 'X-Agent-Key: bee-agent-sulkta-2026' http://127.0.0.1:2222/status

Expected Result:

{"ok": true, "time": 1774201402.7530055}

8.2 Deploy chisel (if HTTP relay also fails)

# 1. On Lucy - download and start server:
wget -O /usr/local/bin/chisel https://github.com/jpillora/chisel/releases/download/v1.9.1/chisel_1.9.1_linux_amd64.gz
gunzip -c chisel_1.9.1_linux_amd64.gz > /usr/local/bin/chisel
chmod +x /usr/local/bin/chisel
chisel server --port 8080 --reverse --auth "sulkta:bee2026"

# 2. On Bee - download ARM64 binary:
cd /data/adacam
wget -O chisel https://github.com/jpillora/chisel/releases/download/v1.9.1/chisel_1.9.1_linux_arm64.gz
gunzip -c chisel_1.9.1_linux_arm64.gz > chisel
chmod +x chisel

# 3. Start chisel client:
./chisel client --auth "sulkta:bee2026" 192.168.0.5:8080 R:2222:localhost:22 &

9. Verification Tests

Test 1: Verify Agent API Locally

# On Bee:
curl -s -H 'X-Agent-Key: bee-agent-sulkta-2026' http://localhost:8080/status
# Expected: {"ok": true, ...}

Test 2: Verify Tunnel Establishment

# On Lucy:
ss -tlnp | grep 2222
# Expected: LISTEN 0 128 127.0.0.1:2222 *:*

# On Bee:
ps aux | grep ssh
# Expected: ssh -R 2222:... process running

Test 3: Verify Data Flow

# On Lucy:
curl -v http://127.0.0.1:2222/status
# Watch for:
# - Connection established
# - Data received (HTTP response)
# - If timeout: relay still broken

Test 4: Raw TCP Test

# On Lucy:
timeout 5 nc 127.0.0.1 2222
# Type some garbage, see if anything comes back
# Or use:
echo "test" | timeout 5 nc 127.0.0.1 2222

Test 5: SSH Debug (if SSH relay ever works)

ssh -vvv -p 2222 root@127.0.0.1
# Watch for:
# - SSH-2.0-OpenSSH_X.X banner
# - Key exchange
# - Authentication

10. Summary & Recommendations

Immediate Action

  1. Move truck closer when Abby leaves (fixes WiFi stability)
  2. Test HTTP agent via tunnel - may work even if SSH relay doesn't
  3. If HTTP works: done, use agent API for all Bee operations

Fallback Plan

  1. If HTTP also fails through tunnel: deploy chisel
  2. chisel uses different relay mechanism, should bypass OpenSSH bug

Long-Term

  1. Update bee-tunnel.service with Restart=always and route pre-command
  2. Consider persistent HTTP tunnel instead of SSH relay
  3. When SIM inserted: configure LTE failover tunnel

Data Recovery

  • Detection files are in /data/recording/landmarks/ (1.7MB)
  • Use HTTP agent /landmarks endpoint to retrieve
  • Or direct copy via phone → home WiFi → Lucy

Appendix A: Key Files on Bee

Path Purpose
/data/adacam/agent.py HTTP agent API
/data/adacam/config.json Agent config
/data/ssh/bee_tunnel_key SSH key for tunnel
/data/recording/landmarks/ Detection files
/data/recording/pics/ Frame images
/data/odc-api.db SQLite (no detections!)

Appendix B: Key Services on Bee

Service Status Notes
map-ai enabled Runs AI inference
depthai_gate enabled Camera interface
odc-api enabled Node.js REST API (can be killed)
redis enabled Required by map-ai
hostapd enabled WiFi AP
bee-tunnel N/A Needs to be created

Appendix C: OpenClaw Commands

# Connect to Bee via Lucy jump (when routing is clean):
ssh -J root@192.168.0.5 root@192.168.0.10

# Or via reverse tunnel (if it worked):
ssh -p 2222 root@127.0.0.1