scan binary

Detects cryptographic assets in compiled binaries, firmware images, and container archives without requiring source code. Useful for auditing third-party binaries, embedded firmware, shipped artifacts, or any environment where source is unavailable.

Usage

qtz-discovery-cli scan binary <path> [flags]

# Scan a single firmware image
qtz-discovery-cli scan binary ./firmware.bin

# Scan a Java archive and emit JSON findings
qtz-discovery-cli scan binary ./app.jar --format json

# Scan a directory of binaries, skip large debug files
qtz-discovery-cli scan binary /usr/lib --max-file-size 32 --exclude "*.debug"

# Scan with AI-driven deep analysis
qtz-discovery-cli scan binary ./firmware.bin --llm --llm-quality deep

# Dry-run AI analysis — estimate scope without spending
qtz-discovery-cli scan binary ./firmware.bin --llm --dry-run

# Write CycloneDX CBOM to file
qtz-discovery-cli scan binary ./release/ --format cyclonedx --output cbom.json

Supported Formats

The scanner auto-detects format by magic bytes and transparently decompresses nested archives.

Format	Notes
ELF (Linux, Android, embedded)	Symbol table, dynamic imports, DT_NEEDED chains
PE / COFF (Windows, UEFI)	Import table, wide strings (UTF-16LE), .NET CLR metadata
Mach-O / Fat Mach-O (macOS, iOS)	Dylib load commands, universal binary slices
JAR / WAR / AAR / APK	ZIP-expanded; each entry scanned recursively
TAR, tar.gz, tar.xz, tar.bz2	Auto-decompressed; entries scanned
DEB packages	data.tar.* extracted and scanned
RPM packages	CPIO payload extracted and scanned
Android Boot / Sparse Images	Kernel and ramdisk extracted
CPIO archives	Entries expanded and scanned
Broadcom TRX firmware	Kernel and rootfs partitions extracted
U-Boot uImage	Payload extracted
Apple XAR (.pkg)	Entries expanded and scanned
Windows Cabinet (.cab)	MSZIP/uncompressed extraction
gzip, bzip2, xz, zstd, LZ4	Transparently decompressed; stacked compression supported
Java .class	Constant pool parsed for class/method names
Dalvik DEX (Android)	String pool analyzed
WASM	Structured name sections parsed
BEAM (Erlang/Elixir)	Atom table extracted
Python .pyc	Marshal stream analyzed
Lua bytecode	Prototype strings analyzed
Intel HEX (.hex, .ihex)	Decoded to raw bytes before analysis
Motorola SREC	Decoded to raw bytes before analysis
UPX-packed binaries	Detected; dynamic unpacking attempted (depth 0 only)
Raw / unknown firmware	Pattern matching against raw bytes

Archives are unpacked recursively up to 8 layers deep (zip-bomb protection). Files exceeding --max-file-size are skipped with an informational advisory.

Detection Passes

Three static analysis passes run in parallel on each binary:

Pass	What it finds	Confidence
STATIC — strings & constants	Printable string extraction matched against 149+ crypto patterns (library version strings, algorithm names, import paths, PEM headers). Byte-constant matching against known AES S-box, SHA IVs, DES S-box, and EC curve parameters.	high
STRUCT — symbol tables	Parses ELF symbol/dynamic tables, PE import tables, Mach-O LC_LOAD_DYLIB commands, .NET CLR metadata, and JVM constant pools. Direct function references produce the highest-confidence findings.	confirmed
DEPS — library chains	Traces PQC library dependency chains from ELF DT_NEEDED, Mach-O dylib loads, .NET imports, and JVM constant pools. Detects liboqs, ML-KEM, ML-DSA, and other post-quantum libraries.	confirmed

Flags

Analysis control

Flag	Default	Description
`--strings`	true	Enable string-extraction pass
`--symbols`	true	Enable symbol-table pass (ELF / PE / Mach-O)
`--min-string-len`	6	Minimum printable-string run length for extraction
`--max-file-size`	128	Skip files larger than this many MiB
`--exclude`	—	Glob patterns to skip (e.g. `.debug,vendor/`)
`--dynamic-timeout`	`10s`	Per-binary wall-clock timeout for dynamic unpacking (UPX, encrypted blobs)

AI analysis (requires portal connection)

Flag	Default	Description
`--llm`	false	Enable AI-driven semantic analysis (requires `--server`)
`--llm-quality`	`auto`	Analysis depth: `auto\|fast\|deep\|chain`
`--scan-budget`	—	Max USD for AI analysis (e.g. `2.50`; 0 = unlimited)
`--dry-run`	false	Estimate AI analysis scope without executing (requires `--llm`)
`--llm-max-strings`	150	Max crypto-relevant strings sent to AI (0 = unlimited)
`--llm-max-symbols`	500	Max symbol/import names sent to AI (0 = unlimited)
`--llm-max-api-calls`	100	Max emulation API log entries sent to AI (0 = unlimited)

Finding Confidence Levels

Confidence	Source	What it means
`confirmed`	STRUCT pass	Direct function reference or dylib load — binary definitely uses this algorithm
`high`	STATIC pass	Strong string match or known byte-constant pattern — very likely present
`medium`	STATIC pass (bytecode)	String match in interpreted bytecode (DEX, .class, BEAM)
`low`	Fallback	Weak signal; review manually

Exit Codes

Code	Meaning
`0`	Success (findings may or may not be present)
`1`	Error (I/O failure, unreadable path, etc.)

CI/CD Example

- name: Crypto scan — release binary
  run: |
    qtz-discovery-cli scan binary ./dist/my-service-linux-amd64 \
      --format sarif \
      --output binary-results.sarif

- name: Upload SARIF to GitHub Security
  if: always()
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: binary-results.sarif
    category: qtz-binary

Firmware Example

# Scan a compressed firmware image — archives are decompressed automatically
qtz-discovery-cli scan binary router-fw-v2.4.1.tar.gz --format json --output fw-findings.json

# View a summary
qtz-discovery-cli report summary fw-findings.json