- Building an LLM agent that drives Ghidra/IDA/radare2 to answer reverse-engineering questions autonomously
- Task decomposition and tool-use strategies for multi-step binary analysis (unpacking, deobfuscation, semantic recovery)
- Benchmarking agent performance against human reverse engineers on curated CTF-style targets
- Failure-mode analysis: where do agents loop, hallucinate, or give up?
Please contact Sebastian Schrittwieser.