In a seminar today, we will unveil Rendezvous, a search engine for code. Built by Wei-Ming Khoo, it will analyse an unknown binary, parse it into functions, index them, and compare them with a library of code harvested from open-source projects.
As time goes on, the programs we need to reverse engineer get ever larger, so we need better tools. Yet most code nowadays is not written from scratch, but cut and pasted. Programmers are not an order of magnitude more efficient than a generation ago; it’s just that we have more and better libraries to draw on nowadays, and a growing shared heritage of open software. So our idea is to reframe the decompilation problem as a search problem, and harness search-engine technology to the task.
As with a text search engine, Rendezvous uses a number of different techniques to index a target binary, some of which are described in this paper, along with the main engineering problems. As well as reverse engineering suspicious binaries, code search engines could be used for many other purposes such as monitoring GPL compliance, plagiarism detection, and quality control. On the dark side, code search can be used to find new instances of disclosed vulnerabilities. Every responsible software vendor or security auditor should build one. If you’re curious, here is the demo.