workerthread 7 years ago

It seems to me that the input data for the study is "just" the final report from the hackers to the customer. The academic researchers (who are presumbably nowhere near the level of expertise of the hackers) then annotate and categorize the conceptual tasks behind each word and sentences in the report.

It seems to me that a lot of bias on the input data, based on the annotators knowledge. They try to account for this by using multiple (n=7) annotators, but I doubt if that is enough.

Two questions come to my mind:

1) What level of detail do the final reports contain? I have procured and read a few pen testing reports myself, and the level of technical detail seemed too low to infer the hour-by-hour activities of the hackers in any meaningful way. Would be nice if the paper explained what those reports actually contained

2) I wonder what it would take to get the hackers themselves to keep a diary/journal of the hour by hour activities. That would remove a lot of noise from the input data.

bitexploder 7 years ago

Their technique is pretty interesting (open coding).

There is a level in abstraction below (more real) that is greatly helpful in breaking DRM and finding other weird bits of code. Running code in an instrumented qemu for example. It is much more accessible and productive for me over SMT solver tooling.

Generally their attack tree is very nice, but they sweep a lot under the rug in their dynamic analysis section. This is a short paper though, so no fault there. Their tree is a concise outline for anyone looking to methodically RE any black box thing, not just DRM and protection schemes.