Posted on 2019-02-28 by Steve Esling
As we’ve discussed in our previous two Ex Machina articles, one of the goals in our machine learning efforts is to use artificial intuition as an aid in the construction of our signatures. This was previously illustrated by examining the importance of features in GB and RF algorithms, both of which are supervised techniques. However, both also require a training data set, which must be collected, annotated, and put together by humans; they’re good for finding patterns related to good or bad macros as a whole, but don’t get much more fine-grained than that. Obviously, this is going to lead to very general signatures. Even with very large training sets, the smaller details behind why an ML process yielded a positive or negative result are lost within simpler heuristics. When it comes to designing more specific signatures tailored to locating different kinds of malware, these heuristics are not enough.