Plyara: Parsing YARA rules with Python

Posted on 2018-07-06 by Ryan Shipp

Plyara is a Python lexer and parser for YARA rules. You can use it to build your own tools around YARA rules: whether analyzing or performing bulk operations on a large corpus, parsing rule content for display, writing a linter, or any other application you might think of.

InQuest has worked together with several other contributors to put together a new, community-maintained version of plyara with tons of bug fixes and improvements.

History and Contributors

Plyara was created in 2014 by Christian Buia, who continued to maintain and improve the tool for a few years. As the project gained traction, several other people began contributing. In 2017, three different developers (rholloway, utkonos, and Adam Trask) made significant changes to the source in their respective forks. Plyara was also published to PyPI by another developer, msm.

As we at InQuest were building our YARA and C2 knowledge base workflow tool, ThreatKB (still in a pre-release state), we came across plyara and again modified it slightly to fit our needs. We found a few breaking bugs in the parsing logic while testing against our rulesets, and while looking for fixes stumbled upon the newer forks where some of the issues had already been resolved. Rather than trying to fork the project again and track down and merge all the changes alone, we asked for help from the existing contributors and put together an official plyara GitHub organization where everyone could work together to improve this awesome project.

The new plyara GitHub repository includes the changes from the three forks mentioned above, updated tests and documentation, and a few extra bug fixes from InQuest developers. Check it out, and feel free to open an issue or submit a pull request if you have any problems!

Installation

You will need Python (2.7 or 3.x) and pip installed on your computer. Then install plyara from PyPI:

pip install plyara

Usage

There are two basic ways to use plyara: import the library directly for use in Python projects, or use the provided command-line tool to parse and convert rules to JSON. Some usage examples for each are provided below. Check out the full plyara documentation for more information.

Library

You can use plyara to parse rules from a string or file into a dictionary representation.

>>> import plyara
>>> parser = plyara.Plyara()
>>> mylist = parser.parse_string('rule MyRule { strings: $a="1" \n condition: false }')
>>>
>>> import pprint
>>> pprint.pprint(mylist)
[{'condition_terms': ['false'],
  'raw_condition': 'condition: false',
  'raw_strings': 'strings: $a="1" \n',
  'rule_name': 'MyRule',
  'start_line': 1,
  'stop_line': 2,
  'strings': [{'name': '$a', 'value': '"1"'}]}]
>>>

Plyara extracts rule imports, includes, meta information, strings, conditions, comments, tags, and more, and provides easy programmatic access to everything. You can also use the included rebuild_yara_rule function to transform the parsed dictionary data structure back into a valid YARA rule.

Command Line

The command-line tool will print valid JSON output when parsing rules:

Take this example rule from the YARA documentation:

rule silent_banker : banker
{
    meta:
        description = "This is just an example"
        thread_level = 3
        in_the_wild = true
    strings:
        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
        $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
    condition:
        $a or $b or $c
}

If you run it through plyara, it parses the rule and prints out the JSON representation for easy use in other tools:

$ plyara example.yar
[
    {
        "condition_terms": [
            "$a",
            "or",
            "$b",
            "or",
            "$c"
        ],
        "metadata": {
            "description": "This is just an example",
            "in_the_wild": "true",
            "thread_level": "3"
        },
        "raw_condition": "condition:\n        $a or $b or $c\n",
        "raw_meta": "meta:\n        description = \"This is just an example\"\n        thread_level = 3\n        in_the_wild = true\n    ",
        "raw_strings": "strings:\n        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}\n        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}\n        $c = \"UVODFRYSIHLNWPEJXQZAKCBGMT\"\n    ",
        "rule_name": "silent_banker",
        "start_line": 1,
        "stop_line": 13,
        "strings": [
            {
                "name": "$a",
                "value": "{6A 40 68 00 30 00 00 6A 14 8D 91}"
            },
            {
                "name": "$b",
                "value": "{8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}"
            },
            {
                "name": "$c",
                "value": "\"UVODFRYSIHLNWPEJXQZAKCBGMT\""
            }
        ],
        "tags": [
            "banker"
        ]
    }
]

Thanks

A huge thank you to Christian and all the other contributors, who have been super helpful in updating plyara!

yara open-source