Omnibus: Automating OSINT Collection

Posted on 2018-08-16 by Adam Swanda

Open Source Intelligence (OSINT) is data collected from publicly available sources that is meant to be used in the context of intelligence. A great deal of data, combined with analysis by trained professionals, can be turned into actionable intelligence. This intelligence is used to enhance cyber security investigations, provide insight into adversary infrastructure and operators, give context to threat actor profiling, or understand a complex scenario.

When performing threat investigations OSINT is a crucial resource and is commonly used by analysts to enrich their data or gather new information on indicators found during their research. Though manual collection of this information can be a long, tedious, and costly process - especially if you need to perform the same collection tasks against dozens or hundreds of data points. On top of the information collection itself, analysts need a way to organize the gathered data so that it can be easily accessed, queried, and understood afterwards.

This is where InQuest Lab's new project Omnibus comes into play. Omnibus is our new open-source Python application which provides the means to collect OSINT information from dozens of public sources through built-in modules, store the collected data in a searchable manner, and automatically extract new artifacts found in the modules’ results for further inspection.

An Omnibus is defined as a volume containing several novels or other items previously published separately, and that is exactly what Omnibus aims to be for Open Source Threat Intelligence collection, research, and artifact management. With an easy to use interface, users can add new artifacts to investigate, group the artifacts together into manageable sessions, and run a wide variety of modules to enrich the data from their OSINT sources.

We would be amiss to say there are not already many great projects in this same field, specifically SpiderFoot and DataSploit. As we stand in the footsteps of giants, we greatly appreciate the work they've done for OSINT analysis by bringing it into the public-eye open-source community, and putting a spotlight on the discipline so that people like us can come along and build something new that suits our needs, and hopefully can help your mission as well.

Omnibus is developed for Python 2.7 and the beta version is available for download today, fully open-source under the MIT license, at our Github page https://github.com/InQuest/omnibus. It has been tested on Mac OS X, Ubuntu 16.04, and Ubuntu 18.04. For a detailed installation guide, please read the installation.rst file located in the docs directory of the repository. Further documentation is also available within this same directory.

We at InQuest Labs genuinely love community contributions so please tell us how we can improve this project. Please note that Omnibus is still in a beta stage and there may be some bugs we're still working out. Feel free to create Issues; submit Pull Requests for new modules, features, or bug fixes. We're more than happy to hear from you! 

Overview

Before we dive into the details of Omnibus, let's start off with some basic terminology that will be used throughout this blog and within the application itself.

Vocabulary

  • Artifact: Any supported* item created in Omnibus to be investigated or stored. An Artifact’s primary key is its name: inquest.net for a FQDN artifact, 8.8.8.8 for an IP artifact, etc.
    • Child Artifact: Artifact automatically created from an executed modules’ collected data. Each child database record has a pointer to its parent and the module it originated from.
  • Module: Python plug-in that performs an arbitrary task against an Artifact such as data collection or enrichment.
  • Machine: Group of modules specific to an Artifact type, executed with a single command, that runs against a supplied Artifact. Meant to quickly gather all available information on an artifact.
  • Session: Temporary cache of artifacts created after starting the Omnibus CLI. Organizes multiple Artifacts related to a single case or investigation. Each Artifact in a session is given an ID to quickly retrieve the Artifact value from the cache. Commands and modules can be executed against an Artifact by providing its name or corresponding Session ID.

Technology Stack

Omnibus was built using several supporting systems, though the entire code base is pure Python. 

Internally, Omnibus creates the Artifact data model as a Python Class which maps directly to how the data is stored within MongoDB. An example of the schema, detailing the required fields and the dnsresolve and nmap modules is available on our GitHub page for review and suggestions. 

Tech Overview:

  • MongoDB: Stores artifacts and enriched artifact data. The database contains the following collections:
    • email
    • hash
    • host
    • user
    • btc
  • Redis: Used to create the Omnibus session cache. Each time Omnibus-cli exits, the Redis cache is wiped but the artifacts remain stored in Mongo. (1)
  • Python: The entire codebase is developed in Python, including the modules. We ❤️ Python at InQuest Labs. (2)

(1) The next version of Omnibus will support "cases" which will essentially act like persistent sessions so you can continue your work after exiting the application. Cases will also offer the means to take notes, add references, and more easily view artifact relationships.

(2) This project has been a pet of mine for a long time and originally began as a folder of disparate scripts. Those individual scripts eventually came together into Omnibus - hence the name. Because of some older code and concepts, Omnibus is still using Python 2.7 as opposed to 3.x.  At some point in the future it will either be ported to Python 3 or at the least provide cross-compatibility.

High-Level Interaction

Users start the interactive command line application and create a session where they can then add artifacts they wish to investigate. Collection and enrichment modules can be executed against one or more of these added artifacts. The resulting data, with additional metadata, per artifact is stored in MongoDB for later access and searching. If a module identifies any valid artifact type in its discovered data, that newly discovered artifact will be stored in the database with a reference to the module and parent artifact it originated from.

Currently supported artifacts include:

  • IPv4 Addresses
  • Domains
  • File Hashes
  • Email Addresses
  • User names
  • Bitcoin Addresses

As Omnibus continues to grow, more artifact types will be supported such as URLs, subdomains, and more. 

Each artifact type has a number of modules that can be ran to query external sources and enrich the original artifact. The help menu explains which artifacts can be used with which modules, but don't worry if you mix them up at first. Omnibus gracefully handles ensuring only supported artifacts run against their associated modules; the artifact is validated against the module before it's executed. Until you get the hang of it, you might just be nicely told that the specified artifact isn't supported by the module.

Using Omnibus

Omninus-cli.py is the main application that's used to create, manage, and interact with any items you wish to investigate. This script can also perform some additional tasks like create reports, manage sessions, view your API keys, and more but we'll get to that a little later.

Below you can see the main interface to Omnibus. We've loaded the interactive CLI with the --debug argument (so full tracebacks are shown if errors occur), entered help to view all available commands, then dove deeper into the help contents of the session and new commands.

Command Line Interface

All help <topic> commands provide detail on what exactly the command does, and many will provide more detailed information along with example usage as seen in the screenshot.

To get started adding artifacts, simply use the new command. This adds the artifact to the appropriate MongoDB collection with the bare minimum of information if it's the first time you've interacted with this specific artifact.  omnibus >> new <artifact name>

Once entered, Omnibus automatically determines what type and subtype the artifact is, then creates a MongoDB document in the associated collection. 

This automated artifact identification is done across the application. For example, modules like the virustotal module must differentiate between an IP address, a domain, or a file hash in order to query the correct endpoint. In this scenario, as long as you run the virustotal module with any of those accepted artifact types, the module will automatically determine the correct type and API endpoint it needs to hit.

This auto-identification process goes along with one of the main goals for Omnibus I had while developing it; ease of use and simple access to data. A user should be able to simply start the application, add artifacts of interest and run commands to retrieve the data they want.

Interactive CLI & Artifacts

Most cyber security OSINT investigations begin with one or more technical indicators, such as an IP address or email address. After searching and analyzing, relationships begin to form and you can pivot by connected data points. As previously mentioned, these data points are called Artifacts within Omnibus and represent an item you wish to investigate.

To kick off an investigation, start the CLI by simply executing omnibus-cli.py and ensure the etc/apikeys.json file has all your required keys saved. Within this file there is a key for each service that requires an API key. Add your keys as the values and save the file and you’re good to go.

The omnibus-cli.py script provides an interactive command line for you to add and track multiple artifacts, execute modules against MongoDB- stored artifacts or one-off artifacts you don't wish to store, add sources to artifacts, export reports, and more.

Within the CLI you can get a list of all commands by using the help command. For each individual command, typing help and the name of the command will print the help information for that specific command.

Some common commands to remember are:

Command Usage
session start a new artifact cache
cat artifact-name pretty-print an artifacts stored data
cat apikeys view your stored API keys
open file-name load text file list of artifacts into Omnibus
new artifact-name create a new artifact and add it to MongoDB & your session
ls view all artifacts in your session
wipe clear your current session
modules show a list of all available modules

The command new followed by an artifact will create that artifact within your Omnibus session and store a record of that artifact within MongoDB. This record holds the artifact name, type, subtype (when applicable), module results, creation time, and source. 

The screenshot below shows the creation of a new session, adding a new artifact of inquest.net, displaying the current artifacts within our session, and then using the artifact’s session ID to display a truncated version of its database record. Also shown is the data key which contains the results of previously executed modules nmap and dnsresolve.

Artifact Results

This command line script attempts to mimic some common Linux console commands for ease of use. Omnibus provides commands such as cat, shown above, to show information about an artifact, rm to remove an artifact from the database, ls to view currently cached artifacts, and so on.

Sessions

Omnibus sessions provide a quick and easy way to investigate multiple hosts without having to manually keep track of all the artifacts you are tracking or having to re-type the artifact name for every modules’ execution. Within the Omnibus command line, sessions can be started manually using the session command. If not started manually a session will be created the first time you use the new command.

It should be noted that you do not need to have an active session to execute a module against an artifact. As long as you ensure a module’s argument is a valid artifact type, and that the module accepts that artifact type, you can run a module in a one-off fashion. 

Artifacts are added to the session cache as a dictionary with an integer as the key and the artifact name as the value. For example, the first artifact (e.g., deadbits.org) would have a key of "1" and a value of deadbits.org. The second artifact added will have a key of "2" and so on.

After an artifact is added to the session, you can interact with it by simply providing the key instead of typing the entire name. Still using the "deadbits.org" artifact, to get the WHOIS records you would simply run whois 1. This is the same as running whois deadbits.org.

Sessions are also how you can add a source to your artifacts. Once an artifact is added, you can use the source command to update those values in MongoDB. By default, these commands will update the most recently added artifact, but you can specify an artifact ID as the second argument to add a source to a pre-existing artifact.

Source is meant as a quick way to track where a piece of information came from. The source can be any arbitrary string you'd like such as a URL, an individual’s name, blog title, etc. Sources are also automatically created for child artifacts upon creation using the name of the module that generated them as the source value.

Commands:

  • source view <session id|artifact name>
  • source add <session id|artifact name> 

Modules

The application is designed in a modular fashion to be easily extendable. The "modules" directory contains several built-in Python scripts that query public data sources or otherwise run some task against specified artifacts. Each of the module results are printed to the console as formatted and syntax highlighted JSON.

As modules are executed against an artifact, new artifacts will automatically be created from the modules’ resulting data. For example, running the DNS resolution module will create new IPv4 or FQDN artifacts for the returned DNS records. These automatically created artifacts are linked to the original artifact by the children key of the schema. The created child artifact will also have a reference to its parent artifact and the module from which the child was created by. This makes it simple to view any given artifact and see parent-child relationships in both directions, or search for artifacts by their parent or source module.

Below is the list of provided modules at the time of writing this article:

  • Blockchain.info
  • Censys
  • ClearBit
  • CSIRTG
  • Cymon
  • DNS resolution
  • DShield (SANS ISC)
  • Full Contact
  • Geolocation
  • GitHub username search
  • HackedEmails.com
  • HaveIBeenPwned.com
  • Hurricane Electric
  • IPInfo
  • IPVoid
  • Keybase username lookup
  • NMap scanner
  • OTX (AlienVault)
  • PassiveTotal
  • PGP Key Search
  • RSS reader 
  • Shodan
  • ThreatCrowd
  • Twitter
  • URLVoid
  • VirusTotal
  • WHOIS
  • WhoisMind

Important: Every time you run a module against a pre-existing/stored artifact, the MongoDB document will be updated to reflect the newly discovered information. This means that if you run the same module twice and it returns different results each time, the artifact record will only have saved the most recent result.

Future Releases

As development of Omnibus is ongoing please follow the Github repository to ensure you are using the latest version and getting the benefits of any newly released modules.

Some upcoming features to look forward to include the addition of subdomains and URL artifacts, support for cases, integration with the InQuest Labs project ThreatIngestor, and import/export modules.

open-source osint