There are plenty of blacklists available online. Building blacklists based detection often leads to high false positives rates which affects quality, increase workload and make alerts investigation more difficult. Primary reason is the lack of context. Context allows analysts to focus on what’s important and pivot from collected data in order to find more indicators and create better detection rules. Let’s explore how to hunt with Open Source Intelligence and command line fu to find evil and enhance detection with pattern matching rules.
Create the IOC list
For the sake of this example I’ve decided to use this list which includes IP addresses, domains and most importantly: context! Context should be part of every IOC list that you create. It doesn’t matter if the list is build based on known traffic patterns, OSINT research or tip off. Even though there might be additional overhead, having context will pay off in longer run.
List format example:
1 2 3 4 5 6 7 8 9 10
Let’s start by extracting all the domains:
1 2 3 4 5 6 7 8 9 10 11
For those who don’t feel comfortable with command line:
- from list select all the lines excluding lines with
/with new line character
- replace strings in the file (
- remove space character
- select lines that contain
- sort list and print unique entries results save to
Extracted list of domains (part removed for brevity):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Now, we can use our list to search for evil in the proxy logs:
For each line in the
domains-IOC file the above code snippet will search for corresponding entry in the proxy log file.
If your SIEM solution or other detection platform allows you to access some backend that stores historic data about network connections you should consider yourself a lucky analyst. Let’s assume this backend allows you to use raw SQL queries. Body of such SQL query can be easily pre-generated with command line:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Now the only thing left is to add a header with SELECT statement and a table name. This might be extremely useful and time saving especially if your list contains hundreds of entries!
Let’s assume you found malicious domains in your proxy logs - for instance a hit on a Sweet Orange gate:
This is where context kicks in:
with this information one click later analyst can review chain of the events.
In this case most probably redirect to EK page did not happen as there were no hits in proxy logs for the following URLs:
However with the context available an analyst knows exactly what to look for and how to determine whether activity was successful or not. It is definitely worth mentioning that EK gates stay active for a longer time, this in turn quite often leads to interesting findings like new compromised domains or landing pages. Just follow this example:
200 TCP_NC_MISS GET UnknownCompromisedWebsite.com 18.104.22.168 80 / - DIRECT
200 TCP_NC_MISS GET static.matthewsfyi.com 22.214.171.124 80 /k?tstmp=3600039285 hxxp://UnknownCompromisedWebsite.com/ DIRECT
Analysis of HTTP referer field provides information about previously unknown compromised domain.
Above queries allowed analysts to not only identify connections to bad domains but more importantly allowed them to use context to confirm/deny if traffic was indeed malicious and resulted in successful exploitation. That’s pretty cool! But hey, there’s more! Bad guys quite often will use a different C&C infrastructure for the same type of malware. Basically it is far more easier to change C&C for given malware sample rather than to re-code the communication function/method. This means that malware will use similar communication pattern when connecting to different C&C.
Enter pattern matching!
URL pattern matching is an effective way to detect the same type of network traffic communication even though bad guys use different IPs/domains. For instance let’s take a closer look at Cryptowall C&C.
Detection could be achieved with the following regular expression:
Above expression will match
.php? followed by:
- one letter
- equals sign
- minimum twelve to maximum sixteen alphanumeric characters
This works for both real time detection and also hunting on historic data.