Analyst’s Handbook - Hunting with basic OSINT and command line fu

There are plenty of blacklists available online. Building blacklists based detection often leads to high false positives rates which affects quality, increase workload and make alerts investigation more difficult. Primary reason is the lack of context. Context allows analysts to focus on what’s important and pivot from collected data in order to find more indicators and create better detection rules. Let’s explore how to hunt with Open Source Intelligence and command line fu to find evil and enhance detection with pattern matching rules.

Create the IOC list

For the sake of this example I’ve decided to use this list which includes IP addresses, domains and most importantly: context! Context should be part of every IOC list that you create. It doesn’t matter if the list is build based on known traffic patterns, OSINT research or tip off. Even though there might be additional overhead, having context will pay off in longer run.

List format example:

182.149.13,port 80,eturnstartsikkanese. and returnstartsikkanese.innovativeapplicationsblog.com,,Angler EK (2nd attempt from 2015-01-22),2015-01-22
182.149.13,port 80,governat.richdadradio.co.uk,,Angler EK (3rd attempt from 2015-01-22),2015-01-22
126.97.209,port 80,jyjhsvgkpeni0g.com,POST /,ET TROJAN Bedep Checkin Response,2015-01-22
138.25.107,port 80,drain.diskant.co.uk,POST /news.php,ET TROJAN Fareit/Pony Downloader Checkin 2,2015-01-22
224.126.19,port 80,asop83uyteramxop.com,,ET MALWARE Fun Web Products Spyware User-Agent (FunWebProducts),2015-01-22
69.233.133,port 8080,ipsalomenatep58highwayroad.biz:8080,,ET TROJAN Ursnif Checkin,2015-01-22
99.6.187,port 8080,79.99.6.187:8080,,ETPRO TROJAN Win32/Injector.BOIK Downloader Checkin,2015-01-22
40.64.218,port 80,n1hxftesfm3n4333ah61xnf.ajanshizmeti.com,,Windigo group Nuclear EK,2015-01-23
44.135.8,port 80,camhogger.com,,Compromised website,2015-01-23
226.180.82,port 80,instanthold.gq,,Nuclear EK,2015-01-23

Let’s start by extracting all the domains:

$ cat suspicious-ip-addresses-and-domains.txt | egrep -v `#` |
  cut -d `,` -f3  | 
  tr `/` `\n` |
  sed `s/ and returnstartsikkanese.//` |
  sed `s/ and /\n/` |
  sed `s/8085and/8085\n/` |
  sed `s/www.//` |
  tr -d ` ` | 
  sed `s/\[23characters\].//` |
  egrep . |
  sort -n | uniq > domains-IOC

For those who don’t feel comfortable with command line:

from list select all the lines excluding lines with # character
replace / with new line character \n
replace strings in the file (sed s/MatchString/ReplaceWithThisString)
remove space character
select lines that contain .
sort list and print unique entries results save to domains-IOC file

Extracted list of domains (part removed for brevity):

brianpekarchuk.com
burdiacs.com
burtander.com
butterflymedia.az
californiainsuranceco.com
callproc.com
camhogger.com
canadahalalec.com
cannedseniordogfood.com
captainblowdri.com
caracolassn.com
cast.autolistprofits.net
celebrityvalley.com
chcoa.com
cityep.net
themoviespoiler.com

Hunting

Now, we can use our list to search for evil in the proxy logs:

$ while read line; do egrep -i $line proxy-logs-20150130.log ; done < domains-IOC

For each line in the domains-IOC file the above code snippet will search for corresponding entry in the proxy log file.

If your SIEM solution or other detection platform allows you to access some backend that stores historic data about network connections you should consider yourself a lucky analyst. Let’s assume this backend allows you to use raw SQL queries. Body of such SQL query can be easily pre-generated with command line:

$ cat IOC-Domain | sed `s/.*/db_column_name="\0" OR/`

db_column_name="brianpekarchuk.com" OR
db_column_name="burdiacs.com" OR
db_column_name="burtander.com" OR
db_column_name="butterflymedia.az" OR
db_column_name="californiainsuranceco.com" OR
db_column_name="callproc.com" OR
db_column_name="camhogger.com" OR
db_column_name="canadahalalec.com" OR
db_column_name="cannedseniordogfood.com" OR
db_column_name="captainblowdri.com" OR
db_column_name="caracolassn.com" OR
db_column_name="cast.autolistprofits.net" OR
db_column_name="celebrityvalley.com" OR
db_column_name="chcoa.com" OR
db_column_name="cityep.net" OR

Now the only thing left is to add a header with SELECT statement and a table name. This might be extremely useful and time saving especially if your list contains hundreds of entries!

Let’s assume you found malicious domains in your proxy logs - for instance a hit on a Sweet Orange gate:

200 TCP_NC_MISS GET themoviespolier.com 50.62.217.1 80 / http://www.google.com/url?url=http://www.themoviespoiler.com/&rct=j&frm=1&q=&esrc=s&sa=U&ei=KDnZVMbIN4eQyQTd8oKwCA&ved=0CBUQFjAA&usg=AFQjCNFJ3uEfi7Djoan_rZE88d15OulS9g DIRECT 
200 TCP_NC_MISS GET static.matthewsfyi.com 50.87.151.146 80 /k?tstmp=3600039285 http://www.themoviespoiler.com/ DIRECT

This is where context kicks in:

$ egrep 50.87.151 suspicious-ip-addresses-and-domains.txt 
50.87.151.146,port 80,static.matthewsfyi.com,,Redirect pointing to Sweet Orange EK,2015-02-09

with this information one click later analyst can review chain of the events.

In this case most probably redirect to EK page did not happen as there were no hits in proxy logs for the following URLs:

h.useditems.ca:8085
k.vidihut.com:8085

However with the context available an analyst knows exactly what to look for and how to determine whether activity was successful or not. It is definitely worth mentioning that EK gates stay active for a longer time, this in turn quite often leads to interesting findings like new compromised domains or landing pages. Just follow this example:

200 TCP_NC_MISS GET UnknownCompromisedWebsite.com 1.1.1.1 80 / - DIRECT
200 TCP_NC_MISS GET static.matthewsfyi.com 50.87.151.146 80 /k?tstmp=3600039285 hxxp://UnknownCompromisedWebsite.com/ DIRECT

Analysis of HTTP referer field provides information about previously unknown compromised domain.

Improving detection

Above queries allowed analysts to not only identify connections to bad domains but more importantly allowed them to use context to confirm/deny if traffic was indeed malicious and resulted in successful exploitation. That’s pretty cool! But hey, there’s more! Bad guys quite often will use a different C&C infrastructure for the same type of malware. Basically it is far more easier to change C&C for given malware sample rather than to re-code the communication function/method. This means that malware will use similar communication pattern when connecting to different C&C.

Enter pattern matching!

URL pattern matching is an effective way to detect the same type of network traffic communication even though bad guys use different IPs/domains. For instance let’s take a closer look at Cryptowall C&C.

$ egrep Crypto suspicious-ip-addresses-and-domains.txt | cut -d ',' -f4 | sort -n | uniq

Cryptowall requests

Detection could be achieved with the following regular expression:

$ egrep '\.php\?[a-z]{1}=[a-z0-9]{12,16}?' http-logs.txt

Above expression will match .php? followed by:

one letter
equals sign
minimum twelve to maximum sixteen alphanumeric characters

Cryptowall - matched requests

This works for both real time detection and also hunting on historic data.

Happy Hunting!

Update

I’ve noticed the original list is no longer available on www.malware-traffic-analysis.net. You can grab a copy of the extracted domains list from GitHub if you want to play with it.

dfir it!

responding to incidents with candied bacon

Analyst’s Handbook - Hunting With Basic OSINT and Command Line Fu

Create the IOC list

Hunting

Improving detection

Comments