Table of contents

Metamorphic Malware

Introduction

Metamorphic or self-modifying code is an advanced technique used by virus and malware authors which enables their malicious program to rewrite itself in a way that the code remains functionally equivalent but looks different each time it is executed. This characteristic prevents antivirus software from detecting the malware using static signatures and makes reverse engineering more difficult.

This article will provide an overview of metamorphic malware, the difference to polymorphic malware and how it can be implemented, by taking a look at how The Mental Driller from the 29A virus group structures metamorphic engines1.

Metamorphism vs Polymorphism

The term polymorphism in regard to malware or virus development has little to do with it’s use in object-oriented programming. Instead, polymorphic malware can change its signature by generating and using a mutation engine that generates a new decryption routine for each generation of the virus. After the initial virus body is decrypted, the decryption routine is mutated and the virus body as well as the decryptor are encrypted again, before being linked together for the next iteration2. The downside of this approach is obvious: the decrypted virus body does not change and can therefore be detected and flagged by antivirus software, which is the reason why metamorphic engines came into existence.

Metamorphism can be defined as body-polymorphism3. Instead of having a constant virus body, a metamorphic engine is able to create new generations of the virus which are functionally equivalent but look different for every new infection. It’s not just the decryptor that mutates, but everything else in the code as well. This can be achieved by using three mutation techniques:

  • Compression: The virus body is compressed by combining multiple assembly instructions into one that does the same thing. This step is necessary to avoid the malware from growing too large in size.
  • Permutation: Intstructions are permuted by changing the order or by replacing them with equivalent instructions.
  • Expansion: Instructions are rewritten into multiple instructions that do the same thing and insert dead code that adds not functionality. This technique is only used if compression is present, since the virus code would grow uncontrollably otherwise.

While the permutation aspect is usually present in most metamorphic engines, the others are used to a lesser extent, due to the difficulty of implementing the compression code. According to The Mental Driller, the Accordion model (compression & expansion) in combination with permutation generates absolute metamorphism, which means that code skeleton changes but not the algorithm1.

The Metamorphic Engine

The metamorphic engine is the core part of a metamorphic virus that is responsible for the mutation of the virus body. It takes of more than 90% percent of the virus code, with the remaining part being the infector itself. The structure of the engine was described by The Mental Driller in 2002 but can still be applied to modern metamorphic malware. It incorporates the mutation techniques mentioned above in the shrinker, permutator and expander components and further includes a disassembler and assembler.

Structure of metamorphic code
Structure of metamorphic code

The following sections will provide a brief overview of each component of the engine. In particular, this article will focus on MetaPHOR4 (metamorphic permutating high-obfuscating reassembler), also known as W32/Simile5 by The Mental Driller. The replication mechanism of the virus is out of scope for this blog post but can be read about here.

Disassembler

In order to mutate itself, the malware first needs to disassemble itself into a pseudo-assembly language that is created by the malware author. However, The Mental Driller suggests basing the pseudo-assembly language on x86 opcodes in order to simplify the handling. The secondary characteristic of the disassembler is the ability to decode jmp or call instructions. A memory buffer is used to store the already-disassembled code, while two tables are used to store pointers to the disassembled code and the destinations of jmp, call and other similar instructions. While this article will not go into in-depth detail of the disassembler’s algorithm, it eventually achieves the following goals:

  • Eliminating permutation and permutation jumps
  • Eliminating unreachable code
  • Decoding the program into a pseudo-assembly language
  • Substituting labels with pointers to table entries

Shrinker

After the code has been disassembled, the shrinker takes caro of compressing known pairs or triplets of instructions into a single one. This is mainly done to reduce the size of the malware and to undo the results of the expander. The program needs to define a table of possible combinations that can be compressed or even eliminated.

Original InstructionCompressed InstructionExplanation
MOV reg, regNOPNothing happens
XOR reg, regMOV reg, 0Sets the registry to 0
SUB reg, regMOV reg, 0Sets the registry to 0

As can be seen in the table above, the shrinker also aims to keep the amount of different instructions to a minimum, which is why both the xor and sub instruction are compressed to a mov instruction.

Of course, the shrinker not only replaces single instructions but also sets of instructions. A small excerpt is shown below.

MOV addr, reg and PUSH addrPUSH reg
MOV addr2, addr1 and MOV addr3, addr2MOV addr3, addr1
MOV reg, val and ADD reg, reg2LEA reg,[reg2 + val]

When a matching pair or triplet is found, the shrinker replaces the first instruction with the compressed one and overwrites the following instructions with NOP instructions, successfully compressing it.

Permutator

The permutator is responsible for changing the order of instructions in the code, which is done by shuffling. By leaving in the nop instructions from the shrinker in this step, the permutator will derive a more random code structure, since the nop instructions are included in the permutation, but removed in further steps. The permutator can also be combined with other forms of metamorphism, such as substitution, which replaces instructions with equivalent ones.

Expander

At this point, the code has been compressed and permuted, which means that the code is now smaller and has a different structure. Now, the expander basically does the opposite of the shrinker: it recursively replaces single instructions randomly with matching single, pair or triplet instruction sets. Of course, a control variable is put in place to prevent the code from growing too much. Apart from the obvious expansion, the expander has additional characteristics, like register translation, which means that the registers are shuffled and never the same in different generations.

The expander is normally only implemented if the shrinker is implemented as well, as the code would grow uncontrollably without a compression mechanism.

Assembler

Last but not least, the assembler is responsible for converting the mutated pseudo-assembly language back into machine code that the processor can understand. The assembler fixes jmp, call and similar instructions, changes registers and fixes instruction lengths, before reassembling the code into the target processor language.

After the assembler has done its job, the metamorphic engine has successfully mutated the virus body and the hard part begins: the debugging. Debugging metamorphic code is problematic, since the mutated code is obfuscated and difficult to understand.

This marks the end of the components that The Mental Driller has described in his article about MetaPHOR. It has to be said that these are not the only metamorphic techniques that exist and that there are of course countless other ways to implement a metamorphic engine.

Detection

Despite it’s sophistication, metamorphic malware is by no means undetectable. Possible detection techniques include:

  • Behavioral Analysis
  • Heuristic-based Detection
  • Emulation or VM-Sandboxing
  • Geometric Detection6
  • Machine Learning7

According to researcher Philippe Beaucamps, who thoroughly studied and analyzed The Mental Driller’s MetaPHOR, the virus can be detected using statistical analysis of the code and monitoring of the memory during program execution. This is because the virus compresses itself into a form which is similar between generation. Furthermore, The Mental Driller has not implemented evasion techniques that protect against behavioral analysis8. However, the researcher also states that antivirus companies can be swiftly overtaken if refined metamorphic techniques were to be used by virus authors and malware developers.

Conclusion

Summing up, metamorphism is an advanced and highly sophisticated technique used by malware authors to evade static detection through signature-based antivirus software. Due to the complexity of the metamorphic engine, the code is difficult to implement, debug and reverse engineer, resulting in a shortage of research and well-developed practical examples. The Mental Driller showcased the generic structure of a metamorphic engine in his article about MetaPHOR, consisting of a disassembler, shrinker, permutator, expander and assembler. By combining the permutator with the so-called “Accordion” model, MetaPHOR achieved what the author called absolute metamorphism. The result is a virus that is functionally equivalent but looks different each time it infects a new host.


  1. Metamorphism in practice or “How I made MetaPHOR and what I’ve learnt”, https://vxug.fakedoma.in/archive/VxHeaven/lib/vmd01.html ↩︎ ↩︎

  2. Metamorphic Malware and Obfuscation: A Survey of Techniques, Variants, and Generation Kits, https://onlinelibrary.wiley.com/doi/10.1155/2023/8227751 ↩︎

  3. Advanced Code Evolution Techniques and Computer Virus Generator Kits: https://www.informit.com/articles/article.aspx?p=366890&seqNum=6 ↩︎

  4. MetaPHOR source code: https://github.com/mal-project/win32.MetaPHOR ↩︎

  5. Simile: https://en.wikipedia.org/wiki/Simile_(computer_virus) ↩︎

  6. Hunting for Metamorphic: https://harrisonwl.github.io/assets/courses/malware/spring2017/papers/HuntingMetamorphic.pdf ↩︎

  7. Understanding Metamorphic Code: https://dev.to/khairuaqsara/understanding-metamorphic-code-3g7 ↩︎

  8. Advanced Metamorphic Techniques in Computer Viruses: https://inria.hal.science/inria-00338066/document ↩︎