This blog is about Java (advanced Java topics like Reflection, Byte Code transformation, Code Generation), Maven, Web technologies, Raspberry Pi and IT in general.

Samstag, 6. September 2014

Class Transformation with ASM

Have you ever asked yourself how class transformation works? Great, then you are reading the right blog post ;-)

Probably not everybody knows what class transformation is. Therefore I will start to explain it before I will explain how it works. 

What is class transformation?

Actually it's there is a quite simple answer to this question: the Java bytecode will be modified in some kind. But let me explain it more detailed. If you compile a Java file a class file will be generated. The class file represents the Java source file as Java binary code. So it's much smaller and optimized for execution. The methods consists of the Java opcodes. These opcodes will pushed sequentially onto the Java stack and will be executed. Class transformation means that the Java byte code, which represents a Java class, will be modified. So opcodes can be inserted or removed. But not only the opcodes inside of a method can be modified. Everything can be changed - any program can be transformed in anything else! At least as it is still valid Java byte code. Otherwise the Java Bytecode Verifier will reject the class if the class will be loaded.

Where is class transformation used?

Why would you transform a class if you just could write the Java class like you need it to be? Actually if you can accomplish your work without class transformation than don't use class transformation. Just write the Java code accordingly. I think the most commonly usage of class transformation is to instrument Java code at runtime. Imagine if you have a big program which has performance problems and there are no performance tools like VisualVM or JProfiler. What would you do to find the methods which takes long to execute?
You would have to insert at each method the code to measure the execution duration of the method. If there are thousands of methods this would be a quite boring work. E
specially since you need to remove the code for the production code and probably add it again to do the analyze the execution durations again. With class transformation you can do exactly this boring work. You don't write the duration measurement code in you Java files. But you read the existing class file and insert the needed opcodes to each class, to each method.
Actually all performance tools work like this. Java allows to modify already loaded classes, with some restrictions, too. So these tools gets the binary code of the classes, rewrites the classes on a binary level and Java loads the modified classes. Than the tools can generate analysis and pretty diagrams from the instrumented classes.

ASM

ASM is a great library which allows to transform classes. It consists of three main parts
  •  ClassReader: reads a binary class
  • ClassWriter: writes a binary class
  • ClassVisitor: transforms the binary class by calling the visit-methods of your implementation
To be able to create class transformations you need to understand the Java opcodes and how a stack based language works. At least you need a rudimentary understanding.

There are tools which will show you the byte code of any class and generate the ASM code to create this class with ASM. Therefore you don't need to write all the opcodes by hand. Just write a Java class which should be the result of your transformation. Than look at the generated code and adapt it to your needs. In theory that sounds very easy. But at least I had to read the ASM documentation and the Java opcodes documentation, too. To make a quite simple transformation work. 

 Example

This example, which can be found completely on GitHub, does two things.
  • Wraps all static Logger variables
  • Logs all method calls
For more details read the code. It's heavily documented and it makes more sense to read than anything else. Have fun! :-)

The code on GitHub with syntax hilighting, yeah: https://github.com/rseiler/concept-class-transformation-with-asm/blob/master/src/main/java/at/rseiler/concept/Main.java!


package at.rseiler.concept;

import org.objectweb.asm.*;
import org.objectweb.asm.commons.AdviceAdapter;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.lang.reflect.Method;
import java.util.logging.Logger;

import static org.objectweb.asm.Opcodes.ASM4;
import static org.objectweb.asm.Opcodes.INVOKESTATIC;

/**
 * A demo how to do byte code transformation with ASM.
 * 
 * The program will load the HelloWorld class file and manipulate the byte code:
 * 
 * 1. Wraps static {@link Logger} into the {@link LoggerWrapper#logger(Logger)}
 * 2. Adds at the beginning of each method a call to {@link MethodLogger#log(String, Object...)}
 * 
 * 1.
 * private static final Logger logger1 = Logger.getLogger(HelloWorld.class.getName());
 * will be transformed into:
 * private static final Logger logger1 = LoggerWrapper.logger(Logger.getLogger(HelloWorld.class.getName()));
 * 
 * 2.
 * public String foo(String arg) {
 * return bar("foo", arg);
 * }
 * will be transformed into:
 * public String foo(String arg) {
 * MethodLogger.log("foo", arg);
 * return bar("foo", arg);
 * }
 * 
 * 
 * You shouldn't relay on the ASM version packed into the jdk for production code!
 * Because if a new Java version will be shipped than it could contain a new version of AMS (or remove ASM) which will break your code.
 * Therefor you must repackage ASM into your own namespace, to prevent version conflicts, and ship it with your library.
 * 
 * Because this is non production code and I am lazy I didn't do it.
 * 
 * IMPORTANT: If you try to run the program on a JMV other than the JDK8 it will probably fail.
 *
 * @author reinhard.seiler@gmail.com
 */
public class Main {

    public static void main(String[] args) throws Exception {
        // creates the ASM ClassReader which will read the class file
        ClassReader classReader = new ClassReader(new FileInputStream(new File("HelloWorld.class")));
        // creates the ASM ClassWriter which will create the transformed class
        ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_MAXS);
        // creates the ClassVisitor to do the byte code transformations
        ClassVisitor classVisitor = new MyClassVisitor(ASM4, classWriter);
        // reads the class file and apply the transformations which will be written into the ClassWriter
        classReader.accept(classVisitor, 0);

        // gets the bytes from the transformed class
        byte[] bytes = classWriter.toByteArray();
        // writes the transformed class to the file system - to analyse it (e.g. javap -verbose)
        new FileOutputStream(new File("HelloWorld$$Transformed.class")).write(bytes);

        // inject the transformed class into the current class loader
        ClassLoader classLoader = Main.class.getClassLoader();
        Method defineClass = ClassLoader.class.getDeclaredMethod("defineClass", String.class, byte[].class, int.class, int.class);
        defineClass.setAccessible(true);
        Class helloWorldClass = (Class) defineClass.invoke(classLoader, null, bytes, 0, bytes.length);

        // creates an instance of the transformed class
        Object helloWorld = helloWorldClass.newInstance();
        Method hello = helloWorldClass.getMethod("hello");
        // class the hello method
        hello.invoke(helloWorld);
    }

    private static class MyClassVisitor extends ClassVisitor {

        public MyClassVisitor(int i, ClassVisitor classVisitor) {
            super(i, classVisitor);
        }

        public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
            if (cv == null) {
                return null;
            }

            MethodVisitor mv = super.visitMethod(access, name, desc, signature, exceptions);
            //  defines the static block in which the assignment of static variables happens.
            // E.g. private static final Logger logger = Logger.getLogger(HelloWorld.class.getName());
            // The assignment of the logger variable happens in .
            if ("".equals(name)) {
                return new StaticBlockMethodVisitor(mv);
            } else {
                // all other methods (static and none static)
                return new MethodLogger(mv, access, name, desc);
            }
        }

        class StaticBlockMethodVisitor extends MethodVisitor {
            StaticBlockMethodVisitor(MethodVisitor mv) {
                super(ASM4, mv);
            }

            @Override
            public void visitFieldInsn(int opcode, String owner, String name, String desc) {
                // checks for: putstatic // Field *:Ljava/util/logging/Logger;
                if ("Ljava/util/logging/Logger;".equals(desc)) {
                    // adds before the putstatic opcode the call to LoggerWrapper#logger(Logger) to wrap the logger instance
                    super.visitMethodInsn(INVOKESTATIC, "at/rseiler/concept/LoggerWrapper", "logger", "(Ljava/util/logging/Logger;)Ljava/util/logging/Logger;");
                }
                // do the default behaviour: add the putstatic opcode to the byte code
                super.visitFieldInsn(opcode, owner, name, desc);
            }
        }

        class MethodLogger extends AdviceAdapter {

            private final int access;
            private final String name;
            private final String desc;

            protected MethodLogger(MethodVisitor mv, int access, String name, String desc) {
                super(ASM4, mv, access, name, desc);
                this.access = access;
                this.name = name;
                this.desc = desc;
            }

            @Override
            protected void onMethodEnter() {
                // checks if the method is static.
                // The difference is that "this" is stored in ALOAD_0 and the arguments are stored in ALOAD_1, ALOAD_2, ...
                // But there is no "this" for a static method call. Therefor the arguments are stored in ALOAD_0, ALOAD_1 ,...
                // If we want to access the arguments we need to differentiate between static and non static method calls.
                boolean isStatic = (access & ACC_STATIC) > 0;

                int length = Type.getArgumentTypes(desc).length;

                // pushes the method name on the stack
                super.visitLdcInsn(name);
                // pushes the count of arguments on the stack
                // could be optimized if we would use iconst_0, iconst_1, ..., iconst_5 for 0 to 5.
                super.visitIntInsn(BIPUSH, length);
                // creates an object array with the count of arguments
                super.visitTypeInsn(ANEWARRAY, "java/lang/Object");

                // stores the arguments in the array
                for (int i = 0; i < length; i++) {
                    // duplicates the reference to the array. Because the AASTORE opcode consumes the stack element with the reference to the array.
                    super.visitInsn(DUP);
                    // could be optimized
                    super.visitIntInsn(BIPUSH, i);
                    // puts the value of the current argument on the stack
                    super.visitVarInsn(ALOAD, i + (isStatic ? 0 : 1));
                    // stores the value of the current argument in the array
                    super.visitInsn(AASTORE);
                }

                // calls the MethodLogger#log(String, Object...) method with the corresponding arguments - which we created just before
                super.visitMethodInsn(INVOKESTATIC, "at/rseiler/concept/MethodLogger", "log", "(Ljava/lang/String;[Ljava/lang/Object;)V");
            }
        }

    }

}

Sonntag, 29. Juni 2014

Explanation how CGLIB proxies work


 In my lastblog entry I explained how proxy based mock frameworks work and created my own demo mock framework based on CGLIB. Now I will explain how exactly the CGLIB proxy works. Because then you will understand why these things doesn't work with a proxy:
  • intercept static method calls
  • intercept private method calls
  • intercept final method calls
  • build a proxy for a final class


ASM

CGLIB is based on ASM. ASM is a library to create class files, on bytecode level, on the fly. So it's very low level and you need a good understanding of the bytecode to use it. But it abstracts the bytecode and is much more comfortable to generate bytecode with ASM than writing the bytecode itself. Because it takes care of the common structures like classes, constructors, method definitions and so on. One of the most helpful features is that it takes care about the constant pool. Because the constant pool holds many elements and you need to adapt it if you change anything. E.g. if a new method is added than you have to add the needed elements to the constant pool and increase the constant pool size. Or the labels in ASM are quite nice, too. Or that there are constants for the opcodes. After the complete class is generated, you have the bytecode of the class. This bytecode is passed into the ClassLoader which will load the class and then the class can be used.


How the generated CGLIB looks

Now you know how the generation of the proxy works. The question is how the generated proxy looks like. Actually it's quite simple. The proxy class extends the original class. All classes which should be intercepted by the proxy will be overridden. These methods will call the MethodInterceptor:: intercept method with the corresponding parameters. If the proxy callback is null than the super method will be called. Here is the code snippet for a simple echo(String) method:
public class at.rseiler.concept.mock.Foo$$EnhancerByCGLIB$$c0699eac
  extends at.rseiler.concept.mock.Foo
  implements org.mockito.cglib.proxy.Factory
{
  private static final org.mockito.cglib.proxy.MethodProxy CGLIB$echo$1$Proxy;
  private static final java.lang.reflect.Method CGLIB$echo$1$Method;
  // ...
  private org.mockito.cglib.proxy.MethodInterceptor CGLIB$CALLBACK_0;
  // ...

 
public final String echo(String arg0) {
    if(CGLIB$CALLBACK_0 == null) {
      CGLIB$BIND_CALLBACKS(this);
    }

    if(CGLIB$CALLBACK_0 != null) {
      // MethodInterceptor::intercept(Object obj, Method method, Object[] args, MethodProxy proxy)
      Object obj = CGLIB$CALLBACK_0.intercept(this, GLIB$echo$0$Method, new Object[] {arg0}, CGLIB$echo$0$Proxy);

      if(!(obj instanceOf String)) {
        throw new ClassCastException();
      }

      return obj;
    }

    return super.echo(arg0);
  }
  private static final void CGLIB$BIND_CALLBACKS(java.lang.Object object) {
    /* compiled code */
  }

  // ...

}
Now you should understand the limits of the proxy (see above). The limits caused because the proxy is a subclass of the original class. Therefore it's not possible to extend a final class or final methods and so on. That's just basic Java limitations.
With code generation it's not possible to work around these limits. Sure it would be possible to generate a proxy class without the relationship to the original class. Than it would be possible to copy the whole class and insert the MethodInterceptor::intercept calls. This new class could remove all final keywords and make private methods public. The problem is that you can't call any method of this class without reflection. Because you don't have the type information of the "proxy" class at  compile time - clearly because the class is generated at runtime. Further you can't pass the "proxy" object to any constructor or any method because it's not a subclass.
The only way to break this limits is to do bytecode transformations of the original class before the ClassLoader loads the class. Before the class is loaded replace all private keywords with public keywords and to remove all final keywords. To do so you need to create your own ClassLoader which will performs the bytecode transformation. I don't know the code of PowerMock, but it must work like this.
At last the with javap --verbose decompiled bytecode of the echo(String) method from which I created the Java code above:
public final java.lang.String echo(java.lang.String);
flags: ACC_PUBLIC, ACC_FINAL

Code:
stack=7, locals=2, args_size=2
0: aload_0
1: getfield      #37                 // Field CGLIB$CALLBACK_0:Lorg/mockito/cglib/proxy/MethodInterceptor;
4: dup
5: ifnonnull     17
8: pop
9: aload_0
10: invokestatic  #41                 // Method CGLIB$BIND_CALLBACKS:(Ljava/lang/Object;)V
13: aload_0
14: getfield      #37                 // Field CGLIB$CALLBACK_0:Lorg/mockito/cglib/proxy/MethodInterceptor;
17: dup
18: ifnull        45
21: aload_0
22: getstatic     #64                 // Field CGLIB$echo$1$Method:Ljava/lang/reflect/Method;
25: iconst_1
26: anewarray     #66                 // class java/lang/Object
29: dup
30: iconst_0
31: aload_1
32: aastore
33: getstatic     #68                 // Field CGLIB$echo$1$Proxy:Lorg/mockito/cglib/proxy/MethodProxy;
36: invokeinterface #53,  5           // InterfaceMethod org/mockito/cglib/proxy/MethodInterceptor.intercept:(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;Lorg/mockito/cglib/proxy/MethodProxy;)Ljava/lang/Object;
41: checkcast     #55                 // class java/lang/String
44: areturn
45: aload_0
46: aload_1
47: invokespecial #62                 // Method at/rseiler/concept/mock/Foo.echo:(Ljava/lang/String;)Ljava/lang/String;
50: areturn

Freitag, 20. Juni 2014

Explanation how proxy based Mock Frameworks work


Have you ever wondered how these Mockito lines work?
Foo foo = Mockito.mock(Foo.class);
Mockito.when(foo.echo("foo")).thenReturn("foo");
Yes or you are now interested? Great, then you should read this article. Otherwise you are probably lost and I can't help you. Sorry.

The first important thing to know is that there are two types of frameworks.
  1. the proxy based mock frameworks: Mockito, EasyMock, jMock, ...
  2. the mock frameworks based on bytecode manipulation: PowerMock, ...
There is a big different between those both concepts.
  1. are much easier to implement but they are more restricted in the features they can support.
  2. bytecode manipulation should tell you everything you should know about it: it's based on "very dark magic". It can break on major Java releases. So be careful if you start to use such frameworks, because they could prevent you from upgrade your Java version. PowerMock for example builds on top of Javassist. A framework which makes bytecode manipulation more simple.
In this article I will explain only how the proxy based mock frameworks like Mockito works. Because it's quite easy to understand how this kind of mock frameworks work. The knowledge will probably help you to use those frameworks if you know how they work and where the limits are. So you will never try to do anything which is technically impossible.

What is a Proxy?

A proxy is just an object which will be used instead of the original object. If a method of the proxy object is called then the proxy object can decide what it will do with this call:
  • delegate it to the original object
  • handles the call itself
Proxies can be used to implement some kind of permission system. The proxy checks if the user is allowed to call the method and if the user doesn't have the permission then it throws an exception.

A proxy doesn't require an instance of an interface/class if the proxy handles all method invocations itself.  
Mockito.mock(Foo.class) is now easily explained. This code just creates a proxy object for the Foo class.

Limits of a Proxy

There are a few important restrictions to the proxies. It's not possible to:
  • intercept static method calls
  • intercept private method calls
  • intercept final method calls
  • build a proxy for a final class
If you want to understand these limitations then read my other blog entry about: Explanation how CGLIP proxies work
Another restriction is that you have always to create the proxy explicitly. So it's not possible to say, that all Foo instances, created with new Foo() should be automatically wrapped into a proxy object. With PowerMock such things are possible. 

How to create a Proxy

If you look into the Java API you will find the java.lang.reflect.Proxy class. 
/**
* Returns an instance of a proxy class for the specified interfaces
* that dispatches method invocations to the specified invocation
* handler.
*
* @param   loader the class loader to define the proxy class
* @param   interfaces the list of interfaces for the proxy class to implement
* @param   h the invocation handler to dispatch method invocations to
Proxy.newProxyInstance(ClassLoader loader, Class<?>[] interfaces, InvocationHandler h)
It's quite simple to use. To showcase the usage of this proxy we just build our own simple mock framework - you find the complete source code at the end of the article. We create the static mock method which returns the proxy object.
public class Mock {
  public static <T> T mock(Class<T> clazz) {
    MockInvocationHandler invocationHandler = new MockInvocationHandler();
    T proxy = (T) Proxy.newProxyInstance(Mock.class.getClassLoader(), new Class[]{clazz}, invocationHandler);
    return proxy;
  }
}


This creates a proxy object for the clazz and redirects all calls to the MockInvocationHandler which looks like this:

private static class MockInvocationHandler implements InvocationHandler {
  @Override
  public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
    return null;
  }
}
So each call to a method of the proxy object will return null. At this stage we have an proxy object on which we can call methods. 

How to dynamically define the Proxy?

Now the question is how to setup the proxy to return something else than just null. We need to do something like: Mock.when(foo.echo("foo")).thenReturn("foo")

But how does this work? To understand how this work you have to analyze the one line of code very carefully and think about what it actually does. 

What does the when() method actually gets?

In the case of our example it would just gets a null object. In the first step we created the foo object which is a proxy which always return null. The when() method doesn't get the method echo() with a parameter. It's an common method call. It's a call to the proxy which will return null and pass the null value to the when() method. That's it. 

How the thenReturn() method works?

You have learned that the when() method just got null. So how can the thenReturn() call work? Think a few minutes about it. It's nothing special. It can be called at most "a little trick".

The solution is simple: with static variables in which the state is stored.
  • in the MockInvocationHandler we store the method and the arguments of the last call
  • in the Mock class - Mock.when(foo.echo("foo")).thenReturn("foo") - we store the reference to the MockInvocationHandler which was called last
These two steps happens in the invoke() method of the proxy object - in the MockInvocationHandler. The when() method doesn't have any logic. When thenReturn() is called then we store the return value for the stored (remembered) MockInvocationHandler with it's last method and arguments. If the proxy is called again then it will return the stored return value (if it's the same method gets called with the same parameter).

Basically that's it

I hope you could follow so far. I will summarize it again with other words
  • create a proxy
  • if a proxy method is called then remember which method was called. This proxy method call is normally located inside of the Mock.when() method - even if it has no relationship to the when() method.
  • if thenReturn(value) is called store the "value" to the stored/remembered method.
  • the proxy returns the "value" if the method is called again with the correct arguments.

The genius behind this is

The very simple API which makes the whole thing looks like very nice. Another great decision is that the when() method uses generics so that the thenReturn() method is type safe.

But what's about classes?

The current solution only works for interfaces. Because java.lang.reflect.Proxy only supports interfaces and not classes. So we need another mechanism to create the proxy. We have to dig a little be deeper and finally come to CGLIB - Code Generation Library. So we are back to magic. But it's by far not so dangerous like bytecode manipulation. We just use bytecode generation to create the proxy which shouldn't fail. In fact many tools are using CGLIB (e.g. Spring, Hibernate).

To create a proxy with CGLIB isn't any more difficult than with the Java`s Proxy class.
public static <T> T mock(Class<T> clazz) {
  Enhancer enhancer = new Enhancer();
  enhancer.setSuperclass(clazz);
  enhancer.setCallback(new MockMethodInterceptor());
  return (T) enhancer.create();
}

private static class MockMethodInterceptor {
  @Override
  public Object intercept(Object obj, Method method, Object[] args, MethodProxy proxy) throws Throwable {
    return null;
  }
}
That's it. Everything else stays the same.
With this kind of proxy you can easily build something like
Foo foo = new Foo();
foo = Mockito.spy(foo);
Mockito.when(foo.echo("foo")).thenReturn("bar");
You can create a proxy for a real object. All calls will be delegated to the real object expect if the method call is redefined. With CGLIB this is very easy to build. If you want to know more just take a look at the source code. 

Source Code

A simple mock framework to demonstrate how the proxy based mock frameworks work. With two implementations:
  • based on the java.lang.reflect.Proxy.
  • based on CGLIB. The CGLIB mock also implements the spy method.
Checkout the code from https://github.com/rseiler/concept-of-proxy-based-mock-frameworks