Classworking toolkit: Generics with ASMFind out how to access generic type information from Java 5 code using the ASM bytecode framework |
|
Level: Introductory
Dennis Sosnoski (dms@sosnoski.com), Java and XML consultant, Sosnoski Software Solutions Inc.
07 Feb 2006
Java™ 5 generics provide information that's useful for many classworking applications. Although Java reflection can be used to get generics information for loaded classes, the requirement that classes be loaded into the JVM can be a major drawback. In this article, classworking guru Dennis Sosnoski shows how the ASM Java bytecode manipulation framework offers flexible access to generics information without going through the Java classloading process. Along the way, he looks deeper into the representation of generics in the binary class format.<script language="JavaScript" type="text/javascript"> </script>
Generics information from Java 5 programs can be very helpful in understanding program data structures. Last time, I showed how you can use runtime reflection to access generics information. This reflection approach works well if you're only interested in getting information from classes you're loading into the JVM. However, sometimes you may want to modify classes before loading them, or you might want to just investigate data structures without loading the classes at all. In these cases, reflection won't work for you -- reflection uses the JVM's class structures as the source of information, so it can only work with classes that have been loaded by the JVM.
To access generics information without loading classes into the JVM, you need a way of reading the generics information stored inside the binary class representation. In some prior articles, I've shown how the ASM classworking library provides a very clean interface for reading and writing binary classes. In this article, I'll show how you can use ASM both to retrieve the raw generics information out of class files and to interpret the generics in a useful manner. Before digging into the ASM details, I'll start off with a look at how generics information is actually encoded into the binary classes.
The designers of the generics specification needed to add typing information to Java binary classes that could be used by the Java compiler. Fortunately, the Java platform already had a mechanism built into the binary class format that could be used for this purpose. This mechanism is the attribute structure, which basically allows all kinds of information to be associated with a class itself or with the methods, fields, and other components of a class. Certain kinds of attribute information are defined by the JVM specification, but the original designers of the Java language made the wise choice to leave the set of possible attributes open for extension both by later versions of the specification and by users designing their own custom attributes.
Generics information is stored in a new standard attribute: the signature attribute. This attribute is a simple text value that encodes the generics information for a class, field, method, or variable. The updated Java 5 JVM specification (see Resources for a link to the Java 5 changes page) spells out the full syntax of the signature text values. I'm not going to try to cover all that here, but I'll run through a quick introduction to signatures later in this section. First though, I'll give some necessary background with a look at the internal form of class names and the field and method descriptors used by the JVM.
Classes in the Java platform are always from some package. When you reference a class name in Java source code, you may or may not actually include the package qualification as part of the name. You're always allowed to include the package qualification (as in java.lang.String
), but you can drop it as a convenience if the class is from the java.lang
package or has been import
ed into the source file. The form of the class name that includes the package qualification is called the "fully qualified" class name.
Inside the actual binary class, class names are always specified with a package. The format of the names is a little different from the fully qualified class names used in Java source code, though, using forward slashes ('/') in place of periods ('.'). For example, in the case of the String
class, the internal form of the name is java/lang/String
. If you try to print or view a class file as text, you'll generally see many strings of this type, each a reference to some class.
Class references in this internal form are used as part of field and method descriptors. A field descriptor specifies the exact type of a field defined within a class. The representation used depends on whether the field is a simple object type, a simple primitive type, or an array type. For simple object types, the representation uses a leading 'L
', followed by the internal form of the object class name, and terminated by a trailing ';
'. For primitive types, the representation uses a single letter code for each type (such as 'I
' for an int
and 'Z
' for a boolean). For array types, the representation adds a leading '[
' as a modifier to the field descriptor for the array item type (which can itself be an array type). Table 1 gives some samples for each type of field descriptor, along with the equivalent Java source code declaration:
Table 1. Field descriptor examples
Descriptor | Source Code |
Ljava/lang/String; | String |
I | int |
[Ljava/lang/Object; | Object[] |
[Z | boolean[] |
[[Lcom/sosnoski/generics/FileInfo; | com.sosnoski.generics.FileInfo[][] |
A method descriptor just combines field descriptors to specify the parameter types and return type of a method. The format for a method descriptor is easy to understand. It always starts with an open parenthesis, followed by the field descriptors for the parameters (all run together), followed by a close parenthesis, and ends with the return type (or 'V
' if the return type is void
). Table 2 gives a few examples of method descriptors, along with the equivalent Java source code declaration (note that the method names and parameter names are not part of the method descriptor, so I've just used placeholder names for these):
Table 2. Method descriptor examples
Descriptor | Source Code |
(Ljava/lang/String;)I | int mmm(String x) |
(ILjava/lang/String;)V | void mmm(int x, String y) |
(I)Ljava/lang/String; | String mmm(int x) |
(Ljava/lang/String;)[C | char[] mmm(String x) |
(ILjava/lang/String;[[Lcom/sosnoski/generics/FileInfo;)V | void mmm(int x, String y, FileInfo[][] z) |
Now that you've seen field and method descriptors, you're ready to hear about signatures. The signature format extends the idea of field and method descriptors to include generic type information. Unfortunately, the complexity of generic types (including all the possible variations of bounds and such) means that signatures cannot be described as simply as the descriptors. The grammar for signatures (supplied in the JVM specification changes for Java 1.5, chapter 4) includes 21 separate productions. Rather than go through the whole set, I'll just provide some examples for now, which I'll expand on in the next section.
Listing 1 shows portions of the source code from one of the data structure classes used in the last article, along with the corresponding signature strings. In this case, the class is not itself a parameterized type, but the fields and methods use parameterized java.util.List
s:
Listing 1. Simple signature example
|
Because the class is not a parameterized type, no signature is added to the binary class representation for the class itself. However, signatures are present for both the fields and methods that use parameterized types. The m_files
field signature identifies it as a List
of type FileInfo
, while the m_directories
signature says it's a List
of type DirInfo
. Likewise, the getDirectories()
method signature says it returns a List
of type DirInfo
, while the getFiles()
signature says it's a List
of type FileInfo
.
Looks easy so far, doesn't it? Now check out Listing 2, which gives a simple parameterized class definition and the corresponding signature strings:
Listing 2. Parameterized class signature example
|
Because the Listing 2 class is a parameterized type, the class signature needs to be present in the binary class representation. The text of the signature is long compared with the source code, but not too difficult to understand when you realize that all the optional components of a type parameter that are left out in the source code are included in the signature. The first part of the signature (within the angle brackets '<...>
') is just the list of type parameter definitions for the class. These each take the form of a type parameter name followed by the field descriptors for the class bound and interface bounds (if any) of the type. Each field descriptor is preceded by a ':
' character. Because the Listing 2 source code doesn't specify any bounds for the class type parameters, the only bound present for each is the default class bound of java.lang.Object
.
The second part of the class signature (following the closing angle bracket) gives the superclass and superinterface (if any) signatures. In the Listing 2 case, no superclass is specified, so the signature gives the superclass as just java.lang.Object
. A superinterface is specified, as Iterable<T>
. It shows up in the signature pretty much as you'd expect to see, except that where the source code has just "<T>
," the signature uses "<TT;>
." The reason is that the signature needs to distinguish between class names and type variable names; the leading "T
" identifies what follows as a type variable name, while the trailing ';
' just marks the end of the name.
The field and method signatures from Listing 2 make use of the same type of variable format as seen in the superinterface signature, but aside from that, they don't show anything new.
|
As I've explained in some earlier articles in this series (see Resources for links), ASM uses a visitor approach to working with binary class representations. This visitor approach is bidirectional: You can parse an existing class, resulting in a sequence of calls to your handler visitor methods for the components of the class, or you can make the same sort of sequence of calls to the visitor methods of a class writer to generate a binary class representation. This parser/writer symmetry makes ASM especially convenient for situations where you're only modifying certain aspects of a class -- you can base your handler for the class parser events on a class writer, only overriding the base writer handling for the events you want to change. Both the parser (or reader) and writer are also very useable as stand-alone components
ASM 2.X provides full support for Java 5 JVM changes, including reading and writing signatures. The basic handling of signatures is automatic, using values passed directly to the appropriate visitor methods. In addition, ASM 2.X also adds support for parsing the (sometimes complex) signature string encoding to interpret the details of the signature. In holding with the basic ASM philosophy, the same interface can also be used for a writer to generate signature strings on demand. In this section, I'll show how ASM handles both the basic signatures as-a-text-blob and the detailed parse.
The signature-as-a-text-blob handling in ASM is built directly into the basic class, field, and method visitor calls. Listing 3 shows the relevant methods from the org.objectweb.asm.ClassVisitor
interface:
Listing 3. Class, field, and method visitor methods
|
Each of the visitor methods shown takes a signature string as a parameter. If the corresponding class, field, or method is not generic, a null
value is passed when calling the method.
Listing 4 shows the signature-related methods in action. Here I've implemented a visitor class using the org.objectweb.asm.commons.EmptyVisitor
class as a base, so that I only need to override the methods I want to use. The supplied method implementations just print out the signature information for the class as a whole and the descriptor and signature information for each field and method seen in the class. The bottom of Listing 4 shows the output generated when this visitor is used with the full version of the Listing 1 DirInfo
class:
Listing 4. Signature-related method in action
|
Besides working with signatures as strings, ASM also supports working with signatures at the detail level. The org.objectweb.asm.signature.SignatureReader
class parses a signature string, generating a sequence of calls to an org.objectweb.asm.signature.SignatureVisitor
interface. The org.objectweb.asm.signature.SignatureWriter
class implements the visitor interface, building up a signature string from a sequence of visitor method calls.
The detailed level interface is unfortunately somewhat complex, but that's because of the complexity of the signature definitions rather than any poor handling in the ASM code. The SignatureVisitor
interface shows this complexity, defining 16 separate method calls that may be involved in processing a signature. Of course, most signatures will only use a small portion of these methods.
To illustrate the ASM detailed signature handling, I'll show the methods called by parsing some of the signatures discussed earlier in this article. Listing 5 gives a partial listing of the TraceSignatureVisitor
class I wrote for this purpose, along with an AnalyzeSignaturesVisitor
to drive the signature processing. When an instance of AnalyzeSignaturesVisitor
is used as the visitor for a class, it creates a SignatureReader
for each signature found, passing an instance of the TraceSignatureVisitor
class as the target for the signature component visitor calls. The SignatureReader
call used for parsing the signature depends on the form of the signature: For class and method signatures, the appropriate method is just accept()
; for field signatures, use the acceptType()
call.
Listing 5. Analyzing signatures
|
Listing 6 shows the output generated when the AnalyzeSignaturesVisitor
class is used to visit the DirInfo
class from Listing 1:
Listing 6. DirInfo code and signatures analysis
|
The first block of output lines in Listing 6 shows the visitor methods called in analyzing the m_files
signature, Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;
. The first method called is visitClassType("java/util/List")
, giving the base class for the field. Then visitTypeArgument("=")
says that an actual type is being supplied for the type parameter of the current class (java.util.List
), and visitClassType("com/sosnoski/generics/FileInfo")
says that the actual type is based on com.sosnoski.generics.FileInfo
. Finally, the first call to visitEnd()
closes the open FileInfo
class signature, and the second closes the open List
class signature.
As you might guess from looking at the sequence of visitor method calls, some of these calls effectively open a new context for an embedded type signature component. The methods in the SignatureVisitor
interface that return a SignatureVisitor
instance all have this effect. The interface instance returned by the method call (which may be the same as the instance being called, as in the Listing 5 code, but can be different) is then used for processing the embedded type signature. It's easy to change the Listing 5 code to show the nesting of child signatures with indenting, and the file download has the code with this change. Rather than go through the code in detail here, I'll just show what comes out. Listing 7 gives the (partial) generated output from the indenting version of the code when run on the Listing 2 PairCollection
parameterized class:
Listing 7. PairCollection code and signatures analysis
|
The Listing 7 output shows how nested type definitions are used within the parsed signatures. In the case of the class signature, the nesting even goes two levels deep -- the class signature includes an interface signature that the class must implement, and the interface signature includes a type argument signature (which is just the type variable "T
" in this case).
|
Going further with ASM generics
In this article, I've gone through the basics of how generics information is stored in the binary class representation and how it can be accessed using ASM. Next month, I'm going to finish my coverage of generics with a recursive data structure analyzer built around ASM. Starting from an initial class, the analyzer chains through all referenced classes, handling substitution of generic types as it goes. The end result is a data structure that reflects all the information you can deduce through the use of generics.
|
Description | Name | Size | Download method |
---|---|---|---|
Source code | j-cwt02076-source.zip | 11KB | FTP |
Information about download methods | Get Adobe® Reader® |
|
Learn
- "Introduction to generic types in JDK 5.0" (developerWorks, 2004): Find out the basics of generic types in this tutorial by Brian Goetz.
- "Generics in the Java Programming Language" (Sun Microsystems, 2004): A tutorial introduction to generics by Gilad Bracha, the principal architect of generic type support in the Java language.
- "Java theory and practice: Generic gotchas" (developerWorks, 2005): Brian Goetz gives practical tips and insights on working with generics.
- "Classworking toolkit: ASM classworking" (developerWorks, 2005): Dennis Sosnoski compares classworking with ASM to other bytecode libraries.
- "Classworking toolkit: Annotations with ASM" (developerWorks, 2005): Dennis Sosnoski demonstrates using annotations to trigger on-load code modifications with ASM.
- JVM Changes for JDK 1.5 (Sun Microsystems, 2005): Gives the links for the JVM specification updates required by Java 5 (about halfway down the page). See the chapter 4 changes for details of the signature attribute.
- Classworking toolkit: Check out the other columns in the series by Dennis Sosnoski.
- Java programming dynamics: Series author Dennis Sosnoski takes you on a tour of the Java class structure, reflection, and classworking.
- The Java technology zone: Find articles about every aspect of Java programming.
Get products and technologies
- ASM: Get the ASM Java bytecode manipulation framework.
Discuss
- developerWorks blogs: Get involved in the developerWorks community.