Messing about with Clang

I’ve been reading the Smalltalk-80 blue book (pdf) recently, and started to wonder what a Smalltalk style object browser for Objective-C would look like. Not just from the perspective of presenting the information that makes up Objective-C classes in novel ways (though this is something I’ve discussed with Saul Mora at great length in the past). What would an object browser in which the compiler is an object, so you can define and manipulate classes in real time, look like?

Well, the first thing you’d need to do is to turn the compiler into an object. I decided to see whether I could see what the compiler sees, using the clang compiler front-end library.

Wait, clang library? Clang’s a command-line tool, isn’t it? Well yes, but it and the entire of LLVM are implemented as a collection of reusable C++ classes. Clang then has a stable C interface wrapping the C++, and this is what I used to produce this browser app. This isn’t the browser I intend to write, this is the one I threw away to learn about the technology.

Objective-Browser

Clang is a stream parser, and there are two ways to deal with source files just like any other stream: event-driven[*], in which you let the parser go and get callbacks from it when it sees interesting things, or document-based[*] where you let the parser build up a document object model (a tree, in this case) which you then visit the nodes of to learn about the data.

[*] Computer scientists probably call these things something else.

Being perverse, I’m going to use the event-driven parser to build a parallel data model in Objective-C. First, I need to adapt the clang library to Objective-C, so that the compiler is an Objective-C object. Here’s my parser interface:

#import <Foundation/Foundation.h>

@protocol FZAClassParserDelegate;

@interface FZAClassParser : NSObject

@property (weak, nonatomic) id <FZAClassParserDelegate>delegate;

- (id)initWithSourceFile: (NSString *)implementation;
- (void)parse;

@end

The -parse method is the one that’s interesting (I presume…) so we’ll dive into that. It actually farms the real work out to an operation queue:

#import <clang-c/Index.h>

//...

- (void)parse {
    __weak id parser = self;
    [queue addOperationWithBlock: ^{ [parser realParse]; }];
}

- (void)realParse {
#pragma warning Pass errors back to the app
    @autoreleasepool {
        CXIndex index = clang_createIndex(1, 1);
        if (!index) {
            NSLog(@"fail: couldn't create index");
            return;
        }
        CXTranslationUnit translationUnit = clang_parseTranslationUnit(index, [sourceFile fileSystemRepresentation], NULL, 0, NULL, 0, CXTranslationUnit_None);
        if (!translationUnit) {
            NSLog(@"fail: couldn't parse translation unit);
            return;
        }
        CXIndexAction action = clang_IndexAction_create(index);

That’s the setup code, which gets clang ready to start reading through the file. Which is done in this function:

        int indexResult = clang_indexTranslationUnit(action,
                                                     (__bridge CXClientData)self,
                                                     &indexerCallbacks,
                                                     sizeof(indexerCallbacks),
                                                     CXIndexOpt_SuppressWarnings,
                                                     translationUnit);

This is the important part. Being a C callback API, clang takes a context pointer which is the second argument: in this case, the parser object. It also takes a collection of callback pointers, which I’ll show next after just showing that the objects created in this method need cleaning up.

        clang_IndexAction_dispose(action);
        clang_disposeTranslationUnit(translationUnit);
        clang_disposeIndex(index);
        (void) indexResult;
    }
}

There’s a structure called IndexCallbacks defined in Index.h, this class’s structure contains functions that call through to methods on the parser’s delegate:

int abortQuery(CXClientData client_data, void *reserved);
void diagnostic(CXClientData client_data,
                CXDiagnosticSet diagnostic_set, void *reserved);
CXIdxClientFile enteredMainFile(CXClientData client_data,
                                CXFile mainFile, void *reserved);
CXIdxClientFile ppIncludedFile(CXClientData client_data,
                               const CXIdxIncludedFileInfo *included_file);
CXIdxClientASTFile importedASTFile(CXClientData client_data,
                                   const CXIdxImportedASTFileInfo *imported_ast);
CXIdxClientContainer startedTranslationUnit(CXClientData client_data,
                                            void *reserved);
void indexDeclaration(CXClientData client_data,
                      const CXIdxDeclInfo *declaration);
void indexEntityReference(CXClientData client_data,
                          const CXIdxEntityRefInfo *entity_reference);

static IndexerCallbacks indexerCallbacks = {
    .abortQuery = abortQuery,
    .diagnostic = diagnostic,
    .enteredMainFile = enteredMainFile,
    .ppIncludedFile = ppIncludedFile,
    .importedASTFile = importedASTFile,
    .startedTranslationUnit = startedTranslationUnit,
    .indexDeclaration = indexDeclaration,
    .indexEntityReference = indexEntityReference
};

int abortQuery(CXClientData client_data, void *reserved) {
    @autoreleasepool {
        FZAClassParser *parser = (__bridge FZAClassParser *)client_data;
        if ([parser.delegate respondsToSelector: @selector(classParserShouldAbort:)]) {
            return [parser.delegate classParserShouldAbort: parser];
        }
        return 0;
    }
}

// …

Internally clang creates its own threads, so the callback functions wrap delegate messages in @autoreleasepool so that the delegate doesn’t have to worry about this.

The delegate still needs to understand clang data structures of course, this is where the real work is done. Here’s the delegate that’s used to build the data model used in the browser app:

#import <Foundation/Foundation.h>
#import "FZAClassParserDelegate.h"

@class FZAClassGroup;

@interface FZAModelBuildingParserDelegate : NSObject <FZAClassParserDelegate>

- (id)initWithClassGroup: (FZAClassGroup *)classGroup;

@end

The FZAClassGroup class is just somewhere to put all the data collected by parsing the file: in a real IDE, this might represent a project, a translation unit, a framework or something else. Anyway, it has a collection of classes. The parser adds classes to that collection, and methods and properties to those classes:

@implementation FZAModelBuildingParserDelegate {
    FZAClassGroup *group;
    FZAClassDefinition *currentClass;
}

- (id)initWithClassGroup:(FZAClassGroup *)classGroup {
    if ((self = [super init])) {
        group = classGroup;
    }
    return self;
}

- (void)classParser:(FZAClassParser *)parser foundDeclaration:(CXIdxDeclInfo const *)declaration {
    const char * const name = declaration->entityInfo->name;
    if (name == NULL) return; //not much we could do anyway.
    NSString *declarationName = [NSString stringWithUTF8String: name];

We’ve now got a named declaration, but a declaration of what?

    switch (declaration->entityInfo->kind) {
        case CXIdxEntity_ObjCProtocol:
        {
            currentClass = nil;
            break;
        }
        case CXIdxEntity_ObjCCategory:
        {
            const CXIdxObjCCategoryDeclInfo *categoryInfo = 
            clang_index_getObjCCategoryDeclInfo(declaration);
            NSString *className = [NSString stringWithUTF8String: categoryInfo->objcClass->name];
            FZAClassDefinition *classDefinition =[group classNamed: className];
            if (!classDefinition) {
                classDefinition = [[FZAClassDefinition alloc] init];
                classDefinition.name = className;
                [group insertObject: classDefinition inClassesAtIndex: [group countOfClasses]];
            }
            currentClass = classDefinition;
            break;
        }
        case CXIdxEntity_ObjCClass:
        {
            FZAClassDefinition *classDefinition =[group classNamed: declarationName];
            if (!classDefinition) {
                classDefinition = [[FZAClassDefinition alloc] init];
                classDefinition.name = declarationName;
                [group insertObject: classDefinition inClassesAtIndex: [group countOfClasses]];
            }
            currentClass = classDefinition;
            break;
        }

I’m ignoring protocols, but recognising that methods declared in a protocol shouldn’t go onto any particular class. Similarly, I’m adding methods found in categories to the class on which that category is defined: real Smalltalk browsers keep the categories, but for this prototype I decided to skip them. I’m using the fact that this is a prototype to justify having left the duplicate code in place, above :-S.

So now we know what class we’re looking at, we can start looking for methods or properties defined on that class:

        case CXIdxEntity_ObjCClassMethod:
        case CXIdxEntity_ObjCInstanceMethod:
        {
            FZAMethodDefinition *method = [[FZAMethodDefinition alloc] init];
            method.selector = declarationName;
            if (declaration->entityInfo->kind == CXIdxEntity_ObjCClassMethod)
                method.type = FZAMethodClass;
            else
                method.type = FZAMethodInstance;
            [currentClass insertObject: method inMethodsAtIndex: [currentClass countOfMethods]];
            break;
        }
        case CXIdxEntity_ObjCProperty:
        {
            FZAPropertyDefinition *property = [[FZAPropertyDefinition alloc] init];
            property.title = declarationName;
            [currentClass insertObject: property inPropertiesAtIndex: [currentClass countOfProperties]];
            break;
        }
        default:
            break;
    }
}

And that’s “it”. The result of collecting all of these callbacks is a tree:

ClassGroup -> Class -> [Method, Property]

I define a tree-ish interface for all of these classes, by adding categories that define the same methods:

@interface FZAMethodDefinition (TreeSupport)

- (NSInteger)countOfChildren;
- (NSString *)name;
- (id)childAtIndex: (NSInteger)index;
- (BOOL)isExpandable;

@end

@implementation FZAMethodDefinition (TreeSupport)

- (NSInteger)countOfChildren {
    return 0;
}

- (BOOL)isExpandable {
    return NO;
}

- (id)childAtIndex:(NSInteger)index {
    return nil;
}

- (NSString *)name {
    switch (self.type) {
        case FZAMethodClass:
            return [@"+" stringByAppendingString: self.selector];
            break;
        case FZAMethodInstance:
            return [@"-" stringByAppendingString: self.selector];
            break;
        default:
            return [@"?" stringByAppendingString: self.selector];
            break;
    }
}

@end

And, well, that’s it. libClang could be the kernel of a thousand visualizers, browsers and editors for C-derived languages, the start of one is outlined above.

About Graham

I make it faster and easier for you to create high-quality code.
This entry was posted in code-level, Mac, software-engineering, tool-support. Bookmark the permalink.

5 Responses to Messing about with Clang

  1. Josh says:

    Can you give a link to where to get the clang compiler front-end library as from what I’ve seen the website only details the command-line tool.

    Also an example project or full code snippets would be very helpfully to people not used to using C functions!

    Thanks for the great post!

  2. Karsten says:

    It would be even better if an object-browser like in current Smalltalk Implementation actually existed and wouldn’t depend on .m files anymore. Then it could instead generate .m files itself from the classes and then feed these files to the compiler. That would also simplify the Compiler a lot as its scope would only be on methods and not on hugish c-files.

    Karsten

  3. Graham says:

    Hi Josh, an example project will need to wait until the code is production-ready ;-). You can see the Doxygen documentation to the C interface here: http://clang.llvm.org/doxygen/group__CINDEX.html. It’s already on your Mac (assuming you’re on a Mac), you just need to add libclang.dylib to your project. Alternatively if you build clang from source you’ll get an up to date version of the library.

  4. Josh says:

    Thanks Graham!

  5. Alexis Gallagher says:

    F-Script is a Smalltalk-like scripting language built on the Objective-C runtime, and it provides a Smalltalk-style class browser, object browser, as well as code injection and introspection facilities for running Cocoa apps:

    http://www.fscript.org/

    Nutron provides something similar based on Nu, a lisp-style wrapper over Objective-C:

    http://itfrombit.github.com/nutron/

Comments are closed.