Monday, September 19, 2011

Syntactic Tartness for Macro Expansion

One thing that is very natural for Smalltalk image based programming, is to programmatically assemble source and install it into the very same running program. I've been using the RB Change framework to do just that with something I'm working on lately.

To piece together the appropriate source, working with a template source and string, and then fill in the variables is something that's desirable. VisualWorks and Squeak (maybe some other Smalltalk use this same approach?) have an ability to expand templates strings with macro substitution. The template for a setter method might look something like

'<1s>: anObject
<1s> := anObject'

To fill out those fields, you send messages like expandMacrosWith:, expandMacrosWith:with:, expandMacrosWith:with:with:, and expandMacrosWithArguments:.

I am no fan of this API. First, it is too verbose. When I'm looking at template substitution, I don't want a bunch of other longish selectors. They dilute the information I'm trying to glean as I piece together the template and what's being substituted.

Secondly, I find it doesn't scale well when evolving the code. It's common that I start with a simple template in the first cut of code. Something like
    'Hello <1s>' expandMacrosWith: aName

But as I refactor and discover more needs, the need to add parameters arises. As long as I only add two more parameters, I can just use the variants with the additional with: keywords.
'<1s> ^self <2s> <3s>'
expandMacrosWith: aVariableName
with: aBasicAccessingMethod
with: sizeof + 1

As soon, as I go to 4 fields though, I have to change gears and use the expandMacrosWithArguments: API and build the sequence myself. It could be argued that it's best to just always start with this version, but it's the very longest of the selectors. If you're using VisualWorks, and don't have the language syntax for array construction (e.g. {statement. statement. statement.}), then it's even funner, because you can use the Array with:with:with:with: expression, but if you need to move to 5 fields, then you've got to change your code all around again.

Over the years, I've tried a couple of different experiments to make this all something I liked a little better. They've used involved interesting binary selectors (e.g. "%") and proxy objects, or at least fun with doesNotUnderstand: messages. I thought I'd try something a little different. I wanted something that correlated well with the numbered fields, but was uber-terse as well.

So I went with the shortest selector that could possibly work: _1:, _1:_2:, _1:_2:_3:, etc. Written in selector shorthand like that, it's pretty ugly. When actually used in code though, it improves some:
'<1s>At: anOffset
^self <2s> anOffset * <3p> + <4p>'
_1: aVariableName
_2: aBasicAccessingMethod
_3: aByteSize
_4: sizeof + 1

and
'inspector<1s>Field
%<inspectorFields>
^Array with: (Tools.Trippy.DerivedAttribute
label: ''<2s>''
valueBlock: [self <2s>])'
_1: upperVariableName
_2: aVariableName

and
'<1s>
^(0 to: <2p>) collect: [:n | self <3s> <4p> + (n * <5p>)]'
_1: aVariableName
_2: anInteger - 1
_3: aBasicAccessingMethod
_4: sizeof + 1
_5: aByteSize

I don't dare call this syntactic sugar. The use of the underscores is too ugly to be sugary. It's a sort of bitter sweet thing, thus the "tart" label.

It does solve two problems nicely. It is very terse. You see a minimal amount of "scaffolding" getting the job done, and are free to spend more time looking at the template and the substitutions. One could get that though by using inlined arrays with a shorter selector, something like:
'<1s>
^(0 to: <2p>) collect: [:n | self <3s> <4p> + (n * <5p>)]' macro: {
aVariableName.
anInteger - 1.
aBasicAccessingMethod.
sizeof + 1.
aByteSize.}

One thing you lose with this though, is the strong association between each substitution and field. In the former example, when I glance at the template and see field 4, and then want to know what's being substituted, I see it instantly. I just find the 4 in the selector below. Without that direct link, I have to parse the array linearly to find it.

I haven't used this _syntax enough to decide if it's usefulness would overcome its ugliness, but it was interesting to play with it and discover the visual readability aspect. I'll likely use it on some more non-production stuff to get a better feel.

3 comments:

  1. I think the key for solving this problem is to get rid of the out-of-context expansion. That is, instead of using an array with indices you directly inline the values into the string using some special construct:

    '{aVariableName} ^ self {aBasicAccessingMethod} {sizeof + 1}' expandMacros

    '{aVariableName} ^ (0 to: {anInteger - 1}) collect: [ :n | self {aBasicAccessingMethod} {sizeof + 1} + (n * {aByteSize})]' expandMacros

    #expandMacros can be trivially implmenented by evaluating the expressions inside the string at run-time in the definition context (http://www.squeaksource.com/evaluablestrings.html); or by compiling the strings at compile-time. The latter has the advantage that it is very efficient, that it is checked for syntax errors at compile-time, and that the tools (browser, debugger, senders, references) continues to work. In fact, the example CUInterpolateExample in Helvetia does exactly that (http://bit.ly/helvetia-download).

    ReplyDelete
  2. Yes Lukas, I should have preambled up front, that we'll assume we're going the template route here. I like string interpolation, but I think there's devils in the details. At least, in the context of I18n. Which is a bit disingenious of me to argue here, since my above use cases are about string generation, not I18n. I'm kind of torn personally on it. There are times, when I personaly like the template approach better. And there are times when I like the interpolations better. For example, in the above, I find the (0 to: {anInteger - 1}) collect: a bit misleading at first glance. For any expression that grows much larger than a simple variable name, or is easily confused with the literal parts of the string, the separation of presentation and state are harder to discern.

    ReplyDelete
  3. You could only allow variables in your template and use a DSL like

    '{variableName} ^ self {accessingMethod} {size}' with
    variableName: aVariableName;
    accessingMethod: aBasicAccessingMethod;
    size: sizeof + 1;
    expand

    for expansion. I think this greatly improves readability. I would try to avoid indices at all cost.

    In the particular example of code generation I would try to avoid using strings altogether: see Section 4.3.1, 4.3.2, and 4.3.3 in http://scg.unibe.ch/archive/papers/Reng09bLanguageShootout.pdf with increasing order of preference :-)

    ReplyDelete