Jun 29, 2014

Another sed surprise

I found myself in a situation where I wanted to playback recently downloaded videos from YouTube play list. Because the file names had space characters in them, a normal invocation of xargs wouldn't have helped. Instead I'd have to somehow separate the file names with null characters, and use xargs -0 to ask it to split input at null characters instead of whitespace.

The plan was to get the list of latest downloaded files and substitute new-line characters with null character to form a single string. As usual, I turned to sed and composed this stress-free one-liner:

ls -r | tail -n 10 | sed '${s/\n/\x00/g};N' | xargs -0 mplayer

The intention was to keep appending the input lines to the pattern space, and substitute all new-line characters with null character and print the output. Of course I expected this to work, but instead, I could see mplayer playing only the latest file downloaded! This meant that sed was emitting only the last of the input lines, which was a bit surprising.

After many tries, I realized that I took sed's processing mechanism for granted yet again. The way sed works is that it starts with empty pattern space and execute the entire program on it. When the program finishes, it reads the next line from input and restarts the cycle. This meant that the pattern space is cleared before the next input line is processed. I.e. the N command at the end of the program did append the current line to the pattern space, but it was cleared immediately as the cycle restarted! As a result, by the time it got to the substitution, only the last line remained in the pattern space.

What I had to do instead was to prevent the cycle from restarting by adding a loop:

ls -r | tail -n 10 | sed ':a;${s/\n/\x00/g};N;b a' | xargs -0 mplayer

Here, I define a label a at the beginning of the program. After appending current line to the pattern space, I jump to the label instead of letting the cycle restart. This ensures that the pattern space has all the lines appended by the time we get to the substitution. The substitution then replaces all new-line characters with null characters; and xargs gives mplayer all the files I intended.

Another lesson learned was to use tr instead, which requires far less key strokes:

lr | tail -n 10 | tr '\n' '\0' | xargs -0 mplayer
For me, tr is proving more and more useful for carrying trivial substitution like this. Maybe I should give it the attention it deserves.

Jun 8, 2014

Conditional assignment in GNU make

GNU Make provides a feature to conditionally assign variables using ?= operator. I.e. the operator assigns the variable the given value only if it's not defined yet. This is commonly (although mistakenly, as we'll see) used to accommodate user's preference to pass Make variables through the command line. For that purpose, it's not uncommon to find conditional assignments in top-level Makefiles, beaming author's generosity.

# User is the king
BOSS ?= me

sandwich:
 @echo Made sandwich for $(BOSS)

The intention is quite clear: If the user says they're boss, let them be; otherwise I'm the boss. The author then expects the Makefile to be used as:

$ make BOSS=$USER sandwich

And this prints:

Made sandwich for jeenu

So this works as expected; but for a different reason!

If one were to skim the Make documentation to discover the ?= operator and what it does, they'd be left with the first impression that they have to use this operator in Makefiles to let the user have a say. Actually that's not the case. The subtlety is hidden further deep in the documentation, with respect to variable overriding.

What happens is that when a user assigns value of a Make variable from the command line (through arguments of the form of X=Y), that assignment overrides all assignments from within the Makefile. It's worth noting that assignment at the command line isn't the only override that user can specify; users can also specify the command line option -e so that environment variables of the same name overrides those in Makefile. When either method of override is employed, no matter what kind of operator is used, normal assignments to the variable from within Makefile has no effect, and are ignored! In fact, the documentation is quite clear on the behaviour of ?= operator, which is the same as:

ifeq ($(origin FOO), undefined)
  FOO = bar
endif

This means that the assignment is seen by Make only if it's undefined. When the variable is assigned to from the command line, or is inherited from the environment, it's no more undefined. In effect, the Makefile looks as if all assignments to the that variable are removed! To revisit the first example above, it worked not because ?= operator was used, but because the override that happens regardless. In fact, even normal assignment using := operator would have sufficed for assigning a default value.

So what then is the use for ?= operator? Well, it still serves the purpose of conditional assignment, for example, in sub-Makefiles included from the top-level one for cooperative usage. But the author still should bear in mind that any assignment the user makes through the command line or environment would make all those in the Makefile futile.

It's a bit dull to have user assignments override authors intention. However, it's not the end of the world. GNU Make provides the override directive, so that the author can have the final laugh. Assignments done with the override directive ignores anything that comes from outside the Makefile. I.e. all assignments in the Makefile are honoured. In other words, with this directive, the author gets the final say as to what the variable's value is. One ought to use the override directive at the first instance of initialization.

override FOO = bar

One another nifty usage is to honor user assignment and append it to an existing variable than to overwrite it. Often it's useful for the user to specify additional C flags for compilation (optimization level, for example). Using override directive, the user's preference can be appended to the default C flags as below. Without the override directive, the user assignment completely replaces the C flags, thus breaking the build.

override CFLAGS := -g $(CFLAGS)

To summarize, user is free to override any variable in the Makefile through command line argument or environment inheritance. If the author absolutely can't let user override a variable, they should use the override directive with variable initialization. Conditional assignment alone isn't always the best solution for accommodating user preferences, and such usage should account for user overrides.

May 17, 2014

Boolean check in GNU make

I had found my way the other day to flip a boolean value in GNU make. One thing then was a given that the value being flipped was in fact a boolean. That is, a 0 or 1, and nothing else. After all, my solution there will work only if the value was boolean. So I'll have to verify that it's indeed the case.

As with the problem earlier, there are no make builtin that can do this alone. Shell, again, is an option, but I was inclined to get this done using bultins alone. So I decided take another stab at it:

define assert_boolean
$(and $(patsubst 0,,$($(1))),$(patsubst 1,,$($(1))),$(error $(1) must be boolean))
endef

my_boolean    := 0
not_a_boolean := foo

$(call assert_boolean,my_boolean)       # All is well
$(call assert_boolean,not_a_boolean)    # Flags error

It's a little involved, but I'll try to break it down.

The snippet above is written in canned sequence of commands using define directive. The define directive, although has semblance to a macro definition in C, actually declares a recursively expanding variable in make. This is the same kind of variable that you declare by assignment using the = (not :=) operator. The only practical and convenient distinction when defining using define is that the value of variable can span multiple lines, until it's terminated with endef. This type of variable has a property that it only stores its value verbatim and does not evaluate it in place. Evaluation of the variable only happens when it's referred, and in the referred context. That means value of the variable is taken in as as mere list of characters provided on the right hand side, and nothing, the $() dereferences in particular, on the right hand side gets evaluated at the point of declaration.

Then there's the call builtin. Simply put, what the call builtin does is to evaluate a variable, but with a twist: It lets us treat the variable as if it's a function (or a macro for that matter), to which we can pass arguments. Arguments passed to the variable are referred to by their position - like $(1), $(2) etc. Once called this way, all the call builtin does is do a parameter substitution in the variable's value and evaluates it. I.e. the $() indirections get evaluated. So $(call foo) is exactly same as $(foo), except that the former lets to pass arguments if we chose to.

The meat of the snippet has two patsubsts, each to substitute a single 0 or 1 in the given variable with an empty string. The and builtin does a logical AND of its arguments. So if the variable is a boolean, either of the patsubst will turn inputs to and false (empty), and the error won't fire. If, on the other hand, if it's not a boolean, neither of the patsubst will make an empty string, making all inputs to and true (non-empty), causing the error to complain.

Finally, we assert that the value stored in a variable is a boolean by calling assert_boolean variable (or function), and passing it the name of the variable whose value we want to assert, say, my_boolean. Once the call is made, all the $(1)s in assert_boolean are replaced by the string my_boolean. The patsubst builtins are in turn passed the value of my_boolean because of the $() indirection. And in the end, the error builtin will tell us if the value is not a boolean. Woof!

So, there. I have it!