[Configuration Management] [Building with Cook]

How make/cook works:
===================

1) Parse Makefile
   - Evaluate expressions
   - instantiate rules
   - conditional preprocessing - include makefile fragments
2) Build dependency graph
3) Walk dependency graph

Presenting cook:

1) Parse Makefile
   - Evaluate expressions
   - instantiate rules
   - instantiate functions
   - allow built-in function to trigger steps 2) and 3) while still doing 1).
   - conditional preprocessing - include makefile fragments which may be built on the fly.

2) Build dependency graph
   - evaluate conditions for selecting which rules to use

3) Walk dependency graph
   - allow pre-op and unconditional post-op code (i.e. code that is executed even when
     the rule's target is up to date)


Source Tree Anatomy:
===================

Classic "Gospel":
----------------

  src   <= may contain a make include file, say "master.mk".
   |    <= may contain a script/makefile to recursivly enter subdirectories
   |
   |____lib  <= may contain a script/makefile to recursivly enter subdirectories
   |     |
   |     |___a  <= contains a makefile to build liba.a, probably including "master.mk"
   |     |___b  <=  "        "        "         libb.a
   |     |___ ...
   |
   |____bin  <= may contain a script/makefile to recursivly enter subdirectories
         |
         |___p  <= contains a makefile to build program "p", probably including "master.mk"
         |___ ...


This approach probably evolved because people start out with some
small subset of the whole at the beginning and assemble it
later. People are generally fearsome of make, and don't want to touch
a makefile once it's working - it's easier to write a script to run
every makefile than to have a global make policy...

Drawbacks of this approach:

 o Make will end up parsing the same rules over and over again. This
   becomes particularly apparent when you actually have the
   centralized "master.mk"

 o Make will not know of dependencies between "projects". This will
   either lead to the habit of doing complete "clean" rebuilds very
   often or having to run the same build multiple times until
   everything is resolved, or of having to hardwire the traversal
   path into scripts... All of these methods fail to take advantage
   of the inherent capability of make to traverse dependency graphs.

Advantages: none, really, as we will see...

Non-Recursive make:
------------------

  src   <= Contains _the_ Makefile
   |
   |____lib  <= no script or makefile here
   |     |
   |     |___a  <= contains a fragment included by _the_ Makefile upstairs
   |     |___b  <=  "        "        "         "
   |     |___ ...
   |
   |____bin  <= no script or makefile here 
         |
         |___p  <= contains a fragment included by _the_ Makefile upstairs
         |___ ...

In this approach, all rules are written relative to the top of the
source tree. All files and dependencies are expressed relativly to the
top of the source tree.

Advantages:
 
 o The Makefile is parsed only once. The dependency graph is built
   once and traversed once. This alone can greatly reduce the time
   to do a complete build.

 o Make now sees the complete picture. This not only allows you to
   express the dependencies between components, but it also gives
   make a better choice for optimizing parallel builds.

 o It is now much more natural to encourage a global policy for the
   structure of the source tree. The local makefile fragements now
   only contain info that differentiates, and no common preamble or
   other "bureaucratic junk" is required. Ideally, no local fragments
   are needed at all...

Drawbacks:

 o Some care must be exercises in writing that top level
   makefile. Some make programs might not support the right
   constructs to deal with the issues - which is why I use
   "cook"... (see below)

Some Useful "cook" idioms
=========================

"cook" is a rewritten "make". The syntax has been cleaned up and a
couple of crucial features added:

Pattern rules in make:
----------------------

The typical pattern rule in make looks like this:

  %.o: %.c
         $(CC) -o $@ $< -c

In some makes, % can match a complete path, in others it can't. If you
need to decompose the path, you are forced to use complex macros, and
even then you rarely can get what you want.

Pattern rules in cook:
----------------------

The same rule in cook (assuming the source tree structure above),
looks somewhat like this:

  src/%1/%2/%.o: src/%1/%2/%.c
  {
     [cc] -o [target] src/%1/%2/%.c -c
  }

Note how I can decompose my path using multiple wildcards. Also note
how I can use the wildcard in the body of the rule. This decomposition
allows me, for example, to write rules where the derived objects are
not in the same directory as the sources, like this:

  src/%1/%2/DO/%.o: src/%1/%2/%.c
  {
     [cc] -o [target] src/%1/%2/%.c -c
  }

"cook" has a special wildcard (%0) which sort of works like the clearcase
"..." wildcard. I can therefore have a policy where the sources for a
particular binary are distributed over a subtree:

  src/%1/%2/DO/%0%.o: src/%1/%2/%0%.c
  {
     [cc] -o [target] src/%1/%2/%0%.c -c
  }

%0 is either the empty string, or a path terminated with a "/".

Variables and Namespace in make:
-------------------------------

In make, you can assign values to names via := or =. I do not know
which makes allow which set of characters to be part of a name, but I
would expect that "/" may not be in a name.

Depending on your make variant, the following may be legal, or maybe
not:

  $(OBJ_IN_$(DIR))

It is also not clear when the resolution takes place (which is why
there are two assignment operators), in particular when variables are
used in a rule like this:

  $(DIR)/libx.a: $(OBJ_IN_$(DIR))

Variables and Namespace in cook:
-------------------------------

In cook, any character may be used in a name. This allows you to use pathnames
in variable names, so that you can say things like:

obj_in_src/lib/a = src/lib/DO/a_1.o src/lib/DO/a_2.o ...;

and you can have a pattern rule like this:

src/lib/%/DO/lib%.a: [obj_in_src/lib/%]
{
  [ar] [target] [obj_in_src/lib/%];
}

Note how the right side of the rule is evaluated after the left side
is instantiated.

Building and Including Makefile Fragments:
-----------------------------------------

Most makes nowadays support an include statement. Unfortunately, only
very few makes (GNU make) know how to build an include file before
including it, and only cook gives you control over the timing of that
build.

GNUmake's (and cook's default) process for dealing with built include
files is as follows:


  repeat
    initialize (forgetting everything from any previous passes);
    parse makefile, including all fragments if they exist;
    build dependency graph;
    walk dependency graph for the included fragments;
  until no fragment needed to be rebuilt.
  build main dependency graph;
  walk main dependency graph;

The problem with this process is that it will include fragments
_before_ checking whether they are up to date. This may lead to failures
when the information in those fragments is incorrect, and may even
lead to build cycles...

Fortunately, "cook" has a function that can be used to trigger the
building and walking of the dependency graph. Therefore, you can say:

  #if [cook [fragments]]
  #include [fragments]
  #endif

Currently, the [cook ...] function just returns success or
failure. Soon, it will actually return the list of targets it was able
to build, so you can simply say:

  #include [cook [fragments]]

There actually is a neat trick you can use in "dumb" makes to achieve
a similar goal:

  all: $(FRAGMENTS)
       cat step2_preamble $(FRAGMENTS) > Makefile_step2
       $(MAKE) -f Makefile_step2

Recursive make isn't totally useless...

Auto-discovery - or - How to Avoid Writing Makefiles
===================================================

There is this gospel about how makefiles "are part of the source" and
"document" the dependencies in the source. This is nice, in theory. In
practise, nobody really wants to read makefiles and use them as the
source of wisdom. More often than not, having to duplicate information
which is readily available elsewhere (e.g. in the structure of the
source tree, in the #include's of the source files etc...) is tedious
and error prone.

Then there is the problem of differing styles and abilities between
developers. While some will write good makefiles, most will use a
"good enough" approach.

Finally, the most important dependencies, i.e. the global dependencies
between different developers often remain unwritten.

Six Steps to a Complete Dependency Graph:
----------------------------------------

This system assumes a C/C++ style code base. Most other languages end
up being simpler...

1) Build File Manifests
2) Determine the DO's from the file manifests
3) Include Fragments in subdirectories for customization
4) Collect #include dependencies
5) Collect library dependencies
6) Determine link lines

Step 1: Build File Manifests
----------------------------

This is by far the most important step. Once you know the complete
path to all your source files, you can compute that paths to your
DO's, and you can compute which set of DO's get assembled into a
particular library or executable.

The easiest way to get the file manifest is to run "find". This is,
unfortunately, also the slowest. I wished that "cleartool find" would
use some indexing scheme so that it could run faster, but nothing
really beats reading from a flat file - so let's use make-errr cook to
build these manifest files.

The basic idea is this:

  /* This is part of my main makefile */

  %0DO/Whatto.cook: %0.
  {
     [perl] generate_manifest.pl %0. > [target];
  }

  #include [cook DO/Whatto.cook]

  /* I now have a variable called "files_under_." to play with, e.g.: */
  cpp_src = [filter-out %0%.cpp [files_under_.]];
  ...

The generate_manifest.pl script will look into the directory passed in
as argument and create output that looks like this (the following is
an original, directly from my source tree)

  files_in_./ =
  /*** BEGIN FILES ***/
    'install'
    'release.notes'
    'Howto.cook'
  /*** END FILES ***/
    ;
   
  dirs_in_./ =
  /*** BEGIN DIRS ***/
    'src'
    'httpd'
    'Tests'
  /*** END DIRS ***/
    ;
   
  #if [count [dirs_in_./]]
  manifest_files_in_./ = [addsuffix /DO/dynamic/SunOS5.6/Whatto.cook [dirs_in_./]];
  #if [not [cook [manifest_files_in_./]]]
  fail;
  #endif
  #include [manifest_files_in_./]
  #endif
   
  /* collect the work done by the manifests included above. */
  files_under_./ = [files_in_./]
    [files_under_src/]
    [files_under_httpd/]
    [files_under_Tests/]
    ;
  dirs_under_./ = [dirs_in_./]
    [dirs_under_src/]
    [dirs_under_httpd/]
    [dirs_under_Tests/]
    ;

Note how it will recursivly descend the source tree, collecting all
the files and directory names on the way back up...

Note also that I actually use a 2 level DO tree structure, so that I
can support multiple variants and architectures in the same
tree. Actually, the DO subtree is used in two different ways:

  1) It provides the skeleton which will be populated by the DO's
  2) It provides a means for determining whether the directory should
     even be "seen" by the build system. Obviously, of the appropriate
     branch of the DO subtree doesn't exist, then nothing should be
     built there.

This allows me to fine-tune which directories a build will actually
traverse... Another way for me to determine this is simply by
restricting the tree from the command line - what the build doesn't
see doesn't get built - it's that easy, and here is how I actually did
it in my real life top level "Howto.cook" (i.e. makefile) file:

  /* you can restrict the vision of "cook" by saying:
   *
   *     cook top=<dir1>,<dir2>,<dir3>...
   */
  
  #ifndef top
  top = . ; /* if unspecified, start at the top of the source tree */
  #endif
  
  tops = [split , [top]];
  tops = [fromto %0%/ %0% [tops]]; /* remove trailing / */
  
  manifest_files = [addsuffix /[objdir]/Whatto.cook [tops]];
  
  /*--- manifest anywhere ----------------------------------------*/
  #include [build]/rules/manifest
  
  #if [print Checking Manifests]
  #endif
  #if [not [cook [manifest_files]]]
  fail;
  #endif
  
  #include [manifest_files]

Caveats:
-------

Manifests may end up being out of date. I haven't yet figured out the
exact dependencies I need to put into the rule to avoid this. One
problem is that some of the dependencies may only be known _after_ I
read the manifest - or I need to generate special manifest dependency
files...

It turns out that it is easier to write a script that will traverse
the source tree and analyze the manifest files, deleting those which
are incorrect - that is the reason for those /*** BEGIN... ***/
comments in the manifest example above.

This script needs to be run whenever people change the structure of
the DO trees, augmenting or reducing visibility...

Step 2: Determine DO pathnames:
------------------------------

This is really nothing more that an extension of your good ol'

  OBJ = $(SRC:.c=.o)  # or whatever your make accepts

I'll just dump an extract of my top level Howto.cook file here. You'll
get the idea (I hope):

  /*--- get the real source files --------------------------------*/
  all_c_src       = [match_mask [src]/%2/%3/%0%[c_suffix]       [all_files]];
  all_cpp_src     = [match_mask [src]/%2/%3/%0%[cpp_suffix]     [all_files]];
  all_java_src    = [match_mask [java]/%3/%0%[java_suffix]      [all_files]];
  all_class_src   = [match_mask [java]/%3/%0%[class_suffix]     [all_files]];
  all_bison_src   = [match_mask [src]/%2/%3/%0%[bison_suffix]   [all_files]];
  all_bisonpp_src = [match_mask [src]/%2/%3/%0%[bisonpp_suffix] [all_files]];
  all_flex_src    = [match_mask [src]/%2/%3/%0%[flex_suffix]    [all_files]];
  all_esql_src    = [match_mask [src]/%2/%3/%0%[esql_suffix]    [all_files]];
  all_h_src       = [match_mask [src]/%2/%3/%0%[header_suffix]  [all_files]];
  
  /*--- get the names of the derived header files ----------------*/
  all_bison_h   = [fromto [src]/%2/%3/%0%[bison_suffix]
                          [src]/%2/%3/%0[objdir]/%[bison_infix][header_suffix] [all_bison_src]];
  all_bisonpp_h = [fromto [src]/%2/%3/%0%[bisonpp_suffix]
                          [src]/%2/%3/%0[objdir]/%[bison_infix][header_suffix] [all_bisonpp_src]];
  all_gen_h     = [all_bison_h] [all_bisonpp_h];
  
  /*--- get the names of the object files ------------------------*/
  all_c_obj     = [fromto [src]/%2/%3/%0%[c_suffix]
                          [src]/%2/%3/%0[objdir]/%[obj_suffix]     [all_c_src]];
  all_cpp_obj   = [fromto [src]/%2/%3/%0%[cpp_suffix]
                          [src]/%2/%3/%0[objdir]/%[obj_suffix]     [all_cpp_src]];
  all_flex_obj  = [fromto [src]/%2/%3/%0%[flex_suffix]
                          [src]/%2/%3/%0[objdir]/[flex_prefix]%[obj_suffix] [all_flex_src]];
  all_bison_obj = [fromto [src]/%2/%3/%0%[bison_suffix]
                          [src]/%2/%3/%0[objdir]/%[bison_infix][obj_suffix] [all_bison_src]];
  all_bisonpp_obj = [fromto [src]/%2/%3/%0%[bisonpp_suffix]
                            [src]/%2/%3/%0[objdir]/%[bison_infix][obj_suffix] [all_bisonpp_src]];
  all_esql_obj  = [fromto [src]/%2/%3/%0%.pc
                          [src]/%2/%3/%0[objdir]/%[obj_suffix]     [all_esql_src]];
  all_obj = [all_c_obj]
            [all_cpp_obj]
            [all_flex_obj]
            [all_bison_obj]
            [all_bisonpp_obj]
            [all_esql_obj];

  etc... etc...

I then determine the top level targets like this:

  /*--- determine top level targets  -----------------------------*/
  all_build_dirs = [match_mask [src]/%2/%3 [tops] [all_dirs]];
  all_lib_dirs = [match_mask [lib]/%3 [all_build_dirs]];
  remaining_build_dirs = [filter_out [lib]/%3 [all_build_dirs]];
  all_java_dirs = [match_mask [java]/%3 [remaining_build_dirs]];
  all_bin_dirs = [filter_out [java]/%3 [remaining_build_dirs]];
  
  function make_lib_target /* lib_dir */ =
  {
    return [fromto [lib]/%3
                   [lib]/%3/[objdir]/[lib_prefix]%3[defined-or-default
                     lib_suffix_[@1]/ [lib_suffix]]
                   [@1]];
  }
  all_lib_targets = [repeat make_lib_target [all_lib_dirs]];

  etc... etc...  

Collecting the set of objects which make up a library or a program
works like this:

  /*--- bunch together depends and objects by top level targets --*/
  dirs = [all_build_dirs];
  loop {
    if [not [count [dirs]]] then loopstop;
    dir = [head [dirs]];
    dirs = [tail [dirs]];
    objects_in_[dir] = [match_mask [dir]/%0[objdir]/%[obj_suffix]    [all_obj]];
  }

I admit the loop construct is a little awkward, which is why I used
the "repeat" function above, which looks like this
(just to show off some of cook's capabilities):

  /*--- emulate a foreach loop -----------------------------------*/
  function repeat =
  {
	@2 = ;
	@3 = [tail [arg]];
	loop
	{
		if [not [count [@3]]] then
			loopstop;
		@4 = [head [@3]];
		@3 = [tail [@3]];
		/* run the named function on this argument */
		@2 = [@2] [[@1] [@4]];
	}
	return [@2];
  }

Step 3: Do local customizations:
-------------------------------

For this, we use the namespace trick again.

In a fragment stored as, say, "src/lib/a/Howto.cook", we simply say:

  cpp_flags_in_src/lib/a = -g /* or whatever */;

and the compile rule does something like:

  src/%1/%2/DO/%.o: src/%1/%2/%.cpp
  {
    [cpp] [cpp_flags_in_src/%1/%2] -o [target] src/%1/%2/%.cpp;
  }

The only slight drawback of this method is that the directory name is
hard coded into the assignment, and needs to be changed if we move
that directory. Therefore, I run "src/lib/a/Howto.cook" through a
little perl script which will appends the directory name at the end of
every variable which is assigned some value. So my original
"src/lib/a/Howto.cook" would simply say:

  cpp_flags = -g;

and my script would "localize" this to:

  cpp_flags_in_src/lib/a = -g;

Most customizations are themselves pretty standard, since they often
deal with well described problems, for example: "I want to link to
oracle" or "I want to use Roguewave" or something...

Having the localization script allows me to define shortcuts for this, so
that the actual "src/lib/a/Howto.cook" may simply contain something like:

  use roguewave;  /* add settings to use roguewave */
  use dll;        /* add settings to build a shared object */

The details of this may be interesting, but will somewhat exceed the
scope of this presentation. Here's a hint: This is what my "roguewave"
module looks like:

  rw_dir  = [thirdparty]/[unix]/new_rogue;
  rw_external_lib = [rw_dir]/lib/libtls4d.so;
  rw_cpp_I_flags = -I[rw_dir];

("use rw;" => hack off the rw_, glue on the directory and dump it)

The bottom line is that I use the name space trick to customize build
rules as a function of where they are used.

Step 4: Determine include dependencies:
--------------------------------------

"makedepend" is an old hat. Nevertheless it can be immensely improved upon.

The classic "makedepend" will parse the output of the C/C++
preprocessor and track down which files got included. This is
extremely wasteful, especially in "modern" C++ programs which tend to
include thousands of header files, often the same ones over and over
again for every single .cpp file. It often seems to me that the
makedepend done this way is slower than the compile.

Cook has a feature it stole from "jam", another excellent make-like
product (which is unfortunately hardwired for building C/C++, so...),
called the "cascade dependency".

When you say, for example:

  cascade x.c = x.h;

You are in fact saying that "everything which needs x.c also needs
x.h". Therefore, when you write a dependency like:

  x.o: x.c;

"cook" will automatically generate the dependency:

  x.o: x.h;

This is great news because now, instead of running the preprocessor,
we can run a specialized little C program which will only scan a file
for #include's, but not actually include them. We use this to generate
a cascade which essentially says: "if you need me, you will need all
the files I'm including".

If I generate such a cascade for every file (.c _and_ .h), I end up
with the same dependency graph, only that I used the extremely fast
"superficial" scans.

Two problems with the "superficial scan":

1) #ifdef OPTION
   #include <x>
   #else
   #include <y>
   #endif

2) the header file doesn't exist.

Let's do (2) first, because it will help solve (1).

If a header file doesn't exist, it's either because it needs to be
generated, or because it probably isn't included as in (1). In the
latter case, it's totally OK to ignore the dependency. In the former
case, well, we need to put in the dependency so that we trigger the
generation of the header file. Since we already computed all DO paths,
we simply pass a list of to-be-generated header files to the
superficial scan program, thereby telling it to register these
dependencies. All others will be ignored.

This also solves half of (1). If a header file is included but doesn't
exist, then we'll get a runtime error anyway, so not having a
dependency to it doesn't hurt. If the header file _does_ exist, then
the worst that will happen is that DO's depending on the file
containing the reference will be rebuilt unnecessarily. In practise,
this is a negligeable penalty compared to the huge gain in speed by
the superficial scan method.

Since I know of no "make" that has cascade dependencies, make users
will have to bite the bullet and use the preprocessor...

There is another subtle interaction between generated header files and
the possibility of operating within a "restricted view" of the source
tree. The problem happens when a generated header file is included
from a different directory - e.g. the generated header file is a
"published" API to a library. If you do a restricted build of the
client of such an API, then you will not be able to know that the
header file is a generated header file, since your build doesn't "see"
the source of that header file. The consequence is that the
superficial scanner will conclude that the header file just doesn't
exist, and since it's not on the list of generated header files, will
fail to create a dependency.

This doesn't immediately pose a problem, but if at a later stage you
do a build with full visibility, the faulty makefile fragment may not
be regenerated...

The solution to this problem is to introduce the "docking header file"
whenever you have a generated header file used outside of the
directory in which it lives. This docking header file will always
exist, thereby ensuring that the outside makefile fragments will have
a dependency to it, while the inside makefile fragment will take care
of the dependency to the generated header file. This is how to do it:

  #define cook_str(s)  # s
  #define cook_string(s) cook_str(s)

  #include cook_string(OBJDIR/__FILE__)

If you compile this with -DOBJDIR=DO/dynamic/SunOS5.5, the
preprocessor will do the right thing - and many thanks to the $#%^%$
ANSI committee for that #$%#$%^ "feature"...

... Obviously, the generated header file will end up in the DO
subtree, right where it belongs...

Step 5: Library dependencies:
----------------------------

In my world, we compile with "-Isrc/lib -Isrc/bin", which means that
developers will need to say "#include <library/header.h>" - which,
besides helping to reduce naming conflicts also neatly documents which
library is being used.

Therefore, I can simply scan through all the makefile fragments
generated by the superficial scanner, lop off the "src/lib" from the
beginning of every dependency, and conclude that the next entry in the
path is the name of a library being used by that file.

Obviously, this assumes that libraries are named after the directory
that contains their source code - but anything else would be cruel and
unusual punishment of the build Meister...

Again, I can use the cascade technique for the dependencies of one
library to another.

For the library dependencies of a program, I use "real" dependencies,
since the program is a real build target which needs to be re-linked
if (static) libraries change.

One note about cyclic dependencies between libraries. In my world,
those are not allowed to exist. The easiest way to get rid of cyclic
dependencies is to make a big library out of all the components of the
cycle. The best way is to encourage developers to reorganize the
source tree whenever it becomes necessary (e.g. to avoid a cycle), and
to support them in the process. Shops which are scared to branch
directories need not apply.

Step 6: Link lines:
------------------

Almost there... I find it amazing that 9 months before the year 2000,
we still don't know how to write a two-pass linker. At least we don't
need to do "lorder | tsort" anymore... tsort hmmm.... well there is
still some life left in that little utility.

"cook" does provide a variable called "need" whenever it executes the
body of a rule. This variable contains the list of all dependencies of
the current target. Unfortunately it doesn't have a "pairs" or a
"tsort" function which would sort the entries in the "need" variable -
therefore I need to do this "by hand".

I use a script that will read the library dependency fragment of the
program I want to link.  This script will recursively descend through
all the other library dependency fragments, generating the dependency
pairs in such a way that I can feed them into the "tsort" utility. The
output of that utility is the link line.

Grand Finale:
------------

So, what does my "all:" target look like? Here's from my original
Howto.cook file again:

  /*--- if there are not executables, do libraries or objects ----*/
  if [count [all_bin_targets]] then {
    all: [all_bin_targets];
  } else if [count [all_lib_targets]] then {
    all: [all_lib_targets];
  } else {
    all: [all_obj];
  }

References:
===========

http://www.tip.net.au/~millerp/rmch/recu-make-cons-harm.html
   Recursive Make Considered Harmful

http://www.tip.net.au/~millerp/cook.html
   Dependency Maintenace Tool
Christian Goetze
Last modified: Fri Feb 19 16:43:04 PST 1999