8. Linking

It's important to have a little understanding of the ML assembler in order to see what the linker does. And probably the easiest way to think of the ML assembler is as a tool that turns your assembler source code into literal strings of bytes, just like an ASCII message. Except that it's a literal string built by ML from it's knowledge about how each of the instructions need to be coded up in binary form (machine code.) But rather than each instruction being a separate string of its own, ML turns whole sequences of instructions into single instances of a literal strings.

How does ML know when to start or stop a string, then? Well, whenever you change memory segments is the simple answer. When you entered a line with a directive that said .code, the assembler started a new literal string. If you then give the directive that says, .data, the assembler will terminate the earlier string and then start up a new one. These strings go into the object file I mentioned earlier. It's the job of the linker to read these object files and put these strings together into something meaningful, when building the final program.

It's now time to discuss a little of what the linker actually does for us. Understanding this will help you write your programs well.

Linking with LINK

One of the tasks the linker handles is collecting up these strings that the ML assembler has written into the object file(s) and putting them into some order. For this part of its task, you can image that the linker has a tablet of blank pages it can write on. But there is nothing on them, when it starts up. Just empty pages. Each of these pages can be labeled at the top with the name of a memory segment. No two pages can have the same name. Each page is "very long" so it can hold everything the linker needs to put there.

The linker then grabs up the first object file mentioned (if there is more than one, it often doesn't matter what order they are given) and starts to read through it. Each string is labeled with a memory segment name within which it belongs. The linker searches the pages it has and sees if that memory segment is already mentioned at the top of any one of them. If not, the linker selects the next available blank page and heads it with the name of this newly discovered segment, placing the name at the top where it can be easily found. Then, the linker simply appends the literal string of bytes it found in the object file on this page. Think of it as a kind of accountant's page, with ruled lines -- one per byte. The linker (acting as an accountant) simply takes each byte in sequence and places its value on the next line, numbering each line as it goes. The first line with the first byte is numbered 0. It continues this process as it reads the first object file. Then it closes that file and opens the next one and repeats this for each string there, too.

And so it goes. In the end, the linker will have one or more of these partly filled pages at hand. When that happens, the linker simply writes each of these pages, one at a time, to the final executable program file. There are some other details that complicate the linker's life a little.

For example, sometimes one of those literal strings in the object file will also have a special notation telling the linker that they can't be simply added to the memory segment, but that they have to be exactly placed on that segment, starting at a given row. In cases like this, the linker may need to keep track of an empty 'hole' on the page. It can fill this hole, if it wants, with other strings that don't have to be placed in an exact spot and are short enough to fit into that hole.

Another complication happens when one of these literal strings isn't quite so... literal. It might have a spot with few consecutive bytes that are marked as 'unknown' and need to be adjusted by the linker. These are usually spots reserved for a memory address (segment or offset part) that wasn't known by ML when it assembled the code, but where the linker will eventually be able to figure it out as it finishes reading all the object files. In these cases, ML will place a note there telling the linker the name of the label of which it didn't know the value and leave it to the linker to later 'fix up.' To fix these, the linker must keep a separate list of such places that need adjusting. In the normal course of adding strings to various pages in its book (memory segments), the linker also writes down any labels (named byte entries) it sees as it processes them. Once the process of reading all the various strings from the various object files is complete, the linker will then go back to its list of "fix ups" and go find the referenced label and patch in the right bytes where it is supposed to.

Also, it's possible to tell the linker you want to make a composite page from one or more of the other named memory segment pages. This is what I think of as a GROUP page.

Let's look at a print out of the object file, lesson02.obj:

000000 THEADR  lesson02.asm
000011 LNAMES
    Name  1: ''
    Name  2: '_DATA'
    Name  3: 'DGROUP'
    Name  4: '_TEXT'
    Name  5: 'DATA'
    Name  6: 'CODE'
000033 SEGDEF 1 : _TEXT           WORD  PUBLIC  Class 'CODE'     Length: 010b
00003D SEGDEF 2 : _DATA           WORD  PUBLIC  Class 'DATA'     Length: 0013
000047 GRPDEF Group: DGROUP
    Segment: _TEXT
    Segment: _DATA
000050 LEDATA  Segment: _TEXT          Offset: 0100  Length: 000B
    0000: BA 00 00 B4 09 CD 21 B4  4C CD 21                  ......!.L.!
000062 FIXUPP
    FixUp: 001  Mode: Seg  Loc: Offset16    Frame: GI[1]   Target: SI[2]
00006B LEDATA  Segment: _DATA          Offset: 0000  Length: 0013
    0000: 48 65 6C 6C 6F 20 6F 75  74 20 74 68 65 72 65 21   Hello out there!
    0010: 0D 0A 24                                           ..$
000085 MODEND(Main Module)   Frame: GI[1]   Target: SI[1], 0100h

This is what the internals of an object file might look like (and do look like, in this case.) If you look in there, you'll see two entries marked LEDATA. These are those 'strings' I have been talking about. In one case, it's code from the code segment we wrote. In the other case, it's the actual string we wanted to display on the screen. But they are written into the object file as two separate strings. Also note that the linker page name, the memory segment in other words, is also mentioned on these LEDATA headers; one as _TEXT and the other as _DATA. Those are the named pages, so to speak, where those strings go. There is also a "FIXUPP" record. This is one of those entries designed to cause the linker to 'patch up' one of the LEDATA strings. See what you can figure out by reading this. Compare it with what you can see in the listing file generated by ML. It's not necessarily important, but it's fun.

Summary

When you write your assembly code, you can choose to place code, constants, or variables into different memory segments that the linker will manage for you. Later, as you learn more about the use of segments in the assembler, keep in mind what I've said here about the linker. The main thing to remember is that different segments are like different pages of paper. You can instruct the assembler and the linker to place code on any of them you want, any time you want, in any file you want. They will do the work of making sure that things get put where you say to put them. (Whenever you use a .model directive, in fact, the assembler will automatically create a segment called _TEXT.)

For very simple programs, you'll probably just use two segments, just like _TEXT and _DATA mentioned above. Your code will get placed onto the _TEXT page and your constants and variables will get placed onto the _DATA page. For a lot of useful programs, that's all you really need.

Last updated: Thursday, July 08, 2004 22:11