Previous | Index | Next

Master 512 Forum by Robin Burton
Beebug Vol. 10 No. 4 August / September 1991

Following on quite neatly from the previous Forum we're digging into the dark mysteries of batch files again. This month's article was prompted by a PC-using friend asking how such a short batch file could generate enough files to fill a directory in a test I outlined a couple of issues ago. There's also something new later which could double the value of your year's BEEBUG membership fee.

OK, perhaps batch files aren't dark and mysterious, but obviously some DOS users aren't too familiar with the lesser used functions like parameter substitution, 'IF variable == variable' and 'SHIFT'. I thought we should take a look.

DEFINING THE JOB

You may recall that my test required the (over) filling of a directory with files. Naturally I wanted to do it in the easiest manner possible, which in my book means the least amount of effort, particularly in terms of manual entry.

My test was of a hard disc partition, but the purpose here is investigating batch files, so floppy users needn't 'switch off', but should join in too (and run times using 800K discs aren't all that much longer than for a winchester).

Using floppies, I suggest you format a fresh disc and use the root directory to ensure maximum speed (these batch runs are pretty heavy going, even for a winchester). If you use a hard disc for these jobs, create a test directory in the root and use that.

N.B. DO NOT use the root directory of your winchester. If it fills to capacity DOS Plus (in the 512, at least) definitely doesn't like it – judging by my tests it can even damage the partition. This is not a problem in sub-directories, which appear to expand as required without practical limits.

In all cases it's best to run these tests in an EMPTY directory or they may not produce the correct results, because the order in which new files are created in the directory is important.

When you delete a file in DOS (unlike the DFS for example) the remaining directory entries are NOT concatenated. In DOS all deleted file entries in the directory stay precisely where they are until new files need to use those entries again. In an 'old', well used directory therefore, the gaps left by deleted files can be all over the place.

THE DIRECT APPROACH

How to go about it? The starting point is an empty directory, so the first job is to create one file as the source for the copy operation. Also, for quickness, the source file should be kept short, not more than a cluster long in fact.

The easiest way to produce a suitable file is to use 'COPY CON <filename>', then press Ctrl-Z followed immediately by Return, which will create a one byte file. For our tests, call this first file 'A', for reasons that will become clear as we proceed.

Now let's see if we can make the system do most of the work. Probably the most obvious strategy would be to copy the first file to another, then to copy both resulting files again and so on, doubling the number produced each time. In this way, after only a few commands, the number of files will be growing rapidly. However, the most obvious method isn't the best in this case and we can do much better.

I know that an 800K floppy only holds 192 directory entries, but ignore that for now. Concentrate on the fact that our real interest is the batch file itself, not filling the directory.

One difficulty with any approach to producing lots of files automatically is inventing suitable names that can be used with wildcards (the aim being to copy all existing files every time) without producing duplicate filenames. This would at least waste time and could cause a failure in other circumstances.

Using a bit of imagination it can be done. Starting with file 'A' we could for example do this, with these results:

Command	New files
COPY ??????? A*		7
COPY ??????? B*		28
COPY ??????? C*		84
COPY ??????? D*		210

It's crude, but it works! Only four copy commands and we have a total of 329 new files. For the job to be run repeatedly it only needs to be saved as a batch file containing commands like the above. If you find the results surprising and want to try these commands you can do so either in a batch file, or more simply by entering them manually in sequence. The results are precisely the same in both cases. If you weren't expecting so many files from so few commands the process is fully explained later.

We could be satisfied with this and get on with the job. However, all that can be learned from it is the power of DOS wildcard copy commands, but not much about batch files, so let's look further for a more flexible and elegant technique. This should produce at least as many files as the example above, ideally more, from the same or fewer manual entries.

Before we move on, there are a couple of points to note about the commands in the list above. Notice that the source filenames are specified as seven '?'s, rather than one '*'. This is necessary for two reasons. First, and the important one here, if a '*' is used instead, the instant result is a failure, because the first command will immediately attempt to copy the first source file to itself.

Even if that wasn't a problem, you'd find as the job progressed that a '*' would include existing filenames of eight characters, so a lot of copies (the sixteenth and every eighth one following for each new letter) would produce duplicate names and waste time. This point becomes more relevant in the next batch file.

MORE AUTOMATION

A better approach (call it TEST.BAT) looks like this. When you run it, you'll see that essentially it does the same job as the previous file, but this time the job is totally automated, we don't even need the first source file because that's included in the job too. How this job works is explained below in some detail so that you can follow what each of the commands does. You'll need to think about these commands more than the previous ones, because there's a lot of character substitution going on.

COPY %0 %1
:START
COPY ??????? %1*
IF %1 == D EXIT
COPY %1 %2
SHIFT
GOTO START

Again, seven question marks are used so that eight character names aren't copied. This file (more lines but about the same amount of typing) is called with a number of parameters, rather than having values fixed in the file, so it's much more flexible, and the file will never grow no matter how many sets of files we want to produce. To exactly mimic our first batch file's actions therefore, we'd call it with the command:

TEST A B C D

You'll notice the '%1' and '%2' in the file; these are the parameter identifiers and this is how they work. In this command line entry the first string is 'TEST', and the batch variable %0 is always set to the batch filename (minus extension) by DOS automatically. The second parameter is 'A', so this value is substituted for '%1', 'B' is used for %2, 'C' for %3 (if it's specified) and so on.

The first line copies '%0.BAT' (i.e. TEST.BAT) to %1 (A) and this therefore creates our first source file for us. Since the batch file's extension prevents it qualifying as a source file in this job, it can be copied to the test directory each time without interfering with the run.

Next, line 2 is passed over the first time through because it's a label. The next executable line, line 3, says, "For every occurrence of any filename of up to seven characters in length with no file extension, copy that file to another giving it a name starting with the current value of %1 followed by the characters in the source filename.

As the job starts '%1' evaluates to 'A', so the first execution of line 3 copies every occurrence of '???????' (file 'A' only, the first time) to a new file called %l plus whatever '???????' represents, creating in this case, file 'AA'.

Now this next bit is probably the most difficult part to understand, both in this batch file and the first one, because there's no equivalent situation in BBC micro filing system commands. Immediately following the first execution of line 3 we have two files in the directory. One of these, file 'A', has been processed, the second, file 'AA', has not.

Batch file commands are interpreted and evaluated in real time, just like manual ones, and because line 3 says "For EVERY occurrence of name ??????? copy that file...." the command processor immediately applies the command to the only file remaining unprocessed in the directory, our new file 'AA'.

Parameter %1 still equates to 'A', so file 'AA' is duly copied to file 'AAA'. This then becomes the only unprocessed file, so it's copied to 'AAAA' and so on until all (seven character) source filenames have been processed. Input to line 3 is then exhausted, so a new situation arises. Since no qualifying files in the directory remain uncopied, execution passes to the next command.

Ignore line 4 for the present (we'll come back to it) and look at line 5. At this stage %1 = 'A' and %2 = 'B', the second and third parameters from our original command entry. File 'A' is therefore copied to a new file called 'B'. The fifth line, 'SHIFT' then tells the batch file processor to move all parameters left by one place, so the old %l (A) becomes the new %0 the old %2 (B) becomes the new %1, the old %3 (C), even though it's not referenced directly in our file, becomes the new %2 and so on. This operation takes place for as many parameters as were supplied, up to the maximum length of a command line.

The reason for the 'SHIFT' command is that you can only specify parameter names from %0 to %9 in batch files. However, because SHIFT operates on all parameters supplied in the original command line, any which were initially out of reach (the 10th, 11th, 12th, etc.) or simply not referred to (in our case %3, %4 etc.) can be accessed after an appropriate number of 'SHIFT's. Remember though, this is strictly a one way process. The old %0 value is permanently lost after every SHIFT. In other words, our original parameters, evaluated the first time through by the batch processor as:

%0 %1 %2 %3 %4
TEST A B C D

become:

%0 %1 %2 %3
A B C D

Following the 'SHIFT', execution is then returned to line 2 by 'GOTO START' in the last line, so the whole process begins again, this time copying file 'A' to 'BA', 'AA' to BAA, and so on up to BAAAAAAA, then all the new 'B' files follow too. After this the parameters are again shifted left by one place and the process can continue for as long as is required.

There are a couple of extra points to note before we leave the explanations. First, I've used single alpha character parameter entries here for obvious reasons, but batch parameters can be anything you like in the real world, so long as they're valid for the purpose intended. Also, the filenames used could include extensions, and either could be made up of parameters, wildcards, hard coded values or any combination of any of these.

Next, back to line 4 for a moment. In this file the line:

IF %1 == D EXIT

ends the run when parameter %1 becomes equal to 'D', because the 'EXIT' command is then executed. All you need to do is to change the 'D' to another character to make the run longer or shorter. Just make sure the value used in the test is the last parameter you want the file to process. Note also that the '==' is not a misprint; there must be two of them.

Finally, remember that upper case has been used here purely for clarity. In batch files you MUST ensure that the case is consistent between any test and the variables which will be supplied. Batch file '==' tests are case sensitive, so 'D' and 'd' are NOT the same so far as the batch file processor is concerned. Whichever case a batch file test specifies, the variables compared must be the same case or the test will fail. It's obviously simplest to stick to lower case throughout, as is usual for DOS commands and filenames.

MORE DEVIOUS

Back to operations. This batch file does much the same as the first one, and it's much more flexible too. We've also satisfied our second requirement, less manual entry, because the only thing that now increases, regardless of how many files we might want to generate, is the number of parameters and they're only one character each.

However, compare the results of two jobs and you'll find there's another difference between them. For %1 ='A' you'll get the same eight filenames as before, 'A' to 'AAAAAAAA , but when %1 = 'B' things change. In the second version of the file, from set 'B' onwards, apparently similar operations produce different results. In the first file, 'A' to 'D' produce 330 files, but this time the same range produces almost 500!

Why more files? Actually I've already mentioned the reason, but you might have missed its significance. Look at line 5 again. It seems innocent enough, but this simple step creates the first file of the next set before that set is processed. However, the result is an increase in the number of files in the next set by the total number of all files in all preceding sets.

Similar operations could have been included in the first file, but in that case they would literally have doubled the number of lines in the job. Since every extra set in the first file already requires an extra command line, this would make it twice as bad – definitely a backward step if economy of effort is our aim.

In set 'B' this small change adds 8 files (the total of set A) giving 36 'B' files (8+7+6+5+4+3+2+1) instead of the 28 of the first job. Things get more complicated after that though. If that simple calculation were applied to set 'C', i.e. the sum of numbers 1 to 44 (8+36), you'd expect 990 files, yet only about an eighth of this number are produced, so what's happening? And why am I being vague about the totals? Wait and see.

Two factors are at work. First, for each succeeding set of files an ever growing number of existing filenames are eight characters long and can't be used again. Secondly, in each succeeding set of files, an increasing number of new filenames will have fewer unused characters than in previous sets, so (proportionally) fewer new files can be created from them.

By the way, I never ran my tests with more than five parameters, but if you have a hard disc and want to try, good luck! I'd guess with six parameters you'd have to leave the machine running for maybe 12 hours, with eight it would be more like a week (assuming it didn't crash).

SOMETHING EXTRA

If you are wondering why I didn't list more complete results from this job it's because we're having a competition.

Prizes of a hundred pounds worth of Essential Software's 512 products are up for grabs. There's a new package too, so existing users needn't feel left out. If you're unfamiliar with the product range consult BEEBUG Vol.8 No.5, Vol.9 Nos.1 & 10, the September '90 and January '91 BBC Acorn User, or The 512 Technical Guide from Dabs Press. Alternatively, send an S.A.E. to the address below for an up to date list.

Each of the first five correct entries out of the hat after the closing date will each win a £20.00 voucher which can be spent on any Essential Software products (including the memory expansion).

To enter you must work out the number of files which would be generated by the second batch file for each file set from 'A' to 'H' (in alphabetic order) and say whether or not the number of new files produced by any individual set (A to Z in alpha order) would reach a million. If yes, which one is it? There are no tricks, so ignore all practical limitations.

One entry only per BEEBUG member please, and you must include your name, address and your BEEBUG membership number. To give overseas members sufficient time to enter, the closing date is not until Friday, September 20th.

Winners will be notified directly and their names plus the answers will be published in the November issue of BEEBUG.

Please send entries/product list enquiries to Essential Software, *******