Files
Peter Suber, Computer Science, Earlham College

You are used to working with files. When you write a TP program, you typically make three files: the *.pas file of source code, the *.bak file or back-up copy of the source code, and the compiled *.exe file for running. You are used to seeing directories of files, loading files into TP, editing files, saving files, copying files, and deleting files.

Files are a data type like arrays or records. You can write a TP program that creates, loads, edits, saves, copies, and deletes files, just as you can write a TP program that manipulates arrays or records.

As a data type, a file is a series of components. The components must be of the same type. A string is a series of components of type char. An array is a series of components of any type (provided it is the same type for each). A file is like an array except that its size need not be stated at the time of declaration. A file can be as large as your operating system and physical disk permit.

Of course, files also differ from arrays in that they are stored on the disk. They survive the shutting down of the computer. This is the chief reason for using files.

The components of a file can be of any type except files. A file of records is a common type. A file of characters is so common that TP and standard Pascal give it a special name: text. If a variable is declared to be of type text, then it is a file of characters.

Technical digression you may skip for now. A file also differs from an array in that arrays permit random access and in general files permit only serial access. Random access allows us to jump to the nth item in an array without scrolling through all the preceding items first. Data types that only support serial access require us to scroll through the first n items to reach the nth. In standard Pascal, no kind of file permits random access. By contrast, in TP a file of records does permit random access. A text file, however, supports only serial access.

As with strings, TP has non-standard ways of dealing with files that have definite advantages over the standard methods. This hand-out is a summary of TP's non-standard file features.

*
  1. In standard Pascal, every program header contains the tag "(input, output)" after the program identifier. For example,

    program SampleProgram (input, output);
    begin
    {yeah, yeah}
    end.

    "Input" and "output" here are program parameters that pass the names of the files used by the program to the computer's operating system.

    But, you say, if the program parameters are file names, what files are represented by "input" and "output"? Counter-intuitive as it may sound, Pascal treats the keyboard as a file from which data are received, as if from a disk, and treats the screen as a file to which data are sent, as if to a disk. "Input" is the name for standard input, or the default input file, which is usually the keyboard. "Output" is the name for standard output, or the default output file, which is usually the screen. If you want to take input from another source, like a disk file, or send output to another destination, like a printer or a disk, then you can create new file variables. In standard Pascal, you'd have to list all those file variables in the program parameter list in the program header.

    While TP uses "input" and "output" as the names of the standard input and output files (keyboard and screen), TP does not support program parameters. This means that you need not include them. It also means that if you do include them, they will have no effect. However, they will not interfere with compilation. So you might include them anyway if you wish to port your code to other Pascal environments.

  2. The job performed by program parameters in standard Pascal still has to be performed. The computer's operating system must be told which Pascal file variables are associated with which files on the physical disk. To accomplish this job, TP introduces the non-standard built-in procedure, assign. If we've already declared a file variable, GradeFile, then we can write:

    assign(GradeFile, 'B:Grades.97');

    Assign takes two parameters. The first is of type file. The second is of type string. The second, string parameter is the same string of characters you'd use in a DOS command at a DOS prompt to affect that file. The string that names the file may include the drive letter and subdirectory in which the file is located. In fact, if the file is not on the logged drive, the file name must include that extra information or else your program will think the file doesn't exist. (Since capitalization within file names is irrelevant to DOS, it is irrelevant here.)

    Perhaps the most confusing aspects of TP files to new programmers is the difference between the file variable and the file name that are used as arguments to Assign. The file variable is of type text (or file of char). The file name is of type string. Inside your program you refer to the file through the file variable, just as you refer to the values of other variables by using the variable identifier as its name. The file name string is how DOS refers to the file. The Assign procedure establishes the link between the name you'll use inside your program and the name DOS uses when physically manipulating the disk.

    Since the file name string is how DOS will refer to the file, it must conform to the DOS requirements for file names. See the appendix to this hand-out (below) for those requirements.

    Assign is more convenient than the program parameters of standard Pascal in at least one respect. A program parameter must be "hard-wired" into the source code and can only be changed by editing and re-compiling the program. However, the second, string parameter to Assign can be changed during a run of the program. For example,

    write('What file do you want to edit? ');
    readln(FileWanted); {of type string}
    assign(GradeFile,FileWanted);

    The second, string parameter of Assign can be any expression that evaluates to a string that names a file: a quoted name, a variable of type string, a series of string operators on string operands, or a function of type string. So, for example, you can treat the drive, file name, and extension separately, and concatenate their string representations at the last minute.

    Before you can do anything with any file in TP, you must declare a file variable and use Assign to assign to it a file name (string) known to DOS. The only exceptions are "input" and "output" which are pre-assigned.

    If the string does not in fact name a file (for example, it says "B:Grades.97" when there is no such file on drive B:), then the program will crash. I wrote a procedure for your library file to test whether a file exists; hence it will tell you whether a file name (string) is safe to assign to a file variable.

  3. TP uses Reset and Rewrite exactly as standard Pascal does. Therefore I'll let Cooper explain how they work.

    In TP, Read, Readln, Write, and Writeln as applied to files differ from their counterparts in standard Pascal only in minute, insignificant ways. So for them too I'll leave you with Cooper.

  4. TP adds the non-standard procedure Close to close files that have been opened with Reset or Rewrite. If you've opened GradeFile, then

    close(GradeFile);

    will close it again. As with Reset and Rewrite, the argument to this procedure should be of type file, not of type string.

    If you do not close opened files before the program ends, then some data you thought you had written to the file might not actually be there when you reopen it. This can be a nasty bug, and very hard to trace. A good way to avoid it is simply to close every file at the end of every procedure that opens one.

    Writes to the file are sent first to a temporary memory buffer. The contents of that buffer are not actually written to the file until the buffer is flushed. Only three events will flush the buffer:

    1. The command to close the file: close(GradeFile);
    2. The command to flush the buffer: flush(GradeFile);
    3. Filling the buffer. (I can't find the size of the buffer in the manual just now, but it's around 120 bytes.)

    Prove this to yourself by executing this code:

    writeln(GradeFile, 'One line.');
    writeln(GradeFile, 'Another line.');
    writeln(GradeFile, 'And yet another.');

    Notice that the panel light for the logged drive does not glow during the execution of these statements. They are not enough to cause the disk to spin. But when

    close(GradeFile);

    executes, then the drive light will finally glow.

  5. TP does not support standard Pascal's "file window". The standard procedures Get and Put, which use the file window, are also not supported in TP.

  6. TP has many non-standard built-in procedures and functions for making files easier to handle. Those not mentioned so far are conveniences, not necessities. We won't cover them in the course; if you are interested, see the manual. ChDir, Erase, GetDir, IOResult, MkDir, Rename, and RmDir apply to all types of files. Append, Flush, SeekEof, SeekEoln, and SetTextBuf apply to text files. FilePos, FileSize, Seek, Truncate apply to all files except text files. FindFirst, FindNext, GetFAttr, and SetFAttr apply to directories of files.

  7. In standard Pascal, the end-of-line (EOL) marker is read as a (single) space. But in fact, two commands typically terminate a line of text:

    1. a "carriage return" (CR) that sends the cursor back to the left margin, and
    2. a "line feed" (LF) that advances to the next line.

    In TP, the EOL marker consists of these two characters, a CR and a LF. They are read separately, the CR first and the LF second.

    In standard Pascal, a CR is read as a space; in TP it is read as a CR, which is ASCII character #13. (The space is ASCII character #32.) In standard Pascal, after reading a CR, you can expect to read the first character of the next line; in TP you can expect to read the LF.

    In both standard Pascal and TP the built-in boolean function, EOLN, returns the value true when the next character to be read is the CR that terminates a line, and false otherwise. So when EOLN is true and we advance one character, then in standard Pascal the next character to be read is the first character of the next line; but in TP the it is LF character.

    It is as if in TP the CR terminated a line, and the LF began the next line.

    Incidentally, in TP the LF is read as ASCII character #10, and the end-of-file (EOF) marker is read as ASCII character #26.

  8. In standard Pascal, the EOF marker cannot be read without crashing. When the EOF function says you are about to read it, you should stop reading new characters. In TP, you can read the EOF marker without crashing.

    Similarly, in standard Pascal, attempts to read past the EOF marker cause a crash. In TP, you can issue the commands to read past the EOF marker without crashing. Physically, however, you will not read past the marker; instead, you will read the marker itself again and again. This relief from crashing is most welcome.

  9. In standard TP, the only way to read a text file is character by character. This requires two loops: one to try the next character in case we are not at the end of the file, and another nested inside it to try the next character in case we are not at the end of the line. TP supports this method and, if you want your code to be portable, you should adopt it.

    However, TP's non-standard string package provides a much easier way to read text files: not character by character, but line by line. In short, read each line into a string variable. The EOLN marker tells TP where to end the string. Use "readln" instead of "read" to pick up the whole line, including the EOLN marker. One advantage is that only one loop is needed, not two (one iteration per line, not one per line and another nested loop iterating once per character within the line). Another advantage is that for most "text crunching" applications, manipulating strings is easier than manipulating characters. It's also more natural for most applications to think of a text file as collection of lines than as a collection of characters.

    Standard Pascal supports the line-by-line approach, but only with the awkward implementation of strings available in standard Pascal. For most programmers, it is not worth the trouble.

Appendix: File Names in MS-DOS

Remember that DOS file names can have two parts: the name proper and an extension. If you use both, separate them by a period (with no spaces on either side of the period). The name proper is necessary; the extension is optional. For example, these are all valid strings usable in Assign as file names.

MyDataname without extension
MyData.docname, extension
B:MyData.docdrive, name, extension
B:\work\MyData.docdrive, subdirectory, name, extension

The name proper may be up to 8 characters. The extension may be up to 3 characters.

File names may contain any of the letters, upper or lower case. DOS is not case sensitive in reading file names. For DOS, "MyData" is the same file name as "mydata" and "MYDATA".

File names may contain any of the digits, 0..9. But they may use only the following punctuation marks and special symbols:

$%' -@ {} ~` !# () &

The following, otherwise legal file names are reserved by DOS for other purposes, so you cannot use them: aux, clock$, com, con, lpt, lst, nul, and prn. Input and output are not in this catetory. They are default file names for Turbo Pascal, unknown to DOS; you may override the default by assigning those names to files of your own creation. However, DOS will not allow you to assign the names aux etc. to your own files.


This file is an electronic hand-out for the course, Programming and Problem Solving.

[Blue
Ribbon] Peter Suber, Department of Philosophy, Earlham College, Richmond, Indiana, 47374, U.S.A.
peters@earlham.edu. Copyright © 1997, Peter Suber.