Discussion:
OS/8 ASCII file definition?
(too old to reply)
Martin Eberhard
2020-09-03 03:30:39 UTC
Permalink
Can someone point me to DEC documentation for OS/8's ASCII file structure? How are ASCII characters packed into the file? Three 8-bit characters packed into two 12-bit words? Two 6-bit ASCII characters per 12-bit word? Something else? Is there any header information at the beginning of the file? A ^Z at the end? Must be padded with nulls after the ^Z?

I can't seem to find this documentation anywhere.

Thanks!

Martin E.
Thomas Moss
2020-09-03 05:25:06 UTC
Permalink
Post by Martin Eberhard
Can someone point me to DEC documentation for OS/8's ASCII file structure? How are ASCII characters packed into the file? Three 8-bit characters packed into two 12-bit words? Two 6-bit ASCII characters per 12-bit word? Something else? Is there any header information at the beginning of the file? A ^Z at the end? Must be padded with nulls after the ^Z?
I can't seem to find this documentation anywhere.
Martin,

The Software Support Manual for OS/8 explains this, see Apendix A.
http://bitsavers.org/pdf/dec/pdp8/os8/DEC-S8-OSSMB-A-D_OS8_v3ssup.pdf

A good write-up which covers the above and some other cases can also be found here:
https://retrocomputing.stackexchange.com/questions/5840/how-did-the-pdp-8-handle-strings/5842

Regards,
-Tom


***@sdf.org
SDF Public Access UNIX System - https://sdf.org
Martin Eberhard
2020-09-03 16:33:15 UTC
Permalink
Thanks! I read that appendix and somehow missed that bit.

That appendix has weird typos, like this:

"3. ASCII and Binary files are terminated by a CTRL/Z code (ASCII 232). In binary files, a CTRL/Z code data rather than end-af-file."

What does that last sentence mean, in connection to the first one???

Fortunately, I already understand the binary file format!
Best,
Martin
Dennis Boone
2020-09-03 18:40:28 UTC
Permalink
Post by Martin Eberhard
"3. ASCII and Binary files are terminated by a CTRL/Z code (ASCII 232). In binary files, a CTRL/Z code data rather than end-af-file."
What does that last sentence mean, in connection to the first one???
Presumably "In binary files, a CTRL/Z code is data, rather than an
end-of-file."

De
Martin Eberhard
2020-09-03 19:31:48 UTC
Permalink
Post by Dennis Boone
Post by Martin Eberhard
"3. ASCII and Binary files are terminated by a CTRL/Z code (ASCII 232). In binary files, a CTRL/Z code data rather than end-af-file."
What does that last sentence mean, in connection to the first one???
Presumably "In binary files, a CTRL/Z code is data, rather than an
end-of-file."
De
Yes, that is part of what the 2nd sentence says.
But look closely at the first sentence. That sentence says "...Binary files are terminated by a CTRL/Z code"
While the (contradicting) second sentence says "In binary files, a CTRL/Z code data *rather than* end-af-file."
So you see, the first sentence says that binary files ARE terminated with a CTR/Z, while the second sentence says that a CTRL/Z is NOT the termination of a binary file. At most one of these two sentences is correct.
Do you see what I mean?
Vincent Slyngstad
2020-09-03 19:49:11 UTC
Permalink
Post by Martin Eberhard
Post by Dennis Boone
Post by Martin Eberhard
"3. ASCII and Binary files are terminated by a CTRL/Z code (ASCII 232). In binary files, a CTRL/Z code data rather than end-af-file."
What does that last sentence mean, in connection to the first one???
Presumably "In binary files, a CTRL/Z code is data, rather than an
end-of-file."
De
Yes, that is part of what the 2nd sentence says.
But look closely at the first sentence. That sentence says "...Binary files are terminated by a CTRL/Z code"
While the (contradicting) second sentence says "In binary files, a CTRL/Z code data *rather than* end-af-file."
So you see, the first sentence says that binary files ARE terminated with a CTR/Z, while the second sentence says that a CTRL/Z is NOT the termination of a binary file. At most one of these two sentences is correct.
Do you see what I mean?
In a BIN format (.BN) file, a 0232 (^Z, mark parity) is indeed an EOF.
The file is binary, in the sense that it isn't text, and thus 8 bits,
are significant, not 7.

In other binary files (.SV, etc.) a byte of 0232 has no particular
meaning without further context.

A 0032 byte is just data in all the binary contexts I know about. I'm
not even sure that OS/8 will reliably recognize 0032 as an EOF for text
files!

Perhaps the capital B was meant to signify .BN?

Vince
Ian Schofield
2020-09-07 15:43:41 UTC
Permalink
Dear All,

Just to complicate things a bit more!!!
0232 ^Z (SUBS) is the terminator used by OS/8
for ASCII files. This ^Z is included
in the file on a storage device.

0232 within a .BN file is not usually seen within
the file. .BN files are enclosed in 0200 leader trailer
and again the terminating character is ^Z ...
after the trailer is included in the file.

Now, 032 is a legal character (6 bits) within a .BN file and
would be read as such and the binary read would continue
until 0232 is received or, 0200 trailer. If the 0232 precedes
the trailer, you will get an error.
Finally,
an 032 character is still interpreted as EOF in ASCII /A copy mode.

Told you it was a bit involved!!!

Regards, Ian.
K. Krause
2020-09-16 12:42:24 UTC
Permalink
This is a post that I wanted to the newsgroup, but by mistake I sent it
directly to Vince.
He also responded directly to me, and made the annotation, that FOCAL69
is not OS/8-software. Regardless of this, my experiments with reading
textfiles with FOCAL and PIP may be of interest.
Most programs I have been used don't look for bit 7, indifferently if
pre-OS/8 papertape software or later OS/8 software. My theory is, that
PDP-8 users at papertape-time used not only special configured TTYs with
parity bit set so mark, they also used existing TTYs with real parity or
parity fixed to space. To avoid problems with parity, software authors
wisely set parity (bit 7) to 1 directly after reading the data register
of the serial line.
So this convention was eventually taken to OS/8
I remember one exception: PIP10, which is an OS/8 utility wants bit 7 set.
...
Post by Vincent Slyngstad
Post by Martin Eberhard
Do you see what I mean?
In a BIN format (.BN) file, a 0232 (^Z, mark parity) is indeed an
EOF. The file is binary, in the sense that it isn't text, and thus 8
bits, are significant, not 7.
That's right. OS/8-ASCII means, bit 7 is set. So 032 is not ctl-Z.
ctl-Z is 232!
Try to transfer a .TX-file with KERMIT in binary mode to your modern
system and open that with an editor. KERMIT in text mod strips bit 7 of.

If you punch out a .BN-file with PIP, at the end of the file, after the
checksum, PIP punchs out 8 time 200, and after this 232.
So evidently the ABSLDR (and PIP) don't look for ctl-Z, they look for
bit 7 set.

On the other hand most PDP-8 programs which want text input (FOCAL,
BASIC, EDIT) accept ASCII-text which was produced on a standard TTY,
with bit 7 NOT set.
But if you list out this text, than bit 7 is set.
Post by Vincent Slyngstad
In other binary files (.SV, etc.) a byte of 0232 has no particular
meaning without further context.
That's possible, because .SV-files are block oriented. The words inside
an block are pure binary in the range 0000 - 7777.
Post by Vincent Slyngstad
A 0032 byte is just data in all the binary contexts I know about.
I'm not even sure that OS/8 will reliably recognize 0032 as an EOF for
text files!
recognizing 032 is not a question of OS/8, it is the question of the
special application:
I tried:

First I prepared a piece of papertape on my TTY. Normally punching text
on the TTY has bit 7 set.
With a little trick I eliminated the bit 7 over the whole text.
The text is:
01.10 TYPE "HALLO"
<032>
01.20 TYPE FSQT(2)
<032>
GARBAGE AFTER CTL_Z
<032>

The two lines are valid FOCAL-code. After each line comes a "wrong"
ctl-Z <032>.

Reading this tape without bit 7 with FOCAL69 does the following:
01.10 is accepted and stored in FOCAL.
The first <032> stops reading in. And prooduces error 03.28 (illegal
expression). The second line 01.20 is ignored (not stored) and the
third line (GARBAGE ...) is interpreted as G-Command and FOCAL
executes line 01.10 and types "HALLO".
Punching this 1-line program on the TTY has bit 7 set.

Now the test with OS/8:
TTY is console and I type:

.R PIP
*T1.TX<TTY:

That means, PIP is running in text mode.
The effect is: the first line (01.10 ...) goes to T1.TX with bit 7
set. And the first 032 is recognized by pip as ctl-Z and stored as
EOF (232) in T1.TX.

Next experiment:

.R PIP
*T2.TX<TTY:/B

which means PIP is running in binary mode.
T2.TX contains a number of <200> and exactly one <232> as EOF,
because PIP in bin mode waits for leading <200> from input device.

If I write input routines for serial lines, the first thing I do,
is masking of bit 7, to have not the difference (bit 7 set or not set),
with different terminals (TTY or glass-terminal).


Klemens
steve...@gmail.com
2020-09-17 00:36:56 UTC
Permalink
Post by K. Krause
FOCAL69
is not OS/8-software.
That is correct. FOCAL-69 predates OS/8 by a couple of years. On the other hand, there is a version of FOCAL called UW Focal that does run inside of OS/8. So if you do want to run Focal programs inside of OS/8, get UW Focal. If you can't find it anywhere else, I have it. Also, I believe it's a DECUS program and the user guide is in the DECUS write up.
Post by K. Krause
My theory is, that
PDP-8 users at papertape-time used not only special configured TTYs with
parity bit set so mark, they also used existing TTYs with real parity or
parity fixed to space. To avoid problems with parity, software authors
wisely set parity (bit 7) to 1 directly after reading the data register
of the serial line.
Having been a PDP-8 developer from that era, your theory is incorrect. TTYs (as in ASR-33 and variant Teletypes) did not use either Mark or Space parity. If so, paper tape loaders like RIM Loader and BIN Loader would not work. TTYs of that era were set to full 8 bit, no automatic parity. It's just that some keyboards did force the high bit set and some forced the high bit clear (independent of the paper tape reader) so your code had to either mask off the high bit or force it to be set if you were trying to interpret ASCII characters. I've done it both ways.
Post by K. Krause
So evidently the ABSLDR (and PIP) don't look for ctl-Z, they look for
bit 7 set.
ABSLDR, BIN Loader, and RIM Loader all pay attention to the high bit. RIM Loader treats the high bit set as leader-trailer regardless of the remaining bits. ABSLDR and BIN Loader treat 200 as leader-trailer (interpreting the last 2 characters before the trailer as the checksum). ABSLDR and BIN Loader also treat 3x0 as a field setting to signal loading into memory field x (defaulting to field 0). So a tape with:

200
200
200
120
000
077
077
310
120
000
066
066
<cksum1>
<cksum2>
200
200

would load 7777 into address 0200 of Field 0 and 6666 into address 0200 of Field 1. Without the 3x0 trick, ABSLDR and BIN Loader would only be able to load into Field 0.
Post by K. Krause
If I write input routines for serial lines, the first thing I do,
is masking of bit 7, to have not the difference (bit 7 set or not set),
with different terminals (TTY or glass-terminal).
When you are coding in PAL, yes. As I said, either mask the high bit off completely or force it set.
K. Krause
2020-09-17 09:16:05 UTC
Permalink
Post by ***@gmail.com
Post by K. Krause
FOCAL69
is not OS/8-software.
That is correct. FOCAL-69 predates OS/8 by a couple of years.
On the other hand, there is a version of FOCAL called UW Focal that
does run inside of OS/8. So if you do want to run Focal programs inside > of OS/8, get UW Focal. If you can't find it anywhere else, I have it.
I have all sorts of FOCAl, also for OS/8. I don't have a special
interest in FOCAL. I use it, especially the papertape version, to show
young people what fits in a 4K of memory:
A language interpreter, floating point library with sin, cos, log and
sqrt, the program itself with variables and a text editor.
Post by ***@gmail.com
Post by K. Krause
My theory is, that
PDP-8 users at papertape-time used not only special configured TTYs with
parity bit set so mark, they also used existing TTYs with real parity or
parity fixed to space. To avoid problems with parity, software authors
wisely set parity (bit 7) to 1 directly after reading the data register
of the serial line.
Having been a PDP-8 developer from that era, your theory is incorrect.
May be, but there were other computers in that area, that also used
TTYs. And cerainly there were TTYs in use in remote applications via
modems. And in this environment it would make sense to have real parity
to indicate transmission errors.
Post by ***@gmail.com
TTYs (as in ASR-33 and variant Teletypes) did not use either Mark or Space
parity. If so, paper tape loaders like RIM Loader and BIN Loader would not
work. TTYs of that era were set to full 8 bit, no automatic parity. It's > just that some keyboards did force the high bit set and some forced
the> high bit clear (independent of the paper tape reader) so your code
had to
Post by ***@gmail.com
either mask off the high bit or force it to be set if you were trying to
interpret ASCII characters. I've done it both ways.
That's exactly the same that I wrote.

A look in the TTY-manual shows: they had two types of keyboards. A pari-
ty keyboard and a non parity keyboard. The manual says, that the parity
keyboard produces even parity, and the non parity keyboard produces
always parity bit set to mark. Maybe that the parity keyboards were more
expensive, so that most people bought TTYs with no parity.
The rest of the machine doesn't care of parity. Bit 7 is a bit like the
others from the view of the PTP and PTR.
Post by ***@gmail.com
ABSLDR, BIN Loader, and RIM Loader all pay attention to the high bit.
RIM Loader treats the high bit set as leader-trailer regardless of the
remaining bits. ABSLDR and BIN Loader treat 200 as leader-trailer
(interpreting the last 2 characters before the trailer as the checksum).
That's right, but you don't produce RIM and BIN tapes on the keyboard of
a TTY. These files are generated by either PAL or an RIM generator. So
in this case, it doesn't matter if your TTY has even or no parity.

Klemens
William Cattey
2020-09-18 03:24:21 UTC
Permalink
The convention on the PDP-8 was "mark parity" for ASCII text characters.
When I moved from PDP-8 OS/8 land to UNIX land, I found it odd that ASCII character tests didn't have bit 7 set.

When I joined up with the PiDP-8/i software project, one of the first tools I wrote was a helper to translate between OS/8 ASCII with mark parity and POSIX with "space parity":

https://tangentsoft.com/pidp8i/file?name=src/misc/ptp2txt.c&ci=tip

/*
* Program to convert between POSIX ASCII text files
* and the output of OS/8 PIP to the Paper Tape Punch.

* The OS/8 paper tape punch format is:
*
* leader: a bunch of ASCII NUL chars to be ignored.
* ASCII with the 8th bit set, CR+LF line endings.
* trailer: a bunch of ASCII NUL chars to be ignored.

* This program can be used as a filter from stdin to stdout or
* it will create a new file with name ending in .txt if going to
* POSIX text or .ptp if going to OS/8 PIP Paper Tape format.

* If the program is called with the name "txt2ptp" then
* LTCOUNT (default 100) bytes of leader is prepended to the
* output file and LTCOUNT bytes of leader are appended.
* The 8th bit of every output character is set, and LF-only
* input is turned into CR+LF output. CR+LF input is passed
* as-is.

* If called by any other name, the ASCII NUL character is
* ignored anywhere in the file, and the 8th bit is cleared.
* Line endings are untouched in this case.

* This program helps work around the issue that the
* OS/8 Paper Tape reader handler assumes the last
* character in the buffer is junk, so that when you send
* a plain text file into PDP-8 SIMH OS/8 with the
* ptr device, the last character is lost.
*/

/*
* Author: Bill Cattey
* License: The SIMH License:

* Copyright © 2015-2017
* by Bill Cattey et. ux. William Cattey et. ux. Poetnerd

* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject
* to the following conditions:

* The above copyright notice and this permission notice shall be include
* in all copies or substantial portions of the Software.

* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS LISTED ABOVE BE
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
* IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.

* Except as contained in this notice, the names of the authors
* above shall not be used in advertising or otherwise to promote
* the sale, use or other dealings in this Software without
* prior written authorization from those authors.
*/

#include <errno.h>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <libgen.h>

#define BLOCK_SIZE 256
#define TO_PTP 1
#define TO_TXT 2
#define LTCHAR '\0'
/* PIP ASCII mode adds rubout after control chars so we strip them out too. */
#define RUBOUT '\377'
#define LTCOUNT 100

/* global variable: ltbuf */
int global_ltbuf[LTCOUNT];


void make_txt (FILE *fpin, FILE *fpout)
{
int inchar, outchar;
int read_ct, n;
char *obuffp;
char ibuff[BLOCK_SIZE], obuff[BLOCK_SIZE];

while ((read_ct = fread (ibuff, sizeof(char), BLOCK_SIZE, fpin))) {
obuffp = obuff;
for (n = 0; n < read_ct; n++) {
inchar = *(ibuff + n);
if (inchar == LTCHAR || inchar == RUBOUT) continue;
*obuffp++ = inchar & 0177;
}
fwrite (obuff, sizeof(char), obuffp - obuff, fpout);
}
}

/* We could just create an empty buffer and output it,
but this is better if for some reason LTCHAR changes. */
void init_ltbuf ()
{
int n;

for (n = 0; n < LTCOUNT; n++) {
global_ltbuf[n] = LTCHAR;
}
}

void make_lt (FILE *fpout)
{
fwrite (global_ltbuf, sizeof(char), LTCOUNT, fpout);
}

void make_ptp (FILE *fpin, FILE *fpout)
{
int inchar, outchar, prior = '\0';
int read_ct, n;
char *obuffp;
char ibuff[BLOCK_SIZE];
/* Every \n might add a \r to the output.
Worst case is obuff doubles in size. */
char obuff[2*BLOCK_SIZE];

make_lt (fpout);

while ((read_ct = fread (ibuff, sizeof(char), BLOCK_SIZE, fpin))) {
obuffp = obuff;
for (n = 0; n < read_ct; n++) {
inchar = *(ibuff + n);
if (inchar == '\n' && prior != '\r') {
*obuffp++ = (char)('\r' | 0200);
}
*obuffp++ = inchar | 0200;
prior = inchar;
}
fwrite (obuff, sizeof(char), obuffp - obuff, fpout);
}
/* If we don't already have an EOF, add one. */
if (inchar != '\032') {
fwrite ("\232", sizeof(char), 1, fpout);
}
make_lt (fpout);
}


void process_file (char *fname, int flag)
{
FILE *fpin, *fpout;

char *ofname;
char *fend;

if (flag == TO_PTP) fend = ".ptp";
else fend = ".txt";

ofname = malloc (((strlen (fname) + strlen(fend)) * sizeof (char)) + 1);
strcpy (ofname, fname);
strcat (ofname, fend);

/* printf ("Filename is: %s.\n", ofname); */

if ((fpin = fopen (fname, "r")) == NULL) {
printf ("Open of input file %s failed with status %d. Skipping.\n",
fname, errno);
return;
}
if ((fpout = fopen (ofname, "w")) == NULL) {
printf ("Open of output file %s failed with status %d. Skipping.\n",
ofname, errno);
return;
}
if (flag == TO_PTP)
make_ptp (fpin, fpout);
else
make_txt (fpin, fpout);

fclose (fpin);
fclose (fpout);
free (ofname);
}

int main (int argc, char *argv[])
{
int i, flag;
char *ltbuf;

if (strcmp (basename (argv[0]), "txt2ptp") == 0) {
/* printf ("Flag is TO_PTP"); */
flag = TO_PTP;
init_ltbuf ();
}
else {
flag = TO_TXT;
}

if (argc == 1) {
if (flag == TO_PTP) make_ptp (stdin, stdout);
else make_txt (stdin, stdout);
}
else {
for (i = 1; i < argc; i++) {
process_file (argv[i], flag);
}
}
}
K. Krause
2020-09-18 07:44:13 UTC
Permalink
Post by William Cattey
The convention on the PDP-8 was "mark parity" for ASCII text characters.
When I moved from PDP-8 OS/8 land to UNIX land, I found it odd that > ASCII character tests didn't have bit 7 set.
When I joined up with the PiDP-8/i software project, one of the first
tools I wrote was a helper to translate between OS/8 ASCII with mark
Oh boy!
.... much code dropped ... ;-)

why not:
tr "\201-\376" "\001-\176" < OS8ASC.TX > unix.txt

and to strip away the <CR>-Character from the OS8-Text:
tr -d "\015" < OS8ASC.TX > unix.txt

or both together:
tr "\201-\376" "\001-\176" < OS8ASC.TX | tr -d "\015" > posix.txt

Maybe there is a construct in tr, which does both in one process
without a pipe.
btw: Kermit does this translation in ASCII-mode in both directions
automatically. :-)

Klemens
Josef Moellers
2020-09-18 11:20:48 UTC
Permalink
Post by K. Krause
Post by William Cattey
The convention on the PDP-8 was "mark parity" for ASCII text characters.
When I moved from PDP-8 OS/8 land to UNIX land, I found it odd that  >
ASCII character tests didn't have bit 7 set.
When I joined up with the PiDP-8/i software project, one of the first
tools I wrote was a helper to translate between OS/8 ASCII with mark
Oh boy!
.... much code dropped ... ;-)
  tr "\201-\376" "\001-\176" < OS8ASC.TX > unix.txt
  tr -d "\015" < OS8ASC.TX > unix.txt
  tr "\201-\376" "\001-\176" < OS8ASC.TX | tr -d "\015" > posix.txt
Maybe there is a construct in tr, which does both in one process
without a pipe.
Yes, there is: "-d" deletes only those characters in SET1 which are not
translated, so
tr -d "\201-\376\015" "\001-\176" < OS8ASC.TX > posix.txt
should do the trick (tested with a DOS format text file with lower case
letters to a UN*X format text file with upper case letters:
tr 'a-z\015' 'A-Z' < DOS > UNIX)

Josef

Loading...