How to Access Files in IBM S370/MVS VB Format


Some files in IRSS's Data Archive are binary copies of files originally written on older IBM mainframe systems running the MVS operating system in a form known as "variable length, blocked records" or RECFM=VB files. Such files have names ending in ".vb" in the Data Archive. Because of their special format, these files cannot be read by most statistical software on UNIX, Windows or Macintosh systems. However, they can be read by SAS® using some specialized code on any of these platforms.

You must do a binary download of the file to your computing platform using FTP prior to attempting to read it with SAS.

Two SAS data step statements require special modifications in order to read .vb files -- the INFILE and INPUT statements.

INFILE Statement

You must add RECFM=S370VB LRECL=32768 to the INFILE statement after the name of the file you are trying to read. For example, the following INFILE statement would be used to read the physician billing data from the Current Medicare Benficiary study contained in the UNIX file "phybill.data.vb" in your home subdirectory on a UNIX system.

INFILE "~/phybill.data.vb"   RECFM=S370VB LRECL=32768 ;

The folowing INFILE statement would read the same data from a file name "phybill.vb" in a subdirectory on the C: drive of a Windows PC.

INFILE "C:\health\phybill.vb"   RECFM=S370VB LRECL=32768 ;


INPUT Statement

Because of the special format of a .vb file, you may not read it with the conventional variable name list, column position or input format techniques you usually employ on an INPUT statement. Instead all data fields must be read with specialized input formats. While there are many possibilities, the most commonly used input formats are shown in Table 1.

TABLE 1
Commonly Used Input Formats for
Reading Data Fields in IBM/MVS RECFM=VB Files
Input Format Name Used for Reading Yields SAS Variable Type
$EBCDICw. Character Fields Character
S370FFw. Numeric Character Field Numeric
S370FIBw. IBM/MVS Integer Binary Field Numeric
S370FPDw. IBM/MVS Packed Decimal Field Numeric
S370FZDw. IBM/MVS Zoned Decimal Field Numeric
Note: the small w stands for the width of the field. E.G. $EBCDIC4. reads four columns of characters.

Here's an example of using some of these formats along with the column pointer (i.e. @n) to read some fields from the Medicare Current Benficiary Study billing data on a UNIX platform.
DATA  billing ;
INFILE  "~/phybill.data.vb"  RECFM=S370VB LRECL=32768  ;
INPUT @1  RIC $EBCDIC1.  REFYEAR S370FF2.  BASEID $EBCDIC8.                                     
          VERSION $EBCDIC1.  (HPROCDT FROMDT THRUDT) ($EBCDIC6.)                              
          CWFLOC $EBCDIC1.  ACCRDT $EBCDIC6.  ACCRNO S370FPD2.  DISPCD $EBCDIC2. ;
As you can see there are a lot of character variables read with the $EBCDICw. input format. The second variable (REFYEAR) is read as a two digit number with the S370FF2. fomat. Also, one variable (ACCRNO) is read with the packed decimal (S370FPD2.) format. It is necessary to read the documentation supplied with these files very carefully to determine what format is needed to read a particular data field.