How to Access Social Science and Health Data Files
Using the SAS FTP Access Method

You can read data files stored in IRSS's Data Archive FTP site directly into a SAS® program runnning on any computer that supports FTP and is hooked to the UNC campus network. These include ATN's central UNIX statistical server (statapps.unc.edu) or any Winxx or Macintosh desktop PC in an office or a computer laboratory.

SAS® is capable of running the FTP session, transferrring the file and making it appear like it was being read on the local machine. Users of ATN's statapps statistical server may find this method to be more convenient than waiting for file migrations from that machine's mass storage system because all files at IRSS's FTP site are online at all times. While this method can theoretically work for PC's using PPP network connections over a telephone modem, it will usually only be successful for data files no larger than 2 megabytes and can still be quite slow.

Reading data in this manner is slower than reading data from a hard drive that is directly attached to a computing platform. However, it is still reasonably rapid. For example, reading the 1990 Texas PUMS file (1.27 million records and over 296 megabytes) from a drive attached to UNC's UNIX statistical server (statapps.unc.edu) can take between 25 and 90 seconds, depending on how many people are using the machine. Reading the same file from statapps using SAS with the FTP access method takes between six and ten minutes. This illustrates the difference in processing time for extremely large files. Most files under 50 megabytes take less than 60 seconds to read from statapps.

The following example code illustrates how to read a raw data file, a SAS export format file and an SPSS portable format file. You may wish to copy one or more of these examples and run them on your computing platform as a first step in learning this technique.

Reading a Raw Data File

Note the use of the FTP keyword on the FILENAME statement and the other keywords which provide login information and the location of the file for the FTP session. The DEBUG keyword is optional but can provide useful diagnostic information if there is a problem with the FTP session.

*-----------------------------------------------------------*
*  Example of accessing an IRSS Data Archive data file from *
*  ftp.irss.unc.edu using SAS(R) FTP access method on a     *
*  FILENAME statement. Login as "anonymous" and use e-mail  *
*  address as the password.                                 *
*                                                           *  
*  This will work from any SAS session on a                 *
*  platform with FTP and access to the Internet like        *
*  statapps.unc.edu or a PC on the UNC campus net.          *
*                                                           *
*  All Data Archive files are below the /pub/irss           *
*  subdirectory.                                            *
*                                                           *
*  Some poll files are available to anyone. Most files      *
*  are only available to UNC-CH faculty, students & staff.  *
*  See www.irss.unc.edu/data_archive/accessing.files.html.  *
*                                                           *
*  Prepared by: Ken Hardy  2/10/1998                        *
*-----------------------------------------------------------*;

DATA ;
FILENAME  fileref    FTP          /* Must use the FTP access method     */
 '/pub/irss/roper/yank8348/data'  /* Complete path and file name, NEVER */
                                  /* use the .gz file extension         */
 HOST='ftp.irss.unc.edu'          /* Address of IRSS FTP site           */
 USER='anonymous'                 /* Login as anonymous                 */
 PASS='userid@email.address'      /* PW is E-mail address               */
 LRECL= 93                        /* Required if record length > 256    */
 DEBUG ;                          /* Useful/optional diagnostic info    */
INFILE  fileref  ;
INPUT  @9 region 1.  @11 areatype 1.  @18 familyok 1.  @21 famties 1.  ;
LABEL  region   = 'Census Region of USA'
       areatype = 'Metro-Suburb-Nonmetro'
       familyok = 'How well is family doing?'
       famties =  'Importance of family ties' ;

run;
PROC FREQ ;
  TABLES  (familyok famties)*region / NOROW NOPCT CHISQ ;
run;


Reading a SAS XPORT Format Dataset

Note the addition of the LIBNAME statement that uses the same name as the FILENAME statement, the use of the RCMD keyword to ensure a binary transfer and the XPORT keyword on the LIBNAME statement.

*---------------------------------------------------------*
*  Example of accessing an IRSS Data Archive SAS export    *
*  file from ftp.irss.unc.edu using SAS(R) FTP access      *
*  method on a FILENAME statement. Login as "anonymous"    *
*  and use e-mail address as the password.                 *
*                                                          *
*  This will work from any SAS session on a                *
*  platform with FTP and access to the Internet like       *
*  statapps.unc.edu or a PC on the UNC campus net.         *
*                                                          *
*  All Data Archive files are below the /pub/irss          *
*  subdirectory.                                           *
*                                                          *
*  Some poll files are available to anyone. Most files     *
*  are only available to UNC-CH faculty, students & staff. *
*  See www.irss.unc.edu/data_archive/accessing.files.html. *
*                                                          *
*                                                          *
* This example gets a SAS library containing both a data   *
* file and a CNTLOUT format data set for creating formats. *
*  Prepared by: Ken Hardy  2/10/1998                       *
*----------------------------------------------------------*;

FILENAME  fileref    FTP              /* Must use the FTP access method   */
 '/pub/irss/harris/s9708/sas.export' /* Complete path and file name, NEVER */
                                     /* use the .gz file extension         */
 HOST='ftp.irss.unc.edu'             /* Address of IRSS FTP site           */
 USER='anonymous'                    /* Login as anonymous                 */
 PASS='userid@email.address'         /* PW is E-mail address               */
 DEBUG ;                             /* Useful/optional diagnostic info    */
LIBNAME fileref XPORT ;               /* Links SAS library to FTP         */
LIBNAME library ''    ;               /* Location for format catalog      */
PROC FORMAT CNTLIN=fileref.formats library=library ;
DATA library.s9708    ;
SET fileref.s9708 ;
run;
PROC FREQ ;
run;


Reading an SPSS Portable Format File

Note that the name on the FILENAME and LIBNAME statements must be identical and that the SPSS keyword must apear on the LIBNAME statement.
*--------------------------------------------------------------*
*  Example of accessing an IRSS Data Archive SPSS portable file*
*  from ftp.irss.unc.edu using SAS(R) FTP access method  on a  *
*  FILENAME statement. Login as "anonymous" and use e-mail     *
*  address as the password.                                    *
*                                                              *
*  This will work from any SAS session on a                    *
*  platform with FTP and access to the Internet like           *
*  statapps.unc.edu or a PC on the UNC campus net.             *
*                                                              *
*  All Data Archive files are below the /pub/irss              *
*  subdirectory.                                               *
*                                                              *
*  Some poll files are available to anyone. Most files         *
*  are only available to UNC-CH faculty, students & staff.     *
*  See www.irss.unc.edu/data_archive/accessing.files.html.     *
*                                                              *
*  Prepared by: Ken Hardy  2/10/1998                           *
*--------------------------------------------------------------*;

DATA ;
FILENAME  fileref    FTP                  /* Must us the FTP device type     */
 '/pub/irss/roper/gss7296/spss.portable'  /* Complete path and file name, NEVER */
                                          /* use the .gz file extension         */
 HOST='ftp.irss.unc.edu'                  /* Address of IRSS FTP site           */
 USER='anonymous'                         /* Login as anonymous                 */
 PASS='userid@email.address'              /* PW is E-mail address               */
 DEBUG ;                                  /* Useful/optional diagnostic info    */
LIBNAME  fileref  SPSS ;                  /* SPSS engine                     */
PROC FREQ DATA=fileref._first_ ;
  TABLES  race ;
run;