Software Input Formats


 

Before performing any statistical function in any statistical software, raw data has to be described to it. Almost all the time when the data is large, it is stored in an external files in different formats: FIXED FIELD- with or without variable names in the first line; BLANK-DELIMITED- with or without variable names in the first line; COMMA-DELIMITED- with or without variable names in the first line.

Various statistical packages read external datafiles in different formats.

SOFTWARE PLATFORM/ OPERATING SYSTEM
UNIX WIN/ DOS MAC
SAS Fixed Field Data Fixed Field Data Fixed Field Data
Blank-delimited Data Blank-delimited Data Blank-delimited Data
Comma-delimited Data Comma-delimited Data Comma-delimited Data
SAS Xport File SAS Xport File SAS Xport File
SAS Cport File SAS Cport File SAS Cport File
SPSS Fixed Field Data Fixed Field Data Fixed Field Data
Blank-delimited Data Blank-delimited Data Blank-delimited Data
Comma-delimited Data Comma-delimited Data Comma-delimited Data
SPSS Portable File SPSS Portable File SPSS Portable File
Intercooled STATA 7 Stata 4-5 Stata 4-5 Stata 4-5
Stata 7 Stata 7 Stata 7
Fixed Field Data Fixed Field Data Fixed Field Data
Comma-delimited Data Comma-delimited Data Comma-delimited Data
Blank-delimited Data Blank-delimited Data Blank-delimited Data
STATA 7 SE Excel Excel Excel
Comma-delimited Data Comma-delimited Data Comma-delimited Data
SPREADSHEET Comma-delimited Data Comma-delimited Data Comma-delimited Data
Tab-delimited Data Tab-delimited Data Tab-delimited Data
RATS, TSP Comma-delimited Data Comma-delimited Data Comma-delimited Data
Blank-delimited Data Blank-delimited Data Blank-delimited Data
S-PLUS Blank-delimited Data Blank-delimited Data Blank-delimited Data
Comma-delimited Data Comma-delimited Data Comma-delimited Data
SHAZAM Fixed Field Data Fixed Field Data Fixed Field Data
Blank-delimited Data Blank-delimited Data Blank-delimited Data
ARCINFO, MAPINFO etc. Comma-delimited Data Comma-delimited Data Comma-delimited Data
.dbf File .dbf File .dbf File

  Return to Top of Page


 

An Example of FIXED FIELD DATA

        A. Without Variable Names
109991 122222 421222 2471
109991 331423 436214 2761
109991 121322 421213 2591
109991 121022 421410 1731
12999112110611319999 1722
129991 92 82210292 8 3111
B. With Variable Names in first line (comma delimited or blank delimited)

PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP         
109991 122222 421222 2471
109991 331423 436214 2761
109991 121322 421213 2591
109991 121022 421410 1731
12999112110611319999 1722
129991 92 82210292 8 3111

  Return to Top of Page


 

An Example of COMMA-DELIMITED DATA

        A. Without Variable Names

10,999,1, 1,2,22,2,2, 4,2,1,2,22, 2,47,1
10,999,1, 3,3,14,2,3, 4,3,6,2,14, 2,76,1
10,999,1, 1,2,13,2,2, 4,2,1,2,13, 2,59,1
10,999,1, 1,2,10,2,2, 4,2,1,4,10, 1,73,1
12,999,1,12,1,10,6,1,13,1,9,9,99, 1,72,2
12,999,1, 9,2, 8,2,2,10,2,9,2, 8, 3,11,1
B. With Variable Names in First Line
              
PROVP,CMAPUMFP,HHCLASSP,HTYPEP,UNITSP,HHINCP,EFSTATP,EFSIZEP,CFSTATP,CFSIZEP,PRESCHP,MSCFINCP,CFINCP,HHSTATP,AGEP,SEXP
10,999,1, 1,2,22,2,2, 4,2,1,2,22, 2,47,1
10,999,1, 3,3,14,2,3, 4,3,6,2,14, 2,76,1
10,999,1, 1,2,13,2,2, 4,2,1,2,13, 2,59,1
10,999,1, 1,2,10,2,2, 4,2,1,4,10, 1,73,1
12,999,1,12,1,10,6,1,13,1,9,9,99, 1,72,2
12,999,1, 9,2, 8,2,2,10,2,9,2, 8, 3,11,1

  Return to Top of Page


An Example of BLANK-DELIMITED DATA

A. Without Variable Names

               
10 999 1  1 2 22 2 2  4 2 1 2 22  2 47 1
10 999 1  3 3 14 2 3  4 3 6 2 14  2 76 1
10 999 1  1 2 13 2 2  4 2 1 2 13  2 59 1
10 999 1  1 2 10 2 2  4 2 1 4 10  1 73 1
12 999 1 12 1 10 6 1 13 1 9 9 99  1 72 2
12 999 1  9 2  8 2 2 10 2 9 2  8  3 11 1
B. With Variable Names in First Line
               
PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP
10 999 1  1 2 22 2 2  4 2 1 2 22  2 47 1
10 999 1  3 3 14 2 3  4 3 6 2 14  2 76 1
10 999 1  1 2 13 2 2  4 2 1 2 13  2 59 1
10 999 1  1 2 10 2 2  4 2 1 4 10  1 73 1
12 999 1 12 1 10 6 1 13 1 9 9 99  1 72 2
12 999 1  9 2  8 2 2 10 2 9 2  8  3 11 1
NOTE: A data file that contains blanks as valid codes (or missing data fields) can not be made into a blank-delimited file without recoding all blanks to some other ASCII character from 0-9.

  Return to Top of Page


SPSS Commands for Reading Different Data Formats

SPSS Command for Reading Fixed Field File Without Variable Names
data list file='C:\myfile\data.txt' / V1 1-2 V2 3-5 V3 6-6 V4 7-8 V5 9-9 V6 10-11 V7 12-12 V8 13-13 V9 14-15 V10 16-16 V11 17-17 V12 18-18 V13 19-20 V14 21-22 V15 23-24 V16 25-25
SPSS Command (for MSWindows only) for Reading Fixed Field File With Variable Names (comma or blank separated)
data list file='C:\myfile\dataname.txt' skip = 1 / PROVP 1-2 CMAPUMFP 3-5 HHCLASSP 6-6 HTYPEP 7-8 UNITSP 9-9 HHINCP 10-11 EFSTATP 12-12 EFSIZEP 13-13 CFSTATP 14-15 CFSIZEP 16-16 PRESCHP 17-17 MSCFINCP 18-18 CFINCP 19-20 HHSTATP 21-22 AGEP 23-24 SEXP 25-25
SPSS Command for Reading Comma Delimited File Without Variable Names
data list list(' ') file='C:\myfile\data.csv' / V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16.
SPSS Command (for MSWindows only) for Reading Comma Delimited File with Variable Names in First Line
data list list(' ') file='C:\myfile\dataname.csv' skip = 1 / PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP.
SPSS Command for Reading Blank-Delimited File Without Variable Names
data list list(' ') file='C:\myfile\data.dat' / V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16.
SPSS Command (for MSWindows only) for Reading Blank-Delimited File with Variable Names in First Line
data list list(' ') file='C:\myfile\dataname.dat' skip = 1 / PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP .
NOTE: SPSS for MSWindows treats adjoining blanks as two different variables. Therefore blank delimited data file to be read by SPSS for windows should not contain more than one blanks between two adjoining variables.

  Return to Top of Page


 

SPSS Commands for Writing Different Data Formats

SPSS Commands for Writing SPSS System File
save outfile='[path/fn]'/
execute
finish
SPSS Commands for Writing SPSS Export File
export outfile='[path/fn]'/
execute
finish
SPSS Commands for Writing Comma Delimited Data File
write outfile='[path/fn]'/ PROVP ',' CMAPUMFP ',' HHCLASSP ',' HTYPEP ',' UNITSP ',' HHINCP ',' EFSTATP ',' EFSIZEP ',' CFSTATP ',' CFSIZEP ',' PRESCHP ',' MSCFINCP ',' CFINCP ',' HHSTATP ',' AGEP ',' SEXP
execute
finish
SPSS Commands for Writing Blank Delimited Data File
write outfile='[path/fn]'/ PROVP ' ' CMAPUMFP ' ' HHCLASSP ' ' HTYPEP ' ' UNITSP ' ' HHINCP ' ' EFSTATP ' ' EFSIZEP ' ' CFSTATP ' ' CFSIZEP ' ' PRESCHP ' ' MSCFINCP ' ' CFINCP ' ' HHSTATP ' ' AGEP ' ' SEXP
execute
finish
SPSS Commands for Writing Fixed-field Format Data File
write outfile='[path/fn]' table/ PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP
execute finish

  Return to Top of Page


SHAZAM Commands to Read Different Data Formats

The standard format for a SHAZAM data file requires that the file be prepared as a plain text file with numbers separated by spaces or commas. Free format is allowed. That is, there are no constraints on column position.

Note that a comma is treated as a separator. Therefore the number 12,560 will be interpreted as two numbers: '12' and '560'. For correct interpretation by SHAZAM commas in numeric data should be removed. This can be done in an editor with a global edit change.

In general there must be no descriptive information and no special characters of any kind embedded in the data file (an exception to this is when the FORMAT command is used). Data documentation can be placed as a header to the file or at the very end of the file.

Spread-sheet data files can used with one of the following methods:
Convert the spreadsheet to a plain text file (an ASCII file) by using the Save As ... option from the File menu. Save the spreadsheet in DIF format. DIF files can be loaded with the SHAZAM READ command.
Microsoft Excel XLS files can be read by SHAZAM.
Instructions are available.

SHAZAM Command for Reading Blank-Delimited or Comma-Delimited File Without Variable Names

samp 1 6
read (C:\myfile\data.dat) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16/ nobyvar list
SHAZAM Command for Reading Blank-Delimited or Comma-Delimited File with Variable Names in First Line
samp 1 6
read (C:\myfile\data.dat) PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP/ nobyvar list skiplines=1

  Return to Top of Page


STATA Commands to Read Different Data Formats

Stata can read ASCII datasets but not binary datasets. If the dataset to be read into Stata is in binary format or the "internal" format of another software package, you must either translate it into ASCII or use some other program for conversion. One solution to converting Excel datasets to Stata format is to obtain a data-translation package. You could obtain Stat/Transfer from StataCorp or DBMSCopy from Conceptual Systems.

Alternatively Microsoft Excel (.XLS) files can be read directly by STATA. Instructions are available.

Note: Stata 4-5 truncates variable labels to 16 characters, and value labels to 8 characters.

STATA Command for Reading Fixed Field File

Fixed field data can be read into Stata by two commands: "Infile" and "Infix". Most people think "Infix" is easier to use for reading fixed-format data, but "Infile" has more features.

Syntax for "Infile"

. infile PROVP 1 CMAPUMFP 3 HHCLASSP 6 HTYPEP 7 UNITSP 9 HHINCP 10 EFSTATP 12 EFSIZEP 13 CFSTATP 14 CFSIZEP 16 PRESCHP 17 MSCFINCP 18 CFINCP 19 HHSTATP 21 AGEP 23 SEXP 25 using "C:\myfile\data.txt"

Syntax for "Infix"

There are two ways to use "Infix". One is to type the specifications that describe how to read the fixed format data on the command line:

. infix PROVP 1-2 CMAPUMFP 3-5 HHCLASSP 6-6 HTYPEP 7-8 UNITSP 9-9 HHINCP 10-11 EFSTATP 12-12 EFSIZEP 13-13 CFSTATP 14-15 CFSIZEP 16-16 PRESCHP 17-17 MSCFINCP 18-18 CFINCP 19-20 HHSTATP 21-22 AGEP 23-24 SEXP 25-25 using "C:\myfile\data.txt"

The other is to type the specifications into a file (with .dct extension) and then, inside Stata, type

.infix using "C:\myfile\data.dct"

STATA Command for Reading Blank/Tab/Comma Delimited Data File with variable names in the first line
. insheet using "C:\myfile\data.txt"
or
. infile using "C:\myfile\data.txt"
STATA Command for Reading Blank/Tab/Comma Delimited Data File withot variable names in the first line
. insheet PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZE PCFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP using "C:\myfile\data.txt"
NOTE: An Alphabetic variable name has to be preceded by str

  Return to Top of Page


SAS Commands to Read Different Data Formats


The DATA Step in SAS names the SAS Data Set that is being created (the DATA statement) tells SAS where the data may be found (the INFILE statement or FILENAME statement followed by INFILE statenment if the data are in an external file or a CARDS statement if the data are instream) and describes the setup and format of the raw data (the INPUT statement). INPUT statement actually initiates the reading of the data: INFILE CARDS and DATA statements simply tell SAA where to look for it and what to name it.

SAS Commands for Reading Fixed Field File Without Variable Names

libname lib '[path]';
data lib.test;
infile 'C:/myfile/data.txt';
input PROVP 1-2 CMAPUMFP 3-5 HHCLASSP 6-6 HTYPEP 7-8 UNITSP 9-9 HHINCP 10-11 EFSTATP 12-12
EFSIZEP 13-13 CFSTATP 14-15 CFSIZEP 16-16 PRESCHP 17-17 MSCFINCP 18-18 CFINCP 19-20 HHSTATP 21-22 AGEP 23-24 SEXP 25-25;
run;
proc print data=lib.test;
run;
OR
filename in 'C:/myfile/data.txt';
libname lib '[path]';
data lib.test;
infile in;
input PROVP 1-2 CMAPUMFP 3-5 HHCLASSP 6-6 HTYPEP 7-8 UNITSP 9-9 HHINCP 10-11 EFSTATP 12-12
EFSIZEP 13-13 CFSTATP 14-15 CFSIZEP 16-16 PRESCHP 17-17 MSCFINCP 18-18 CFINCP 19-20 HHSTATP 21-22 AGEP 23-24 SEXP 25-25; run;
proc print data=lib.test;
run;
Note: Fixed Field Data Input (Column Input): Points to Remember
SAS Commands for Blank Delimited Data File (List Format) Without Variable Names
libname temp '[path]';
data temp.test;
infile 'C:/myfile/data.dat';
input PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP;
run;
proc print data=lib.test;
run;
OR
filename in 'C:/myfile/data.dat';
libname temp 'C:/myfile';
data temp.test;
infile in;
input PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP;
run;
proc print data=lib.test;
run;
NOTE: List Format has the drawback of being unable to skip fields. If you are interested in the fifth field of data you must read the preceeding four as well.

SAS Command for Comma Delimited Data File Without Variable Names

libname temp '[path]'
data temp.test;
infile 'c:\myfile\data.csv' dlm=',';
input PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP;
run;
proc print data=lib.test;
run;
OR
filename in 'c:\sushil\data.csv';
libname temp '[path]';
data temp.test;
infile in dlm=',';
input PROVP CMAPUMFP HHCLASSP HTYPEP UNITSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP HHSTATP AGEP SEXP;
run;
proc print data=lib.test;
run;
SAS Commands to read Permanent SAS Datasets
a) If the dataset does not require modification (calculation subsetting and so forth) it can directly be used in in the PROC's DATA statement.

b) But if modification of the dataset is necessary before using it in PROC read it in a DATA step with a SET command. SET performs functions similar to those of the iNPUT statement. INPUT moves raw data into memory then reads the raw data translates it into SAS variable and moves it into another area in memory possibly wriring it to a dataset. The SET statement performs these dataset movement tasks for SAS datasets both temorary and permanent.

libname temp1 '[path]'
data temp1.test1
set temp.test;
(keep=PROVP CMAPUMFP HHCLASSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP;) run;
proc print data=temp1.test1;
run;

SAS Commands to read SAS Transport file in SAS/Windows

libname trans '/.../sinc/sinc96.tpt' /* sets up a library reference-trans-to the file SINC96.tpt */
proc [someproc] data=trans.sinc; /*to use the file need to use the lib reference just created*/

  Return to Top of Page


TSP Commands for Reading Different Data Formats

TSP Commands for Reading Comma or Blank Delimited Data Files
Read (File='c:\sushil\data.txt') PROVP CMAPUMFP HHCLASSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP;

  Return to Top of Page


RATS Commands for Reading Different Data Formats


RATS Commands for Reading Comma or Blank Delimited Data Files
open data c:\sushil\data.dat
data (format=free,org=obs) /
PROVP CMAPUMFP HHCLASSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP

  Return to Top of Page


S-Plus Commands for Reading Different Data Formats

S-Plus Commands for Reading Blank Delimited Data Files
scan("c:\sushil\data.dat",
+ list (PROVP CMAPUMFP HHCLASSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP))
Note :By default SCAN function reads numeric data, variables being separated by blanks. To read character strings, an arguement has to follow the variable name: VAR=""- for string variable and VAR=0- for numeric variable.
S-Plus Commands for Reading Comma Delimited Data Files
scan("c:\sushil\data.dat", sep=",",
+ list (PROVP CMAPUMFP HHCLASSP HHINCP EFSTATP EFSIZEP CFSTATP CFSIZEP PRESCHP MSCFINCP CFINCP))

  Return to Top of Page


Html document by Sushil Kumar, Data Library Service, University of Toronto. Comments and corrections to <dlsg@chass.utoronto.ca>. Created: 03/2001. Last updated: 2004/09/09