Overview

Five sets of flat files are built from the CRSP Stock and Index databases. A number of our subscribers have long requested alternate formats of our databases in order to automate or streamline their processes and easily ingest CRSP data into them. These flat files are intended for these purposes.

  • SIZ – 1925 US Stock and Indexes
  • SAZ – 1925 US Stock
  • SXZ – 1962 US Stock and Indexes
  • S6Z – 1962 US Stock
  • SFZ - 1925 US Indexes
Stock Files SAS ASCII
Delist – Daily Return Information sfz_del.sas7bdat sfz_del.dat
Delist – Monthly Return Information sfz_mdel.sas7bdat sfz_mdel.dat
Distributions sfz_dis.sas7bdat sfz_dis.dat
Index Membership * sfz_mbr.sas7bdat sfz_mbr.dat
Name History sfz_nam.sas7bdat sfz_nam.dat
NASDAQ History sfz_ndi.sas7bdat sfz_ndi.dat
Portfolio Membership - Daily * sfz_portd.sas7bdat sfz_portd.dat
Portfolio Membership - Monthly * sfz_portm.sas7bdat sfz_portm.dat
Security Header Information sfz_hdr.sas7bdat sfz_hdr.dat
Shares History sfz_shr.sas7bdat sfz_shr.dat
Time Series – Daily Primary sfz_dp_dly.sas7bdat sfz_dp_dly.dat
Time Series – Daily Secondary sfz_ds_dly.sas7bdat sfz_ds_dly.dat
Time Series - Monthly sfz_mth.sas7bdat sfz_mth.dat
Index Files SAS ASCII
Index Header sfz_indhdr.sas7bdat sfz_indhdr.dat
Rebalance * sfz_rb.sas7bdat sfz_rb.dat
Time Series – Daily Index sfz_dind.sas7bdat sfz_dind.dat
Time Series – Monthly Index sfz_mind.sas7bdat sfz_mind.dat

* Available in 1925 and 1962 US Stock & Indexes, and 1925 US Indexes, but are NOT available in 1925 and 1962 US Stock.

Comparisons between the legacy CRSPAccess Stock and Index databases and the CRSP flat files will reveal differences, outlined here:

Derived Data

The flat files provide all of the underlying data that can be used to generate derived data items, but do not include the derived items themselves. Examples of derived data that TSQuery and ts_print can generate that are not found in the flat include excess and cumulative returns, last non-missing, recent, previous period versions of time series data items.

Double Precision

CRSPAccess stored most items, including commonly used Price (PRC), Ask or High Price (ASKHI), and Bid or Low Price (BIDLO), as 4-byte floating point numbers. The SAS format defaults to storing those items as 8-byte (double precision) floating point numbers. For the majority of historical values, this does not result in any difference. However, the conversion to SAS’s double precision, makes more apparent a known issue of phantom precision related to decimal pricing. For example, IBM’s price on 11/27/2015 of $138.46 will, if displayed with five decimal places, show $138.46001. The ASCII version, by outputting only seven significant digits in scientific notation, will have just 1.384600E+02, and no phantom precision.

CRSP is re-engineering our back office to correct the phantom precision and increase the real precision throughout the Stock and Indexes flat files in the coming year.

9-Character CUSIP

Subscribers have long-requested the addition of 9-character cusips to our databases. The flat files include them in the name- and header history files.

CRSP Total Market

The investable CRSP Total Market Index is included in the index files associated with the stock and index database. Daily price-only and total returns, levels, and counts are included.

Volumes differences

The introduction of double precision allows for the monthly volumes to be stored in same unit (1 share) as daily volumes rather than as a unit of 100 shares. So monthly volumes in the flat files are now stored as one hundred times the value in legacy CRSPAccess. The change also allowed for four missing daily volumes for Citigroup in 2009 and 2010 to be replaced with actual values that were in excess of two billion, and for those monthly volumes to be recalculated.

KYPERMNO CALDT New VOL MCALDT New MVOL
70519 08/05/2009 2,674,463,281 08/31/2009 22,798,732,177
70519 12/17/2009 3,772,638,437 12/31/2009 15,021,795,593
70519 12/18/2009 2,813,697,156
70519 12/07/2010 3,267,829,406 08/31/2009 13,427,190,606

Missing Values

In CRSPAccess, missing values were indicated by the use of defined non-null values (e.g -99 for returns and 0 for prices). In the flat files, missing values are now represented by null values.

In the SAS files, CRSP uses the default SAS missing value, displayed as a “.”

In the coming year, CRSP will be further examining all missing value conventions.