Return to User Guide Table of Contents
- Extract file overview
- Codebook
- Tabulation geography metadata
- Table metadata
- Data dictionary (variable metadata)
- Data files
Extract file overview

You will receive your extract as a .zip file, which you can open with your favorite unzipping utility. Inside the .zip file, you will find four metadata files and one or more data files.
The metadata file names begin with "ihgis" and your extract number.
Data file names consist of:
- Two-letter country code
- Year
- Dataset type code (e.g., "ag" for agricultural census, "pop" for population census)
- Three-character table code
- Hierarchical level code (g0 is national, g1 is the largest subnational units, etc.)
Codebook
The ihgisXXXX_codebook.txt file is a human-readable summary of the contents of your extract. It includes basic information about the datasets, tables, and variables, as well as the recommended citation for IHGIS.
The other metadata files are provided as comma-separated values files to facilitate importing them into statistical packages or other software tools.
Tabulation geography metadata

The ihgisXXX_geog.csv file provides the name of each tabulation geography that is included in one or more tables in your extract.
Table metadata
The ihgisXXXX_tables.csv file provides detailed metadata for each table in your extract. The fields in this file consist of:
- dataset, table, and datset_table: Codes for the dataset, data table, and a concatentaion in which the two codes are separted by a period.
- title: Title of the table.
- table_num: Designation fo the table in the source document.
- table_universe: Entities considered in the table. For percents and ratios, the universe refers to the denominator.
- tabulation_geogs: Tabulation geographies for which the table is available.
- tabulation_geog_labels: Names of the tabulation geographies.
- source_pub_eng: Title of the document or document series in which the table was originally published. It may be a translation into English of the original native-language title.
- country: The country the table describes.
- footnotes: Any footnotes present for the table. (May not be present.)

Data dictionary (variable metadata)
The ihgisXXXX_datadict.csv file provides detailed metadata for the variables (columns) in the tables in your extract. This information is key to interpreting the data in the data files. The fields in this file consist of:
- dataset, table: Codes matching the file name of the data file containgg the listed variables.
- table_var: Codes prviding the link to the column headers in the data files.
- label: Description of the variable represented in the correspondign column in the data files, i.e. the column header
- data_year: Year represented by the data in a given column, which may be different from the year of the dataset. For example, a table may describe population growth over time, with population counts from several years prior to the census.
- universe: Describes the scope of who or what is covered by the variable. For example, data on marital status or economic activity may only cover persons over a certain age. For percents and ratios, the universe refers to the denominator.
- agg_method: Arithmetic operation used to aggregate information from individual census responses to calculate the summary values in the table. The most common aggregation methods are count and percent.
- agg_detail: Additional aggregation details necessary to fully describe how the variable was calculated. For example, aggregation details may include units of measurement, numerators and denominators of ratios, or scaling factors.

Data files
Each data file contains data from a table in the source document for one tabulation geography. (In cases where the source document included separate tables for subnational geographic units, those tables have been combined into nation-wide data files.)
GISJOIN codes provide the link between rows of data and polygons in the GIS boundary files. You may join data files to shapefiles in a GIS package using the GISJOIN field in both files.
The next set of columns (g0, g1, g2…) provides the names of the geographic units and their parent units.
The remaining columns provide the actual data. The codes in the header row (e.g., AAA001) correspond to variable descriptions in the codebook and data dictionary.
