CBD(5) FILE FORMATS CBD(5) NAME cbd - vector map data format for mapbrowser and mapwriter DESCRIPTION This is a file format for compressed binary map database (*.cbd) files, designed to encode features from a vector map database. Map data in this format can be used by the map- browser and mapwriter programs to browse and render raster map images from the vector data. This format encodes data line graph (DLG) data in latitude- longitude coordinates into a compressed format designed by Brian Reid and slightly modified by Steve Putz. It was designed to efficiently encode map data such as from the World Data Bank II. The cbd files used by mapbrowser are 7-9 times smaller than original uncompressed data files. World Bank Database The CIA World Bank II database is available from the U.S. Government and is in the public domain. The map database is divided into continents (North America, South America, Europe, Africa, and Asia), and within continents it is divided into "Coastlines, Islands, and Lakes" (cil files), "Rivers" (riv files), "International political boundaries" (bdy files) and "National political boundaries" (pby files). Each file is divided into several thousand "segments", and each segment is divided into some number of "strokes". The CIA World Bank II original data is a series of COBOL records, specifying 5,719,617 individual vectors, which occupies about 130 megabytes of disk space (This format is apparently compatible with a package called GS-CAM). The "cbd" files used by the mapbrowser program are compressed binary encodings of it, that collectively occupy about 15 megabytes of disk space. The "cbd" files are produced from the original data by the ciamap program. U.S. GeoData The Earth Science Information Centers (ESIC) distribute digital cartographic/geographic data files produced by the U.S. Geological Survey (USGS) as part of the National Map- ping Program. 1:2,000,000-scale DLG data for the United States is available on CD-ROM. The "Graphic" format on the CD-ROM is the same GS-CAM format used by the World Bank Database. Data File Format The .cbd encoding divides a database into several files. In each file there is a header record, a series of segments and strokes, and a segment index. A separate mapset file describes the relationships of the files in a database. MAPBROWSER Last change: 7 June 1993 1 CBD(5) FILE FORMATS CBD(5) File Header Format The original format had a 40 byte header consisting of just the first five integer values shown below plus 20 unused bytes. The modified format used by mapbrowser adds eight additional integer fields for a total of 52 bytes. struct cbdhead { long magic; /* Magic number */ long dictaddr; /* Offset of segment dictionary in file */ long segcount; /* Number of segments in file */ long segsize; /* Size of segment dictionary (bytes) */ long segmax; /* Size of largest segment's strokes, (bytes/2) */ /* the following apply to CBD_MAGIC2 only */ long maxlat,minlat,maxlong,minlong; /* Bounding box of map */ long features; /* bits indicate feature "ranks" present */ long scale_shift; /* bits to shift coordinate data */ long lat_offset; /* lattitude offset */ long lng_offset; /* longitude offset */ }; The first 4-bytes in Brian's original cbd files contain 0x20770002 (CBD_MAGIC). For the extended format, the first word should contain 0x20770033 (CBD_MAGIC2). #define CBD_MAGIC 0x20770002 #define CBD_HEADSIZE1 40 /* size of old header */ #define CBD_MAGIC2 0x20770033 #define CBD_HEADSIZE2 (sizeof(struct cbdhead)) In order to support efficient skipping of an entire file, the extended format header includes coordinates defining the bounding box of all vectors in the file and a bit mask indi- cating which feature codes are present in the file. The scale_shift value (if non-zero) is an exponent for scaling the coordinate data by a power of two. A negative scale_shift allows coordinates to be represented at a reso- lution finer than integer seconds of latitude/longitude. The latitude/longitude offsets are added to the vector data after scaling. The scale and offsets are also applied to the bounding box coordinates in the file header and segment dictionary (I think). Segment Header Format Each segment begins with a segment header, followed by data compressed strokes. struct seghead { BIT32 orgx,orgy; /* Origin of first stroke in segment */ BIT32 id; /* Segment identifier serial number */ BIT16 nstrokes; /* How many strokes in the segment follow */ }; MAPBROWSER Last change: 7 June 1993 2 CBD(5) FILE FORMATS CBD(5) The data-compression scheme uses integer seconds to represent latitude/longitude coordinates (shifted and scaled as indicated in the file header), and stores each stroke as a [dx,dy] from the previous point. If dy will fit into 8 bits and dx into 7 bits, then the entire [dx,dy] is stored in a 16-bit field with bit 0x4000 turned on as a flag. If either value is too large for that scheme, then both are stored as 32-bit values, with the 0x40000000 bit turned off (even in negative numbers) in the first of them. #define MAX8y ((short) 0x7F) /* largest signed 8-bit value */ #define MAX8x ((short) 0x3F) /* largest signed 8-bit value */ #define SHORTFLAG 0x4000 /* flag saying this is a short stroke */ #define SHORTBYTE 0x40 /* flag saying this is a short stroke */ Segment Dictionary Format The segment dictionary sits on the end of the file, is pointed to by the dictaddr value in the file header, and points to the segment headers via the absaddr values. struct segdict { BIT32 segid; /* Segment identifier serial number */ BIT32 maxlat,minlat,maxlong,minlong; /* Bounding box of strokes */ BIT32 absaddr; /* Address in file of segment header */ BIT16 nbytes; /* # bytes of strokes that follow */ BIT16 rank; /* Type of feature this segment draws */ }; The bounding box information in each segment dictionary entry allows clipped segments to be quickly skipped. The rank value is a positive integer feature code indicating which map feature type is represented by the segment. The feature codes are normally defined in the mapset file for the database. Feature Codes The feature code (rank) is an indication of what each seg- ment depicts. The assignments of codes can be different for different databases but must be consistent among cbd files within a database. The following are the feature codes used in the World Bank II database: In "Boundary" files: 01 Demarcated or delimited boundary 02 Indefinite or in Dispute 03 Other line of separation of soverignity on land In "Coast, Islands and Lakes" files: 01 Coast, islands and lakes that appear on all maps 02 Additional major islands and lakes 03 Intermediate islands and lakes 04 Minor islands and lakes 06 Intermittent major lakes MAPBROWSER Last change: 7 June 1993 3 CBD(5) FILE FORMATS CBD(5) 07 Intermittent minor lakes 08 Reefs 09 Salt pans -- major 10 Salt pans -- minor 13 Ice Shelves -- major 14 Ice Shelves -- minor 15 Glaciers In "Rivers" files: 01 Permanent major rivers 02 Additional major rivers 03 Additional rivers 04 Minor rivers 05 Double lined rivers 06 Intermittent rivers -- major 07 Intermittent rivers -- additional 08 Intermittent rivers -- minor 10 Major canals 11 Canals of lesser importance 12 Canals -- irrigation type FILES cbdmap.h Header file for compressed binary map data- base (*.cbd) files. ciamap.c Program for converting from GS-CAM format to cbd format. /import/mapbrowser Location of mapbrowser software and data at Xerox PARC. BUGS The format described here includes extensions of Brian Reid's original cbd format necessary to support features of the mapbrowser program. Files in this format are unfor- tunately not compatible with Brian's netmap program. AUTHOR Original format by Brian Reid Modified by Steve Putz SEE ALSO mapset(5), mapbrowser(1), mapwriter(1) MAPBROWSER Last change: 7 June 1993 4