General Bible Format Tagging Specification
Purpose
This specification is intended as an aid to preparing Bible Texts for use with various Bible search programs. While it is possible to parse and use these files directly, these files may be further processed by a Bible search program by indexing and/or converting to another format. Conversions to other formats should be relatively simple. This format is designed to leave most of the detailed formatting up to the Bible search program, which may be running in DOS, Windows, or some other platform.
Caution
Since this format is still under development, some things might change, especially with respect to footnote and cross reference handling. There are still some things that are a bit ambiguous in this specification, and which may be clarified later with a reference implementation of a GBF reader. Check the master copy of this document before writing programs to take advantage of Bibles distributed in this format. This document was last updated 2 January 1998 (4 Tevet 5758).
General Format
General Bible Format (GBF) files are plain ASCII text files, with lines limited to no more than 254 characters. Lines are ended with the DOS standard CR-LF pair of characters. GBF files consist of a series of tokens, which may be tags, words, spaces, or punctuation. These tokens may not cross line boundaries. When read, line endings are the same as a single space, except when they immediately follow the <CM> token, in which case they are ignored. It is up to the program displaying or converting the text to another format to insert soft line breaks at the appropriate places. For ASCII values greater than 127, the U. S. Windows ANSI character set is assumed. GBF readers targeting DOS or Unix should convert characters in this range to the closest equivalents on those platforms.
Tags
There are several kinds of tags in GBF files, indicated by the first character after the "<" character. These include: file header tags, text type tags, file tail tags, font attribute tags, paragraph attribute tags, informational tags, sync marks, special characters, etc. All tags start with the less-than symbol (<) and two characters that identifies the tag and end with the greater-than symbol (>). Tag identifiers are case sensitive, and may be extended in the future to include additional letters, numbers, and other characters. Some tags take a parameter (like sync marks). This parameter is inserted just before the ending ">". Unrecognized tags (not in this specification) should be ignored by GBF readers aimed at this version of the GBF specification (since they were probably added in a later version of this specification. In tags with start and stop versions, the second letter of the tag is upper case in the start tag, and lower case in the stop tag.
Text Type Tags
The text type mutually exclusive tag group specifies the logical contents of the following text and implies a default set of character and paragraph settings. The actual fonts and paragraph attributes assigned to each of these kinds of text is determined by the displaying program, preferably with some input from the user.
File header tags are presented in the following order, starting at the beginning of the file.
Type | Tag | Purpose |
File header start | <H0vv> | SBF file version information. (vv = 00 for initial version). This must be the first token of the file, if present. This is where a shift to Unicode may happen in the future. Any text before this tag is ignored. |
Long translation title | <H1> | Set the long title of the translation for printing at the beginning and for long window titles or headings. Should precede the main text of the Bible (i.e., The Holy Bible, Gods Living Word Translation; or American Standard Version). |
Translation abbreviation | <H2> | Set the short translation title or abbreviation (i. e., GLW, ASV, NASB 1995, NJB, Amplified, etc.) |
Copyright notice (short) | <H3> | Short copyright notice (minimal) if copyrighted or something like "KJV text is in the Public Domain." |
Copyright notice (long) | <H4> | Copyright and permissions notice (full text). The copyright notice ends when the next field (probably <TI>) starts. |
File body tags are used as necessary to mark the type of content of the text.
Type | Tag | Purpose |
Apocrypha | <BA> | Text of the Apocrypha/Deuterocanonical books |
Commentary | <BC> | Commentary (not normally used in Bible texts but in separate files with sync marks). |
Introduction to translation | <BI> | Notes to the reader, translation history, etc. |
New Testament Text | <BN> | Text of the 27 books of the New Testament |
Old Testament Text | <BO> | Text of the 39 books of the Old Testament |
Book Preface | <BP> | Ancient or recent preface to a book of the Bible or Apocrypha |
File tail tags must be in the following order:
Type | Tag | Purpose |
Check value | <ZW> | Check value (SHA-1 hash of all lines from the beginning of the file to just before this line). Followed by 32 hexadecimal digits. SBF readers should reject any SBF File that fails this validation check. The hash value may optionally be followed by a DSS digital signature. |
Digital Signature | <ZX> | DSS Signature of file from <H0vv> to before <ZW> |
Registered user ID. | <ZY> | The user ID, name, organization, and check value of the registered user are encoded in this section. |
End of file | <ZZ> | Stop reading here. |
Type | Start Tag | Stop Tag | Purpose |
Psalm Book Title | <TB> | <Tb> | Mark the beginnings of the 5 books of Psalms |
Comment | <TC> | <Tc> | Ignore text in this section for any display or conversion use. |
Hebrew Title | <TH> | <Th> | Hebrew titles of psalms. The Hebrew title should come right after the sync marker for verse 1. |
Section header | <TS> | <Ts> | Translators or publishers section headers |
Book title | <TT> | <Tt> | Full title of the current book as it is to be displayed, i. e. "The Good News According to John" |
Font Attributes
Text attribute tag pairs are inserted into the text as necessary to indicate text attributes like italics. These may or may not be properly represented in all Bible viewers due to platform limitations. All of these text attributes are assumed to be off at the beginning of each chapter unless the start tag is explicitly repeated after the chapter sync mark.
Attribute | Start tag | Stop tag |
Bold | <FB> | <Fb> |
Small Caps | <FC> | <Fc> |
Italics | <FI> | <Fi> |
Font name | <FNname> | <Fn> |
Old Testament quote | <FO> | <Fo> |
Red (words of Jesus) | <FR> | <Fr> |
Superscript | <FS> | <Fs> |
Underline | <FU> | <Fu> |
Subscript | <FV> | <Fv> |
Paragraph Attributes
These tags describe attributes of paragraphs, like justification, indenting, etc. All of these attributes are assumed to be in the default state (non-indented prose, left justified, direction left-to-right) at the beginning of each chapter unless the appropriate start tag is repeated after the chapter sync mark. Justification tags are mutually exclusive, as are direction tags.
Attribute | Start Tag | Stop Tag | Comment |
Direction left-to-right (default) | <DL> | European & American languages | |
Direction right-to-left | <DR> | Hebrew, Arabic, etc. | |
Direction top-to-bottom | <DT> | Mandarin, etc. | |
Justify Center | <JC> | Useful for titles. | |
Justify Full | <JF> | Might map to left justification. | |
Justify Left | <JL> | The default method of justification. | |
Justify Right | <JR> | For "selah" | |
Indented quote | <PI> | <Pi> | Indented extended quotation. |
Poetry | <PP> | <Pp> | Describes poetry (like in Psalms). |
Information Tags
These tags provide various additional pieces of information about the text that are intended to be displayed on demand with a right mouse click, in a separate window, or at the bottom of a page. The start and stop tags indicate a range of words over which the footnote or reference apply. The sequence indicators shown as "seq" below are used to match the start and stop tags in case of overlapping references, and is a short string of numbers or letters that are guaranteed to be unique in the range of text they cover.
Type | Start Tag | Stop Tag | Purpose |
Text with an embedded footnote. | <RB> | <RF> | The text between <RB> and <RF> is further described or has a comment or translator's note between <RF> and <Rf>. The text between <RB> and <RF> may be marked as having a hyperlink for a footnote pop-up, or may be marked with more conventional superscript indicators in printed text. |
Footnote text | <RF> | <Rf> | Embedded footnote text is between the <RF> and <Rf> tags. <RF> may or may not be preceded by <RB>. |
Parallel Passage | <RPseq Book ch:vs> | <Rpseq> | Book is a number or abbreviation without spaces. |
Cross reference | <RXseq Book ch:vs> | <Rxseq> | Book is a number or abbreviation without spaces. |
Word information tags:
Type | Tag | Purpose |
Strongs Greek Number | <WGnnn> | Ordinal number of Greek lexicon entry. |
Strongs Hebrew Number | <WHnnn> | Ordinal number of Hebrew lexicon entry for previous word. |
Interlinear translation | <WIword(s)> | words to be placed under the current word. |
Original Language word information | <WTxxxx> | xxxx are one or
more characters with specific meanings that apply to the previous words tense,
gender, number, etc.: A = aorist P = plural S = singular [to be expanded] |
Sync Marks
Verse sync marks identify the current book, chapter, and verse. Each kind of sync mark is required at the beginning of the section (book, chapter, or verse) that it identifies. Sync marks may optionally be repeated within a section. If no specific number is specified in a sync mark, the number is assumed to be one more than the previous sync mark of the same kind.
Kind | Tag | Example or comment |
Book | <SBxxx> | For John, <SBJohn> or <SB67> |
Chapter | <SCxxx> | For chapter 3, <SC3> or (if the last chapter was chapter 2), <SC> |
Verse | <SVxxx> | For verse 16 (following verse 15), <SV16> or <SV> |
Date | <SDmmdd> | For April 30, <SD0430>. Not normally used in Bible texts, but may be in commentaries arranged by daily reading. If mmdd are omitted, assume next day. |
Book markers may be either numeric or an English name or abbreviation as follows. The STEP # is not used in this format, but is listed for reference in conversions.
Old Testament Book Name | Abbreviations | STEP # | Book Number |
Genesis | Ge, Gn | 1 | 1 |
Exodus | Ex | 2 | 2 |
Leviticus | Lev, Lv | 3 | 3 |
Numbers | Nu | 4 | 4 |
Deuteronomy | De, Dt | 5 | 5 |
Joshua | Jos | 6 | 6 |
Judges | Judg, Jdg | 7 | 7 |
Ruth | Ru | 8 | 8 |
1 Samuel | 1 Sa, 1Sa | 9 | 9 |
2 Samuel | 2 Sa, 2Sa | 10 | 10 |
1 Kings | 1 Ki, 1Ki | 11 | 11 |
2 Kings | 2 Ki, 2Ki | 12 | 12 |
1 Chronicles | 1 Ch, 1Ch | 13 | 13 |
2 Chronicles | 2 Ch, 2Ch | 14 | 14 |
Ezra | Ezr | 15 | 15 |
Nehemiah | Ne | 16 | 16 |
Esther | Es | 17, 95, 100 | 17 |
Job | Job | 18 | 18 |
Psalms | Ps | 19, 91, 92 | 19 |
Proverbs | Pr | 20 | 20 |
Ecclesiastes | Ec | 21 | 21 |
Song of Solomon | Song, Sol, SS | 22 | 22 |
Isaiah | Isa | 23 | 23 |
Jeremiah | Je | 24 | 24 |
Lamentations | La | 25 | 25 |
Ezekiel | Eze | 26 | 26 |
Daniel | Da | 27, 89 | 27 |
Hosea | Ho | 28 | 28 |
Joel | Joe | 29 | 29 |
Amos | Am | 30 | 30 |
Obadiah | Ob | 31 | 31 |
Jonah | Jon | 32 | 32 |
Micah | Mi | 33 | 33 |
Nahum | Na | 34 | 34 |
Habakkuk | Hab | 35 | 35 |
Zephaniah | Zep | 36 | 36 |
Haggai | Hag | 37 | 37 |
Zechariah | Zec | 38 | 38 |
Malachi | Mal | 39 | 39 |
Apocrypha Book name | Abbreviation | STEP # | Book # |
Tobit | Tob | 69, 93, 98 | 40 |
Judith | Judi, Jdt | 70 | 41 |
Esther (Greek) | GrkEs | 71 (additions only); 88 (Complete) |
42 |
Wisdom | Wis | 72 | 43 |
Sirach | Sir | 73, 94, 99 | 44 |
Baruch | Bar | 74, 90 | 45 |
Letter of Jeremiah | Let | 75 | 46 |
Prayer of Azariah and the Song of the Three Jews | Azar | 76 | 47 |
Susanna | Sus | 77 | 48 |
Bel and the Dragon | Bel | 78 | 49 |
1 Maccabees | 1Mac | 79, 96 | 50 |
2 Maccabees | 2Mac | 80, 97 | 51 |
1 Esdras | 1Esd | 81 | 52 |
Prayer of Manasseh | Man | 82 | 53 |
Psalm 151 | P1 | 86 | 54 |
3 Maccabees | 3Mac | 84 | 55 |
2 Esdras | 2Esd | 83, 87 | 56 |
4 Maccabees | 4Mac | 85 | 57 |
New Testament Book Name | Abbreviation | STEP # | Book # |
Matthew | Mat, Mt | 40 | 64 |
Mark | Mar, Mk | 41 | 65 |
Luke | Lu, Lk | 42 | 66 |
John | Joh | 43 | 67 |
Acts | Ac | 44 | 68 |
Romans | Ro, Rm | 45 | 69 |
1 Corinthians | 1 Co, 1Co | 46 | 70 |
2 Corinthians | 2 Co, 2Co | 47 | 71 |
Galatians | Ga | 48 | 72 |
Ephesians | Ep | 49 | 73 |
Philippians | Phili, Php | 50 | 74 |
Colossians | Col | 51 | 75 |
1 Thessalonians | 1 Th, 1Th | 52 | 76 |
2 Thessalonians | 2 Th, 2Th | 53 | 77 |
1 Timothy | 1 Ti, 1Ti | 54 | 78 |
2 Timothy | 2 Ti, 2Ti | 55 | 79 |
Titus | Tit | 56 | 80 |
Philemon | Phile, Phm | 57 | 81 |
Hebrews | He | 58 | 82 |
James | Ja | 59 | 83 |
1 Peter | 1 Pe, 1Pe | 60 | 84 |
2 Peter | 2 Pe, 2Pe | 61 | 85 |
1 John | 1 Jo, 1Jo | 62 | 86 |
2 John | 2 Jo, 2Jo | 63 | 87 |
3 John | 3 Jo, 3Jo | 64 (14 vs.); 67 (15 vs.) |
88 |
Jude | Jude | 65 | 89 |
Revelation | Re | 66; 68 (18 vs. in Ch. 12) |
90 |
Special Character Tags
These tags indicate just a single character in the text.
Meaning | Tag | Comment |
ASCII character | <CAxx> | xx is a hexadecimal value |
> | <CG> | Literal greater-than sign. |
< | <CT> | Literal less-than sign. |
End of paragraph | <CM> | Ends paragraph or line of poetry. In prose, may cause blank line and/or indent. |
End of Line | <CL> | Ends line without ending paragraph -- used in the first line of a poetic couplet. |
Unicode character | <CUxxxx> | xxxx is a hexadecimal value. |
External Note and Highlight Files
External note and highlight files may either be prepared commentaries or notations made by the individual Bible student as he or she studies the Bible using a Bible search program. When a note and highlight file is open along with a Bible text, the notes are made available to the reader by a pop-up mechanism or a separate window, or printed at the bottom of a page as footnotes. The highlights are applied to the Bible text background as it is displayed. The note and highlight files are plain ASCII text, using the same tags as the Bible text for font and paragraph characteristics, plus the following:
Meaning | Tag | Comment |
Start of note file | <HN> | This tag must be first. |
Note | <NNref-ref> | The text following is a note pertaining to the Bible text included in the references indicated. References are of the form Book ch:vs wd, where Book may be an abbreviation (without spaces) or a number, and ch and vs are numbers. The number wd is the number of the word within the verse. If wd is omitted, then the whole verse is assumed. If vs is omitted, then the whole chapter is assumed. If ch is omitted, then the whole book is assumed. If the second reference is omitted, then the reference is assumed to cover only the first reference. The beginning and ending words are considered part of the range |
Color | <NC rrr ggg bbb ref-ref> | Background highlight color expressed as three numbers from 0 to 255 for red (rrr), green (ggg), and blue (bbb) covering the reference indicated. The reference is interpreted just like it is for the note tag. For example, to highlight all of John 3:16 in green, the tag would be <NC 0 255 0 John 3:16>. To highlight "In the beginning" in John 1:1 with a shade of greenish blue, the tag would be <NC 0 64 255 John 1:1 1-John 1:1 3> |
End of file | <ZZ> | Last token of the file. Anything after this token is ignored. |