STEPBible Data Repository CC BY 4.0

Data created initially by Tyndale House Cambridge now curated by www.STEPBible.org
(The code for wwww.STEPBible.org is also on an open licence)

This licence allows…

This public licence allows you to:

Include any part of STEPBible-Data in any software or publications without requesting permission
(Though we’d love to hear from you about your project when you make it available.)
Make changes to the data and record the differences
You can make corrections or report possible errors to be checked at STEPBibleATgmail.com
Any changes made to data should be recorded and made available to subsequent users.
Refer others to this repository as the source of the data.
Updates or corrections are easier to implement when the data is distributed from a single source.
You are welcome to make a mirror, so long as it is kept up-to-date and has a link back here.

And you should:

Credit it to “STEP Bible” linked to www.STEPBible.org

STEPBible is…

A Charitable Incorporated Organisation registered in the UK #1193950 run by Bible scholars and computer enthusiasts, as well as members who help to decide priorities.
The datasets are based on work by scholars at Tyndale House - an international Biblical Studies research institute in Cambridge, UK (see www.TyndaleHouse.com)

The repository aims to provide reliable and freely usable data for studying the Bible without any denominational or doctrinal bias. Much of the data is based on other publically licenced sources, and has been compared with non-public sources so that differences can be checked by Tyndale scholars. Corrections and proposed updates are welcomed - please send them to STEPBibleATgmail.com for checking.

Datasets available

The data is available as downloadable tab-separated text files (see notes on the data format below). The following datasets are already posted

Bible modules for OSIS Sword software Bibles in the same format as Crosswire modules which can be used in any Sword-compatible software.
TTESV - Translators Tags for ESV
Tags for Greek & Hebrew Extended Strongs (compatible with original Strongs) for the translated text of the ESV.
TAHOT - Translators Amalgamated Hebrew OT
The Leningrad codex based on Westminster via OpenScriptures, corrected from colour scans, with full morphological and semantic tags for all words, prefixes and suffixes. Semantic tags using the disambiguated Strongs is backwardly compatible with simple Strongs tags and includes all affixes (as defined in TBESH). Morphology is based on ETCBC converted to the format of OS (similar to Westminster) with different morphology for Ketiv/Qere when needed. LXX additions included as Hebrew from BHS/BHK apparatus.
TAGNT - Translators Amalgamated Greek NT in Github Data or Sheets
Greek text that includes all the words in NA27/28, TR and other major editions (SBLGNT, Treg, Byz, WH, THGNT). Each word is marked with the editions that contain it, positional variants, and meaning variants. All words and meaning variants are tagged lexically (disambiguated Strong linked to LSJ) and morphologically (Robinson based on Tauber with missing details) plus context-sensitive translations Punctuation is based on THGNT with spellings from NA28 or other editions for words not in NA27/28.
TBESH - Translators Brief lexicon of Extended Strongs for Hebrew in Github data or Sheets
Abridged BDB linked to extended Strongs (compatible with OpenScriptures and backwardly compatible with original Strongs)
TBESG - Translators Brief lexicon of Extended Strongs for Greek in Github data or Sheets
Brief definitions for all Greek Bible words (NT, LXX, Apoc, & variants) using corrected Abbott-Smith when available, completed with other similar definitions. Backwardly compatible with original Strongs.
TFLSJ - Translators Formatted full LSJ Bible lexicon up to G5624 in Github data or Sheets with Extra entries
Full LSJ entries for all Bible words (NT, LXX, Apoc & variants), formatted for easy reading (all bibliographic data hidden as hover-text) linked to extended Strongs (backwardly compatible with original Strongs).
TIPNR - Translators Individualised Proper Names with all References in Github data or Sheets Every proper noun in the Bible, linked to all Hebrew & Greek forms of that name and separated into individual people & places & things. Each form of the names includes exhaustive refs where that individual is named. Each person has data of their parents, partners, siblings and offspring and places places have geolocation (based on OpenBible). Every individual has a description in brief, short and article length (created by Claude 3 AI)
TVTMS - Translators Versification Traditions with Methodology for Standardisation: Eng+Heb+Lat+Grk+Others in Github data or Sheets All the versification differences in the OT traditional texts in Hebrew, Latin and Greek, and NT early versification, compared with English standard (defined by NRSV which is virtually identical to KJV). Bible translations have an almost infinite variety of versifications because they may follow (for example) Latin in several sections, Hebrew in a few and English most of the time. The Methodology provides simple rules for every section, such as “if this chapter has 29 verses, it is using Greek versification”. Using this, a whole Bible can be reversified according to English or traditional Hebrew or Greek or Latin versification, or compared with Bibles using that versification.
TEHMC - Translators Expansion of Hebrew Morphology Codes
Hebrew morphology codes with expanded explanations in terms of parsing, meaning and example. The codes are based on OpenScripture which is similar to the Westminster code system used in BibleWorks and other commercial software. They include extra codes which occur in STEPBible data which distinguishes sequential perfectives, gentilics, gender/location for personal pronouns, and non-Jussive/Cohortative as well as Jussive/Cohortative & possibly-Jussive/Cohortative forms.
TEGMC - Translators Expansion of Greek Morphology Codes
Greek morphology codes with expanded explanations in terms of parsing, meaning and example. The codes are based on Robinson, developed for the Majority text and used in most open-source texts. They include extra codes which occur in STEPBible data which distinguishes persons in possessive and reflexive pronouns, 2nd forms of verbs, and distinctions between deponant forms and ambiguous passive/middle.

Datasets coming

The followins datasets are still being finished and/or being checked. If you see data that you have need of which isn’t yet available, please contact us and perhaps you can become part of the checking process.

TAGOT - Translators Amalgamated Greek OT
LXXo (oldest texts from Rahlfs), LLXn (newer texts from Sweet), LXXe (ecclesiastic texts based on Apostolic Bible which used Sixtine, Aldine and Complutensian texts). All tagged to LSJ using disambiguated Hebrew numbers for names only in the OT, both backwardly compatible Strongs. Morpology based on CCAT which is extended to text that isn’t in Rahlfs.
TFBDB - Translators Formatted full BDB lexicon
Full BDB formatted for easy reading (all bibliographic data hidden as hover-text) linked to extended Strongs (compatible with OpenScriptures and backwardly compatible with original Strongs)
TOTMM - Translators OT Manuscripts and Meanings
Translation, Hebrew form and witnesses for each variant that affects the meaning of the text, as determined by Barthélemy’s UBS committee. Also, alternate meanings found in standard translations. Shown as alternate renderings of a base text (ESV).
TNTMM - Translators NT Manuscripts and Meanings
Translation, Greek form and witnesses up to 400 AD for each variant that affects the meaning of the text, as determined by the UBS apparatus. Also, alternate meanings found in standard translations. Shown as alternate renderings of a base text (ESV).
**TBCWG - Translators Biblical Concept Word Groups **
Concepts (expanded from Unfolding Word) describing Biblical background and usage (created by Claude 3 AI) for non-overlapping groups of disambiguated Hebrew & Greek words in synonym groups with distinctions (described by Claude 3 AI).

Data format

Data is in plain unicode text (UTF-8) with fields separated by tabs, so that they can be loaded into any text editor or spreadsheet.

To open in spreadsheets, (e.g. Excel): In Github, click on the file, then “Download” then Save (Ctr+S) to your drive. In Excel “Browse” for it using “All Files” (not “All Excel Files”) and open it. When asked, select “Unicode UTF8”, “Delimited”, “Tab”, “General”.
By default, datasets are one-line records, so a Record ends with a NewLine, and each line has identical fields.
Some datasets have multi-line records. Records are separated by a line starting with “$”. The first line is a Header with fields that apply to each subsequent subRecord line. SubRecord lines all start with a tab.
For example, in the ProperNames dataset, the first line is a header with information about the type (individual, place, title etc) and other data. These details apply to each of the subsequent subRecords which contain fields for the specific tag, Hebrew/Greek, translation, and the list of references. So the Header effectively contains fields which belong to each of its subRecords and would be identical for each of them if they were included on each line.
Hebrew glyphs are separated and normalised in the order:
consonant; sin/shin dot; dagesh; vowel; metheg/raphe; accents
- Glyphs NOT used for Hebrew include:
  װ ױ ײ ﭏ ײַ שׁ שׂ שּׁ שּׂ אַ אָ אּ בּ גּ דּ הּ וּ זּ טּ יּ ךּ כּ לּ מּ נּ סּ ףּ פּ צּ קּ רּ שּ תּ וֹ בֿ כֿ פֿ ﬠ ﬡ ﬢ ﬣ ﬤ ﬥ ﬦ ﬧ ﬨ
Greek glyphs are normalised to include only:
; · . , ᾽ ά ά ὰ ᾷ ᾷ ἀ ά ᾴ ᾄ ᾅ Ἆ Ἅ Ἃ ᾍ Ἀ Ἀ ἁ Ἁ ἄ ἄ Ἄ Ἄ ἅ ἂ ἂ ἅ ἃ ἃ ᾶ ᾳ ἆ ἆ έ έ ὲ ἐ έ Ἓ Ἐ Ἐ ἑ Ἑ ἔ Ἔ ἒ ἕ ἕ Ἒ Ἕ Ἕ ἓ ἓ ή ή ὴ ῇ ῇ ἠ ή ῇ ᾑ Ἥ Ἣ Ἠ Ἠ ἡ Ἡ ἤ ἤ Ἤ Ἤ ἢ ἢ ἥ ἥ Ἢ Ἢ ἣ ἣ ᾖ ᾖ ᾗ ᾗ ᾗ ῆ ῃ ῄ ῄ ἦ ἦ Ἦ Ἦ ἧ ἧ ᾐ ᾐ ᾑ ᾔ ᾔ ί ί ὶ ϊ ΐ ΐ ΐ ῒ ῒ ἰ ἲ ί Ἰ Ἰ ἱ Ἱ ἴ ἴ Ἴ Ἴ ἵ ἵ Ἵ Ἵ ἳ ἳ ῖ ἶ ἶ ἷ ἷ ό ό ὸ ὀ ό Ὀ Ὀ ὁ Ὁ ὄ ὄ Ὄ Ὄ ὅ ὅ ὂ ὂ Ὅ ὃ ὃ Ὃ Ὃ Ὃ ῥ ῤ Ῥ ̔Ρ ύ ύ Ύ ὺ ϋ ΰ ΰ ΰ ῢ ῢ ὐ ὑ ύ Ὕ Ὗ Ὑ ὔ ὔ ὒ ὒ ὕ ὕ ὓ ὓ ῦ ὖ ὖ ὗ ὗ ώ ώ ὼ ῷ ῷ ὠ ὣ Ὢ ᾯ Ὠ ὡ Ὡ ὤ ὤ Ὤ ὢ ὢ ὥ ὥ Ὥ Ὥ ᾦ ᾧ ᾧ Ὧ ᾯ ᾯ ῶ ῳ ῴ ῴ ὦ ὦ Ὦ ὧ ὧ ὧ ᾠ ᾠ ς
Glyphs NOT used for Greek include:
; ‘ ᾿ ` ῾ ’ ‘ ‛ ′ ΄ ʹ̛̀́̓̒̓̔̕ ʹ ʻ ʼ ʽ ʾ ʿ ˈ ˊ ˋ ‘ ` ´ o ά ά ὰ ᾷ ἀ Ἀ ἁ Ἁ ἄ Ἄ ἅ ἂ ἃ ᾶ ᾳ ἆ έ έ ὲ ἐ Ἐ ἑ Ἑ ἔ Ἔ ἕ Ἕ ἓ ή ῇ ή ὴ ῇ ἠ Ἠ ἡ Ἡ ἤ Ἤ ἢ ἥ Ἢ ἣ ᾗ ῆ ῃ ῄ ἦ Ἦ ᾖ ἧ ᾐ ᾑ ᾔ ί i ί ὶ ϊ ΐ ῒ ἰ Ἰ ἱ Ἱ ἴ Ἴ ἵ Ἵ ἳ ῖ ἶ ἷ ό ό ὸ ὀ Ὀ ὁ Ὁ ὄ Ὄ ὅ ὂ Ὅ ὃ Ὃ ῥ Ῥ ύ ύ ὺ ϋ ΰ ῢ ὐ ὑ Ὑ ὔ ὕ ὒ ὓ ῦ ὖ ὗ ώ ὼ ῷ ὠ ὡ Ὡ ὤ Ὤ ὢ ὥ Ὥ ᾦ ᾧ ᾯ ῶ ῳ ῴ ὦ Ὦ ὧ Ὧ ᾠ ϛ
Bible reference abbreviations are based on UBS with slightly different formatting:
References are e.g. Gen.1.10-12; 1Ki.2.4,5; Phm.2; Job.1.3–2.4;
OT: Gen, Exo, Lev, Num, Deu, Jos, Jdg, Rut, 1Sa, 2Sa, 1Ki, 2Ki, 1Ch, 2Ch, Ezr, Neh, Est, Job, Psa, Pro, Ecc, Sng, Isa, Jer, Lam, Ezk, Dan, Hos, Jol, Amo, Oba, Jon, Mic, Nam, Hab, Zep, Hag, Zec, Mal,
Apoc: Tob, Jdt, EsG, Wis, Sir, Bar, LJe, S3Y, Sus, Bel, 1Ma, 2Ma, 3Ma, 4Ma, 1Es, 2Es, Man, Ps2, Oda, PsS, Alternate MSS: JsA, JdB, TbS, SsT, DnT, BlT,
NT: Mat, Mrk, Luk, Jhn, Act, Rom, 1Co, 2Co, Gal, Eph, Php, Col, 1Th, 2Th, 1Ti, 2Ti, Tit, Phm, Heb, Jas, 1Pe, 2Pe, 1Jn, 2Jn, 3Jn, Jud, Rev
(OT+NT are all based on the first 3 characters, except: Jdg, Sng, Ezk, Jol, Nam, Mrk, Jhn, Php, Phm, Jas, 1Jn, 2Jn, 3Jn)

Error reporting

Please report all errors at Feedback@STEPBible.org See Current reported errors