striplog.lexicon module#
A vocabulary for parsing lithologic or stratigraphic decriptions.
- copyright
2015 Agile Geoscience
- license
Apache 2.0
- class striplog.lexicon.Lexicon(params)#
Bases:
object
A Lexicon is a dictionary of ‘types’ and regex patterns.
Most commonly you will just load the default one.
- Parameters
params (dict) – The dictionary to use. For an example, refer to the default lexicon in
defaults.py
.
- property categories#
Lists the categories in the lexicon, except the optional categories.
- Returns
A list of strings of category names.
- Return type
list
- classmethod default()#
Makes the default lexicon, as provided in
defaults.py
.- Returns
The default lexicon.
- Return type
- expand_abbreviations(text)#
Parse a piece of text and replace any abbreviations with their full word equivalents. Uses the lexicon.abbreviations dictionary to find abbreviations.
- Parameters
text (str) – The text to parse.
- Returns
The text with abbreviations replaced.
- Return type
str
- find_synonym(word)#
Given a string and a dict of synonyms, returns the ‘preferred’ word. Case insensitive.
- Parameters
word (str) – A word.
- Returns
The preferred word, or the input word if not found.
- Return type
str
Example
>>> syn = {'snake': ['python', 'adder']} >>> find_synonym('adder', syn) 'snake' >>> find_synonym('rattler', syn) 'rattler'
- find_word_groups(text, category, proximity=2)#
Given a string and a category, finds and combines words into groups based on their proximity.
- Parameters
text (str) – Some text.
tokens (list) – A list of regex strings.
- Returns
list. The combined strings it found.
Example
COLOURS = [r”red(?:dish)?”, r”grey(?:ish)?”, r”green(?:ish)?”] s = ‘GREYISH-GREEN limestone with RED or GREY sandstone.’ find_word_groups(s, COLOURS) –> [‘greyish green’, ‘red’, ‘grey’]
- classmethod from_json_file(filename)#
Load a lexicon from a JSON file.
- Parameters
filename (str) – The path to a JSON dump.
- get_component(text, required=False, first_only=True)#
Takes a piece of text representing a lithologic description for one component, e.g. “Red vf-f sandstone” and turns it into a dictionary of attributes.
- parse_description(text)#
Parse a single description into component-like dictionaries.
- split_description(text)#
Split a description into parts, each of which can be turned into a single component.
- exception striplog.lexicon.LexiconError#
Bases:
Exception
Generic error class.