striplog.lexicon module#

A vocabulary for parsing lithologic or stratigraphic decriptions.

copyright

2015 Agile Geoscience

license

Apache 2.0

class striplog.lexicon.Lexicon(params)#

Bases: object

A Lexicon is a dictionary of ‘types’ and regex patterns.

Most commonly you will just load the default one.

Parameters

params (dict) – The dictionary to use. For an example, refer to the default lexicon in defaults.py.

property categories#

Lists the categories in the lexicon, except the optional categories.

Returns

A list of strings of category names.

Return type

list

classmethod default()#

Makes the default lexicon, as provided in defaults.py.

Returns

The default lexicon.

Return type

Lexicon

expand_abbreviations(text)#

Parse a piece of text and replace any abbreviations with their full word equivalents. Uses the lexicon.abbreviations dictionary to find abbreviations.

Parameters

text (str) – The text to parse.

Returns

The text with abbreviations replaced.

Return type

str

find_synonym(word)#

Given a string and a dict of synonyms, returns the ‘preferred’ word. Case insensitive.

Parameters

word (str) – A word.

Returns

The preferred word, or the input word if not found.

Return type

str

Example

>>> syn = {'snake': ['python', 'adder']}
>>> find_synonym('adder', syn)
'snake'
>>> find_synonym('rattler', syn)
'rattler'
find_word_groups(text, category, proximity=2)#

Given a string and a category, finds and combines words into groups based on their proximity.

Parameters
  • text (str) – Some text.

  • tokens (list) – A list of regex strings.

Returns

list. The combined strings it found.

Example

COLOURS = [r”red(?:dish)?”, r”grey(?:ish)?”, r”green(?:ish)?”] s = ‘GREYISH-GREEN limestone with RED or GREY sandstone.’ find_word_groups(s, COLOURS) –> [‘greyish green’, ‘red’, ‘grey’]

classmethod from_json_file(filename)#

Load a lexicon from a JSON file.

Parameters

filename (str) – The path to a JSON dump.

get_component(text, required=False, first_only=True)#

Takes a piece of text representing a lithologic description for one component, e.g. “Red vf-f sandstone” and turns it into a dictionary of attributes.

parse_description(text)#

Parse a single description into component-like dictionaries.

split_description(text)#

Split a description into parts, each of which can be turned into a single component.

exception striplog.lexicon.LexiconError#

Bases: Exception

Generic error class.