Parse a description into components#

This notebook requires at least version 0.8.8.

import striplog

striplog.__version__
'unknown'

We have some text:

text = "wet silty fine sand with tr clay"

To read this with striplog, we need to define a Lexicon. This is a dictionary-like object full of regular expressions, which acts as a bridge between this unstructured description and a dictionary-like Component object which striplog wants. The Lexicon also contains abbreviations for converting abbreviated text like cuttings descriptions into expanded words.

A Lexicon to read only this text might look like:

from striplog import Lexicon

lex_dict = {
    'lithology': ['sand', 'clay'],
    'grainsize': ['fine'],
    'modifier':  ['silty'],
    'amount':    ['trace'],
    'moisture':  ['wet', 'dry'],
    'abbreviations': {'tr': 'trace'},
    'splitters': ['with'],
    'parts_of_speech': {'noun': ['lithology'],
                        'adjective': ['grainsize', 'modifier', 'moisture'],
                        'subordinate': ['amount'],
                       }
}

lexicon = Lexicon(lex_dict)

Now we can parse the text with it:

from striplog import Interval

Interval._parse_description(text, lexicon=lexicon, max_component=3, abbreviations=True)
[Component({'lithology': 'sand', 'grainsize': 'fine', 'modifier': 'silty', 'moisture': 'wet'}),
 Component({'lithology': 'clay', 'amount': 'trace'})]

But this is obviously a bit of a pain to make and maintain. So instead of definining a Lexicon from scratch, we’ll modify the default one:

# Make and expand the lexicon.
lexicon = Lexicon.default()

# Add moisture words (or could add as other 'modifiers').
lexicon.moisture = ['wet(?:tish)?', 'dry(?:ish)?']
lexicon.parts_of_speech['adjective'] += ['moisture']

# Add the comma as component splitter.
lexicon.splitters += [', ']

Parsing with this yields the same results as before…

Interval._parse_description(text, lexicon=lexicon, max_component=3)
[Component({'lithology': 'sand', 'modifier': 'silty', 'grainsize': 'fine', 'moisture': 'wet'}),
 Component({'lithology': 'clay'})]

…but we can parse more things now:

Interval._parse_description("Coarse sandstone with minor limestone", lexicon=lexicon, max_component=3)
[Component({'lithology': 'sandstone', 'grainsize': 'coarse'}),
 Component({'lithology': 'limestone', 'amount': 'minor'})]

striplog does all this parsing internally when you use the Striplog.from_descriptions() class method.

Have fun!