Parse a description into components#
This notebook requires at least version 0.8.8.
import striplog
striplog.__version__
'unknown'
We have some text:
text = "wet silty fine sand with tr clay"
To read this with striplog
, we need to define a Lexicon
. This is a dictionary-like object full of regular expressions, which acts as a bridge between this unstructured description and a dictionary-like Component
object which striplog
wants. The Lexicon
also contains abbreviations for converting abbreviated text like cuttings descriptions into expanded words.
A Lexicon
to read only this text might look like:
from striplog import Lexicon
lex_dict = {
'lithology': ['sand', 'clay'],
'grainsize': ['fine'],
'modifier': ['silty'],
'amount': ['trace'],
'moisture': ['wet', 'dry'],
'abbreviations': {'tr': 'trace'},
'splitters': ['with'],
'parts_of_speech': {'noun': ['lithology'],
'adjective': ['grainsize', 'modifier', 'moisture'],
'subordinate': ['amount'],
}
}
lexicon = Lexicon(lex_dict)
Now we can parse the text with it:
from striplog import Interval
Interval._parse_description(text, lexicon=lexicon, max_component=3, abbreviations=True)
[Component({'lithology': 'sand', 'grainsize': 'fine', 'modifier': 'silty', 'moisture': 'wet'}),
Component({'lithology': 'clay', 'amount': 'trace'})]
But this is obviously a bit of a pain to make and maintain. So instead of definining a Lexicon
from scratch, we’ll modify the default one:
# Make and expand the lexicon.
lexicon = Lexicon.default()
# Add moisture words (or could add as other 'modifiers').
lexicon.moisture = ['wet(?:tish)?', 'dry(?:ish)?']
lexicon.parts_of_speech['adjective'] += ['moisture']
# Add the comma as component splitter.
lexicon.splitters += [', ']
Parsing with this yields the same results as before…
Interval._parse_description(text, lexicon=lexicon, max_component=3)
[Component({'lithology': 'sand', 'modifier': 'silty', 'grainsize': 'fine', 'moisture': 'wet'}),
Component({'lithology': 'clay'})]
…but we can parse more things now:
Interval._parse_description("Coarse sandstone with minor limestone", lexicon=lexicon, max_component=3)
[Component({'lithology': 'sandstone', 'grainsize': 'coarse'}),
Component({'lithology': 'limestone', 'amount': 'minor'})]
striplog
does all this parsing internally when you use the Striplog.from_descriptions()
class method.
Have fun!