Data Structures - novoid/lazyblorg GitHub Wiki
Here is a description of the most important and recurring data structures used in the Python source.
In the source code, following things might be of interest:
-
/lib/orgparser.py→parse_orgmode_file(…)- the main routine of the Org-mode parser
-
/lib/htmlizer.py→sanitize_and_htmlize_blog_content(…)- the main routine of the HTMLization process
- http://orgmode.org/worg/dev/org-syntax.html
- OLD: list of Org Mode elements: http://article.gmane.org/gmane.emacs.orgmode/67871
- Take a look at an Org-mode test-file (for unit testing) containing all implemented Org-mode syntax elements
Org elements: from ox-ascii.el (Org-mode)
| Org Element | [fn:earmarked] | [fn:lowprio] | implemented since | [fn:internalrepresentation] | HTML5 |
|---|---|---|---|---|---|
| external hyperlinks | <2014-01-30 Thu> | a | |||
| internal links | <2014-03-03 Mon> | a | |||
| bold | <2014-01-30 Thu> | b | |||
| center-block | x | ||||
| clock | x | ||||
| code | <2014-01-30 Thu> | code | |||
| drawer | x | ||||
| dynamic-block | x | ||||
| entity | |||||
| example-block | <2014-01-30 Thu> | [‘example-block’, ‘name or None’, [u’first line’, u’second line’]] | FIXXME | ||
| example “colon-block” | <2014-08-10 Sun> | [‘colon-block’, False, [u’first line’, u’second line’]] | pre | ||
| export-block | x | ||||
| export-snippet | x | ||||
| fixed-width | x | ||||
| footnote-definition | x | ||||
| footnote-reference | x | ||||
| headline | <2014-01-30 Thu> | [‘heading’, {‘level’: 3, ‘title’: u’my title’}] | section+header+h1 | ||
| horizontal-rule | <2014-01-31 Fri> | [‘hr’] | (ignored and only interpreted to mark end of standfirst) | ||
| inline-src-block | x | ||||
| inlinetask | x | ||||
| inner-template | x | ||||
| italic | x | ||||
| item | |||||
| keyword | x | ||||
| latex-environment | <2014-01-30 Thu> | [fn:pypandoc] [‘latex-block’, ‘name or None’, [u’first line’, u’second line’]] | |||
| latex-fragment | x | ||||
| line-break | x | ||||
| link | x | ||||
| paragraph | <2014-01-30 Thu> | [‘par’, u’line1’, u’line2’] | p | ||
| plain-list | x | [‘list-itemize’, [u’first line’, u’second line’]] | ul+li | ||
| plain-text | <2014-01-30 Thu> | see: paragraph | |||
| planning | x | ||||
| quote-block | <2014-01-30 Thu> | [‘quote-block’, ‘name or None’, [u’first line’, u’second line’]] | blockquote | ||
| quote-section | ? | ||||
| radio-target | x | ||||
| section | <2014-01-30 Thu> | [‘heading’, {‘title’: u’Sub-heading foo’, ‘level’: 3}] | h2, h3, … | ||
| special-block | x | ||||
| src-block | <2014-01-30 Thu> | [‘src-block’, ‘name or None’, [u’first line’, u’second line’]] | pre | ||
| statistics-cookie | x | ||||
| strike-through | x | ||||
| subscript | x | ||||
| superscript | x | ||||
| table | x | [fn:pypandoc] | |||
| table-cell | x | ||||
| table-row | x | ||||
| target | |||||
| template | x | ||||
| timestamp | x | ||||
| underline | x | ||||
| verbatim | x | pre | |||
| verse-block | <2014-01-30 Thu> | [‘verse-block’, ‘name or None’, [u’first line’, u’second line’]] | pre | ||
| html-block | <2014-01-30 Thu> | [‘html-block’, ‘name or None’, [u’first line’, u’second line’]] | pre (if no #+NAME: then insert directly!) | ||
| tsfile-links | <2017-06-17 Sat> | [‘cust_link_image’, u’2017-03-11T18.29.20 Stars.jpg’, {u’width’: u’300’, u’alt’: u’Stars in a Tree’, u’align’: u’right’}] | figure, img + attributes, figcaption | ||
| the rest | [fn:pypandoc] |
NOTE: OrgParser is using “par” for anything it can not interpret as something else.
[fn:earmarked] Planned to be implemented soon (or at all :-)
[fn:lowprio] This feature is low on my personal development list (way take some time or might never get implemented)
[fn:pypandoc] This element gets converted using pypndoc (and additional sanitizing)
[fn:internalrepresentation] usually in list: blog_data['id-of-entry']['content']
- Blocks: (beginning with
BEGIN_)
For a complete list of content elements, please take a look at id:implemented-org-elements (above) FIXXME
blog_data is a Python list containing one dictionary entry per blog entry:
- FIXXME: add examples of:
- category
- other additional data
blog_data = \
[ {'level': 2, ## number of asterisks
'title': u'This is a blog entry about foo',
'usertags': [u'tag1', u'tag2'],
'autotags': {'language': 'english'},
'id': u'lazyblorg-example-entry', ## ID from PROPERTIES-drawer
'finished-timestamp-history': [datetime1, datetime2, datetime3],
'latestupdateTS': datetime, ## most current time-stamp that changed (or overwrote) heading to DONE
'firstpublishTS': datetime, ## oldest time-stamp that changed heading to DONE
'created': datetime,
'content': [ ['par', u'This is the Org-mode content'], ## 'par: paragraph containing anything that is not defined like tables, ...
'\n', ## change of paragraph
['heading', {'level': 3, 'title': u'Another aspect'}],
['html-block', 'its name or None', [u'first line', u'second line', u'', u'last line']],
['list-itemize', [u'first line', u'second line']],
['cust_link_image', u'2017-03-11T18.29.20 Stars.jpg', {u'width': u'300', u'alt': u'Stars in a Tree', u'align': u'right'}]
] #FIXXME: further elements
} ]Thus:
blog_data[0].keys()
## ... results in:
# ['title',
# 'latestupdateTS',
# 'firstpublishTS',
# 'created',
# 'usertags',
# 'content',
# 'finished-timestamp-history',
# 'level',
# 'id']
blog_data[0]['content'] ## -> list of elements of content
# [['text', u'This is the Org-mode content'],
# ['heading', {'level': 3, 'title': u'Another aspect'}],
# ['list-itemize', [u'first line', u'second line']],
# ['table', u'FIXXME: followed by this table data'],
# ['image', u'FIXXME: followed by this image']]Example:
>>> metadata
{u'2013-08-22-testid': {'title': u"This is the title", 'latestupdateTS': datetime.datetime(2013, 8, 22, 21, 6), 'firstpublishTS': datetime.datetime(2013, 8, 22, 21, 6), 'checksum': 'b757f8478bffd6c70a474f213d6520de', 'created': datetime.datetime(2013, 8, 22, 21, 6)},
u'2013-02-12-lazyblorg-example-entry': {'latestupdateTS': datetime.datetime(2013, 2, 14, 19, 2), 'checksum': '24af2246a5121e829a0dbbd6e2425c15', 'created': datetime.datetime(2013, 2, 12, 10, 58)}}
Keys of the dict: IDs of the entries:
>>> metadata.keys() [u'2013-08-22-testid', u'2013-02-12-lazyblorg-example-entry']
One entry with key=ID holds a dict with following entries:
- ‘title’: string containing the title of the blog entry
- ‘latestupdateTS’: datetime.datetime(2013, 8, 22, 21, 6)
- most recent time-stamp from the LOGBOOK drawer which marked going to a final state
- ‘checksum’: ‘b757f8478bffd6c70a474f213d6520de’
- md5 check-sum of: [title, tags, finished_timestamp_history, content]
- ‘created’: datetime.datetime(2013, 8, 22, 21, 6)
- datetime object of the CREATED property from the PROPERTY drawer
- [ ] FIXXME: why not the first CLOSED time-stamp?
Example:
CLOSED: [2014-01-31 Fri 14:02] :LOGBOOK: - State "DONE" from "DONE" [2014-02-01 Sat 18:42] - State "DONE" from "" [2014-01-30 Thu 14:02] :END: :PROPERTIES: :CREATED: [2014-01-28 Tue 14:02] :ID: 2014-01-27-lb-tests :END:
What happens with the various time-stamps?
- most recent LOGBOOK entry of setting to DONE:
- added to entry[‘finished-timestamp-history’] (which is a list)
- overwrites entry[‘latestupdateTS’] if is newer than the old one
- entry[‘latestupdateTS’] is the most recent LOGBOOK entry of setting to DONE
- overwrites entry[‘firstpublishTS’] if is older than the old one
- CREATED:
- entry[‘created’]
- CLOSED:
- ignored
- ID-timestamp:
- ignored
After parsing entry from above:
- entry[‘created’] = [2014-01-28 Tue 14:02]
- entry[‘latestupdateTS’] = [2014-02-01 Sat 18:42]
- note that
entry['timestamp']was renamed toentry['latestupdateTS']on 2017-02-12
- note that
- entry[‘firstpublishTS’] = [2014-01-30 Thu 14:02]
- Oldest entry of entry[‘finished-timestamp-history’] is the publication time-stamp!
- entry[‘finished-timestamp-history’] = [2014-02-01 Sat 18:42] and [2014-01-30 Thu 14:02]
The dict format is:
- dict with year (int) as key, value = list of 12 MONTH
- MONTH: list of 28-31 DAY
- DAY: list of 0 to many entry-IDs
for year in sorted(entries_timeline_by_published.keys()):
for month in enumerate(entries_timeline_by_published[year], start=0):
# month = tuple(index, list of days)
for day in enumerate(month[1], start=0):
# day = tuple(index, list of IDs)
for blogentry in day[1]:
print str(year) + '-' + str(month[0]) + '-' + str(day[0]) + " has entry: " + str(blogentry)see Utils.__add_entry_to_entries_timeline_by_published(…) how it is populated
see utils_test.py > test_entries_timeline_by_published_functions(…) how it’s tested