from IPython.display import HTML,Markdown,display
from mistletoe import markdown
from fastcore.test import test_eqMy Heading
Highlight
def Highlight(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
Superscript
def Superscript(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
Subscript
def Subscript(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
Emoji
def Emoji(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
FootnoteRef
def FootnoteRef(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
FootnoteEntry
def FootnoteEntry(
match
):
Base class for block-level tokens. Recursively parse inner tokens.
Naming conventions:
* lines denotes a list of (possibly unparsed) input lines, and is
commonly used as the argument name for constructors.
* BlockToken.children is a list with all the inner tokens (thus if
a token has children attribute, it is not a leaf node; if a token
calls span_token.tokenize_inner, it is the boundary between
span-level tokens and block-level tokens);
* BlockToken.start takes a line from the document as argument, and
returns a boolean representing whether that line marks the start
of the current token. Every subclass of BlockToken must define a
start function (see block_tokenizer.tokenize).
* BlockToken.read takes the rest of the lines in the document as an
iterator (including the start line), and consumes all the lines
that should be read into this token.
Default to stop at an empty line.
Note that BlockToken.read does not have to return a list of lines.
Because the return value of this function will be directly
passed into the token constructor, we can return any relevant
parsing information, sometimes even ready-made tokens,
into the constructor. See block_tokenizer.tokenize.
If BlockToken.read returns None, the read result is ignored,
but the token class is responsible for resetting the iterator
to a previous state. See block_tokenizer.FileWrapper.get_pos,
block_tokenizer.FileWrapper.set_pos.
Attributes: children (list): inner tokens. line_number (int): starting line (1-based).
AutoLink
def AutoLink(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
Strikethrough
def Strikethrough(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
AttrLink
def AttrLink(
match
):
Base token class.
Token has two subclasses:
block_token.BlockToken, for all block level tokens. A block level token is text which occupies the entire horizontal width of the “page” and is offset for the surrounding sibling block with line breaks.span_token.SpanToken, for all span-level (or inline-level) tokens. A span-level token appears inside the flow of the text lines without any surrounding line break.
Custom __repr__ methods in subclasses: The default __repr__ implementation outputs the number of child tokens (from the attribute children) if applicable, and the content attribute if applicable. If any additional attributes should be included in the __repr__ output, this can be specified by setting the class attribute repr_attributes to a tuple containing the attribute names to be output.
FencedDiv
def FencedDiv(
result
):
Base class for block-level tokens. Recursively parse inner tokens.
Naming conventions:
* lines denotes a list of (possibly unparsed) input lines, and is
commonly used as the argument name for constructors.
* BlockToken.children is a list with all the inner tokens (thus if
a token has children attribute, it is not a leaf node; if a token
calls span_token.tokenize_inner, it is the boundary between
span-level tokens and block-level tokens);
* BlockToken.start takes a line from the document as argument, and
returns a boolean representing whether that line marks the start
of the current token. Every subclass of BlockToken must define a
start function (see block_tokenizer.tokenize).
* BlockToken.read takes the rest of the lines in the document as an
iterator (including the start line), and consumes all the lines
that should be read into this token.
Default to stop at an empty line.
Note that BlockToken.read does not have to return a list of lines.
Because the return value of this function will be directly
passed into the token constructor, we can return any relevant
parsing information, sometimes even ready-made tokens,
into the constructor. See block_tokenizer.tokenize.
If BlockToken.read returns None, the read result is ignored,
but the token class is responsible for resetting the iterator
to a previous state. See block_tokenizer.FileWrapper.get_pos,
block_tokenizer.FileWrapper.set_pos.
Attributes: children (list): inner tokens. line_number (int): starting line (1-based).
opening_tag
def opening_tag(
line
):
Call self as a function.
TagExtractor
def TagExtractor(
):
Find tags and other markup and call handler functions.
Usage: p = HTMLParser() p.feed(data) … p.close()
Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). If convert_charrefs is True the character references are converted automatically to the corresponding Unicode character (and self.handle_data() is no longer split in chunks), otherwise they are passed by calling self.handle_entityref() or self.handle_charref() with the string containing respectively the named or numeric reference as the argument.
test_eq(opening_tag('<div>'), ('div', {}))
test_eq(opening_tag('<div class="x">'), ('div', {'class':'x'}))
test_eq(opening_tag('<br/>'), (None, {}))
test_eq(opening_tag('<img src="a.png"/>'), (None, {}))
test_eq(opening_tag('plain text'), (None, {}))
test_eq(opening_tag('<svg xmlns="http://www.w3.org/2000/svg">'), ('svg', {'xmlns':'http://www.w3.org/2000/svg'}))
test_eq(opening_tag('<div markdown="1">'), ('div', {'markdown':'1'}))LenientHtmlBlock
def LenientHtmlBlock(
result
):
Block-level HTML token. This is a leaf block token with a single child of type span_token.RawText, which holds the raw HTML content.
ExtendedHtmlRenderer
def ExtendedHtmlRenderer(
args:VAR_POSITIONAL, kw:VAR_KEYWORD
):
HTML renderer class.
See mistletoe.base_renderer module for more info.
def render_md(md): return HTML(markdown(md, ExtendedHtmlRenderer))render_md("Here's a sentence with a footnote[^1] and another[^2].\n\n[^1]: First note.\n[^2]: Second note.")render_md("~~strikethrough~~ and **bold ~~strikethrough~~**")strikethrough and bold strikethrough
render_md("Check out https://fast.ai and http://example.com for more info!")Check out https://fast.ai and http://example.com for more info!
render_md("Here's a sentence with a link to <http://www.example.org>.")Here's a sentence with a link to http://www.example.org.
render_md("""Here's some code:
```
https://fast.ai
~~strikethrough~~
:smile:
H~2~O
```
And outside the block: https://fast.ai ~~strikethrough~~ :smile: H~2~O""")Here's some code:
https://fast.ai
~~strikethrough~~
:smile:
H~2~O
And outside the block: https://fast.ai strikethrough 😊 H2O
render_md("H~2~O and E=mc^2^")H2O and E=mc2
test_md2 = """
- [x] Completed task
- [ ] Incomplete task
- Regular item"""
render_md(test_md2)- Completed task
- Incomplete task
- Regular item
render_md("==highlighted== and :smile: :rocket: :heart:")highlighted and 😊 🚀 ❤️
parse_attrs
def parse_attrs(
text
):
Call self as a function.
parse_attrs('{#my-id .class1 .class2 width="50%" height="200 px"}')' id="my-id" class="class1 class2" width="50%" height="200 px"'
ExtendedHtmlRenderer.render_heading
def render_heading(
token
):
Call self as a function.
<style>
.important { background-color: yellow; font-weight: bold; }
</style>render_md("# My Heading {#intro .important}")render_md('This [link](http://www.example.org){target="_blank"} opens in a new tab.')This link opens in a new tab.
<style>
[class*="callout-"] { border-left: 4px solid var(--clr); background: var(--bg);
padding: 0.8em 1em; border-radius: 4px; margin: 0.5em 0; }
.callout-note { --clr: #4a9eff; --bg: #f0f7ff; }
.callout-warning { --clr: #f0ad4e; --bg: #fff8f0; }
.callout-tip { --clr: #5cb85c; --bg: #f0fff0; }
.callout-important { --clr: #d9534f; --bg: #fff0f0; }
.columns { display: flex; gap: 1em; }
.column { flex: 1; }
</style>render_md("""::: {.callout-warning .prose}
This is a **note** with *formatting*.
- Item 1
- Item 2
:::""")This is a note with formatting.
Item 1
Item 2
render_md(""":::: {.columns}
::: {.column}
**Left** column
:::
::: {.column}
**Right** column
:::
::::""")Left column
Right column
Tests
def rend(c): return ExtendedHtmlRenderer().render(Document(c))
def test_render(a,b): return test_eq(rend(a), b)
def test_render_p(a,b): return test_render(a, f'<p>{b}</p>\n')test_render_p(':sm ile:', ':sm ile:')
test_render_p('.', '.')
test_render_p(':unknown:', ':unknown:')
test_render_p(':smile:', '😊')
test_render_p('H~ 2~O', 'H~ 2~O')
test_render_p('H~2~O', 'H<sub>2</sub>O')
test_render_p('E=mc^ 2^', 'E=mc^ 2^')
test_render_p('E=mc^2^', 'E=mc<sup>2</sup>')
test_render_p('~~no space~~', '<del>no space</del>')
test_render_p('**~~nested~~**', '<strong><del>nested</del></strong>')
test_render_p('==~~double~~==', '<mark><del>double</del></mark>')
html = rend('```\nhttps://fast.ai\n```')
assert 'href' not in html and 'https://fast.ai' in html
html = rend('https://fast.ai')
assert '<a href="https://fast.ai">https://fast.ai</a>' in html
html = rend('[link](https://fast.ai)')
assert html.count('href') == 1
test_render_p('[^1]', '<sup><a href="#fn-1" id="fnref-1">[1]</a></sup>')
html = rend('- [x] done\n- [ ] todo\n- regular')
assert 'checked' in html and html.count('checkbox') == 2
test_render_p('[link](http://example.com){target="_blank"}', '<a href="http://example.com" target="_blank">link</a>')
test_render_p('[link](http://example.com){#my-id .cls}', '<a href="http://example.com" id="my-id" class="cls">link</a>')test_render('# Heading {#intro}', '<h1 id="intro">Heading</h1>\n')
test_render('## Test {.important}', '<h2 class="important">Test</h2>\n')
test_render('### Multi {.c1 .c2}', '<h3 class="c1 c2">Multi</h3>\n')
html = rend('# Full {#id1 .cls data-level="1"}')
assert 'id="id1"' in html and 'class="cls"' in html and 'data-level="1"' in html
test_render('# Plain', '<h1>Plain</h1>\n')
test_render('## Empty {}', '<h2>Empty {}</h2>\n')
html = rend('# Order {.first #myid .second key="val"}')
assert 'id="myid"' in html and 'class="first second"' in html and 'key="val"' in html
test_render('# Spaces {data-value="hello world"}', '<h1 data-value="hello world">Spaces</h1>\n')
test_render('<details markdown="1">\n<summary>test</summary>\n\n```json\n{"a": 1}\n```\n</details>',
'<details markdown="1">\n<summary>test</summary><pre><code class="language-json">{"a": 1}\n</code></pre></details>\n\n')html = rend('::: {.callout-note}\nHello **world**\n:::')
assert '<div class="callout-note">' in html and '<strong>world</strong>' in html
html = rend('::: {#box .a .b}\nInner\n:::')
assert 'id="box"' in html and 'class="a b"' in html and 'Inner' in html
html = rend(':::: {.columns}\n::: {.column}\nLeft\n:::\n::: {.column}\nRight\n:::\n::::')
assert html.count('class="column"') == 2 and 'class="columns"' in htmlhtml = rend('<div>\n\n indented after blank\n</div>')
assert '<pre>' not in html and 'indented after blank' in html
html = rend('<div>\n\n<p>child</p>\n\n <p>indented child</p>\n</div>')
assert '<pre>' not in html and 'indented child' in html
html = rend('<svg width="10" height="10">\n <circle cx="5" cy="5" r="4"/>\n</svg>')
assert '<pre>' not in html and '<circle' in html
html = rend('Before\n\n<div>\n\n indented\n</div>\n\nAfter')
assert '<pre>' not in html and 'Before' in html and 'After' in htmlhtml = rend('<details markdown="1">\n<summary>test</summary>\n\n```json\n{"a": 1}\n```\n</details>')
assert '<pre><code class="language-json">' in html and '<details markdown="1">' in html and '</details>' in html
html = rend('<div markdown="1">\n\n**bold** and *italic*\n</div>')
assert '<strong>bold</strong>' in html and '<em>italic</em>' in html
html = rend('<div>\n\n**not bold**\n</div>')
assert '<strong>' not in html and '**not bold**' in html
test_eq(rend("`<details>`"), '<p><code><details></code></p>\n')