from toolslm.md_hier import *
from IPython.display import Markdown
Markdown Hierarchy Parser
The md_hier
module provides utilities for parsing markdown documents and converting them into structured hierarchical dictionaries. This is particularly useful for processing documentation, extracting sections, or navigating complex markdown files programmatically.
Overview
The module provides a main function and supporting class: - create_heading_dict
: Creates a nested dictionary structure matching the markdown hierarchy
- HeadingDict
: A dictionary-like object that also stores the markdown text content
The function handles code blocks properly by ignoring headings that appear within fenced code blocks.
create_heading_dict
create_heading_dict (text, rm_fenced=True)
Create a nested dictionary structure from markdown headings.
= """
sample_md # Introduction
Welcome to our documentation.
## Getting Started
Follow these steps to begin.
### Installation
Run the following command:
```bash
# Install the packackge
pip install our-package
```
### Configuration
Set up your config file.
## Advanced Usage
For advanced users only.
# Appendix
Additional resources."""
= create_heading_dict(sample_md)
result print("Available sections:")
for key in result.keys(): print(f" {key}")
print(f"\nRoot document has {len(result.text)} characters of text")
Available sections:
Introduction
Appendix
Root document has 328 characters of text
You can access any section’s content via the text
attribute:
print(result['Introduction']['Getting Started']['Installation'].text)
### Installation
Run the following command:
```bash
# Install the packackge
Notice how parent sections contain all their child content in their text
attribute:
print(result['Introduction']['Getting Started'].text[:200] + "…")
## Getting Started
Follow these steps to begin.
### Installation
Run the following command:
```bash
# Install the packackge
pip install our-package
```
### Configuration…
create_heading_dict
creates a nested dictionary structure that mirrors the markdown hierarchy. Each heading becomes a dictionary key containing its subheadings.
- Nested structure: Creates a tree-like dictionary hierarchy
- Navigation friendly: Easy to traverse programmatically
- Code block filtering: Removes code blocks before processing
Let’s see the nested structure:
= create_heading_dict(sample_md)
result print("Structure:")
print(f"Root keys: {list(result.keys())}")
print(f"Introduction subkeys: {list(result['Introduction'].keys())}")
print(f"Getting Started subkeys: {list(result['Introduction']['Getting Started'].keys())}")
print(f"\nType of result: {type(result)}")
print(f"Type of subsection: {type(result['Introduction'])}")
print(f"Has text attribute: {hasattr(result, 'text')}")
Structure:
Root keys: ['Introduction', 'Appendix']
Introduction subkeys: ['Getting Started', 'Advanced Usage']
Getting Started subkeys: ['Installation', 'Configuration']
Type of result: <class 'toolslm.md_hier.HeadingDict'>
Type of subsection: <class 'toolslm.md_hier.HeadingDict'>
Has text attribute: True
Benefits
This approach provides the best of both worlds:
Structure Navigation: Navigate the document hierarchy naturally using dictionary keys - result['Introduction']['Getting Started']
- Check section existence with 'section' in result
- Iterate through subsections with result.keys()
Content Access: Get the actual markdown text at any level - result.text
- entire document
- result['Introduction'].text
- section with all subsections - result['Introduction']['Getting Started']['Installation'].text
- specific subsection only
Use Cases: - Documentation processing: Extract specific sections while preserving formatting - Content analysis: Analyze document structure and section lengths
- Template generation: Build navigation interfaces from document structure - Section extraction: Pull out individual sections with their complete content