Markdown Hierarchy Parser

Parse markdown documents into hierarchical dictionaries for easy navigation and access
from toolslm.md_hier import *
from IPython.display import Markdown

The md_hier module provides utilities for parsing markdown documents and converting them into structured hierarchical dictionaries. This is particularly useful for processing documentation, extracting sections, or navigating complex markdown files programmatically.

Overview

The module provides a main function and supporting class: - create_heading_dict: Creates a nested dictionary structure matching the markdown hierarchy
- HeadingDict: A dictionary-like object that also stores the markdown text content

The function handles code blocks properly by ignoring headings that appear within fenced code blocks.


create_heading_dict

 create_heading_dict (text, rm_fenced=True)

Create a nested dictionary structure from markdown headings.

sample_md = """
# Introduction

Welcome to our documentation.

## Getting Started

Follow these steps to begin.

### Installation

Run the following command:

```bash
# Install the packackge
pip install our-package
```

### Configuration

Set up your config file.

## Advanced Usage

For advanced users only.

# Appendix

Additional resources."""
result = create_heading_dict(sample_md)
print("Available sections:")
for key in result.keys(): print(f"  {key}")
print(f"\nRoot document has {len(result.text)} characters of text")
Available sections:
  Introduction
  Appendix

Root document has 328 characters of text

You can access any section’s content via the text attribute:

print(result['Introduction']['Getting Started']['Installation'].text)
### Installation

Run the following command:

```bash
# Install the packackge

Notice how parent sections contain all their child content in their text attribute:

print(result['Introduction']['Getting Started'].text[:200] + "…")
## Getting Started

Follow these steps to begin.

### Installation

Run the following command:

```bash
# Install the packackge
pip install our-package
```

### Configuration…

create_heading_dict creates a nested dictionary structure that mirrors the markdown hierarchy. Each heading becomes a dictionary key containing its subheadings.

  • Nested structure: Creates a tree-like dictionary hierarchy
  • Navigation friendly: Easy to traverse programmatically
  • Code block filtering: Removes code blocks before processing

Let’s see the nested structure:

result = create_heading_dict(sample_md)
print("Structure:")
print(f"Root keys: {list(result.keys())}")
print(f"Introduction subkeys: {list(result['Introduction'].keys())}")
print(f"Getting Started subkeys: {list(result['Introduction']['Getting Started'].keys())}")

print(f"\nType of result: {type(result)}")
print(f"Type of subsection: {type(result['Introduction'])}")
print(f"Has text attribute: {hasattr(result, 'text')}")
Structure:
Root keys: ['Introduction', 'Appendix']
Introduction subkeys: ['Getting Started', 'Advanced Usage']
Getting Started subkeys: ['Installation', 'Configuration']

Type of result: <class 'toolslm.md_hier.HeadingDict'>
Type of subsection: <class 'toolslm.md_hier.HeadingDict'>
Has text attribute: True

Benefits

This approach provides the best of both worlds:

Structure Navigation: Navigate the document hierarchy naturally using dictionary keys - result['Introduction']['Getting Started'] - Check section existence with 'section' in result - Iterate through subsections with result.keys()

Content Access: Get the actual markdown text at any level - result.text - entire document
- result['Introduction'].text - section with all subsections - result['Introduction']['Getting Started']['Installation'].text - specific subsection only

Use Cases: - Documentation processing: Extract specific sections while preserving formatting - Content analysis: Analyze document structure and section lengths
- Template generation: Build navigation interfaces from document structure - Section extraction: Pull out individual sections with their complete content