from copy import deepcopyxml source
JSON to XML
json_to_xml
json_to_xml (d:dict, rnm:str)
Convert d to XML.
| Type | Details | |
|---|---|---|
| d | dict | JSON dictionary to convert |
| rnm | str | Root name |
| Returns | str |
JSON doesn’t map as nicely to XML as the data structure used in fastcore.xml, but for simple XML trees it can be convenient – for example:
a = dict(surname='Howard', firstnames=['Jeremy','Peter'],
address=dict(state='Queensland',country='Australia'))
hl_md(json_to_xml(a, 'person'))<person>
<surname>Howard</surname>
<firstnames>
<item>Jeremy</item>
<item>Peter</item>
</firstnames>
<address>
<state>Queensland</state>
<country>Australia</country>
</address>
</person>Including documents
Notebooks
nbp = Path('00_xml.ipynb')
nb = dict2obj(nbp.read_json())
cells = nb.cells
cell = cells[-1]
cell{ 'cell_type': 'code',
'execution_count': {},
'id': '1e9ee5c1',
'metadata': {'time_run': '2025-12-24T11:44:29.900555+00:00'},
'outputs': [],
'source': ['#|hide\n', '#|eval: false\n', 'from nbdev.doclinks import nbdev_export\n', 'nbdev_export()']}get_mime_text
get_mime_text (data)
Get text from MIME bundle, preferring markdown over plain
cell2out
cell2out (o)
Convert single notebook output to XML format
for o in cell.outputs: print(to_xml(cell2out(o)))cell2xml
cell2xml (cell, out=True)
Convert notebook cell to concise XML format
cell2xml(cell)<code><source>#|hide
#|eval: false
from nbdev.doclinks import nbdev_export
nbdev_export()</code>cell2xml(cell, out=False)<code>#|hide
#|eval: false
from nbdev.doclinks import nbdev_export
nbdev_export()</code>nb2xml
nb2xml (fname=None, nb=None, out=True)
Convert notebook to XML format
nbsml = deepcopy(nb)
del(nbsml.cells[2:])
print(nb2xml(nb=nbsml))<notebook><code><source>#|default_exp xml</code><md><source># xml source</md></notebook>
Documents
According to Anthropic, “it’s essential to structure your prompts in a way that clearly separates the input data from the instructions”. They recommend using something like the following:
Here are some documents for you to reference for your task:
<documents>
<document index="1">
<source>
(URL, file name, hash, etc)
</source>
<document_content>
(the text content)
</document_content>
</document>
</documents>We will create some small helper functions to make it easier to generate context in this format, although we’re use <src> instead of <source> to avoid conflict with that HTML tag. Although it’s based on Anthropic’s recommendation, it’s likely to work well with other models too.
We’ll use doctype to store our pairs.
Since Anthropic’s example shows newlines before and after each tag, we’ll do the same.
to_xml(Src('a'))'<src>a</src>'
to_xml(Document('a'))'<document>a</document>'
mk_doctype
mk_doctype (content:str, src:Optional[str]=None)
Create a doctype named tuple
| Type | Default | Details | |
|---|---|---|---|
| content | str | The document content | |
| src | Optional | None | URL, filename, etc; defaults to md5(content) if not provided |
| Returns | namedtuple |
This is a convenience wrapper to ensure that a doctype has the needed information in the right format.
doc = 'This is a "sample"'
mk_doctype(doc)doctype(src='\n47e19350\n', content='\nThis is a "sample"\n')
mk_doc
mk_doc (index:int, content:str, src:Optional[str]=None, **kwargs)
Create an ft format tuple for a single doc in Anthropic’s recommended format
| Type | Default | Details | |
|---|---|---|---|
| index | int | The document index | |
| content | str | The document content | |
| src | Optional | None | URL, filename, etc; defaults to md5(content) if not provided |
| kwargs | VAR_KEYWORD | ||
| Returns | tuple |
We can now generate XML for one document in the suggested format:
mk_doc(1, doc, title="test")<document index="1" title="test"><src>
47e19350
</src><document-content>
This is a "sample"
</document-content></document>docs_xml
docs_xml (docs:list[str], srcs:Optional[list]=None, prefix:bool=True, details:Optional[list]=None, title:str=None)
Create an XML string containing docs in Anthropic’s recommended format
| Type | Default | Details | |
|---|---|---|---|
| docs | list | The content of each document | |
| srcs | Optional | None | URLs, filenames, etc; each one defaults to md5(content) if not provided |
| prefix | bool | True | Include Anthropic’s suggested prose intro? |
| details | Optional | None | Optional list of dicts with additional attrs for each doc |
| title | str | None | Optional title attr for Documents element |
| Returns | str |
Putting it all together, we have our final XML format:
docs = [doc, 'And another one']
srcs = [None, 'doc.txt']
print(docs_xml(docs, srcs))Here are some documents for you to reference for your task:
<documents><document index="1"><src>
47e19350
</src><document-content>
This is a "sample"
</document-content></document><document index="2"><src>
doc.txt
</src><document-content>
And another one
</document-content></document></documents>
Context creation
Now that we can generate Anthropic’s XML format, let’s make it easy for a few common cases.
File list to context
For generating XML context from files, we’ll just read them as text and use the file names as src.
read_file
read_file (fname, out=True, max_size=None)
Read file content, converting notebooks to XML if needed
files2ctx
files2ctx (fnames:list[typing.Union[str,pathlib.Path]], prefix:bool=True, out:bool=True, srcs:Optional[list]=None, title:str=None, max_size:int=None)
Convert files to XML context, handling notebooks
| Type | Default | Details | |
|---|---|---|---|
| fnames | list | List of file names to add to context | |
| prefix | bool | True | Include Anthropic’s suggested prose intro? |
| out | bool | True | Include notebook cell outputs? |
| srcs | Optional | None | Use the labels instead of fnames |
| title | str | None | Optional title attr for Documents element |
| max_size | int | None | Skip files larger than this (bytes) |
| Returns | str | XML for LM context |
fnames = ['samples/sample_core.py', 'samples/sample_styles.css']
hl_md(files2ctx(fnames, max_size=120))Here are some documents for you to reference for your task:
<documents><document index="1"><src>
samples/sample_core.py
</src><document-content>
[Skipped: sample_core.py exceeds 120 bytes]
</document-content></document><document index="2"><src>
samples/sample_styles.css
</src><document-content>
.cell { margin-bottom: 1rem; }
.cell > .sourceCode { margin-bottom: 0; }
.cell-output > pre { margin-bottom: 0; }
</document-content></document></documents>Folder to context
folder2ctx
folder2ctx (folder:Union[str,pathlib.Path], prefix:bool=True, out:bool=True, include_base:bool=True, title:str=None, max_size:int=100000, recursive:bool=True, symlinks:bool=True, file_glob:str=None, file_re:str=None, folder_re:str=None, skip_file_glob:str=None, skip_file_re:str=None, skip_folder_re:str=None, func:callable=<function join>, ret_folders:bool=False, sort:bool=True, exts:list|str=None)
Convert folder contents to XML context, handling notebooks
| Type | Default | Details | |
|---|---|---|---|
| folder | Union | ||
| prefix | bool | True | Include Anthropic’s suggested prose intro? |
| out | bool | True | Include notebook cell outputs? |
| include_base | bool | True | Include full path in src? |
| title | str | None | Optional title attr for Documents element |
| max_size | int | 100000 | Skip files larger than this (bytes) |
| recursive | bool | True | search subfolders |
| symlinks | bool | True | follow symlinks? |
| file_glob | str | None | Only include files matching glob |
| file_re | str | None | Only include files matching regex |
| folder_re | str | None | Only enter folders matching regex |
| skip_file_glob | str | None | Skip files matching glob |
| skip_file_re | str | None | Skip files matching regex |
| skip_folder_re | str | None | Skip folders matching regex, |
| func | callable | join | function to apply to each matched file |
| ret_folders | bool | False | return folders, not just files |
| sort | bool | True | sort files by name within each folder |
| exts | list | str | None | |
| Returns | L | Paths to matched files |
print(folder2ctx('samples', prefix=False, file_glob='*.py'))<documents><document index="1"><src>
samples/sample_core.py
</src><document-content>
import inspect
empty = inspect.Parameter.empty
models = 'claude-3-opus-20240229','claude-3-sonnet-20240229','claude-3-haiku-20240307'
</document-content></document></documents>
repo2ctx
repo2ctx (owner:str, repo:str, ref:str=None, prefix:bool=True, out:bool=True, include_base:bool=True, title:str=None, max_size:int=100000, recursive:bool=True, symlinks:bool=True, file_glob:str=None, file_re:str=None, folder_re:str=None, skip_file_glob:str=None, skip_file_re:str=None, skip_folder_re:str=None, func:callable=<function join>, ret_folders:bool=False, sort:bool=True, exts:list|str=None)
Convert GitHub repo to XML context without cloning
| Type | Default | Details | |
|---|---|---|---|
| owner | str | GitHub repo owner | |
| repo | str | GitHub repo name | |
| ref | str | None | Git ref (branch/tag/sha); defaults to repo’s default branch |
| prefix | bool | True | Include Anthropic’s suggested prose intro? |
| out | bool | True | Include notebook cell outputs? |
| include_base | bool | True | Include full path in src? |
| title | str | None | Optional title attr for Documents element |
| max_size | int | 100000 | Skip files larger than this (bytes) |
| recursive | bool | True | search subfolders |
| symlinks | bool | True | follow symlinks? |
| file_glob | str | None | Only include files matching glob |
| file_re | str | None | Only include files matching regex |
| folder_re | str | None | Only enter folders matching regex |
| skip_file_glob | str | None | Skip files matching glob |
| skip_file_re | str | None | Skip files matching regex |
| skip_folder_re | str | None | Skip folders matching regex, |
| func | callable | join | function to apply to each matched file |
| ret_folders | bool | False | return folders, not just files |
| sort | bool | True | sort files by name within each folder |
| exts | list | str | None | |
| Returns | str | XML for LM context |
print(repo2ctx('answerdotai', 'toolslm', exts=('md','py'), skip_file_re='^_', prefix=False, out=False)[:330])<documents title="GitHub repository contents from answerdotai/toolslm at ref 'main' (filters applied: exts: md, py | skip_file_re: ^_)"><document index="1"><src>
CHANGELOG.md
</src><document-content>
# Release notes
<!-- do not remove -->
## 0.3.8
### New Features
- Add `repo2ctx` ([#52](https://github.com/AnswerDotAI/toolsl
After you install toolslm, folder2ctx becomes available from the command line.
!folder2ctx -husage: folder2ctx [-h] [--recursive] [--symlinks] [--file_glob FILE_GLOB]
[--file_re FILE_RE] [--folder_re FOLDER_RE]
[--skip_file_glob SKIP_FILE_GLOB]
[--skip_file_re SKIP_FILE_RE]
[--skip_folder_re SKIP_FOLDER_RE] [--func FUNC]
[--ret_folders] [--sort] [--exts EXTS] [--prefix] [--out]
[--include_base] [--title TITLE] [--max_size MAX_SIZE]
folder
CLI to convert folder contents to XML context, handling notebooks
positional arguments:
folder Folder name containing files to add to
context
options:
-h, --help show this help message and exit
--recursive search subfolders (default: False)
--symlinks follow symlinks? (default: False)
--file_glob FILE_GLOB Only include files matching glob
--file_re FILE_RE Only include files matching regex
--folder_re FOLDER_RE Only enter folders matching regex
--skip_file_glob SKIP_FILE_GLOB Skip files matching glob
--skip_file_re SKIP_FILE_RE Skip files matching regex
--skip_folder_re SKIP_FOLDER_RE Skip folders matching regex,
--func FUNC function to apply to each matched file
(default: <function join>)
--ret_folders return folders, not just files (default:
False)
--sort sort files by name within each folder
(default: False)
--exts EXTS
--prefix Include Anthropic's suggested prose intro?
(default: False)
--out Include notebook cell outputs? (default:
False)
--include_base Include full path in src? (default: False)
--title TITLE Optional title attr for Documents element
--max_size MAX_SIZE Skip files larger than this (bytes)