According to Anthropic, “it’s essential to structure your prompts in a way that clearly separates the input data from the instructions”. They recommend using something like the following:
Here are some documents for you to reference for your task:<documents><document index="1"><source>(URL, file name, hash, etc)</source><document_content>(the text content)</document_content></document></documents>
We will create some small helper functions to make it easier to generate context in this format, although we’re use <src> instead of <source> to avoid conflict with that HTML tag. Although it’s based on Anthropic’s recommendation, it’s likely to work well with other models too.
URL, filename, etc; defaults to md5(content) if not provided
Returns
namedtuple
Exported source
def mk_doctype(content:str, # The document content src:Optional[str]=None# URL, filename, etc; defaults to `md5(content)` if not provided ) -> namedtuple:"Create a `doctype` named tuple"if src isNone: src = hashlib.md5(content.encode()).hexdigest()[:8]return doctype(_add_nls(str(src).strip()), _add_nls(content.strip()))
This is a convenience wrapper to ensure that a doctype has the needed information in the right format.
doc ='This is a "sample"'mk_doctype(doc)
doctype(src='\n47e19350\n', content='\nThis is a "sample"\n')
Create an ft format tuple for a single doc in Anthropic’s recommended format
Type
Default
Details
index
int
The document index
content
str
The document content
src
Optional
None
URL, filename, etc; defaults to md5(content) if not provided
kwargs
Returns
tuple
Exported source
def mk_doc(index:int, # The document index content:str, # The document content src:Optional[str]=None, # URL, filename, etc; defaults to `md5(content)` if not provided**kwargs ) ->tuple:"Create an `ft` format tuple for a single doc in Anthropic's recommended format" dt = mk_doctype(content, src) content = Document_content(NotStr(dt.content)) src = Src(NotStr(dt.src))return Document(src, content, index=index, **kwargs)
We can now generate XML for one document in the suggested format:
mk_doc(1, doc, title="test")
<document index="1" title="test"><src>47e19350</src><document-content>This is a "sample"</document-content></document>
Create an XML string containing docs in Anthropic’s recommended format
Type
Default
Details
docs
list
The content of each document
srcs
Optional
None
URLs, filenames, etc; each one defaults to md5(content) if not provided
prefix
bool
True
Include Anthropic’s suggested prose intro?
details
Optional
None
Optional list of dicts with additional attrs for each doc
Returns
str
Exported source
def docs_xml(docs:list[str], # The content of each document srcs:Optional[list]=None, # URLs, filenames, etc; each one defaults to `md5(content)` if not provided prefix:bool=True, # Include Anthropic's suggested prose intro? details:Optional[list]=None# Optional list of dicts with additional attrs for each doc )->str:"Create an XML string containing `docs` in Anthropic's recommended format" pre ='Here are some documents for you to reference for your task:\n\n'if prefix else''if srcs isNone: srcs = [None]*len(docs)if details isNone: details = [{}]*len(docs) docs = (mk_doc(i+1, d, s, **kw) for i,(d,s,kw) inenumerate(zip(docs,srcs,details)))return pre + to_xml(Documents(docs))
Putting it all together, we have our final XML format:
docs = [doc, 'And another one']srcs = [None, 'doc.txt']print(docs_xml(docs, srcs))
Here are some documents for you to reference for your task:
<documents><document index="1"><src>
47e19350
</src><document-content>
This is a "sample"
</document-content></document><document index="2"><src>
doc.txt
</src><document-content>
And another one
</document-content></document></documents>
Context creation
Now that we can generate Anthropic’s XML format, let’s make it easy for a few common cases.
File list to context
For generating XML context from files, we’ll just read them as text and use the file names as src.
def files2ctx( fnames:list[Union[str,Path]], # List of file names to add to context prefix:bool=True# Include Anthropic's suggested prose intro?)->str: # XML for LM context fnames = [Path(o) for o in fnames] contents = [o.read_text() for o in fnames]return docs_xml(contents, fnames, prefix=prefix)
Here are some documents for you to reference for your task:<documents><document index="1"><src>samples/sample_core.py</src><document-content>import inspectempty = inspect.Parameter.emptymodels = 'claude-3-opus-20240229','claude-3-sonnet-20240229','claude-3-haiku-20240307'</document-content></document><document index="2"><src>samples/sample_styles.css</src><document-content>.cell { margin-bottom: 1rem; }.cell > .sourceCode { margin-bottom: 0; }.cell-output > pre { margin-bottom: 0; }</document-content></document></documents>
@delegates(globtastic)def folder2ctx( folder:Union[str,Path], # Folder name containing files to add to context prefix:bool=True, # Include Anthropic's suggested prose intro?**kwargs # Passed to `globtastic`)->str: # XML for Claude context fnames = globtastic(folder, **kwargs)return files2ctx(fnames, prefix=prefix)