from playwrightnb import *
from html.parser import HTMLParser
PlaywrightNB
PlaywrightNB
provides some little quality-of-life helpers for interactive use of the wonderful Playwright library. It’s likely to be particularly of interest to folks using Jupyter.
Install
pip install playwrightnb
Overview
playwrightnb
provide two main functions: read_page_async(url)
, and read_page(url)
. They are identical except the 1st is async.
They return a tuple of the main HTML page contents, and a dict mapping iframe IDs to their HTML contents. They handle Javascript and other trickiness largely automatically, however you can pass a pause
parameter (in milliseconds) if you need to insert some manual waits. You can also pass a timeout
(also in milliseconds).
For instance, the Dyalog APL help information is provided inside an iframe that’s dynamically loaded by JS, but we are able to read it directly:
= 'https://help.dyalog.com/19.0/#UserGuide/Installation%20and%20Configuration/Shell%20Scripts.htm'
sh_url = read_page(sh_url) cts,iframes
Use h2md
to convert the HTML to markdown:
print(h2md(iframes['topic'])[94:250])
## Shell Scripts
Shell scripts are typically executed from a terminal (or shell).
A script is executed by typing its name. User input is entered from the
In the case where you want to grab some particular element using a CSS selector, use url2md
to read the page, find the selector, and convert to markdown. E.g, for accessing Discord’s JS-rendered docs:
= 'https://discord.com/developers/docs/interactions/application-commands'
url = '.page-content-scrolling-container'
sel = url2md(url, sel) md
print(md[856:1215])
Application commands are native ways to interact with apps in the Discord client. There are 3 types of commands accessible in different interfaces: the chat input, a message's context menu (top-right menu or right-clicking in a message), and a user's context menu (right-clicking on a user).
## Application Command Object
###### Application Command Naming
If you don’t need JS-rendering or other fanciness, use get2md
instead, which uses httpx.get
instead of playwright.