apidoc: Stop parsing docutils trees with regexps on its pseudo-XML
Motivation:
This commit started as a simple change: I wanted to replace:
`<type> <IRI>`
with:
``<type> <IRI>``
Unfortunately, this syntax looks too much like XML for its own good,
so it was stripped by the process_paragraph
method, because it reads
the docutils pseudo-XML representation and strips every tag it doesn't
know about.
(I'm saying pseudo-XML, because my poor <type> <IRI>
string was not
escaped with XML entities, so it was in fact undistinguishable from
actual XML tags).
Changes:
Therefore, stops using the XML-like string representation of docutils trees, and visits tree nodes directly instead. Conveniently, this is already in a node visit, so we can reuse that; simply by iterating recursively instead of stopping the recursion as soon as we see a known node (ie. the visitors actually visited only nodes very close to the root).
This means that we needed to add methods to handle each node type,
and produce its ReST output. And since we don't have a global view
anymore, we need to return the produced ReST instead of appending
directly to self.data["description"]
, because handlers of parent
nodes may need to re-indent their children's output.o
This results in cleaner code (and also closer to what we expect from a visitor transformer), so it's a win too.
This has some other nice side-effects:
- our custom role code is now neatly restricted in
visit_problematic
, so it can't overflow, because docutils runsvisit_problematic
with only the role's string as child - it detects unexpected nodes, such as the
title_reference
roles, which is usually produced when accidentally using single-backquotes instead of double-backquotes to wrap inline code blocks (it happens a lot when one is used to markdown)
Test Plan
I checked most of the endpoints' documentation, and it's visually identical.
Migrated from D5971 (view on Phabricator)