Parsing & serialization
From a &str
use sup_xml::{parse_str, ParseOptions};
let doc = parse_str("<r/>", &ParseOptions::default())?;parse_str is the simplest entry point — input must be valid UTF-8
(Rust enforces that on &str). For documents whose declared encoding
isn’t UTF-8, use parse_bytes.
From bytes (any encoding)
use sup_xml::{parse_bytes, ParseOptions};
// Encoding auto-detected from the XML declaration / BOM / WHATWG sniff.let doc = parse_bytes(include_bytes!("doc.xml"), &ParseOptions::default())?;
// Or with explicit optionslet opts = ParseOptions { recovery_mode: true, skip_inter_element_whitespace: true, ..Default::default()};let doc = parse_bytes(include_bytes!("doc.xml"), &opts)?;Supported encodings out of the box: UTF-8, UTF-16 (BE/LE), UTF-32, ASCII,
and any encoding label in the
WHATWG encoding spec
(Shift_JIS, EUC-JP, GB18030, Windows-125x, ISO-8859-x, …) when the
full-encodings feature is on. See the
character encodings guide for the full matrix.
Common ParseOptions
| Field | Default | Effect |
|---|---|---|
namespace_aware | false | Resolve xmlns:/xmlns= declarations into qualified names (set true for namespaced documents) |
recovery_mode | false | Continue past non-fatal errors; surface them via recovered_errors() |
skip_inter_element_whitespace | false | Drop whitespace between element tags (“ignorable” per XML 1.0 §2.10) |
max_entity_expansion_bytes | 1_000_000 | Defuse billion-laughs attacks; raise for trusted large entities |
max_element_depth | 256 | Stack-overflow defence against pathologically nested input |
external_resolver | None | Opt-in for DTD / external-entity loads — see security model |
load_external_dtd | false | Fetch + parse external DTD subsets when a resolver is present |
validating | false | Run DTD validation alongside well-formedness |
Serialization
use sup_xml::{serialize_to_string, serialize_formatted, serialize_with, SerializeOptions};
// Compact (one line, no inter-element whitespace)let xml: String = serialize_to_string(&doc);
// Pretty-printed (newlines + two-space indent)let pretty: String = serialize_formatted(&doc);
// Full control over the XML declaration, indentation, and HTML modelet opts = SerializeOptions { write_xml_decl: true, format: true, indent: " ".to_string(), // four-space indent ..Default::default()};let xml: String = serialize_with(&doc, &opts);Output round-trips byte-stable through parse_* → serialize_* for
inputs that don’t carry redundant whitespace or alternate
attribute-quote / numeric-character-reference encodings. (The XML spec
allows several valid representations of the same document; we
normalise to the canonical one.)
Streaming — SAX-style events
Two readers process XML as an event stream instead of building a DOM, so
you control how much state to retain. Both surface the same BytesEvents;
they differ in where the bytes come from.
In memory — XmlBytesReader
When the whole document is already in memory (a &[u8]), XmlBytesReader
is a zero-copy SAX reader over it — same parser core as the DOM path, no
tree allocation:
use sup_xml::{XmlBytesReader, BytesEvent};
let mut r = XmlBytesReader::from_bytes(b"<r><a/><b>hi</b></r>")?;loop { match r.next()? { BytesEvent::Eof => break, BytesEvent::StartElement(t) => println!("<{}>", String::from_utf8_lossy(t.name())), BytesEvent::EndElement(t) => println!("</{}>", String::from_utf8_lossy(t.name())), BytesEvent::Text(t) => { let bytes = t.as_bytes(); if !bytes.iter().all(u8::is_ascii_whitespace) { println!(" text: {:?}", String::from_utf8_lossy(bytes)); } } _ => {} }}Tag names and text expose borrowed byte slices (&[u8]) into the input
buffer, so the inner loop allocates only when it explicitly converts
(e.g. String::from_utf8_lossy(t.as_bytes())). That’s what lets the
streaming benches hit 3+ GB/s on hot fixtures —
see the performance reference.
Larger than memory — XmlByteStreamReader
For documents too large to hold in memory, XmlByteStreamReader pulls
from any io::Read (a file, socket, decompressing reader, stdin…)
through a rolling buffer, so peak memory stays bounded by the buffer
size — roughly constant no matter how large the input. Drive it with
next_event:
use std::fs::File;use sup_xml::{XmlByteStreamReader, BytesEvent, DEFAULT_BUFFER_SIZE};
let file = File::open("catalog.xml")?;let mut reader = XmlByteStreamReader::new(file, DEFAULT_BUFFER_SIZE)?;
let mut titles = Vec::new();loop { match reader.next_event()? { BytesEvent::Eof => break, // Each event borrows the rolling buffer and is valid only until // the next pull — copy out what you need before looping. BytesEvent::Text(t) => titles.push(String::from_utf8_lossy(t.as_bytes()).into_owned()), _ => {} }}next_event yields the same BytesEvents as XmlBytesReader::next; the
borrow checker ties each event to &mut self, so you must consume it
before pulling the next (the same zero-copy contract as quick-xml’s
read_event). Peak memory is ~2× the buffer size — DEFAULT_BUFFER_SIZE
is 10 MB; pass HUGE_BUFFER_SIZE (1 GB) or a custom size to
XmlByteStreamReader::new if a single atomic token (an element name
or attribute value — text content is not atomic and splits across events)
exceeds the buffer. Streaming is UTF-8 only.
For async sources (network sockets, async file handles), use the
tokio feature and parse_async instead — see the
async guide.
Working with the parsed tree
use sup_xml::Document;
let doc: Document = parse_str("<r a='1'><b/></r>", &Default::default())?;let root = doc.root();println!("root: <{}>", root.name());for attr in root.attributes() { println!(" @{}={:?}", attr.name(), attr.value());}for child in root.children() { println!(" child: <{}>", child.name());}The DOM is the same Document type that XPath, XSLT, Schematron, and
serde-de all operate on, so no translation between modes — parse once,
query / transform / serialize from the same structure.
Common parse errors
parse_str / parse_bytes return Result<Document, XmlError>.
XmlError carries the source-location of the failure (line /
column) and a structured error code (XmlErrorKind) for switching
on the failure cause without parsing the message string. Recovery
mode lets you walk past the failure and collect a list of recovered
errors via reader.recovered_errors() — see the
recovery guide.