Skip to content

Parsing & serialization

From a &str

use sup_xml::{parse_str, ParseOptions};
let doc = parse_str("<r/>", &ParseOptions::default())?;

parse_str is the simplest entry point — input must be valid UTF-8 (Rust enforces that on &str). For documents whose declared encoding isn’t UTF-8, use parse_bytes.

From bytes (any encoding)

use sup_xml::{parse_bytes, ParseOptions};
// Encoding auto-detected from the XML declaration / BOM / WHATWG sniff.
let doc = parse_bytes(include_bytes!("doc.xml"), &ParseOptions::default())?;
// Or with explicit options
let opts = ParseOptions {
recovery_mode: true,
skip_inter_element_whitespace: true,
..Default::default()
};
let doc = parse_bytes(include_bytes!("doc.xml"), &opts)?;

Supported encodings out of the box: UTF-8, UTF-16 (BE/LE), UTF-32, ASCII, and any encoding label in the WHATWG encoding spec (Shift_JIS, EUC-JP, GB18030, Windows-125x, ISO-8859-x, …) when the full-encodings feature is on. See the character encodings guide for the full matrix.

Common ParseOptions

FieldDefaultEffect
namespace_awarefalseResolve xmlns:/xmlns= declarations into qualified names (set true for namespaced documents)
recovery_modefalseContinue past non-fatal errors; surface them via recovered_errors()
skip_inter_element_whitespacefalseDrop whitespace between element tags (“ignorable” per XML 1.0 §2.10)
max_entity_expansion_bytes1_000_000Defuse billion-laughs attacks; raise for trusted large entities
max_element_depth256Stack-overflow defence against pathologically nested input
external_resolverNoneOpt-in for DTD / external-entity loads — see security model
load_external_dtdfalseFetch + parse external DTD subsets when a resolver is present
validatingfalseRun DTD validation alongside well-formedness

Serialization

use sup_xml::{serialize_to_string, serialize_formatted, serialize_with, SerializeOptions};
// Compact (one line, no inter-element whitespace)
let xml: String = serialize_to_string(&doc);
// Pretty-printed (newlines + two-space indent)
let pretty: String = serialize_formatted(&doc);
// Full control over the XML declaration, indentation, and HTML mode
let opts = SerializeOptions {
write_xml_decl: true,
format: true,
indent: " ".to_string(), // four-space indent
..Default::default()
};
let xml: String = serialize_with(&doc, &opts);

Output round-trips byte-stable through parse_*serialize_* for inputs that don’t carry redundant whitespace or alternate attribute-quote / numeric-character-reference encodings. (The XML spec allows several valid representations of the same document; we normalise to the canonical one.)

Streaming — SAX-style events

Two readers process XML as an event stream instead of building a DOM, so you control how much state to retain. Both surface the same BytesEvents; they differ in where the bytes come from.

In memory — XmlBytesReader

When the whole document is already in memory (a &[u8]), XmlBytesReader is a zero-copy SAX reader over it — same parser core as the DOM path, no tree allocation:

use sup_xml::{XmlBytesReader, BytesEvent};
let mut r = XmlBytesReader::from_bytes(b"<r><a/><b>hi</b></r>")?;
loop {
match r.next()? {
BytesEvent::Eof => break,
BytesEvent::StartElement(t) =>
println!("<{}>", String::from_utf8_lossy(t.name())),
BytesEvent::EndElement(t) =>
println!("</{}>", String::from_utf8_lossy(t.name())),
BytesEvent::Text(t) => {
let bytes = t.as_bytes();
if !bytes.iter().all(u8::is_ascii_whitespace) {
println!(" text: {:?}", String::from_utf8_lossy(bytes));
}
}
_ => {}
}
}

Tag names and text expose borrowed byte slices (&[u8]) into the input buffer, so the inner loop allocates only when it explicitly converts (e.g. String::from_utf8_lossy(t.as_bytes())). That’s what lets the streaming benches hit 3+ GB/s on hot fixtures — see the performance reference.

Larger than memory — XmlByteStreamReader

For documents too large to hold in memory, XmlByteStreamReader pulls from any io::Read (a file, socket, decompressing reader, stdin…) through a rolling buffer, so peak memory stays bounded by the buffer size — roughly constant no matter how large the input. Drive it with next_event:

use std::fs::File;
use sup_xml::{XmlByteStreamReader, BytesEvent, DEFAULT_BUFFER_SIZE};
let file = File::open("catalog.xml")?;
let mut reader = XmlByteStreamReader::new(file, DEFAULT_BUFFER_SIZE)?;
let mut titles = Vec::new();
loop {
match reader.next_event()? {
BytesEvent::Eof => break,
// Each event borrows the rolling buffer and is valid only until
// the next pull — copy out what you need before looping.
BytesEvent::Text(t) =>
titles.push(String::from_utf8_lossy(t.as_bytes()).into_owned()),
_ => {}
}
}

next_event yields the same BytesEvents as XmlBytesReader::next; the borrow checker ties each event to &mut self, so you must consume it before pulling the next (the same zero-copy contract as quick-xml’s read_event). Peak memory is ~2× the buffer size — DEFAULT_BUFFER_SIZE is 10 MB; pass HUGE_BUFFER_SIZE (1 GB) or a custom size to XmlByteStreamReader::new if a single atomic token (an element name or attribute value — text content is not atomic and splits across events) exceeds the buffer. Streaming is UTF-8 only.

For async sources (network sockets, async file handles), use the tokio feature and parse_async instead — see the async guide.

Working with the parsed tree

use sup_xml::Document;
let doc: Document = parse_str("<r a='1'><b/></r>", &Default::default())?;
let root = doc.root();
println!("root: <{}>", root.name());
for attr in root.attributes() {
println!(" @{}={:?}", attr.name(), attr.value());
}
for child in root.children() {
println!(" child: <{}>", child.name());
}

The DOM is the same Document type that XPath, XSLT, Schematron, and serde-de all operate on, so no translation between modes — parse once, query / transform / serialize from the same structure.

Common parse errors

parse_str / parse_bytes return Result<Document, XmlError>. XmlError carries the source-location of the failure (line / column) and a structured error code (XmlErrorKind) for switching on the failure cause without parsing the message string. Recovery mode lets you walk past the failure and collect a list of recovered errors via reader.recovered_errors() — see the recovery guide.