Description
The "GMarkup" parser is intended to parse a simple markup format
that's a subset of XML. This is a small, efficient, easy-to-use
parser. It should not be used if you expect to interoperate with
other applications generating full-scale XML, and must not be used if you
expect to parse untrusted input. However, it's very
useful for application data files, config files, etc. where you
know your application will be the only one writing the file.
Full-scale XML parsers should be able to parse the subset used by
GMarkup, so you can easily migrate to full-scale XML at a later
time if the need arises.
GMarkup is not guaranteed to signal an error on all invalid XML;
the parser may accept documents that an XML parser would not.
However, XML documents which are not well-formed (which is a
weaker condition than being valid. See the
XML specification
for definitions of these terms.) are not considered valid GMarkup
documents.
Simplifications to XML include:
Only UTF-8 encoding is allowed
No user-defined entities
Processing instructions, comments and the doctype declaration
are "passed through" but are not interpreted in any way
No DTD or validation
The markup format does support:
Functions
g_markup_escape_text ()
gchar *
g_markup_escape_text (const gchar *text,
gssize length);
Escapes text so that the markup parser will parse it verbatim.
Less than, greater than, ampersand, etc. are replaced with the
corresponding entities. This function would typically be used
when writing out a file to be parsed with the markup parser.
Note that this function doesn't protect whitespace and line endings
from being processed according to the XML rules for normalization
of line endings and attribute values.
Note also that this function will produce character references in
the range of &x1; ... &x1f; for all control sequences
except for tabstop, newline and carriage return. The character
references in this range are not valid XML 1.0, but they are
valid XML 1.1 and will be accepted by the GMarkup parser.
Returns
a newly allocated string with the escaped text
g_markup_printf_escaped ()
gchar *
g_markup_printf_escaped (const char *format,
...);
Formats arguments according to format
, escaping
all string and character arguments in the fashion
of g_markup_escape_text(). This is useful when you
want to insert literal strings into XML-style markup
output, without having to worry that the strings
might themselves contain markup.
Returns
newly allocated result from formatting
operation. Free with g_free().
Since: 2.4
g_markup_vprintf_escaped ()
gchar *
g_markup_vprintf_escaped (const char *format,
va_list args);
Formats the data in args
according to format
, escaping
all string and character arguments in the fashion
of g_markup_escape_text(). See g_markup_printf_escaped().
Returns
newly allocated result from formatting
operation. Free with g_free().
Since: 2.4
g_markup_parse_context_new ()
GMarkupParseContext *
g_markup_parse_context_new (const GMarkupParser *parser,
GMarkupParseFlags flags,
gpointer user_data,
GDestroyNotify user_data_dnotify);
Creates a new parse context. A parse context is used to parse
marked-up documents. You can feed any number of documents into
a context, as long as no errors occur; once an error occurs,
the parse context can't continue to parse text (you have to
free it and create a new parse context).
g_markup_parse_context_parse ()
gboolean
g_markup_parse_context_parse (GMarkupParseContext *context,
const gchar *text,
gssize text_len,
GError **error);
Feed some data to the GMarkupParseContext.
The data need not be valid UTF-8; an error will be signaled if
it's invalid. The data need not be an entire document; you can
feed a document into the parser incrementally, via multiple calls
to this function. Typically, as you receive data from a network
connection or file, you feed each received chunk of data into this
function, aborting the process if an error occurs. Once an error
is reported, no further data may be fed to the GMarkupParseContext;
all errors are fatal.
Returns
FALSE if an error occurred, TRUE on success
g_markup_parse_context_get_position ()
void
g_markup_parse_context_get_position (GMarkupParseContext *context,
gint *line_number,
gint *char_number);
Retrieves the current line number and the number of the character on
that line. Intended for use in error messages; there are no strict
semantics for what constitutes the "current" line number other than
"the best number we could come up with for error messages."
g_markup_parse_context_get_element ()
const gchar *
g_markup_parse_context_get_element (GMarkupParseContext *context);
Retrieves the name of the currently open element.
If called from the start_element or end_element handlers this will
give the element_name as passed to those functions. For the parent
elements, see g_markup_parse_context_get_element_stack().
Returns
the name of the currently open element, or NULL
Since: 2.2
g_markup_parse_context_get_element_stack ()
const GSList *
g_markup_parse_context_get_element_stack
(GMarkupParseContext *context);
Retrieves the element stack from the internal state of the parser.
The returned GSList is a list of strings where the first item is
the currently open tag (as would be returned by
g_markup_parse_context_get_element()) and the next item is its
immediate parent.
This function is intended to be used in the start_element and
end_element handlers where g_markup_parse_context_get_element()
would merely return the name of the element that is being
processed.
Returns
the element stack, which must not be modified
Since: 2.16
g_markup_parse_context_push ()
void
g_markup_parse_context_push (GMarkupParseContext *context,
const GMarkupParser *parser,
gpointer user_data);
Temporarily redirects markup data to a sub-parser.
This function may only be called from the start_element handler of
a GMarkupParser. It must be matched with a corresponding call to
g_markup_parse_context_pop() in the matching end_element handler
(except in the case that the parser aborts due to an error).
All tags, text and other data between the matching tags is
redirected to the subparser given by parser
. user_data
is used
as the user_data for that parser. user_data
is also passed to the
error callback in the event that an error occurs. This includes
errors that occur in subparsers of the subparser.
The end tag matching the start tag for which this call was made is
handled by the previous parser (which is given its own user_data)
which is why g_markup_parse_context_pop() is provided to allow "one
last access" to the user_data
provided to this function. In the
case of error, the user_data
provided here is passed directly to
the error callback of the subparser and g_markup_parse_context_pop()
should not be called. In either case, if user_data
was allocated
then it ought to be freed from both of these locations.
This function is not intended to be directly called by users
interested in invoking subparsers. Instead, it is intended to be
used by the subparsers themselves to implement a higher-level
interface.
As an example, see the following implementation of a simple
parser that counts the number of tags encountered.
In order to allow this parser to be easily used as a subparser, the
following interface is provided:
The subparser would then be used as follows:
Since: 2.18
g_markup_parse_context_pop ()
gpointer
g_markup_parse_context_pop (GMarkupParseContext *context);
Completes the process of a temporary sub-parser redirection.
This function exists to collect the user_data allocated by a
matching call to g_markup_parse_context_push(). It must be called
in the end_element handler corresponding to the start_element
handler during which g_markup_parse_context_push() was called.
You must not call this function from the error callback -- the
user_data
is provided directly to the callback in that case.
This function is not intended to be directly called by users
interested in invoking subparsers. Instead, it is intended to
be used by the subparsers themselves to implement a higher-level
interface.
Since: 2.18
g_markup_parse_context_unref ()
void
g_markup_parse_context_unref (GMarkupParseContext *context);
Decreases the reference count of context
. When its reference count
drops to 0, it is freed.
Since: 2.36
g_markup_collect_attributes ()
gboolean
g_markup_collect_attributes (const gchar *element_name,
const gchar **attribute_names,
const gchar **attribute_values,
GError **error,
GMarkupCollectType first_type,
const gchar *first_attr,
...);
Collects the attributes of the element from the data passed to the
GMarkupParser start_element function, dealing with common error
conditions and supporting boolean values.
This utility function is not required to write a parser but can save
a lot of typing.
The element_name
, attribute_names
, attribute_values
and error
parameters passed to the start_element callback should be passed
unmodified to this function.
Following these arguments is a list of "supported" attributes to collect.
It is an error to specify multiple attributes with the same name. If any
attribute not in the list appears in the attribute_names
array then an
unknown attribute error will result.
The GMarkupCollectType field allows specifying the type of collection
to perform and if a given attribute must appear or is optional.
The attribute name is simply the name of the attribute to collect.
The pointer should be of the appropriate type (see the descriptions
under GMarkupCollectType) and may be NULL in case a particular
attribute is to be allowed but ignored.
This function deals with issuing errors for missing attributes
(of type G_MARKUP_ERROR_MISSING_ATTRIBUTE), unknown attributes
(of type G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE) and duplicate
attributes (of type G_MARKUP_ERROR_INVALID_CONTENT) as well
as parse errors for boolean-valued attributes (again of type
G_MARKUP_ERROR_INVALID_CONTENT). In all of these cases FALSE
will be returned and error
will be set as appropriate.
Returns
TRUE if successful
Since: 2.16
Types and Values
enum GMarkupError
Error codes returned by markup parsing.
G_MARKUP_ERROR
#define G_MARKUP_ERROR g_markup_error_quark ()
Error domain for markup parsing.
Errors in this domain will be from the GMarkupError enumeration.
See GError for information on error domains.
enum GMarkupParseFlags
Flags that affect the behaviour of the parser.
GMarkupParseContext
typedef struct _GMarkupParseContext GMarkupParseContext;
A parse context is used to parse a stream of bytes that
you expect to contain marked-up text.
See g_markup_parse_context_new(), GMarkupParser, and so
on for more details.
struct GMarkupParser
struct GMarkupParser {
/* Called for open tags <foo bar="baz"> */
void (*start_element) (GMarkupParseContext *context,
const gchar *element_name,
const gchar **attribute_names,
const gchar **attribute_values,
gpointer user_data,
GError **error);
/* Called for close tags </foo> */
void (*end_element) (GMarkupParseContext *context,
const gchar *element_name,
gpointer user_data,
GError **error);
/* Called for character data */
/* text is not nul-terminated */
void (*text) (GMarkupParseContext *context,
const gchar *text,
gsize text_len,
gpointer user_data,
GError **error);
/* Called for strings that should be re-saved verbatim in this same
* position, but are not otherwise interpretable. At the moment
* this includes comments and processing instructions.
*/
/* text is not nul-terminated. */
void (*passthrough) (GMarkupParseContext *context,
const gchar *passthrough_text,
gsize text_len,
gpointer user_data,
GError **error);
/* Called on error, including one set by other
* methods in the vtable. The GError should not be freed.
*/
void (*error) (GMarkupParseContext *context,
GError *error,
gpointer user_data);
};
Any of the fields in GMarkupParser can be NULL, in which case they
will be ignored. Except for the error
function, any of these callbacks
can set an error; in particular the G_MARKUP_ERROR_UNKNOWN_ELEMENT,
G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE, and G_MARKUP_ERROR_INVALID_CONTENT
errors are intended to be set from these callbacks. If you set an error
from a callback, g_markup_parse_context_parse() will report that error
back to its caller.
enum GMarkupCollectType
A mixed enumerated type and flags field. You must specify one type
(string, strdup, boolean, tristate). Additionally, you may optionally
bitwise OR the type with the flag G_MARKUP_COLLECT_OPTIONAL.
It is likely that this enum will be extended in the future to
support other types.