mirror of
https://github.com/cookiengineer/audacity
synced 2025-05-05 06:09:47 +02:00
234 lines
8.5 KiB
XML
234 lines
8.5 KiB
XML
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
|
|
<chapter id="raptor-parsers">
|
|
<title>Parsers in Raptor (syntax to triples)</title>
|
|
|
|
<section id="raptor-parsers-intro">
|
|
<title>Introduction</title>
|
|
|
|
<para>This section describes the parsers that can be compiled into
|
|
Raptor and their features. The exact parsers supported may vary
|
|
by different builds of raptor and can be queried at run-time by
|
|
use of the
|
|
<link linkend="raptor-parsers-enumerate"><function>raptor_parsers_enumerate</function></link>
|
|
and
|
|
<link linkend="raptor-syntaxes-enumerate"><function>raptor_syntaxes_enumerate</function></link>
|
|
functions</para>
|
|
|
|
<para>The optional features that may be set on parsers can also
|
|
be queried at run-time iwth the
|
|
<link linkend="raptor-features-enumerate"><function>raptor_features_enumerate</function></link>
|
|
function.</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-grddl">
|
|
<title>GRDDL parser (name <literal>grddl</literal>)</title>
|
|
<para>A parser for the
|
|
<ulink url="http://www.w3.org/TR/2007/PR-grddl-20070716/">Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</ulink>,
|
|
W3C Proposed Recommendation of 2007-07-16 which allows reading XHTML
|
|
and XML as RDF triples by using profiles in the document that declare
|
|
XSLT transforms from the XHTML or XML content into RDF/XML or other
|
|
RDF syntax which can then be parsed.</para>
|
|
|
|
<para>The GRDDL parser is rather complex and different from the other
|
|
parsers in that it retrieves URIs, reads HTML documents (possibly
|
|
with errors), transforms the documents with XSLT and turns the result
|
|
into a single graph. The default configuration of the GRDDL parser
|
|
also reads microformats (hcard, hcalendar) and follows <link>
|
|
tags that point to RDF/XML. Parts of the GRDDL process can be
|
|
altered by configuration, which are describe below.
|
|
</para>
|
|
|
|
<para>The URIs that are processed during GRDDL operations can be checked
|
|
and skipped if required using a handler set with the
|
|
<link linkend="raptor-parser-set-uri-filter"><function>raptor_parser_set_uri_filter()</function></link>
|
|
function. If the handler returns non-0, the URI is rejected.
|
|
This uses
|
|
<link linkend="raptor-www-set-uri-filter"><function>raptor_www_set_uri_filter()</function></link>
|
|
internally.
|
|
</para>
|
|
|
|
<para>If the value of feature
|
|
<link linkend="RAPTOR-FEATURE-WWW-TIMEOUT:CAPS"><literal>RAPTOR_FEATURE_WWW_TIMEOUT</literal></link>
|
|
if set to a number >0, it is used as the timeout in seconds
|
|
for retrieving of URIs during GRDDL processing.
|
|
This uses
|
|
<link linkend="raptor-www-set-connection-timeout"><function>raptor_www_set_connection_timeout()</function></link>
|
|
internally.
|
|
</para>
|
|
|
|
<para>The hardcoded support for hcard and hcalendar
|
|
microformats can be disabled by setting parser feature
|
|
<link linkend="RAPTOR-FEATURE-MICROFORMATS:CAPS"><literal>RAPTOR_FEATURE_MICROFORMATS</literal></link>
|
|
to 0
|
|
or using
|
|
<link linkend="raptor-set-parser-strict"><function>raptor_set_parser_strict()</function></link>
|
|
with a value of 1.
|
|
</para>
|
|
|
|
<para>The GRDDL parser by default will try an XML parser on the
|
|
content followed by a lax HTML parser. This can be disabled by
|
|
setting parser feature
|
|
<link linkend="RAPTOR-FEATURE-HTML-TAG-SOUP:CAPS"><literal>RAPTOR_FEATURE_HTML_TAG_SOUP</literal></link>
|
|
to 0
|
|
or using
|
|
<link linkend="raptor-set-parser-strict"><function>raptor_set_parser_strict()</function></link>
|
|
with a value of 1.
|
|
</para>
|
|
|
|
<para>The GRDDL parser by default will try to look for an HTML
|
|
<link> tag that points to RDF/XML. This can be disabled by
|
|
setting parser feature
|
|
<link linkend="RAPTOR-FEATURE-HTML-LINK:CAPS"><literal>RAPTOR_FEATURE_HTML_LINK</literal></link>
|
|
to 0
|
|
or using
|
|
<link linkend="raptor-set-parser-strict"><function>raptor_set_parser_strict()</function></link>
|
|
with a value of 1.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-guess">
|
|
<title>Guess parser (name <literal>guess</literal>)</title>
|
|
<para>
|
|
This is a special parser that picks the actual parser to use based
|
|
on the content type, the content bytes or the content identifier. The
|
|
content name can be either from a local file or from a URI.
|
|
</para>
|
|
|
|
<para>If the protocol that delivered the content (such as HTTP)
|
|
provided a <emphasis>Content Type</emphasis> (aka MIME Type) then
|
|
this will be the primary means for identifying th ecotnent.
|
|
</para>
|
|
|
|
<para>The secondary means to identify the content are the bytes of
|
|
the content (if available), otherwise the content identifier is used,
|
|
which is the least reliable.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-ntriples">
|
|
<title>N-Triples parser (name <literal>ntriples</literal>)</title>
|
|
|
|
<para>A parser for the
|
|
<ulink url="http://www.w3.org/TR/rdf-testcases/#ntriples">N-Triples</ulink>
|
|
syntax as used by the
|
|
<ulink url="http://www.w3.org/2001/sw/RDFCore/">W3C RDF Core working group</ulink>
|
|
for the <ulink url="http://www.w3.org/TR/rdf-testcases/">RDF Test Cases</ulink>.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-rdfa">
|
|
<title>RDFa parser - (name <literal>rdfa</literal>)</title>
|
|
<para>
|
|
A parser for the
|
|
<ulink url="http://www.w3.org/TR/2008/CR-rdfa-syntax-20080620/">RDFa</ulink>
|
|
syntax, W3C Candidate Recommendation 20 June 2008 which allows reading XHTML
|
|
and XML as RDF triples by interpreting attributes on elements to
|
|
describe which ones have RDF semantics. This is implemented via
|
|
<ulink url="http://rdfa.digitalbazaar.com/librdfa/">librdfa</ulink>
|
|
linked inside Raptor, written by Manu Sporny of Digital Bazaar,
|
|
and licensed with the same license as Raptor.
|
|
</para>
|
|
|
|
<para>
|
|
This parser is beta quality and passes all but 4 of the RDFa tests as
|
|
of Raptor 1.4.18.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-rdfxml">
|
|
<title>RDF/XML parser - default (name <literal>rdfxml</literal>)</title>
|
|
<para>
|
|
A parser for the standard
|
|
<ulink url="http://www.w3.org/TR/rdf-syntax-grammar/">RDF/XML syntax</ulink>
|
|
as revised by the
|
|
<ulink url="http://www.w3.org/2001/sw/RDFCore/">W3C RDF Core working group</ulink>.</para>
|
|
|
|
<para>This is the default parser in Raptor.</para>
|
|
|
|
<para>Features of this parser:</para>
|
|
<itemizedlist>
|
|
<listitem><para>Fully handles the <ulink url="http://www.w3.org/TR/rdf-syntax-grammar/">RDF/XML syntax updates</ulink> for <ulink url="http://www.w3.org/TR/xmlbase/">XML Base</ulink>, <literal>xml:lang</literal>, RDF datatyping and Collections.</para></listitem>
|
|
|
|
<listitem><para>Handles all RDF vocabularies such as <ulink url="http://www.foaf-project.org/">FOAF</ulink>, <ulink url="http://www.purl.org/rss/1.0/">RSS 1.0</ulink>, <ulink url="http://dublincore.org/">Dublin Core</ulink>, <ulink url="http://www.w3.org/TR/owl-features/">OWL</ulink>, <ulink url="http://usefulinc.com/doap">DOAP</ulink></para></listitem>
|
|
|
|
<listitem><para>Handles <literal>rdf:resource</literal> / <literal>resource</literal> attributes</para></listitem>
|
|
|
|
<listitem><para>Uses <ulink url="http://expat.sourceforge.net/">expat</ulink> and/or (GNOME) <ulink url="http://xmlsoft.org/">libxml</ulink> XML parsers as available or required</para></listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-rss-tag-soup">
|
|
<title>RSS Tag Soup parser (name <literal>rss-tag-soup</literal>)</title>
|
|
|
|
<para>A parser for the multiple XML RSS formats that use the elements
|
|
such as <literal>channel</literal>, <literal>item</literal>,
|
|
<literal>title</literal>, <literal>description</literal>
|
|
in different ways.
|
|
This includes support for the Atom 1.0 syndication format defined in IETF
|
|
<ulink url="http://www.ietf.org/rfc/rfc4287.txt">RFC 4287</ulink>
|
|
</para>
|
|
|
|
<para>The parser attempts to turn the input into
|
|
<ulink url="http://www.purl.org/rss/1.0/">RSS 1.0</ulink>
|
|
RDF triples in the RSS 1.0 model of a syndication feed.
|
|
This includes triples for RSS Enclosures.
|
|
</para>
|
|
|
|
<para>
|
|
True <ulink url="http://www.purl.org/rss/1.0/">RSS 1.0</ulink> when
|
|
wanted to be used as a full RDF vocabulary, is best parsed by the
|
|
RDF/XML parser (name <literal>rdfxml</literal>).
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-trig">
|
|
<title>TRiG parser (name <literal>trig</literal>)</title>
|
|
|
|
<para>A parser for the
|
|
<ulink url="http://www.wiwiss.fu-berlin.de/suhl/bizer/TriG/Spec/">TriG - Turtle with Named Graphs</ulink>
|
|
syntax.
|
|
</para>
|
|
|
|
<para>The parser is alpha quality and may not support the entire TRiG
|
|
specification.</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section id="parser-turtle">
|
|
<title>Turtle Terse RDF Triple Language parser (name <literal>turtle</literal>)</title>
|
|
|
|
<para>A parser for the
|
|
<ulink url="http://www.dajobe.org/2004/01/turtle/">Turtle Terse RDF Triple Language</ulink>
|
|
syntax, designed as a useful subset of
|
|
<ulink url="http://www.w3.org/DesignIssues/Notation3">Notation 3</ulink>.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
</chapter>
|
|
|
|
<!--
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-parent-document: ("raptor-docs.xml" "book" "part")
|
|
End:
|
|
-->
|