mirror of
				https://github.com/cookiengineer/audacity
				synced 2025-10-26 07:13:49 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			213 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
			
		
		
	
	
			213 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
| <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" 
 | |
|                "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
 | |
| <chapter id="raptor-parsers">
 | |
| <title>Parsers in Raptor (syntax to triples)</title>
 | |
| 
 | |
| <section id="raptor-parsers-intro">
 | |
| <title>Introduction</title>
 | |
| 
 | |
| <para>This section describes the parsers that can be compiled into
 | |
| Raptor and their features.  The exact parsers supported may vary
 | |
| by different builds of raptor and can be queried at run-time by
 | |
| use of the 
 | |
| <link linkend="raptor-parsers-enumerate"><function>raptor_parsers_enumerate</function></link>
 | |
| and
 | |
| <link linkend="raptor-syntaxes-enumerate"><function>raptor_syntaxes_enumerate</function></link>
 | |
| functions</para>
 | |
| 
 | |
| <para>The optional features that may be set on parsers can also
 | |
| be queried at run-time iwth the 
 | |
| <link linkend="raptor-features-enumerate"><function>raptor_features_enumerate</function></link>
 | |
| function.</para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parser-grddl">
 | |
| <title>GRDDL parser (name <literal>grddl</literal>)</title>
 | |
| <para>A parser for the
 | |
| <ulink url="http://www.w3.org/TR/2007/PR-grddl-20070716/">Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</ulink>,
 | |
| W3C Proposed Recommendation of 2007-07-16 which allows reading XHTML
 | |
| and XML as RDF triples by using profiles in the document that declare
 | |
| XSLT transforms from the XHTML or XML content into RDF/XML or other
 | |
| RDF syntax which can then be parsed.</para>
 | |
| 
 | |
| <para>The GRDDL parser is rather complex and different from the other
 | |
| parsers in that it retrieves URIs, reads HTML documents (possibly
 | |
| with errors), transforms the documents with XSLT and turns the result
 | |
| into a single graph.  The default configuration of the GRDDL parser
 | |
| also reads microformats (hcard, hcalendar) and follows <link>
 | |
| tags that point to RDF/XML.  Parts of the GRDDL process can be
 | |
| altered by configuration, which are describe below.
 | |
| </para>
 | |
| 
 | |
| <para>The URIs that are processed during GRDDL operations can be checked
 | |
| and skipped if required using a handler set with the
 | |
| <link linkend="raptor-parser-set-uri-filter"><function>raptor_parser_set_uri_filter()</function></link>
 | |
| function.  If the handler returns non-0, the URI is rejected.
 | |
| This uses
 | |
| <link linkend="raptor-www-set-uri-filter"><function>raptor_www_set_uri_filter()</function></link>
 | |
| internally.
 | |
| </para>
 | |
| 
 | |
| <para>If the value of feature
 | |
| <link linkend="RAPTOR-FEATURE-WWW-TIMEOUT:CAPS"><literal>RAPTOR_FEATURE_WWW_TIMEOUT</literal></link>
 | |
| if set to a number >0, it is used as the timeout in seconds
 | |
| for retrieving of URIs during GRDDL processing.
 | |
| This uses
 | |
| <link linkend="raptor-www-set-connection-timeout"><function>raptor_www_set_connection_timeout()</function></link>
 | |
| internally.
 | |
| </para>
 | |
| 
 | |
| <para>The hardcoded support for hcard and hcalendar
 | |
| microformats can be disabled by setting parser feature
 | |
| <link linkend="RAPTOR-FEATURE-MICROFORMATS:CAPS"><literal>RAPTOR_FEATURE_MICROFORMATS</literal></link>
 | |
| to 0
 | |
| or using
 | |
| <link linkend="raptor-set-parser-strict"><function>raptor_set_parser_strict()</function></link>
 | |
| with a value of 1.
 | |
| </para>
 | |
| 
 | |
| <para>The GRDDL parser by default will try an XML parser on the
 | |
| content followed by a lax HTML parser.  This can be disabled by
 | |
| setting parser feature
 | |
| <link linkend="RAPTOR-FEATURE-HTML-TAG-SOUP:CAPS"><literal>RAPTOR_FEATURE_HTML_TAG_SOUP</literal></link>
 | |
| to 0
 | |
| or using 
 | |
| <link linkend="raptor-set-parser-strict"><function>raptor_set_parser_strict()</function></link>
 | |
| with a value of 1.
 | |
| </para>
 | |
| 
 | |
| <para>The GRDDL parser by default will try to look for an HTML
 | |
| <link> tag that points to RDF/XML.  This can be disabled by
 | |
| setting parser feature
 | |
| <link linkend="RAPTOR-FEATURE-HTML-LINK:CAPS"><literal>RAPTOR_FEATURE_HTML_LINK</literal></link>
 | |
| to 0
 | |
| or using 
 | |
| <link linkend="raptor-set-parser-strict"><function>raptor_set_parser_strict()</function></link>
 | |
| with a value of 1.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parser-guess">
 | |
| <title>Guess parser (name <literal>guess</literal>)</title>
 | |
| <para>
 | |
| This is a special parser that picks the actual parser to use based
 | |
| on the content type, the content bytes or the content identifier.  The
 | |
| content name can be either from a local file or from a URI.
 | |
| </para>
 | |
| 
 | |
| <para>If the protocol that delivered the content (such as HTTP)
 | |
| provided a <emphasis>Content Type</emphasis> (aka MIME Type) then
 | |
| this will be the primary means for identifying th ecotnent.
 | |
| </para>
 | |
| 
 | |
| <para>The secondary means to identify the content are the bytes of
 | |
| the content (if available), otherwise the content identifier is used,
 | |
| which is the least reliable.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parser-ntriples">
 | |
| <title>N-Triples parser (name <literal>ntriples</literal>)</title>
 | |
| 
 | |
| <para>A parser for the
 | |
| <ulink url="http://www.w3.org/TR/rdf-testcases/#ntriples">N-Triples</ulink>
 | |
| syntax as used by the 
 | |
| <ulink url="http://www.w3.org/2001/sw/RDFCore/">W3C RDF Core working group</ulink>
 | |
| for the <ulink url="http://www.w3.org/TR/rdf-testcases/">RDF Test Cases</ulink>.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parser-rdfxml">
 | |
| <title>RDF/XML parser - default (name <literal>rdfxml</literal>)</title>
 | |
| <para>
 | |
| A parser for the standard
 | |
| <ulink url="http://www.w3.org/TR/rdf-syntax-grammar/">RDF/XML syntax</ulink>
 | |
| as revised by the
 | |
| <ulink url="http://www.w3.org/2001/sw/RDFCore/">W3C RDF Core working group</ulink>.</para>
 | |
| 
 | |
| <para>This is the default parser in Raptor.</para>
 | |
| 
 | |
| <para>Features of this parser:</para>
 | |
| <itemizedlist>
 | |
| <listitem><para>Fully handles the <ulink url="http://www.w3.org/TR/rdf-syntax-grammar/">RDF/XML syntax updates</ulink> for <ulink url="http://www.w3.org/TR/xmlbase/">XML Base</ulink>, <literal>xml:lang</literal>, RDF datatyping and Collections.</para></listitem>
 | |
| 
 | |
| <listitem><para>Handles all RDF vocabularies such as <ulink url="http://www.foaf-project.org/">FOAF</ulink>, <ulink url="http://www.purl.org/rss/1.0/">RSS 1.0</ulink>, <ulink url="http://dublincore.org/">Dublin Core</ulink>, <ulink url="http://www.w3.org/TR/owl-features/">OWL</ulink>, <ulink url="http://usefulinc.com/doap">DOAP</ulink></para></listitem>
 | |
| 
 | |
| <listitem><para>Handles <literal>rdf:resource</literal> / <literal>resource</literal> attributes</para></listitem>
 | |
| 
 | |
| <listitem><para>Uses <ulink url="http://expat.sourceforge.net/">expat</ulink> and/or (GNOME) <ulink url="http://xmlsoft.org/">libxml</ulink> XML parsers as available or required</para></listitem>
 | |
| 
 | |
| </itemizedlist>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parser-rss-tag-soup">
 | |
| <title>RSS Tag Soup parser (name <literal>rss-tag-soup</literal>)</title>
 | |
| 
 | |
| <para>A parser for the multiple XML RSS formats that use the elements
 | |
| such as <literal>channel</literal>, <literal>item</literal>,
 | |
| <literal>title</literal>, <literal>description</literal>
 | |
| in different ways.
 | |
| This includes support for the Atom 1.0 syndication format defined in IETF
 | |
| <ulink url="http://www.ietf.org/rfc/rfc4287.txt">RFC 4287</ulink>
 | |
| </para>
 | |
| 
 | |
| <para>The parser attempts to turn the input into
 | |
| <ulink url="http://www.purl.org/rss/1.0/">RSS 1.0</ulink>
 | |
| RDF triples in the RSS 1.0 model of a syndication feed.
 | |
| This includes triples for RSS Enclosures.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| True <ulink url="http://www.purl.org/rss/1.0/">RSS 1.0</ulink> when
 | |
| wanted to be used as a full RDF vocabulary, is best parsed by the
 | |
| RDF/XML parser (name <literal>rdfxml</literal>).
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parser-trig">
 | |
| <title>TRiG parser (name <literal>trig</literal>)</title>
 | |
| 
 | |
| <para>A parser for the
 | |
| <ulink url="http://www.wiwiss.fu-berlin.de/suhl/bizer/TriG/Spec/">TriG - Turtle with Named Graphs</ulink>
 | |
| syntax.
 | |
| </para>
 | |
| 
 | |
| <para>The parser is alpha quality and may not support the entire TRiG
 | |
| specification.</para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parser-turtle">
 | |
| <title>Turtle Terse RDF Triple Language parser (name <literal>turtle</literal>)</title>
 | |
| 
 | |
| <para>A parser for the
 | |
| <ulink url="http://www.dajobe.org/2004/01/turtle/">Turtle Terse RDF Triple Language</ulink>
 | |
| syntax, designed as a useful subset of
 | |
| <ulink url="http://www.w3.org/DesignIssues/Notation3">Notation 3</ulink>.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| </chapter>
 | |
| 
 | |
| <!--
 | |
| Local variables:
 | |
| mode: sgml
 | |
| sgml-parent-document: ("raptor-docs.xml" "book" "part")
 | |
| End:
 | |
| -->
 |