mirror of
				https://github.com/cookiengineer/audacity
				synced 2025-10-26 07:13:49 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			636 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
			
		
		
	
	
			636 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
| <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" 
 | |
|                "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
 | |
| <chapter id="tutorial-parsing" xmlns:xi="http://www.w3.org/2003/XInclude">
 | |
| <title>Parsing syntaxes to RDF Triples</title>
 | |
| 
 | |
| <section id="tutorial-parsing-intro">
 | |
| <title>Introduction</title>
 | |
| 
 | |
| <para>
 | |
| The typical sequence of operations to parse is to create a parser
 | |
| object, set various callback and features, start the parsing, send
 | |
| some syntax content to the parser object, finish the parsing and
 | |
| destroy the parser object.</para>
 | |
| 
 | |
| <para>Several parts of this process are optional, including actually
 | |
| using the triple results, which is useful as a syntax checking
 | |
| process.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| <section id="tutorial-parser-create">
 | |
| <title>Create the Parser object</title>
 | |
| 
 | |
| <para>The parser can be created directly from a known name such as
 | |
| <literal>rdfxml</literal> for the W3C Recommendation RDF/XML syntax:
 | |
| <programlisting>
 | |
|   raptor_parser* rdf_parser;
 | |
| 
 | |
|   rdf_parser = raptor_new_parser("rdfxml");
 | |
| </programlisting>
 | |
| or the name can be discovered from an <emphasis>enumeration</emphasis>
 | |
| as discussed in <link linkend="tutorial-querying-functionality">Querying Functionality</link>
 | |
| </para>
 | |
| 
 | |
| <para>The parser can also be created by identifying the syntax by a
 | |
| URI, specifying the syntax by a MIME Type, providng an identifier for
 | |
| the content such as filename or URI string or giving some initial
 | |
| content bytes that can be used to guess.
 | |
| Using the
 | |
| <link linkend="raptor-new-parser-for-content"><function>raptor_new_parser_for_content()</function></link>
 | |
| function, all of these can be given as optional parameters, using NULL
 | |
| or 0 for undefined parameters.  The constructor will then use as much of
 | |
| this information as possible.
 | |
| </para>
 | |
| <programlisting>
 | |
|   raptor_parser* rdf_parser;
 | |
| </programlisting>
 | |
| 
 | |
| <para>Create a parser that reads the MIME Type for RDF/XML
 | |
| <literal>application/rdf+xml</literal>
 | |
| <programlisting>
 | |
|   rdf_parser = raptor_new_parser_for_content(NULL, "application/rdf+xml", NULL, 0, NULL);
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>Create a parser that can read a syntax identified by the URI
 | |
| for Turtle <literal>http://www.dajobe.org/2004/01/turtle/</literal>,
 | |
| which has no registered MIME Type at this date:
 | |
| <programlisting>
 | |
|   syntax_uri = raptor_new_uri("http://www.dajobe.org/2004/01/turtle/");
 | |
|   rdf_parser = raptor_new_parser_for_content(syntax_uri, NULL, NULL, 0, NULL);
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>Create a parser that recognises the identifier <literal>foo.rss</literal>:
 | |
| <programlisting>
 | |
|   rdf_parser = raptor_new_parser_for_content(NULL, NULL, NULL, 0, "foo.rss");
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>Create a parser that recognises the content in <emphasis>buffer</emphasis>:
 | |
| <programlisting>
 | |
|   rdf_parser = raptor_new_parser_for_content(NULL, NULL, buffer, len, NULL);
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>Any of the constructor calls can return NULL if no matching
 | |
| parser could be found, or the construction failed in another way.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-features">
 | |
| <title>Parser features</title>
 | |
| 
 | |
| <para>There are several options that can be set on parsers, called
 | |
| <emphasis>features</emphasis>.  The exact list of features can be
 | |
| found via
 | |
| <link linkend="tutorial-querying-functionality">Querying Functionality</link>
 | |
| or in the API reference for 
 | |
| <link linkend="raptor-set-feature"><function>raptor_set_feature()</function></link>.  (This should be properly called <function>raptor_parser_set_feature()</function> as
 | |
| it only applies to <literal>raptor_parser</literal> objects).
 | |
| </para>
 | |
| 
 | |
| <para>Features are integer enumerations of the
 | |
| <link linkend="raptor-feature"><type>raptor_feature</type></link> enum and have values
 | |
| that are either integers (often acting as booleans) or strings.
 | |
| The two functions that set features are:
 | |
| <programlisting>
 | |
|   /* Set an integer (or boolean) valued feature */
 | |
|   raptor_set_feature(rdf_parser, feature, 1);
 | |
| 
 | |
|   /* Set a string valued feature */
 | |
|   raptor_set_feature_string(rdf_parser, feature, "abc");
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| There are also two corresponding functions for reading the values of parser
 | |
| features:
 | |
| <link linkend="raptor-get-feature"><function>raptor_get_feature()</function></link>
 | |
| and
 | |
| <link linkend="raptor-get-feature-string"><function>raptor_get_feature_string()</function></link>
 | |
| taken the feature enumeration parameter and returning the integer or string
 | |
| value correspondingly.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-set-triple-handler">
 | |
| <title>Set RDF triple callback handler</title>
 | |
| 
 | |
| <para>The main reason to parse a syntax is to get RDF triples
 | |
| returned and this is done by a callback function which is called
 | |
| with parameters of a user data pointer and the triple itself.
 | |
| The handler is set with
 | |
| <link linkend="raptor-set-statement-handler"><function>raptor_set_statement_handler()</function></link>
 | |
| as follows:
 | |
| <programlisting>
 | |
|   void
 | |
|   triples_handler(void* user_data, const raptor_statement* triple) 
 | |
|   {
 | |
|     /* do something with the triple */
 | |
|   }
 | |
| 
 | |
|   raptor_set_statement_handler(rdf_parser, user_data, triples_handler);
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>It is optional to set a handler function for triples, which does
 | |
| have some uses if just counting triples or validating a syntax.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-set-error-warning-handlers">
 | |
| <title>Set fatal error, error and warning handlers</title>
 | |
| 
 | |
| <para>There are several other callback handlers that can be set
 | |
| on parsers.  These can be set any time before parsing is called.
 | |
| Errors and warnings from parsing can be returned with functions
 | |
| that all take a callback of type
 | |
| <link linkend="raptor-message-handler"><type>raptor_message_handler</type></link>
 | |
| and signature:
 | |
| <programlisting>
 | |
| void
 | |
| message_handler(void *user_data, raptor_locator* locator, 
 | |
|                 const char *message)
 | |
| {
 | |
|   /* do something with the message */
 | |
| }
 | |
| </programlisting>
 | |
| returning the user data given, associated location information
 | |
| as a <link linkend="raptor-locator"><type>raptor_locator</type></link>
 | |
| and the error/warning message itself.  The <emphasis>locator</emphasis>
 | |
| structure contains full information on the details of where in the
 | |
| file or URI the message occurred.
 | |
| </para>
 | |
| 
 | |
| <para>The fatal error, error and warning handlers are all set with
 | |
| similar functions that take a handler as follows:
 | |
| <programlisting>
 | |
|   raptor_set_fatal_error_handler(rdf_parser, user_data, fatal_handler);
 | |
| 
 | |
|   raptor_set_error_handler(rdf_parser, user_data, error_handler);
 | |
| 
 | |
|   raptor_set_warning_handler(rdf_parser, user_data, warning_handler);
 | |
| </programlisting>
 | |
| <caution>The program will terminate
 | |
| with <function>abort()</function> if the fatal error handler returns.
 | |
| </caution>
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-set-id-handler">
 | |
| <title>Set the identifier creator handler</title>
 | |
| 
 | |
| <para>Identifiers are created in some parsers by generating them
 | |
| automatically or via hints given a syntax.  Raptor can customise this
 | |
| process using a user-supplied identifier handler function.
 | |
| For example, in RDF/XML generated blank node identifiers and those
 | |
| those specified <literal>rdf:nodeID</literal> are passed through this
 | |
| process.  Setting a handler allows the identifier generation mechanism to be
 | |
| fully replaced.  A lighter alternative is to use
 | |
| <link linkend="raptor-set-default-generate-id-parameters"><function>raptor_set_default_generate_id_parameters()</function></link>
 | |
| to adjust the default algorithm for generated identifiers.
 | |
| </para>
 | |
| 
 | |
| <para>It is used as follows
 | |
| <programlisting>
 | |
|   raptor_generate_id_handler id_handler;
 | |
| 
 | |
|   raptor_set_generate_id_handler(rdf_parser, user_data, id_handler);
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>The <emphasis>id_handler</emphasis> takes the following signature:
 | |
| <programlisting>
 | |
| unsigned char*
 | |
| generate_id_handler(void* user_data, raptor_genid_type type,
 | |
|                     unsigned char* user_id) {
 | |
|    /* return a new generated ID based on user_id (optional) */
 | |
| }
 | |
| </programlisting>
 | |
| where the
 | |
| <link linkend="raptor-genid-type"><type>raptor_genid_type</type></link>
 | |
| provides extra information on the identifier being created and
 | |
| <emphasis>user_id</emphasis> an optional user-supplied identifier,
 | |
| such as the value of a <literal>rdf:nodeID</literal> in RDF/XML.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-set-namespace-handler">
 | |
| <title>Set namespace declared handler</title>
 | |
| 
 | |
| <para>Raptor can report when namespace prefix/URIs are declared in
 | |
| during parsing a syntax such as those in XML, RDF/XML or Turtle.
 | |
| A handler function can be set to receive these declarations using
 | |
| the namespace handler method.
 | |
| <programlisting>
 | |
|   raptor_namespace_handler namespaces_handler;
 | |
| 
 | |
|   raptor_set_namespace_handler(rdf_parser, user_data, namespaces_handler);
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>The <emphasis>namespaces_handler</emphasis> takes the following signature:
 | |
| <programlisting>
 | |
| void
 | |
| namespaces_handler(void* user_data, raptor_namespace *nspace) {
 | |
|   /*  */
 | |
| }
 | |
| </programlisting>
 | |
| <note>This may be called multiple times with the same namespace,
 | |
| if the namespace is declared inside different XML sub-trees.
 | |
| </note>
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parse-strictness">
 | |
| <title>Set the parsing strictness</title>
 | |
| <para>
 | |
| <link linkend="raptor-set-parser-strict"><function>raptor_set_parser_strict()</function></link>
 | |
| allows setting of the parser strictness flag.  The default is lax parsing,
 | |
| accepting older or deprecated syntax forms but may generate a warning. Setting
 | |
| to non-0 (true) will cause parser errors to be generated in these cases.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-content">
 | |
| <title>Provide syntax content to parse</title>
 | |
| 
 | |
| <para>The operation of turning syntax into RDF triples has several
 | |
| alternatives from functions that do most of the work starting from a
 | |
| URI to functions that allow passing in data buffers.</para>
 | |
| 
 | |
| <note>
 | |
| <title>Parsing and MIME Types</title> 
 | |
| The mime type of the retrieved content is not used to choose
 | |
| a parser unless the parser is of type <literal>guess</literal>.
 | |
| The guess parser will send an <literal>Accept:</literal> header
 | |
| for all known parser syntax mime types (if a URI request is made)
 | |
| and based on the response, including the identifiers used,
 | |
| pick the appropriate parser to execute.  See
 | |
| <link linkend="raptor-guess-parser-name"><function>raptor_guess_parser_name()</function></link>
 | |
| for a full discussion of the inputs to the guessing.
 | |
| </note>
 | |
| 
 | |
| 
 | |
| <section id="parse-from-uri">
 | |
| <title>Parse the content from a URI (<link linkend="raptor-parse-uri"><function>raptor_parse_uri()</function></link>)</title>
 | |
| 
 | |
| <para>The URI is resolved and the content read from it and passed to
 | |
| the parser:
 | |
| <programlisting>
 | |
|   raptor_parse_uri(rdf_parser, uri, base_uri);
 | |
| </programlisting>
 | |
| The <emphasis>base_uri</emphasis> is optional (can be
 | |
| <literal>NULL</literal>) and will default to the
 | |
| <emphasis>uri</emphasis>.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parse-from-www">
 | |
| <title>Parse the content of a URI using an existing WWW connection (<link linkend="raptor-parse-uri-with-connection"><function>raptor_parse_uri_with_connection()</function></link>)</title>
 | |
| 
 | |
| <para>The URI is resolved using an existing WWW connection (for
 | |
| example a libcurl CURL handle) to allow for any existing
 | |
| WWW configuration to be reused.  See
 | |
| <link linkend="raptor-www-new-with-connection"><function>raptor_www_new_with_connection</function></link>
 | |
| for full details of how this works.   The content is then read from the
 | |
| result of resolving the URI:
 | |
| <programlisting>
 | |
|   raptor_parse_uri_with_connection(rdf_parser, uri, base_uri, connection);
 | |
| </programlisting>
 | |
| The <emphasis>base_uri</emphasis> is optional (can be
 | |
| <literal>NULL</literal>) and will default to the
 | |
| <emphasis>uri</emphasis>.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parse-from-filehandle">
 | |
| <title>Parse the content of a C <literal>FILE*</literal> (<link linkend="raptor-parse-file-stream"><function>raptor_parse_file_stream()</function></link>)</title>
 | |
| 
 | |
| <para>Parsing can read from a C STDIO file handle:
 | |
| <programlisting>
 | |
|   stream=fopen(filename, "rb");
 | |
|   raptor_parse_file_stream(rdf_parser, stream, filename, base_uri);
 | |
|   fclose(stream);
 | |
| </programlisting>
 | |
| This function can use take an optional <emphasis>filename</emphasis> which
 | |
| is used in locator error messages.
 | |
| The <emphasis>base_uri</emphasis> may be required by some parsers
 | |
| and if <literal>NULL</literal> will cause the parsing to fail.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parse-from-file-uri">
 | |
| <title>Parse the content of a file URI (<link linkend="raptor-parse-file"><function>raptor_parse_file()</function></link>)</title>
 | |
| 
 | |
| <para>Parsing can read from a URI known to be a <literal>file:</literal> URI:
 | |
| <programlisting>
 | |
|   raptor_parse_file(rdf_parser, file_uri, base_uri);
 | |
| </programlisting>
 | |
| This function requires that the <emphasis>file_uri</emphasis> is
 | |
| a file URI, that is 
 | |
| <literal>raptor_uri_uri_string_is_file_uri( raptor_uri_as_string( file_uri) )</literal>
 | |
| must be true.
 | |
| The <emphasis>base_uri</emphasis> may be required by some parsers
 | |
| and if <literal>NULL</literal> will cause the parsing to fail.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="parse-from-chunks">
 | |
| <title>Parse chunks of syntax content provided by the application  (<link linkend="raptor-start-parse"><function>raptor_start_parse()</function></link> and <link linkend="raptor-parse-chunk"><function>raptor_parse_chunk()</function></link>)</title>
 | |
| 
 | |
| <para>
 | |
| <programlisting>
 | |
|   raptor_start_parse(rdf_parser, base_uri);
 | |
|   while(/* not finished getting content */) {
 | |
|     unsigned char *buffer;
 | |
|     size_t buffer_len;
 | |
|     /* obtain some syntax content in buffer of size buffer_len bytes */
 | |
|     raptor_parse_chunk(rdf_parser, buffer, buffer_len, 0);
 | |
|   }
 | |
|   raptor_parse_chunk(rdf_parser, NULL, 0, 1); /* no data and is_end = 1 */
 | |
| </programlisting>
 | |
| The <emphasis>base_uri</emphasis> argument to 
 | |
| <link linkend="raptor-start-parse"><function>raptor_start_parse()</function></link>
 | |
| may be required by some parsers
 | |
| and if <literal>NULL</literal> will cause the parsing to fail.
 | |
| </para>
 | |
| 
 | |
| <para>On the last
 | |
| <link linkend="raptor-parse-chunk"><function>raptor_parse_chunk()</function></link>
 | |
| call, or after the loop is ended, the <literal>is_end</literal>
 | |
| parameter must be set to non-0.  Content can be passed with the
 | |
| final call.  If no content is present at the end (such as in
 | |
| some kind of <quote>end of file</quote> situation), then a 0-length
 | |
| buffer_len or NULL buffer can be used.</para>
 | |
| 
 | |
| <para>The minimal case is an entire parse in one chunk as follows:</para>
 | |
| <programlisting>
 | |
|   raptor_start_parse(rdf_parser, base_uri);
 | |
|   raptor_parse_chunk(rdf_parser, buffer, buffer_len, 1); /* is_end = 1 */
 | |
| </programlisting>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="restrict-parser-network-access">
 | |
| <title>Restrict parser network access</title>
 | |
| 
 | |
| <para>
 | |
| Parsing can cause network requests to be performed, especially
 | |
| if a URI is given as an argument such as with
 | |
| <link linkend="raptor-parse-uri"><function>raptor_parse_uri()</function></link>
 | |
| however there may also be indirect requests such as with the
 | |
| GRDDL parser that retrieves URIs depending on the results of
 | |
| initial parse requests.  The URIs requested may not be wanted
 | |
| to be fetched or need to be filtered, and this can be done in
 | |
| three ways.
 | |
| </para>
 | |
| 
 | |
| <section id="tutorial-filter-network-with-feature">
 | |
| <title>Filtering parser network requests with feature <link linkend="RAPTOR-FEATURE-NO-NET:CAPS"><literal>RAPTOR_FEATURE_NO_NET</literal></link></title>
 | |
| <para>
 | |
| The parser feature 
 | |
| <link linkend="RAPTOR-FEATURE-NO-NET:CAPS"><literal>RAPTOR_FEATURE_NO_NET</literal></link>
 | |
| can be set with
 | |
| <link linkend="raptor-set-feature"><function>raptor_set_feature()</function></link>
 | |
| and forbids all network requests.  There is no customisation with
 | |
| this approach, for that see the URI filter in the next section.
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
|   rdf_parser = raptor_new_parser("rdfxml");
 | |
| 
 | |
|   /* Disable internal network requests */
 | |
|   raptor_set_feature(rdf_parser, RAPTOR_FEATURE_NO_NET, 1);
 | |
| </programlisting>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-filter-network-www-uri-filter">
 | |
| <title>Filtering parser network requests with <link linkend="raptor-www-set-uri-filter"><function>raptor_www_set_uri_filter()</function></link></title>
 | |
| <para>
 | |
| The
 | |
| <link linkend="raptor-www-set-uri-filter"><function>raptor_www_set_uri_filter()</function></link>
 | |
| 
 | |
| allows setting of a filtering function to operate on all URIs
 | |
| retrieved by a WWW connection.  This connection can be used in
 | |
| parsing when operated by hand.
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
| void write_bytes_handler(raptor_www* www, void *user_data, 
 | |
|                          const void *ptr, size_t size, size_t nmemb) {
 | |
| {
 | |
|   raptor_parser* rdf_parser=(raptor_parser*)user_data;
 | |
|   raptor_parse_chunk(rdf_parser, (unsigned char*)ptr, size*nmemb, 0);
 | |
| }
 | |
| 
 | |
| int uri_filter(void* filter_user_data, raptor_uri* uri) {
 | |
|   /* return non-0 to forbid the request */
 | |
| }
 | |
| 
 | |
| int main(int argc, char *argv[]) { 
 | |
|   ...
 | |
| 
 | |
|   rdf_parser = raptor_new_parser("rdfxml");
 | |
|   www = raptor_new_www();
 | |
| 
 | |
|   /* filter all URI requests */
 | |
|   raptor_www_set_uri_filter(www, uri_filter, filter_user_data);
 | |
| 
 | |
|   /* make WWW write bytes to parser */
 | |
|   raptor_www_set_write_bytes_handler(www, write_bytes_handler, rdf_parser);
 | |
| 
 | |
|   raptor_start_parse(rdf_parser, uri);
 | |
|   raptor_www_fetch(www, uri);
 | |
|   /* tell the parser that we are done */
 | |
|   raptor_parse_chunk(rdf_parser, NULL, 0, 1);
 | |
| 
 | |
|   raptor_www_free(www);
 | |
|   raptor_free_parser(rdf_parser);
 | |
| 
 | |
|   ...
 | |
| }
 | |
| 
 | |
| </programlisting>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-filter-network-parser-uri-filter">
 | |
| <title>Filtering parser network requests with <link linkend="raptor-parser-set-uri-filter"><function>raptor_parser_set_uri_filter()</function></link></title>
 | |
| 
 | |
| <para>
 | |
| The
 | |
| <link linkend="raptor-parser-set-uri-filter"><function>raptor_parser_set_uri_filter()</function></link>
 | |
| allows setting of a filtering function to operate on all URIs that
 | |
| the parser sees.  This operates on the internal raptor_www object
 | |
| used inside parsing to retrieve URIs, similar to that described in
 | |
| the <link linkend="tutorial-filter-network-www-uri-filter">previous section</link>.
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
|   int uri_filter(void* filter_user_data, raptor_uri* uri) {
 | |
|     /* return non-0 to forbid the request */
 | |
|   }
 | |
| 
 | |
|   rdf_parser = raptor_new_parser("rdfxml");
 | |
|   raptor_parser_set_uri_filter(rdf_parser, uri_filter, filter_user_data);
 | |
| 
 | |
|   /* parse content as normal */
 | |
|   raptor_parse_uri(rdf_parser, uri, base_uri);
 | |
| </programlisting>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-filter-network-parser-timeout">
 | |
| <title>Setting timeout for parser network requests with feature <link linkend="RAPTOR-FEATURE-WWW-TIMEOUT:CAPS"><literal>RAPTOR_FEATURE_WWW_TIMEOUT</literal></link></title>
 | |
| 
 | |
| <para>If the value of feature
 | |
| <link linkend="RAPTOR-FEATURE-WWW-TIMEOUT:CAPS"><literal>RAPTOR_FEATURE_WWW_TIMEOUT</literal></link>
 | |
| if set to a number >0, it is used as the timeout in seconds
 | |
| for retrieving of URIs during parsing (primarily for GRDDL).
 | |
| This uses
 | |
| <link linkend="raptor-www-set-connection-timeout"><function>raptor_www_set_connection_timeout()</function></link>
 | |
| internally.
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
|   rdf_parser = raptor_new_parser("grddl");
 | |
| 
 | |
|   /* set internal URI retrieval maximum time to 5 seconds */
 | |
|   raptor_set_feature(rdf_parser, RAPTOR_FEATURE_WWW_TIMEOUT , 5);
 | |
| </programlisting>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-static-info">
 | |
| <title>Querying parser static information</title>
 | |
| 
 | |
| <para>
 | |
| These methods return information about the constructed parser
 | |
| implementation corresponding to the information available
 | |
| via <link linkend="raptor-syntaxes-enumerate"><function>raptor_syntaxes_enumerate()</function></link>
 | |
| for all parsers.
 | |
| </para>
 | |
| 
 | |
| <para><link linkend="raptor-get-name"><function>raptor_get_name()</function></link> return the parser syntax name,
 | |
| <link linkend="raptor-get-label"><function>raptor_get_label()</function></link>
 | |
| the long label for the parser and
 | |
| <link linkend="raptor-get-mime-type"><function>raptor_get_mime_type()</function></link>
 | |
| the primary MIME Type for the parser (there may be others that the parser
 | |
| will accept but this is the main one).
 | |
| </para>
 | |
| 
 | |
| <para><link linkend="raptor-parser-get-accept-header"><function>raptor_parser_get_accept_header()</function></link>
 | |
| returns a string that would be sent in an HTTP
 | |
| request <code>Accept:</code> header for the syntaxes accepted by this
 | |
| parser only. 
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-runtime-info">
 | |
| <title>Querying parser run-time information</title>
 | |
| 
 | |
| <para>
 | |
| <link linkend="raptor-get-locator"><function>raptor_get_locator()</function></link>
 | |
| returns the <link linkend="raptor-locator"><type>raptor_locator</type></link>
 | |
| for the current position in the input stream.  The <emphasis>locator</emphasis>
 | |
| structure contains full information on the details of where in the
 | |
| file or URI the current parser has reached.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-abort">
 | |
| <title>Aborting parsing</title>
 | |
| 
 | |
| <para>
 | |
| <link linkend="raptor-parse-abort"><function>raptor_parse_abort()</function></link>
 | |
| allows the current parsing to be aborted, at which point no further
 | |
| triples will be passed to callbacks and the parser will attempt to
 | |
| return control to the application.  This is most useful when called
 | |
| inside a handler function which allows the application to decide to stop
 | |
| an active parsing.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-destroy">
 | |
| <title>Destroy the parser</title>
 | |
| 
 | |
| <para>
 | |
| To tidy up, delete the parser object as follows: 
 | |
| <programlisting>
 | |
|   raptor_free_parser(rdf_parser);
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| 
 | |
| <section id="tutorial-parser-example">
 | |
| <title>Parsing example code</title>
 | |
| 
 | |
| <example id="raptor-example-rdfprint">
 | |
| <title><filename>rdfprint.c</filename>: Parse an RDF/XML file and print the triples</title>
 | |
| <programlisting>
 | |
| <xi:include href="rdfprint.c" parse="text"/>
 | |
| </programlisting>
 | |
| 
 | |
| <para>Compile it like this:
 | |
| <screen>
 | |
| $ gcc -o rdfprint rdfprint.c `raptor-config --cflags` `raptor-config --libs`
 | |
| </screen>
 | |
| and run it on an RDF file as:
 | |
| <screen>
 | |
| $ ./rdfprint raptor.rdf
 | |
| _:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://usefulinc.com/ns/doap#Project> .
 | |
| _:genid1 <http://usefulinc.com/ns/doap#name> "Raptor" .
 | |
| _:genid1 <http://usefulinc.com/ns/doap#homepage> <http://librdf.org/raptor/> .
 | |
| ...
 | |
| </screen>
 | |
| </para>
 | |
| 
 | |
| </example>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </chapter>
 | |
| 
 | |
| 
 | |
| <!--
 | |
| Local variables:
 | |
| mode: sgml
 | |
| sgml-parent-document: ("raptor-docs.xml" "book" "part")
 | |
| End:
 | |
| -->
 |