RGui -> Help -> R functions(text)
htmlParse(file, ignoreBlanks = TRUE, handlers = NULL, replaceEntities = FALSE, asText = FALSE, trim = TRUE, validate = FALSE, getDTD = TRUE, isURL = FALSE, asTree = FALSE, addAttributeNamespaces = FALSE, useInternalNodes = TRUE, isSchema = FALSE, fullNamespaceInfo = FALSE, encoding = character(), useDotNames = length(grep("^\\.", names(handlers))) > 0, xinclude = TRUE, addFinalizer = TRUE, error = htmlErrorHandler, isHTML = TRUE, options = integer(), parentFirst = FALSE) xmlSchemaParse(file, asText = FALSE, xinclude = TRUE, error = xmlErrorCumulator())
The name of the file containing the XML contents. This can contain \~ which is expanded to the user’s home directory. It can also be a URL. See
logical value indicating whether text elements made up entirely of white space should be included in the resulting ‘tree’.
Optional collection of functions used to map the different XML nodes to R objects. Typically, this is a named list of functions, and a closure can be used to provide local data. This provides a way of filtering the tree as it is being created in R, adding or removing nodes, and generally processing them as they are constructed in the C code.
In a recent addition to the package (version 0.99-8), if this is specified as a single function object, we call that function for each node (of any type) in the underlying DOM tree. It is invoked with the new node and its parent node. This applies to regular nodes and also comments, processing instructions, CDATA nodes, etc. So this function must be sufficiently general to handle them all.
logical value indicating whether to substitute entity references with their text directly. This should be left as False. The text still appears as the value of the node, but there is more information about its source, allowing the parse to be reversed with full reference information.
logical value indicating that the first argument, ‘file’, should be treated as the XML text to parse, not the name of a file. This allows the contents of documents to be retrieved from different sources (e.g. HTTP servers, XML-RPC, etc.) and still use this parser.
whether to strip white space from the beginning and end of text strings.
logical indicating whether to use a validating parser or not, or in other words check the contents against the DTD specification. If this is true, warning messages will be displayed about errors in the DTD and/or document, but the parsing will proceed except for the presence of terminal errors. This is ignored when parsing an HTML document.
logical flag indicating whether the DTD (both internal and external) should be returned along with the document nodes. This changes the return type. This is ignored when parsing an HTML document.
indicates whether the
this only applies when on passes a value for the
a logical value indicating whether to return the namespace in the names of the attributes within a node or to omit them. If this is
a logical value indicating whether to call the converter functions with objects of class
If this argument is
This is ignored when parsing an HTML document.
a logical value indicating whether the document is an XML schema (
a logical value indicating whether to provide the namespace URI and prefix on each node or just the prefix. The latter (
This is ignored when parsing an HTML document.
a character string (scalar) giving the encoding for the document. This is optional as the document should contain its own encoding information. However, if it doesn’t, the caller can specify this for the parser. If the XML/HTML document does specify its own encoding that value is used regardless of any value specified by the caller. (That’s just the way it goes!) So this is to be used as a safety net in case the document does not have an encoding and the caller happens to know theactual encoding.
a logical value indicating whether to use the newer format for identifying general element function handlers with the ‘.’ prefix, e.g. .text, .comment, .startElement. If this is
a logical value indicating whether to process nodes of the form
a logical value indicating whether the default finalizer routine should be registered to free the internal xmlDoc when R no longer has a reference to this external pointer object. This is only relevant when
a function that is invoked when the XML parser reports an error. When an error is encountered, this is called with 7 arguments. See
If parsing completes and no document is generated, this function is called again with only argument which is a character vector of length 0. This gives the function an opportunity to report all the errors and raise an exception rather than doing this when it sees th first one.
This function can do what it likes with the information. It can raise an R error or let parser continue and potentially find further errors.
The default value of this argument supplies a function that cumulates the errors
If this is
a logical value that allows this function to be used for parsing HTML documents. This causes validation and processing of a DTD to be turned off. This is currently experimental so that we can implement
an integer value or vector of values that are combined (OR’ed) together to specify options for the XML parser. This is the same as the
a logical value for use when we have handler functions and are traversing the tree. This controls whether we process the node before processing its children, or process the children before their parent node.