class Crawler implements Countable, IteratorAggregate
Crawler eases navigation of a list of \DOMNode objects.
protected | $uri |
__construct(mixed $node = null, string $uri = null, string $baseHref = null) | ||
string | getUri() Returns the current URI. | |
string | getBaseHref() Returns base href. | |
clear() Removes all the nodes. | ||
add(DOMNodeList|DOMNode|array|string|null $node) Adds a node to the current list of nodes. | ||
addContent(string $content, string|null $type = null) Adds HTML/XML content. | ||
addHtmlContent(string $content, string $charset = 'UTF-8') Adds an HTML content to the list of nodes. | ||
addXmlContent(string $content, string $charset = 'UTF-8', int $options = LIBXML_NONET) Adds an XML content to the list of nodes. | ||
addDocument(DOMDocument $dom) Adds a \DOMDocument to the list of nodes. | ||
addNodeList(DOMNodeList $nodes) Adds a \DOMNodeList to the list of nodes. | ||
addNodes(array $nodes) Adds an array of \DOMNode instances to the list of nodes. | ||
addNode(DOMNode $node) Adds a \DOMNode instance to the list of nodes. | ||
Crawler | eq(int $position) Returns a node given its position in the node list. | |
array | each(Closure $closure) Calls an anonymous function on each node of the list. | |
Crawler | slice(int $offset = 0, int $length = null) Slices the list of nodes by $offset and $length. | |
Crawler | reduce(Closure $closure) Reduces the list of nodes by calling an anonymous function. | |
Crawler | first() Returns the first node of the current selection. | |
Crawler | last() Returns the last node of the current selection. | |
Crawler | siblings() Returns the siblings nodes of the current selection. | |
Crawler | nextAll() Returns the next siblings nodes of the current selection. | |
Crawler | previousAll() Returns the previous sibling nodes of the current selection. | |
Crawler | parents() Returns the parents nodes of the current selection. | |
Crawler | children() Returns the children nodes of the current selection. | |
string|null | attr(string $attribute) Returns the attribute value of the first node of the list. | |
string | nodeName() Returns the node name of the first node of the list. | |
string | text() Returns the node value of the first node of the list. | |
string | html() Returns the first node of the list as HTML. | |
array|Crawler | evaluate(string $xpath) Evaluates an XPath expression. | |
array | extract(array $attributes) Extracts information from the list of nodes. | |
Crawler | filterXPath(string $xpath) Filters the list of nodes with an XPath expression. | |
Crawler | filter(string $selector) Filters the list of nodes with a CSS selector. | |
Crawler | selectLink(string $value) Selects links by name or alt value for clickable images. | |
Crawler | selectImage(string $value) Selects images by alt value. | |
Crawler | selectButton(string $value) Selects a button by name or alt value for images. | |
Link | link(string $method = 'get') Returns a Link object for the first node in the list. | |
Link[] | links() Returns an array of Link objects for the nodes in the list. | |
Image | image() Returns an Image object for the first node in the list. | |
Image[] | images() Returns an array of Image objects for the nodes in the list. | |
Form | form(array $values = null, string $method = null) Returns a Form object for the first node in the list. | |
setDefaultNamespacePrefix(string $prefix) Overloads a default namespace prefix to be used with XPath and CSS expressions. | ||
registerNamespace(string $prefix, string $namespace) | ||
static string | xpathLiteral(string $s) Converts string for XPath expressions. | |
DOMElement|null | getNode(int $position) | |
int | count() | |
ArrayIterator|DOMElement[] | getIterator() | |
array | sibling(DOMElement $node, string $siblingDir = 'nextSibling') |
mixed | $node | A Node to use as the base for the crawling |
string | $uri | The current URI |
string | $baseHref | The base href value |
Returns the current URI.
string |
Returns base href.
string |
Removes all the nodes.
Adds a node to the current list of nodes.
This method uses the appropriate specialized add*() method based on the type of the argument.
DOMNodeList|DOMNode|array|string|null | $node | A node |
InvalidArgumentException | when node is not the expected type |
Adds HTML/XML content.
If the charset is not set via the content type, it is assumed to be UTF-8, or ISO-8859-1 as a fallback, which is the default charset defined by the HTTP 1.1 specification.
string | $content | A string to parse as HTML/XML |
string|null | $type | The content type of the string |
Adds an HTML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxml_use_internal_errors(true) and then, get the errors via libxml_get_errors(). Be sure to clear errors with libxml_clear_errors() afterward.
string | $content | The HTML content |
string | $charset | The charset |
Adds an XML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxml_use_internal_errors(true) and then, get the errors via libxml_get_errors(). Be sure to clear errors with libxml_clear_errors() afterward.
string | $content | The XML content |
string | $charset | The charset |
int | $options | Bitwise OR of the libxml option constants LIBXML_PARSEHUGE is dangerous, see http://symfony.com/blog/security-release-symfony-2-0-17-released |
Adds a \DOMDocument to the list of nodes.
DOMDocument | $dom | A \DOMDocument instance |
Adds a \DOMNodeList to the list of nodes.
DOMNodeList | $nodes | A \DOMNodeList instance |
Adds an array of \DOMNode instances to the list of nodes.
array | $nodes | An array of \DOMNode instances |
Adds a \DOMNode instance to the list of nodes.
DOMNode | $node | A \DOMNode instance |
Returns a node given its position in the node list.
int | $position | The position |
Crawler |
Calls an anonymous function on each node of the list.
The anonymous function receives the position and the node wrapped in a Crawler instance as arguments.
Example:
$crawler->filter('h1')->each(function ($node, $i) {
return $node->text();
});
Closure | $closure | An anonymous function |
array | An array of values returned by the anonymous function |
Slices the list of nodes by $offset and $length.
int | $offset | |
int | $length |
Crawler |
Reduces the list of nodes by calling an anonymous function.
To remove a node from the list, the anonymous function must return false.
Closure | $closure | An anonymous function |
Crawler |
Returns the first node of the current selection.
Crawler |
Returns the last node of the current selection.
Crawler |
Returns the siblings nodes of the current selection.
Crawler |
InvalidArgumentException | When current node is empty |
Returns the next siblings nodes of the current selection.
Crawler |
InvalidArgumentException | When current node is empty |
Returns the previous sibling nodes of the current selection.
Crawler |
InvalidArgumentException |
Returns the parents nodes of the current selection.
Crawler |
InvalidArgumentException | When current node is empty |
Returns the children nodes of the current selection.
Crawler |
InvalidArgumentException | When current node is empty |
Returns the attribute value of the first node of the list.
string | $attribute | The attribute name |
string|null | The attribute value or null if the attribute does not exist |
InvalidArgumentException | When current node is empty |
Returns the node name of the first node of the list.
string | The node name |
InvalidArgumentException | When current node is empty |
Returns the node value of the first node of the list.
string | The node value |
InvalidArgumentException | When current node is empty |
Returns the first node of the list as HTML.
string | The node html |
InvalidArgumentException | When current node is empty |
Evaluates an XPath expression.
Since an XPath expression might evaluate to either a simple type or a \DOMNodeList, this method will return either an array of simple types or a new Crawler instance.
string | $xpath | An XPath expression |
array|Crawler | An array of evaluation results or a new Crawler instance |
Extracts information from the list of nodes.
You can extract attributes or/and the node value (_text).
Example:
$crawler->filter('h1 a')->extract(array('_text', 'href'));
array | $attributes | An array of attributes |
array | An array of extracted values |
Filters the list of nodes with an XPath expression.
The XPath expression is evaluated in the context of the crawler, which is considered as a fake parent of the elements inside it. This means that a child selector "div" or "./div" will match only the div elements of the current crawler, not their children.
string | $xpath | An XPath expression |
Crawler |
Filters the list of nodes with a CSS selector.
This method only works if you have installed the CssSelector Symfony Component.
string | $selector | A CSS selector |
Crawler |
RuntimeException | if the CssSelector Component is not available |
Selects links by name or alt value for clickable images.
string | $value | The link text |
Crawler |
Selects images by alt value.
string | $value | The image alt |
Crawler | A new instance of Crawler with the filtered list of nodes |
Selects a button by name or alt value for images.
string | $value | The button text |
Crawler |
Returns a Link object for the first node in the list.
string | $method | The method for the link (get by default) |
Link | A Link instance |
InvalidArgumentException | If the current node list is empty or the selected node is not instance of DOMElement |
Returns an array of Link objects for the nodes in the list.
Link[] | An array of Link instances |
InvalidArgumentException | If the current node list contains non-DOMElement instances |
Returns an Image object for the first node in the list.
Image | An Image instance |
InvalidArgumentException | If the current node list is empty |
Returns an array of Image objects for the nodes in the list.
Image[] | An array of Image instances |
Returns a Form object for the first node in the list.
array | $values | An array of values for the form fields |
string | $method | The method for the form |
Form | A Form instance |
InvalidArgumentException | If the current node list is empty or the selected node is not instance of DOMElement |
Overloads a default namespace prefix to be used with XPath and CSS expressions.
string | $prefix |
string | $prefix | |
string | $namespace |
Converts string for XPath expressions.
Escaped characters are: quotes (") and apostrophe (').
Examples:
echo Crawler::xpathLiteral('foo " bar');
//prints 'foo " bar'
echo Crawler::xpathLiteral("foo ' bar");
//prints "foo ' bar"
echo Crawler::xpathLiteral('a\'b"c');
//prints concat('a', "'", 'b"c')
string | $s | String to be escaped |
string | Converted string |
int | $position |
DOMElement|null |
int |
ArrayIterator|DOMElement[] |
DOMElement | $node | |
string | $siblingDir |
array |
© 2004–2017 Fabien Potencier
Licensed under the MIT License.
https://api.symfony.com/4.1/Symfony/Component/DomCrawler/Crawler.html