search_simplify($text,$langcode= NULL)
Simplifies and preprocesses text for searching.
Processing steps:
string $text: Text to simplify.
string|null $langcode: Language code for the language of $text, if known.
string Simplified and processed text.
function search_simplify($text, $langcode = NULL) { // Decode entities to UTF-8 $text = Html::decodeEntities($text); // Lowercase $text = Unicode::strtolower($text); // Remove diacritics. $text = \Drupal::service('transliteration')->removeDiacritics($text); // Call an external processor for word handling. search_invoke_preprocess($text, $langcode); // Simple CJK handling if (\Drupal::config('search.settings')->get('index.overlap_cjk')) { $text = preg_replace_callback('/[' . PREG_CLASS_CJK . ']+/u', 'search_expand_cjk', $text); } // To improve searching for numerical data such as dates, IP addresses // or version numbers, we consider a group of numerical characters // separated only by punctuation characters to be one piece. // This also means that searching for e.g. '20/03/1984' also returns // results with '20-03-1984' in them. // Readable regexp: ([number]+)[punctuation]+(?=[number]) $text = preg_replace('/([' . PREG_CLASS_NUMBERS . ']+)[' . PREG_CLASS_PUNCTUATION . ']+(?=[' . PREG_CLASS_NUMBERS . '])/u', '\1', $text); // Multiple dot and dash groups are word boundaries and replaced with space. // No need to use the unicode modifier here because 0-127 ASCII characters // can't match higher UTF-8 characters as the leftmost bit of those are 1. $text = preg_replace('/[.-]{2,}/', ' ', $text); // The dot, underscore and dash are simply removed. This allows meaningful // search behavior with acronyms and URLs. See unicode note directly above. $text = preg_replace('/[._-]+/', '', $text); // With the exception of the rules above, we consider all punctuation, // marks, spacers, etc, to be a word boundary. $text = preg_replace('/[' . Unicode::PREG_CLASS_WORD_BOUNDARY . ']+/u', ' ', $text); // Truncate everything to 50 characters. $words = explode(' ', $text); array_walk($words, '_search_index_truncate'); $text = implode(' ', $words); return $text; }
© 2001–2016 by the original authors
Licensed under the GNU General Public License, version 2 and later.
Drupal is a registered trademark of Dries Buytaert.
https://api.drupal.org/api/drupal/core!modules!search!search.module/function/search_simplify/8.1.x