How to add "nofollow" attribute to all external links on MODx Evolution

Adding the "nofollow" attribute to some (or all) of external links is one of most commonly actions performed on websites. And all of us knows that when the count of links grows, it becomes not handy to track this process and do not miss few of them. Furthermore the site may have content added by users (comments, for example) where manual tracking becomes completely wasting of time.

To automate this process, we will write a plugin which will be adding rel="nofollow" attribute to all external links before output the page to a user. Moreover we will add the functionality to skip links to a domains which are included to white list.

First of all, create a new plugin and call it say "NoFollow". Next go to the "System Events" tab and check the OnWebPagePrerender event. This event is good for our case because on this stage we can use $modx->documentOutput property which contains already parsed page output.

Let's write initial code. What the plugin should do is find all occures of <a> tag and extend it with rel="nofollow" attribute. It can be done by utilizing power of regular expressions.

if ($modx->event->name == 'OnWebPagePrerender')
{
    $content = $modx->documentOutput;

    // Collect all link tags on the page to $matches array
    preg_match_all(
        "/<a [^>]*?href=[\"\'](.*?)[\"\'][^>]*>.*?<\/a>/im",
        $content,
        $matches
    );

    if (!empty($matches[0]))
    {
        foreach ($matches[0] as $key => $tag) {
            // Add "nofollow" attribute to the beginning of a link tag
            // and replace occurrences of old tag with new one
            $new_tag = preg_replace("/^<a/i", '<a rel="nofollow"', $tag);
            $content = str_replace($tag, $new_tag, $content);
        }

        // Set the new content to document output
        $modx->documentOutput = $content;
    }
}

Save this plugin and try to load a page which contain any links. If you look at the page source code, you should notice that rel="nofollow" attribute is added to any of <a> tags.

This is the initial page code which I has prepared for the test:

<a href="http://example.com">External link</a>
<a href="page.html">Internal link 1</a>
<a href="http://deniskomlev.com/page.html">Internal link 2</a>
<a href="http://domain-1.com">Friendly site 1</a>
<a href="https://www.domain-2.org/contacts.html">Friendly site 2</a>
<a rel="nofollow" href="http://domain-3.info">Already nofollowed</a>

After refreshing the page with plugin enabled, the code has been transformed to following:

<a rel="nofollow" href="http://example.com">External link</a>
<a rel="nofollow" href="page.html">Internal link 1</a>
<a rel="nofollow" href="http://deniskomlev.com/page.html">Internal link 2</a>
<a rel="nofollow" href="http://domain-1.com">Friendly site 1</a>
<a rel="nofollow" href="https://www.domain-2.org/contacts.html">Friendly site 2</a>
<a rel="nofollow" rel="nofollow" href="http://domain-3.info">Already nofollowed</a>

It is partialy good, but you may notice that changes were applied to all of the links, including the ones which are related to internal pages and which has already have the nofollow attribute. Yes, our code is very basic for this time and it not doing any additional analyse of a links. It just adds rel="nofollow" to all occurences of a link tags, and nothing more. Well, it's time to fix that.

Let's teach our plugin to follow these rules:

  • skip the non-external links which are not forvarding to external addresses;
  • skip the links which already contain the nofollow attribute;
  • skip the links linked to a domains included to a white list.

For the last rule we have to add the plugin parameter and pass comma-separated list of domains which is included to our white list. Go to "Configuration" tab of a plugin editing screen and add following code to "Plugin configuration" text box:

&whitelist=Friendly domains;text;

Press the "Update parameter display" button and type a comma-separated list of friendly domains to a text box appeared right after the new "Friendly domains" parameter. Don't add the "www" to a domain.

Plugin configuration

After that update plugin code with the following:

if ($modx->event->name == 'OnWebPagePrerender')
{
    $content = $modx->documentOutput;

    // Collect all link tags on the page to $matches array
    preg_match_all(
        "/<a [^>]*?href=[\"\'](.*?)[\"\'][^>]*>.*?<\/a>/im",
        $content,
        $matches
    );

    if (!empty($matches[0]))
    {
        // Get the list of friendly domains as array
        if (!empty($whitelist)) {
            $whitelist = explode(',', str_replace(' ', '', $whitelist));
        }
        else {
            $whitelist = array();
        }

        // Add own domain to whitelist
        $site_url = parse_url($modx->config['site_url']);
        if (isset($site_url['host'])) { $whitelist[] = $site_url['host']; }

        foreach ($matches[0] as $key => $tag) {
            // Get and parse the value of "href" attribute
            $href = trim($matches[1][$key]);
            $url_info = parse_url($href);

            // Skip non-external links (if the link destination is not
            // beginning with "http://" or "https://")
            if (!isset($url_info['scheme']) ||
                !in_array($url_info['scheme'], array('http', 'https'))) {
                continue;
            }

            // Skip already nofollowed links (if the link tag has occurence
            // of rel attribute with "nofollow" value)
            if (preg_match("/ rel=[\"\']nofollow[\"\']/i", $tag)) {
                continue;
            }

            // Skip the domains included to white list (regardless to "www")
            $domain = preg_replace("/^www\./i", '', $url_info['host']);
            if (in_array($domain, $whitelist)) {
                continue;
            }

            // Add "nofollow" attribute to the beginning of a link tag
            // and replace occurrences of old tag with new one
            $new_tag = preg_replace("/^<a/i", '<a rel="nofollow"', $tag);
            $content = str_replace($tag, $new_tag, $content);
        }

        // Set the new content to document output
        $modx->documentOutput = $content;
    }
}

And let's refresh the test page and see the output. In my case it looks like this:

<a rel="nofollow" href="http://example.com">External link</a>
<a href="page.html">Internal link 1</a>
<a href="http://deniskomlev.com/page.html">Internal link 2</a>
<a href="http://domain-1.com">Friendly site 1</a>
<a href="https://www.domain-2.org/contacts.html">Friendly site 2</a>
<a rel="nofollow" href="http://domain-3.info">Already nofollowed</a>

As you can see, the nofollow attribute has been added only to a first link. Why? Because second and third ones links to internal page, and the domain of our site was automatically added to white list; fourth and fifth ones are linked to a domains which were manually included to a white list; and the last one already has nofollow attribute. So now the plugin works exactly as expected.

However you should keep in mind that nofollow attribute is not giving a guarantee that links will not be followed and they will not influence the ranking of your site in search engines. Another way to close links from indexation is to forward them to especially designed internal page which accepts external site url in query parameter and performs redirect to this site. How to do this, we will cover in one of next articles.

Browse on GitHub

By on
If you want to repost this article, please keep the source hyperlink