Backlink acquisition methodology

For SEO, backlinks are a historical topic. We often talk about the 3 pillars of SEO, mentioning authority in reference to backlinks. Today we are going to see that authority is not just that, and that improving it is not just a question of acquiring backlinks. However, choosing a backlink always has an impact, and there are better ways of going about it than just going for the most expensive link.

How can you optimize the effectiveness of your link acquisition?

1. Definition of authority

a. PageRank and its variants

PageRank is a formula that has seen many changes since it was first created and patented.

Initially, PageRank is a formula for sorting pages according to the incoming links they receive.

Each link contributes to the target’s PageRank, and its value is calculated with a portion of the source’s PageRank, reduced by the number of outgoing links on the source page.

Since then, other factors have been added to the calculation:

– The inverse of the distance between each page, calculated according to the embeddings * of each url: a notion of PageRank thematization comes into play.

(* embedding: semantic embedding: representation of words as vectors for automatic language processing, encoding the meaning of each word as a number. Thanks to embedding, we can calculate with the following words: Paris – France + Poland = Warsaw)

– A devaluation based on the results of an anti-spam algorithm (if the page does not pass the algorithm, it transmits nothing).

– The location of the link on the page: this is the reasonable random robot method: the more the link is in the first links of a page, the more PageRank it transmits.

By producing a PageRank metric that includes all these factors, we can achieve a metric that is consistent with the ranking.

A few months ago, Google’s internal API documentation was leaked. Analysis of this documentation revealed a new kind of PageRank, which is the latest version of the algorithm: PageRank NearestSeed (NS). This PageRank seems a little inspired by Yahoo!’s Trust, since it takes into account a pool of reference sites and implies a boost if you are mentioned by these sites, or at the right distance. (Forbes, for example, was one of these sites before it fell out of favor for misuse – a blatant example of human intervention: “If Forbes doesn’t play the game, we take it off the list”).

Be careful: if you base your assessment of a site’s authority solely on Trust, you’re shooting yourself in the foot, as you’re not taking into account all the factors that have an impact.

b. Factors affecting transmission

Let’s talk a little about the factors I mentioned earlier that affect transmission:

– Theming: undoubtedly one of the most complex factors to manipulate. Here we’re talking about trying to find embeddings that are close to our content… Or to bring our content closer to the embeddings of existing sources. It’s a bit of a complex game, because bringing two pieces of content together without completely distorting the individual message of each (make a distinction between semantic rapprochement and duplication, they’re not the same thing) is the work of a goldsmith, and you need to be accompanied by a good copywriter who understands that it’s not a question of “just” adding words.

– Link location on the page: The easiest factor to modify, move the link to the top of the page’s html code for maximum transmission and you’re done. A large number of outgoing links also reduces transmission. In the framework of the reasonable random robot, this notion of the number of outgoing links is implicitly integrated by a higher weight on the first links and decreasing until the last link. Beyond the 300th link, the transmission value is always almost zero. Google communicates a limit of 200 links per page, but you should remain reasonable (a hundred outgoing links is not bad).

– Antispam: The most difficult factor to modify. What defines spam from an engine’s point of view? Poverty of vocabulary in the terms used in the context of the content, repetition of n-grams on a page etc. From an algorithmic point of view, we have to rely on a list of urls recognized as spam to identify patterns that occur at similar levels to the training dataset. In this way, the practice of keyword stuffing, for example, is easily identified by an anti-spam algorithm.

– Neighborhood of important sites: This is undoubtedly the most complicated point: you need to be able to identify a site whose links are particularly effective (more than is reasonable).

c. Natural traffic

Still in the Google document leaks, and coupled with the information that can be recovered from the antitrust trial pitting Google against the US Department of Justice (DOJ), authority includes PageRank as well as traffic (in the Twiddler – a post-creation SERP adjustment – NavBoost which uses clicks) and the notion of natural visits to a URL (specifically via social networks). Another way of boosting the authority of your pages is therefore to also work on the natural traffic of your pages (through Community Management, optimizing SERP CTR or sending newsletters, for example).

However, while traffic tracking may come as a surprise, at least on this scale for SEOs, it doesn’t make backlink-based methods obsolete: all the criteria used for ranking are useful for improving positions, and to be the best positioned, you need to be the best in all the active criteria.

2. Find link sources

a. Identification

There are several ways to find potential sources of backlinks:

– Access a link seller’s catalog

– Produce your own sites

– Interact with sites you find on your own

We’re going to focus on the first and last elements here – producing your own sites is a different challenge, but if you take on board the recommendations given here, you may find some interesting suggestions for your site network. After all, the logic behind acquiring backlinks is pretty much the same for any site.

Suppose you get a link seller’s catalog: you’ll find lots of sites, usually listed under a human categorization, with lots of referral metrics. If you want to use these categories and reference metrics to filter the list and reduce your workload, you’re free to use them as you see fit.

Suppose you contact potential partners on your own, you are of course free to reconstruct a list of metrics by exploiting your favorite tools, or even to ask these would-be partners for some information, but really important metrics should be the following:

– Traffic (as we saw in the previous section, natural traffic is important). You can also couple this information with the possibility of relaying the article on social networks, and the number of followers the site claims.

– A PageRank metric: PageRank is transmitted to the url, but a notion of PageRank to the domain is indicated in the Leaks. Surprising, but helpful.

– A Trust metric: not to be taken as the actual value of a site’s urls, but rather as the last metric to help you choose between several possibilities. Caution: Trust here should enable you to bring your choices based on PageRank closer to what is hoped for via NearestSeed PageRank. Furthermore, some popular metrics indicated as “Trust” are not actually Trust, so it’s best to avoid taking Trust as the decisive metric.

I don’t recommend filtering the list by human categories, as this is the best way to miss out on great linking opportunities. Human categories make no sense from a search engine point of view.

b. Source filtering

Now that you’ve identified a list of potential sources, you can start filtering. Your interest here is to make sure that, in the field of possibilities, you do not waste time identifying the best ones.

Items to be filtered:

Maintaining outbound links: When you acquire backlinks, you buy a link that is placed on a page. Unless the site changes ownership, the links placed there have been paid for by a customer. You want to have a partner who will maintain the link, because losing the link means losing the associated authority. Similarly, check that outgoing links are not nofollow: nofollow does not give you authority.

Spam rate: Using metrics that take spam rate into account, we can identify sites that exceed the average web spam rate. In France, this rate is between 20% and 30%. Beyond that, you can forget about the site.

AI content: Without being a problem for any specific content, several pieces of content from generative tools would be clues to content produced quickly, with a minimum of involvement on the part of the salesperson. The key is to check whether the content delivers interesting information. Content that has no added value, AI or not, is content that may not perform well. It’s a filter that can be taken into account in the choice to maximize the chances of performance, but it’s not a sine qua non condition. A simple visit to the site and reading of the articles is enough to realize this.

Internal PageRank outbound rate: A site that makes its living from link sales will tend to make many more outbound links than normal. This increases its dilution rate, which averages between 3% and 10%. A site that exceeds this dilution rate is easily identified as a link-selling site, which falls within Google’s definition of spam. Its outbound links are likely to be of little value. To identify a site’s dilution rate, we need to crawl the site and calculate the distribution via PageRank outbound links. Since all the site’s outbound links contribute to dilution, you need to identify the sites that are in the average range, to keep them.

Correct meshing of article pages: if you’ve got this far, you’ve identified a site that’s within the average spam rate, whose content is correct, and you’ve calculated the Internal PageRank distribution after crawling the site. What remains to be done is simply to identify whether the article pages you’ve read are correctly meshed within the site: their depth must not exceed 5, (ideally below 3), and they are ranked in the top half of the list of internal urls, according to the PRI ranking.

Items that do not need to be filtered :

The overall theme of the site: the final transmission will be from URL to URL, and categorization makes no sense to a search engine. Filtering via human categorization removes from the list related subjects that would be in another category.

Sites that don’t have a high overall authority: it’s not the site that interests us overall, it’s the page that’s going to link to us. If you remove a site specializing in your theme for the benefit of a generalist medium, chances are you’ll lose out.

Sites that don’t have a high ratio of Trust to PageRank metrics: This is common practice, but there’s no point in comparing two metrics in this way. You need to compare metrics from one url or site with the same metrics from another site.

c. Prioritize

You’ve refined your list of potential acquisitions by removing all those that don’t match your backlinks policy (or previous recommendations)? Now you need to prioritize them to identify the order in which your budget should be spent.

There is only one real use for this: making sure the link value matches your budget. It is a final step that will potentially filter about half your list through a selection that uses source and target semantics (again, we’re not talking about human categorization).

Evaluating the value of the link is essential to prioritize your purchases. You will need several elements:

– The approximate location of the outgoing link: Authority metrics for a url that corresponds to the one that will make the link are sufficient: at best we’re talking about a themed, anti-spam PageRank.

– The text of the page from which the link originates: Unless you’re trying to obtain a link on an already existing url, or to supply the text, you’ll need to estimate by taking an existing page similar to the text you want to obtain, or by taking an average semantic embedding for the entire site, if it’s not generalist.

– The text of the link’s target page: Just as important, if not more important than the rest: your url to be featured also has text that you’ll be pushing, and if it matches the content that mentions it, all the better. Here you also need to embed your content, using the same method as the previous element.

For the embedding method, I recommend using LLM embeddings – there’s no better method available to the general public today. Next, you need a formula that you can use to evaluate each link. You need to take into account the inverse of the distance between the source and target embeddings, and integrate the PageRank metric of the source url. You’ll obtain a unique value for the link you’re simulating, which you can compare with the other simulated links to sort them in descending order.

For your backlink acquisitions, you then need to consider your budget and consider each link on the list starting with the highest value down to the lowest, if your budget allows.

3. Acquisition

a. Terms and conditions of acquisition

When obtaining the link, you should consider obtaining a link that lasts over time. The “lifetime” aspect of the link (understand “as long as the seller runs the site”) is more interesting because losing a link means losing what that link gave you.

By obtaining a lifetime link, you must above all ensure that this link and its context do not change.

You may be tempted to ask for a link with an optimized anchor but be careful not to exceed a reasonable optimized anchor rate, which could be penalized by Penguin. Depending on the subject, an acceptable optimized anchor rate can vary between 3 and 5%, with a more commonly accepted average of between 4.5 and 5%. The maximum is not an objective, but rather a risk, but having an optimized anchor is a criterion that can be used by Google and should therefore be considered.

However, the Google Leaks teach us something rather important: optimized anchors are only taken into account when the source of the link is made on the top 30% of sites in the index according to their PageRank metrics. So don’t bother making backlinks with an optimized anchor if the source site doesn’t have a PageRank metric (themed and anti-spam) greater than 66 / 100.

In the opposite case, the link is always used, but the anchor is not. It is specifically counterproductive to get a link with an optimized anchor on a site below the specified threshold. The optimized anchor ratio would increase, but without any real impact.

You therefore have 2 points to watch out for during acquisition:

– Keeping the link and its context.

– If desirable, obtain an optimized anchor, otherwise no optimized anchor (except on the brand).

b. Semantic consistency

To maximize the effect of PageRank transmission via the theming of a link, there are actually two possibilities: either you bring the source content closer to the target content, or you do the opposite.

It’s much simpler to adjust the source content, but when your target doesn’t rank and could get much better backlinks by changing its text, you need to consider that the online text isn’t the one that’s going to benefit from inbound links. In such cases, you need to be open-minded enough to consider that the current content can be replaced (or moved if it’s essential to your user journey) by content more in line with the source articles.

The method to be used to reconcile the semantics of two pages is to calculate the embeddings of the textual content and to calculate the inverse of the semantic distance between the two embeddings obtained.

On the other hand, the accuracy of embedding depends very much on the model used to calculate it, which is why an LLM will be more efficient.

4. After acquisition, follow-up

a. Non-regression tests

Monitoring the stability and preservation of a link over time is essential: the disappearance of an important link has the effect of removing the transmission of authority it had. However, the advantages of creating a link outweigh the disadvantages of deleting it: by creating a link, you can enable GoogleBot to discover the url, which in turn can enable you to index it. Deleting the link does not remove this indexing, as the robot knows the page.

The consequences of deleting a link vary according to the link’s actual impact:

– Either nothing happens

– Either you lose authority and your ranking is affected

It all depends on how important the link is to your ranking, but generally speaking, losing links is not a good signal, because if you lose important links and your competitors don’t, you lose out in the rankings.

To avoid, or at least limit, these unfortunate incidents, set up non-regression tests on the most important links, so that you can be warned when action is required: contact the service provider to correct an error, replace a deleted link, etc…

For a non-regression test on backlinks to be interesting, it must take into account several criteria linked to the source url: is it crawlable? Is it indexable? Has the context or location of the backlink changed? Has a nofollow been added to the link? Has the link been removed? Has the page been removed?

Ideally, we also want to track the authority of the source page, as this authority naturally decreases over time (the source site will create other articles, remove or remodel certain URLs, and potentially transmit less PageRank internally to this page, which will also potentially lose backlinks, unless we maintain the arrival of backlinks to this URL). Using the metrics of a tool that updates its metrics every URL would thus make it possible to identify a floor value below which action needs to be taken to increase the value of the backlink, or to assess whether investing in this one in the absence of another is still worthwhile.

b. Search engine memory

From a search engine’s point of view, keeping a history of changes to a URL comes at a huge cost. It involves storing several versions of a URL in memory, which amounts to storing the entire web several times over.

However, pages change and evolve, and in order to combat spamming practices, it’s important to be aware of these changes. That’s why Google only keeps track of the last 20 changes to a page. In other words, the last 20 changes to a page’s code, text and authority.

Why is this important?

Google has patented an algorithm specifically designed to combat SEO practices: Transition Rank. This algorithm works in 2 parts:

– Detection

– The quarantine phase

Detection: when a URL is suspected of being spam through actions such as semantic optimization or link acquisition, it enters a Transition Rank phase: a phase lasting from a few hours to 3 months, during which an arbitrary position lower than the initial one is assigned to the page. If, during this period, the page is re-optimized, it again enters a Transition Rank phase and loses positions once again.

The only way to avoid Transition Rank is not to touch the content or linking of a page for 3 months.

5. Optimize your multi-level strategy

a. Push URLs that generate good backlinks.

When you get a backlink, especially on a site that promotes the brand or the site, such as a media outlet, you want to be able to amplify the effect of that backlink. To do this, you can adopt a strategy of acquiring links to URLs that link to you. In this way, you can :

– Indirectly receive authority from sites that are perhaps a little distant from the final target’s theme

– Optimize backlinks obtained from the same sources as your competitors

– Revitalize a historical backlink that is losing PageRank

b. The link wheel

One theory of PageRank loop circulation known to increase bot traffic on your URLs is to exploit outbound links, hoping to get the target to point to a page among the inbound links, whether direct or indirect.

By applying this method, it is possible to multiply the PageRank obtained by a page by a factor of up to 2.6.

This method is commonly known as link wheel, and if it’s well known to SEOs, it’s also well known to Google, which has taken steps to limit its impact. There is no guarantee that any configuration or adaptation of this technique will have an effect. But since Google is trying to limit the impact of this technique, it’s worth putting enough effort into making the URLs that make up this “wheel” singular: it’s enough for the pages that make up the wheel not to pass the anti-spam algorithm for this technique to no longer be effective, or for them to be on the same AS (autonomous system) to be considered as belonging to the same entity (domain, IP, association with the same Google account via GA or GSC, or other Google functionalities), and therefore devalue the links passing through these pages.

One solution: communicate with sites upstream and downstream of your URLs on the link path.

6. How many links do I get?

a. Understanding what a link represents

A link is a vote of confidence that one URL gives to another. This principle comes from Eugène Garfield’s Science Citation Index. But the difference is that each page has a different transmission possibility depending on the number of outgoing links, the semantic distance, and each backlink transmits differently depending on its position in the list of outgoing links.

How does Google use a link?

Because a backlink is used to calculate PageRank, and the Googlebot uses it to discover new pages, Google uses links to assess the likelihood of the next page being interesting.

What if the next page is not interesting?

PageRank is teleported to all web URLs. From a practical point of view, the Googlebot “jumps” to another page in its list of URLs to visit, and from a mathematical point of view, the part of the PageRank not transmitted via the backlink is attributed uniformly to all the pages in the index.

Will PageRank be sent in the same way regardless of the source page?

No, in fact, there are three types of sources url:

– The URL that does not pass the anti-spam algorithm: this page transmits nothing further to its link targets, as if outgoing links were nofollow. In this case, PageRank is beamed to all URLs on the web, so there is no major advantage or disadvantage to receiving a link from a spam page.

– The lambda URL that passes the anti-spam algorithm: this page normally transmits the PageRank it receives to its targets. No more, no less.

– The URL of a “Seed” site. Since the production PageRank is the PageRank_NS, you need to bear in mind that some pages on the web “always” transmit PageRank to their targets, which are lambda pages.

The way PageRank Nearest Seed is calculated, you need to evaluate the distance in number of links from a Seed page to assess the page’s actual PageRank. This means tracing back the sources of links to find the nearest seed or seeing where you can retrieve a link directly or indirectly from a logical seed.

b. Assessing the competition

How many links do you need to outrank the competition? This means assessing the value you need to do better than the others. As Google is a ranking engine, you will need to assess the authority value of your competitors’ links on the targeted SERP in order to do better than them, for equivalent semantic value.

What do you need for each competing page?

– The list of incoming links

– A PageRank value for the URL

– PageRank at domain level

– A reliable Trust value at URL

From the list of inbound links, you can calculate two other metrics:

– The number of referring sites for a page

– The strength of these ties

From all these metrics, you can identify the acceptable average, maximum and minimum for each URL ranking on the SERP. This should give you a pretty accurate idea of what your page is missing to be considered important enough to join the list of ranking URLs.

If you are already in the average range of what’s acceptable for your page to rank on the target SERP, the remaining actions and levers to activate are user interaction, traffic and traffic location.

7. How can I make the most of the links I have obtained?

a. Internal networking

When you receive links from outside your site, Google stores them in a specific database and processes and manages them there. When you create links within a site (i.e. internal links), Google stores them in another database, and performs specific calculations for these links. This means that the value of a link from a site other than yours is different from the value of an internal link. This also means that certain practices sanctioned for external links are not necessarily sanctioned internally.

However, good internal linking is still important, as it makes it easier for robots to discover the pages highlighted on your site. With a properly executed internal link, you can push a section of your site that is essential to your business.

Internal PageRank optimization allows you to give certain URLs the benefit of PageRank when you receive links, regardless of which page on your site receives the links.

b. Outbound links, a strategic interest

As we saw earlier with the link wheel technique, outbound links can be exploited to give PageRank a bit of a boost, but from a broader point of view, outbound links are important because they allow PageRank to circulate to other URLs on the web, probably in the same thematic area, and since all URLs on the web are linked, more often in the same thematic areas, making an outbound link also helps to bring PageRank back home.

There is no point in considering PageRank as a volume, but rather as a flow. Thinking that you can accumulate PageRank by nofollowing or refusing to make outbound links is not useful, because in the case of a “cul de sac”, PageRank teleports to the rest of the web. (This is one way of avoiding spider trap).

However, outbound links are monitored: as I said earlier, the dilution rate (outbound from internal PageRank), i.e. the percentage of internal PageRank attributed to outside the site, must be between 3% and 10% to appear natural. Above 10%, the site is making too many outbound links; below 3%, it’s more likely to be a “dead end”.

As we’ve seen, there are many ways to optimize a page’s authority, so as to move it up in Google’s search results. If you’ve been paying attention to this article, you’ll have understood that pagerank has evolved considerably since the 1996 formula, and that, logically enough, Google bases its decision on user behavior (traffic, interaction with the page, reading time, etc.) to give a slightly different notion of authority than the number of backlinks.

Sources :

Yahoo TrustRank :

Title: Combating Web Spam with TrustRank 
Authors: Zoltan Gyöngyi, Hector Garcia-Molina, Jan Pedersen
https://www.vldb.org/conf/2004/RS15P3.PDF

Web Spam :

Title: Detecting spam web pages through content analysis 
Authors: Alexandros Ntoulas, Marc Najork, Mark Manasse, Dennis Fetterly
https://dl.acm.org/doi/10.1145/1135777.1135794

Transition Rank :

Title: Changing a rank of a document by applying a rank transition function 
Author : Ross Koningstein
https://patents.google.com/patent/US8924380B1/en

PageRank Nearest Seeds:

Title : Producing a ranking for pages using distances in a web-link graph 
Author: Nissan Hajaj
https://patentimages.storage.googleapis.com/b8/dc/3e/3b5b32f8acd264/US9165040.pdf

Latent Semantic Model:

Title : Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
Authors: Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng,Alex Acero, Larry Heck
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf

Blog and other resources :

Title: Google’s PageRank: does it still matter in 2024
Authors: Anastasia Osypenko, Olena Karpova
https://seranking.com/blog/pagerank/

Title : Page Rank NS (NearestSeeds)
Author : NC
https://lutforpro.com/seo-glossary/pagerank-ns/

Title: Navboost: What the docs tell us and how to use it for your SEO
Authors: Anna Postol, Olena Karpova
https://seranking.com/blog/navboost/