Machine Learning: How does it impact SEO?

machine learning seo

So, in the previous post we discussed what Machine Learning is. In this post we’ll go over how machine learning is impacting the way search engines (more precisely Google) work. How are they using machine learning (e.g. RankBrain) to deliver the best search results to their audience.

Without saying this is also the way Google is treating this, I want to split the impact into two subdomains:

  • How Google processes your search query and tries to understand intent.
  • How Google designs SERP’s that are relevant to the search query.

There’s a lot more to talk about than just these two subjects, but it’s the main deal. I’ll be explaining both by giving you a brief overview on how Google performed both actions before and after RankBrain. Let’s-a-go!

Processing Search Queries

With DMOZ closing recently, we’ve got a throwback to the early internet browsing behavior before search engines were a thing. The more pages were being made and thrown on the internet, the harder it became to just find what you were looking for. So people tried to solve this problem. People tried to ‘organise the internet’.

Categorising pages

Just like people were used to organise everything in these days, they started to gather the most important websites and put them into folders. You have a website about your local soccer team? We’ll put that here:

ALL -> Sports -> Soccer -> Teams -> Europe -> Belgium -> KMSK Deinze

This way DMOZ, at its closing time, categorized a stunning 3.861.210 websites into 1.031.722 categories over 90 languages. To do this, they had a team of 91.929 editors.

DMOZ Sports

This became an increasingly hard task to do, considering the enormous volume of websites going live on the internet each hour of the day. We needed a new, easier way to find the website page you were looking for.

Search engines based on query/document matching

google processing search queries

Go ahead, type in anything you want.

Why not let people type in the thing they’re looking for and return all the pages that contain the exact search term?

That’s where search engines started. Matching exact search queries to documents. If I had a document online that has the title ‘Coffee Machine’ and I used the phrase ‘coffee machine’ a lot in the document, it would be a very relevant result for the search query ‘coffee machine’.

There are a lot of different ways to determine the relevance of a document considering a search term. Consider just the following possibilities:

  • Keyword Usage: Is the document using this query? How many times does it use it (in absolute / relative terms)?
  • Term Frequency x Inverse Document Frequency (TF*IDF): This method takes into account the commonality of a word used in the query. If we’re looking for ‘great guitars’, the word ‘great’ will be more common, so the word ‘guitars’ will be more important to determine the relevance.
  • Co-occurence: Assuming you have a lot of data, you could check which words frequently co-occur with the search query. For example: If a document is about ‘guitar lessons’, it will probably mention ‘chords’, ‘frets’, ‘notes’ and other relevant words. A document containing these co-occuring words (measured across documents) will be considered more relevant.
  • Topic Modeling (e.g. LDA): This is were it gets though. Notice that co-occurence doesn’t imply the words are relevant. Topic modeling is a bunch of ways to determine which words are related to each other. For example the word ‘up’ and ‘down’ are related to each other. They are both related to ‘elevators’ but they are also related in a total different way to ‘manic depression’. Topic modeling uses vectors to determine how words are related. There is an awesome post from 2010 on the Moz blog about LDA and how it’s correlated to rankings. It also visually explains the previous topics.

This works great but has two downsides:

  • Exact search query usage: Matching documents to search queries doesn’t take search intent into account. This means that two different search queries, having the same intent, will have two different results. Also: misspellings are a big issue.
  • Manual topic modeling: The topic modeling used is mostly based on human, non-automated work. This means an enormous amount of work and editors needed. (DMOZ, anyone? 😉 )

Search engines using machine learning

What is needed is a machine learning system that learns how words, topics and concepts relate to each other. We need Artificial Intelligence to make search engines understand the questions we are asking so they can give us the correct answer.

I’ve found this great talk from Hang Li (Huawei Technologies), who presented his view on how to use machine learning for advanced query / document matching. The main problem being: how to adapt to natural language (synonyms, misspellings, similar queries <-> same intent,…)?

If you don’t want to watch the full video, the main aspects are here:

Hang speaks about matching the keywords and concepts on different levels:

  • Term: Comparable to the query/document matching. If a document uses the term ‘NY’ a lot, it’s probably relevant for the search term ‘NY’.
  • Phrase: Just like before but on the level of phrases. Term-level matching ‘hot’ and ‘dog’ will not necessarily give you the documents that are relevant to the phrase ‘hot dog’.
  • Word Sense: This is where it starts to get interesting. On this level of matching, we need to be connecting similar word senses. The system should know that ‘NY’ is actually ‘New York’, and that someone searching for ‘utube’ probably is looking for ‘YouTube’.
  • Topic: Even further we should be able to match the topics of the queries being used. If we can link ‘microsoft office’ to ‘powerpoint’, ‘excel’, … and other relevant terms, this gives us an extra layer to determine relevancy of a document.
  • Structure: On this level, we should be able to get the intent of the search, no matter how it is formulated. So the structure of the language should be understood. The system should ask ‘What is/are the most defining part/s of this search?’

So the way this works from a ‘Query Understanding’-standpoint:

search query understanding machine learning

  1. The searcher enters the query ‘michael jordan berkele‘, which contains a typo.
  2. On a term level, the spelling error is corrected. So ‘berkele’ is interpreted as ‘berkeley’.
  3. On a phrase level ‘michael jordan’ is identified as being a phrase.
  4. On the sense level there are similar queries like ‘michael l. jordan’ or just ‘michael jordan’.
  5. Importantly, on a topic level, the system recognizes the topic as being ‘machine learning’. If ‘Berkeley’ wasn’t in the query, there would have been confusion on the topic as ‘Michael Jordan’ is obviously also a very famous former basketball player.
  6. On a structure level it becomes clear that Michael Jordan is the main phrase of importance. It’s not Berkeley.

Looking at it from the other side, we have a similar process:

So when both the query and document can be understood on these levels, the system can start matching the search query intent to the most relevant documents. Hang goes further into this process, but this first part explains a lot about the task that’s been given to machine learning.

This process of including machine learning into understanding language and search intent has come a long way. Google uses TensorFlow to have machines learning language. Through a massive input of language data, it can make it’s own knowledge by understanding vectorial correlations between words or phrases. There’s little doubt that this technology is part of RankBrain.

So from a query-processing standpoint, machine learning is helping query/document matching by developing its own understanding of language.

Ranking search results

As said earlier, search engines have two main objectives: First, understanding the search intent to match the right pages. Then, rank all the matched pages so the most useful will be highest in the list.

When we finally decided which pages are probably relevant to the searcher’s intent, we’ll have to make a guess on what page will be the best to rank first. And there are a lot of factors being used to do that. But as you might have learned from the previous blog in this series, all these possibilities become too hard to handle right for every search. And that’s where machine learning and stuff like RankBrain come into play.

So let’s see how we could rank pages.

Pages ranked based on query / document matching

Plain and simple. We let the matching-algorithm run and define scores based on on-page relevance of the document. The document with the highest score, gets ranked first.

Although simple, this is not the best way to do this as it is an easy-to-trick system. Once you know how the query / document matching is done, you’ll be able to design a document that is very relevant according to the algorithm, but not for the user.

Pages ranked based on a set of manually weighted factors

Second thing to do is add in extra factors which can define if a page will be relevant or not. Then manually setting the weight these different factors should have to rank the search results. There are a lot of factors:

  • Page level: query / document matching score, links to the page, linking C-blocks to the page, …
  • Domain level: overall topical relevance, links to the domain, quality of content, …
  • Search level: branded search on this topic, …
  • User level: has visited this website before, visits video content regularly, …
  • Device level: what device is used, how’s the internet connection, …

Problem is, different searches will need different weighting in factors. And that’s more than any man can do…

Pages ranked based on machine learning

Not only does Google have the necessary information on query / document matching, incoming links to the domain and the page, overall relevance and power of the domain… It also gathers information on how well the search results are working. It measures click-through rate, bounce rate, etc…

For example, if you perform a search and get a search results page, there are a couple of things that can happen. Suppose you don’t click the first result. Why in hell, would you not click the first result? The possible list of answers is endless.

  • You’ve already visited this domain in the past and didn’t like it.
  • The search result is not relevant to your particular situation.
  • You think this website is for older people.
  • You don’t like the way the meta description is written.

Everything from user profile (demographics, interests, …) to on- or off-page factors (domain, meta title, …) can be in play. It is too much for a manually updated algorithm to get al these factors right. But given you have enough data (// enough searches), a self-learning algo could do the job.

It can work its way back from the results (‘What is the page that people clicked and probably had a good user experience?’) to define how the different algorithm factors should be weighted.


Machine Learning & Digital Marketing: What is Machine Learning?

RankBrain, Programmatic Buying, Artificial Intelligence, Real Time Bidding, Algorithm Updates… Digital marketing these days is all about big words and the math behind them. How is machine learning actually impacting digital marketing?

That’s what I’m exploring in this series on ‘Machine Learning & Digital Marketing’. Although I’m not a machine learning expert, I’m trying to give you an insight on how the practice itself is changing the way we do (digital) marketing today and how we will do it in the future. In the next episodes, we’ll be covering SEO, SEA, Media Buying and Analytics. But first, in this intro, let’s take a look at machine learning.

What is machine learning?

First things first! You’ve probably already heard about these 3 terms:

  • Artificial Intelligence
  • Machine Learning
  • Deep Learning

It’s good to know that there’s a difference between those 3 terms. In fact, Nvidia wrote a great blog about this subject. In short:

Artificial Intelligence is human intelligence exhibited by machines. Machine Learning is an approach to achieve artificial intelligence. Deep Learning is a technique for implementing machine learning.

For example with what is called “narrow AI” we can ask a machine to do a very specific task, like ‘beating a human at chess’ or ‘given a certain word, returning the most relevant page of a website’. Notice how the AI doesn’t need to understand Alexis de Tocqueville’s view on democracy. It doesn’t need to mimic the human brain, just do what is needed to perform te task at hand.

Artificial Intelligence: The art of beating a human at chess

There are lots of ways to make a computer beat a human at chess:

Source: Maarten van den Heuvel @

  • Ask expert chess players for their strategy and implement it as a combination of ‘If this then that’-rules.
  • Gather data on every chess game between two humans. For every situation, plan out the possible actions and the probability of winning the game for each action. Let the system always choose the action that gives the highest probability of winning the game.

You might have noticed that there’s a problem with these two solutions. If the data the AI is based on, is static, the AI becomes very predictable. Even though it might beat humans a few times, once the human gets the decision-making performed by the AI, it should never win a game again considering the human is then able to a develop a counter-tactic against the ‘highest probability’-choices. The AI will not be changing its strategy. So we need a new way.

Machine Learning: The art of beating a human at chess again and again

The new way would be something like this:

  • Let the computer play millions of games and gather data on winning probabilities for every action in every possible situations. Make it constantly learn to adjust its own choices, including as many hand-written parameters you can imagine.

This last part, machine learning, ensures that the AI will be able to keep beating humans in chess in the future. Keep in mind that it will need to make mistakes and lose games to be able to learn how to win them. It will not go on a winning streak of 100% starting at its first win. (There is a very good life lesson in this paragraph. 😉 )

To get back to our example, the chess computer might probably learn that the hand-written parameter of ‘randomness’ is important. If it doesn’t want to be perfectly predictable, the AI might want to sometimes pick ‘the second highest probability move’ to challenge the human’s processing capacity. But excellence will be in the balance. It should not lower its chances of succes by too much.

Another example:

Artificial Intelligence: Simulating a game of football

The one thing that started my interest in AI is gaming. Most of all sim(ulation) gaming. For example (and I’m sorry non-football lovers) the game Football Manager.

Football Manager AI

The amount of hours I played this game…

It essentially mimics the game of football being played in the real world, with excellent precision. The game seems simple:

  • You have a club with a group of players, each of them having their own set of abilities. For example: Scott Davidson, a central defender at the Scottish League Two club Stirling Albion:
    Scott Davidson Stirling
  • When playing a match, Scott is put in a line-up, combined with some high level strategy decisions that will guide his decision making:
  • In-game these players are constantly making decisions. For example, Henderson in this case gets the ball and should decide what to do:

    Artificial Intelligence Football Manager

    Run down the flank? Pass the ball? To whom?

And this is when it becomes interesting: Henderson has to make a decision, which is based on his parameters (Vision, Anticipation, Decisions…), tactical guidelines (‘Pass Shorter’, ‘Take No Risk’) and many more factors. Once he has made his decision, the execution of his action is also based on factors like his parameters (Passing, Technique, Dribbling, …), the pitch quality, his fitness level…

Machine Learning: Keeping the game interesting to play

This would be (and for most people: is 😀 ) a very boring game if there was one tactic that would win every game. The thing with the game is that, given certain limitations, the ‘other coaches’ are adapting their tactic to what you are doing.

This ensures that you’ll have to keep changing your tactics to keep winning games. Makes it a very frustrating game at times, but in essence, makes this game endlessly playable (and some of us do…).

So, now we know what Artificial Intelligence and Machine Learning are, what’s this deep learning thing?

Deep Learning: Mimicking neural networks

Then for the absolute abstract part of this. What deep learning is actually doing is very close to how the think our brains work: through neural networks:

Artificial Neural Network

A very simple Artificial Neural Network – Source:

There is a certain amount of input that is being divided over different nodes. This input is getting transformed in different hidden layers of nodes. The amazing thing is that the nodes are connected give their ‘transformed input’ and a weighting of their own input (considering the output) to the next layer.

Given the rising processing capacities and math innovations that science has created in the latest years, we are capable of doing ‘sort of what the brain does’ on a smaller scale.

Dr. Pete Meyers actually explained this brilliantly simple on MozCon 2016:

The way a neural network works is: We have these [layers of] inputs and we have these [layers of] outputs we want to achieve. […] So we’re trying to put something in between that can model that [input to output]. […] We put in this data to train it, but then the machine itself can handle new inputs that’s never seen before.

So actually, by letting the machine learn backwards from the output to the input, we create Artificial Intelligence that processes new input into the desired output. This allows us (bearing in mind the quality of the training data, processor capacity…) to build better data processing tools then our mind is consciously facilitating us to do by hand. That’s crazy.

And this is and will be impacting the world in general and digital marketing in specific. In the next episodes we will be discussing the impact this has on SEO, SEA, Media Buying and Analytics. If you have any other ideas on this, be sure to let me know!