What is a Generative Search Engine? Can we Game it?

Search engines have existed for 30 years and have undeniably had an incredible impact on information access and, more broadly, on human relationships through the highlighting of information. The concept behind engines such as Google is straightforward: they match websites to information needs.

In a sense, these technological tools are still in their infancy. They operate by associating objects with keywords, without truly understanding or mastering the information they process.

The recent emergence of large language models, also known as foundational models, has transformed how we access information by equipping “the machine” with the ability to engage in fluid and even seamless dialogue with humans.

Unlike traditional search engines, LLMs (such as ChatGPT and Claude) synthesize the structure of natural language and the information contained within their training datasets. This capability enables them to present information in a conversational manner while understanding the importance of context in determining what information to prioritize.

As illustrated in the figure below, this approach allows users to intuitively navigate information, refining their informational needs through interactive exchanges.

But you’ll notice that I added a small asterisk in the diagram: updating the database is highly challenging and costly. As a result, the datasets are not always up to date. It can be estimated that LLMs update their data monthly, or even less frequently. Moreover, a growing number of website owners are banning AI bots, meaning that some LLMs lose access to part of the general knowledge (content available only on those sites).

The fact is, to provide responses that meet human quality standards, an LLM must continuously access the web to retrieve the best content. However, LLMs cannot process the entire web every time they need to answer a query. New solutions must be developed to address this limitation!

What is a Generative Search Engine?


I can finally answer 😉 New solutions must be developed, and we already have the solution in the ecosystem: search engines continue to play a vital role in prioritizing and organizing web data, but their end-user can also be the LLM!

To answer a query, Generative Search Engines (GSE)  provide a LLM written synthesis of  prioritized information, sourced from search engines that deliver relevant URLs for the given query. The scheme below is giving a clear view of how things are done.

Of course the search engine index must be up-to-date at any moment, but this is something that search engines have been doing fine since decades, so it’s not an issue.
SearchGPT, AI overviews (Google), BingChat (now copilot), Perplexity, all these platforms are examples of generative search engines.

Generative search engines perform a form of curation by synthetizing information into authoritative answers. This curation amplifies their responsibility: they don’t just reflect the web; they actively shape understanding and decision-making. Much like Asimov’s robots were bound by laws to protect humans, Marc Najork from Google has outlined five key principles these systems must adhere to:

  1. Reliability: Generative systems must prioritize authoritative sources and avoid hallucinations.
  2. Transparency: Every answer should provide clear citations, enabling users to find primary sources.
  3. Unbiasedness: These systems must avoid amplifying biases.
  4. Diversity: On controversial issues, they must present a balanced range of perspectives.
  5. Accessibility: Their answers must be comprehensible to users, adapting to their language and level of understanding.

These principles may be necessary to align the innovative aspect of GSE with ethical obligations.

Can we Manipulate Generative Search Engines?

This is the main question of interest for the whole SEO sector. And the answer is, of course, yes. Generative search engines (GSEs) can indeed be manipulated, even if, for now, such efforts remain in their early days.

There is however a growing body of research on the manipulation of LLMs, as well as Generative Search Engines (GSEs). The article by Nestass, Debenetti, and Tramèr introduces the concept of Preference Manipulation Attacks, a type of attack targeting LLMs. The goal is to manipulate the model into prioritizing content differently by combining several techniques: prompt injection, black-hat SEO strategies, and LLM persuasion.

What’s particularly fascinating about their approach is that it introduces true negative SEO—actively pushing competitors down in visibility—and the results are both measurable and remarkably clear. They tested their attack on perplexity, Bing copilot, claude 3 and GPT4. For instance they achieved a x2.5 in visibility for a targeted product by Bing copilot.
Don’t bother replicating it, the authors disclosed the threat to the operators and it’s now fixed.

Another paper entitled “GEO: Generative Engine Optimization” by Aggarwal et al. introduces the acronym GEO and presents  a new approach to help content creators enhance their visibility in Generative Search Engines (GSEs). The paper mentions GSEs, like BingChat – copilot or Perplexity, but we can easily figure that the approach is also good to go for AI overviews (the former SGE) or for searchGPT. The goal of their approach is to boost content citations and impressions.
The SEO methodology they propose is clearly not rocket science, but is effective:

  • Cite Sources: Adds reliable citations to improve credibility and boost ranking.
  • Quotation Addition: Incorporates relevant quotes to make content authoritative.
  • Statistics Addition: Introduces data-driven evidence to enhance visibility.
  • Fluency Optimization: Improves content readability and coherence for better engagement.

The table below, coming straight from the paper, is showing examples of optimizations that have been done by the researchers.

These methods collectively showed up to 40% improvement in visibility in their study, marking a shift from SEO to GEO.

If I had to summarize what you need to know about Generative Engine Optimization (GEO), it’s fairly straightforward because the field isn’t deeply explored yet. Essentially, GEO techniques revolve around exploiting the core nature of LLMs: their semantic understanding and how they summarize information. Another approach is more technical, involving methods like prompt injection, cloaking, and similar tactics.

Beyond that, the landscape is very similar to SEO. Generative Search Engines (GSEs) are, by design, black boxes. Just as SEO works to influence traditional search engines such as Google with only theoretical knowledge of how they work, GEO manipulates the inputs to the “engine” to influence its outputs.

Like SEO, the goal of this adversarial game is simple: capture more visibility than your competitors, which ultimately translates to revenue. In short, the tools and targets may have evolved, but the game remains the same.