LLMs don’t index web pages the same way as Search Engines do, they use a somewhat complex, multi-step, process that involves data acquisition, content processing, indexing and retrieval and synthesis. Search engines use crawlers and indexes, but with LLMs, it is a much more complex process.

Here is a breakdown of how LLMs determine what is on a URL and when to show it to someone who submits an inquiry.

Data Acquisition

  • Collect data using specialized bots, raw HTML, data from search engine compiled indexes like Google, and by direct access to content via APIs or plugins.

Content Processing

  • LLMs take the data collected and process it using something called “chunking”, where large pages are broken down into smaller blocks, tokenization, where text is converted into numerical tokens, and embedding/vectorization, where those tokens are transformed into numerical vectors that capture their semantic meaning placing similar ideas close together.

Indexing & Retrieval

  • The numerical vectors, also called “embeddings”, and associated metadata, are stored in specialized databases designed for fast searching. When a user asks the LLM a question the system converts the query into a vector and searches this database for the most semantically similar content “chunks”.

Synthesis

  • Retrieved “chunks” are re-ranked for relevance and the LLM uses the chunks, along with “internal knowledge” to generate a synthesized answer citing sources. Synthesized, basically means an answer that contains pieces and parts from multiple sources.

Keys for Positioning Your Site to Be Referenced by LLMs

  • On-page content should be well-written, clearly structured, and written in plain language. Frequently Asked Questions (FAQs), and high-value internal linking are excellent for helping LLMs extract reliable information from your web pages.
  • Include Metadata – Title tags, headings, and schema data provide context to LLMs
  • LLMs use the text surrounding a sentence or paragraph on your web page to understand its’ meaning.

Create Excellent Content

Optimizing for LLMs has been a major topic of discussion lately but the keys, in my opinion, are still creating excellent content and making sure your website performs at a high level. LLMs are rapidly developing so my focus has been on helping clients optimize in the areas were traditional SEO and LLMs overlap. The process might change in the future, but right now, I think this approach will bring the best results, as far as optimization.