top of page

Updated: Jan 8, 2026


Technical SEO is the foundation your website is built on, ensuring that search engines can find, crawl, and understand your content. Use the below technical SEO checklist to audit essential areas including crawlability, indexing, page speed, and site security.


Don't forget that a comprehensive technical SEO audit begins with understanding the site's context. Tailor the scope of your audit to the website in question. For example, an hreflang audit should be included as part of a technical audit for an international site with different language versions, but not for a single-language website.


Also, note that not all of your findings will be equally problematic. A high number of 404s after intentionally removing outdated content is to be expected, but an unexplainable increase in 404s should be investigated further.


Table of Contents


XML Sitemap

XML sitemap is a file that helps search engines to understand your website structure and crawl it. It includes a list of all pages on your site, the pages prioritization, when they were last modified, and how frequently they are updated. Usually, the pages will be categorized by topic, post, product, etc.


You'll probably check the sitemap at the beginning of a technical audit. Find the sitemap of any page by typing /sitemapl.xml after the URL, for example,

https://datachai.com/sitemap.xml. If the site has multiple sitemaps, use /sitemap_index.xml


Register your sitemap with Google Search Console, which includes several tools to check technical SEO metrics such as mobile responsiveness and page speed. The XML Sitemap Report will give you the technical insight to achieve 1:1 ratio of URLs added to the site and updating the sitemap.


Ideally, your site has an internal linking structure that connects all pages efficiently, so you don’t need a sitemap. It’s actually optional. But it’s best practice for large sites to have a sitemap.


If your site has a sitemap, it's best practice to include all 200 OK URLs, and have a 1:1 ratio of exact URLs in the sitemap as there are on the site. 4xx and 5xx URLs, orphaned pages, and parameters should be removed.


Server Response Code and Redirects

Bulk check source codes with this Google Apps Script:

// Get http server response status code
function getStatusCode(url){
  var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
  var statusCode ;
  try {
  statusCode = UrlFetchApp .fetch(url) .getResponseCode() .toString() ;
  }
  
  catch( error ) {
  statusCode = error .toString() .match( / returned code (\d\d\d)\./ )[1] ;
  }

  finally {
  return statusCode ;
  }
}

// Exceed importxml limit
function importRegex(url, regexInput) { 
  var output = ''; 
  var fetchedUrl = UrlFetchApp.fetch(url, {muteHttpExceptions: true}); 
  if (fetchedUrl) { 
    var html = fetchedUrl.getContentText(); 
    if (html.length && regexInput.length) { 
      output = html.match(new RegExp(regexInput, 'i'))[1]; 
    } 
  } 
  Utilities.sleep(1000); 
  return unescapeHTML(output); 
} 

Then, use RegEx redirects if you need to bulk redirect multiple source URLs to the same destination, or .htaccess file for smaller scale redirects. However, if your site is hosted on WordPress, be careful about using .htaccess because it will be depreciated in php 7.4 and subsequent versions. WP Engine suggests alternatives such as using RegEx directly on WordPress or managing redirects in Yoast SEO Premium. Finally, if you are completely removing a page, orphan it and then use a 410 so that Google can remove it more quickly.


Canonicals

Even if you don't have multiple parameter-based URLs of each page, different versions of your pages using https, http, www. .html, etc. can quickly add up. That's where the rel=canonical tag comes in - allowing you to manage duplicate content by specifying the canonical or preferred version of your page. This functions to report duplicate content and tell Google to consolidate the ranking signals, so your page won't be disadvantaged.


If you are using a CMS like Wix or Squarespace, your web hosting service might automatically add canonical tags with the clean URL.

<link rel="canonical" href="https://www.datachai.com"/>

robots.txt

The robots.txt file, also called the robots exclusion protocol or standard, is a text file that tells search engines which pages to crawl or not crawl. You can see the robots.txt file for any website by adding /robots.txt to the end of the domain. For example, https://www.datachai.com/robots.txt


Search engines look at robots.txt first before crawling a site, so a disallowed page will be completely excluded.


Crawl Budget

Crawl budget is the number of pages Google crawls and indexes on a website within a given timeframe. If your pages exceed your site's crawl budget, Googlebot will not index the balance, which can negatively affect your rankings.


Performing regular log file analysis can provide insights about how Googlebot (and other web crawlers and users) are crawling your website, giving you the necessary information to optimize the crawl budget. If your site is large and has crawl budget issues, you can adjust the crawl rate via search console.


JavaScript SEO

Developers tend to use JavaScript to create animations, interactive forms, and content elements that respond to user actions, making websites more dynamic, engaging and user-friendly. However, websites that have been developed with JavaScript frameworks such as React, Angular, or Vue.js face unique SEO challenges. These days, almost all websites use JavaScript in some form, making JavaScript SEO an essential component of technical SEO.

 

What is JavaScript SEO?

JavaScript SEO focuses on ensuring that websites built with JavaScript are easily crawled, understood, and indexed by search engines. JavaScript itself does not inherently hurt SEO. In fact, it's often used to make websites more user-friendly which is a good thing for SEO. The problem arises with client-side rendering, where browsers use JavaScript to dynamically load content, enabling rich user-interactivity but potentially slowing down initial load times and negatively affecting SEO.


How to Implement SEO-Friendly JavaScript

In 2018, Google announced dynamic rendering, a technique where you switch between client-side rendered content and pre-rendered content for certain user agents, allowing you to deliver the full client-side rendered experience to users while getting as much content as possible to crawlers like Googlebot. 



However, Google has since updated their documentation to clarify that dynamic rendering is a workaround and not a long-term solution for problems with JavaScript-generated content. Google recommends using server-side rendering, static rendering, or hydration instead.


Core Web Vitals

In May 2020, Google introduced core web vitals as the newest way to measure user experience on a webpage. About one year later, as of June 2021, Google announced an algorithm update to use core web vitals as ranking factors.


There are three core web vitals:

  • Largest Contentful Paint (LCP)

  • First Input Delay (FID)

  • Cumulative Layout Shift (CLS)


The idea is that core web vitals point to a set of user-facing metrics related to page speed, responsiveness, and stability, which should help SEOs and web developers improve overall user experience. Let’s see each of the core web vitals in more depth.


Largest Contentful Paint (LCP)

Largest contentful paint (LCP) is the time from when the page begins loading, to when the largest text block or image element is rendered. The idea is to measure perceived pagespeed by estimating when the page’s main contents have finished loading.


Of course, lower (faster) scores are better. In general, LCP <2.5s is considered to be good, and >4s should be improved.


LCP is one of the more difficult core web vitals to troubleshoot because there are many factors that could cause slow load speed. Some common causes are slow server response time, render-blocking JavaScript or CSS, or the largest content resource being too heavy.

Note that if the largest text block or image element changes while the page is loading, then the most recent one is used to measure LCP. Also, if it's difficult to pass core web vitals in your industry (i.e. most corporate sites have graphics heavy pages), keep in mind that your pages are compared against your close competitors.


First Input Delay (FID)

First input delay (FID) is the time from when a user first interacts with your site, to when the browser can respond. Only single interactions count for FID, such as clicking on a link or tapping a key. Continuous interactions that have different performance constraints are excluded, such as scrolling or zooming.


FID of <100ms is generally considered good, while >300ms should be improved. If your FID score is high, it could be because the browser’s main thread is overloaded with JavaScript. To reduce FID, try optimizing JavaScript execution time and third-party cookies impact.


Cumulative Layout Shift (CLS)

Cumulative layout shift (CLS) is about visual stability on a webpage. Instead of a time measurement, CLS is measured by a layout shift score, which is a cumulative score of all unexpected layout shifts within the viewport that occur during a page’s lifecycle. The layout shift score is the product of impact fraction and distance fraction. Impact fraction is the area of the viewport that the unstable element takes up, and distance fraction is the greatest distance that the unstable element moves between both frames, divided by the viewport’s largest dimension.


<0.1 is considered good, and >.25 is generally considered a poor score. Common causes for poor CLS include images or ads with undefined dimensions, resources loaded asynchronously, or DOM elements dynamically added to a page above existing content. The best practice is to always include size attributes for your images and videos.


How to Measure Core Web Vitals

Core web vitals are incorporated into many Google tools that you probably already use, such as Search Console, Lighthouse, and PageSpeed Insights. In addition, a new Chrome extension called Web Vitals is now available to measure the core web vitals in real time.


User experience and page speed often depend on the user’s connection environment and settings as well. Every time a page is loaded, LCP, FID, and CLS will be slightly different. Your site has a pool of users that make up a distribution - some people see the pages fast, others see it slower.

For the purpose of core web vitals, Google measures what the 75th percentile of users see. This and other concepts are discussed in a recent episode of Search Off the Record.


On the topic of page speed, it's also best practice to adopt a fast DNS provider, minimize HTTP requests by reducing CSS, scripts, and plugins, and compress pages by optimizing images and cleaning critical code, especially in the first view.


Troubleshooting

Google offers several solutions if you suspect a bug, such as if your core web vitals numbers are poor but your site has been tested and proven to be performing well. Join the web-vitals-feedback Google group and email list to provide feedback, which Google may consider when modifying the metrics going forward.


If you're looking for individual support, file a bug with the core web vitals template on crbug.com. This involves some technical work -- for example, you may need to write JavaScript that has a performance observer to show the issue.


Note that the core web vitals explained above are included in Google’s page experience signal. Of course, core web vitals are not the only user experience metrics to focus on. All other web vitals such as total blocking time (TBT), first contentful paint (FCP), speed index (SI), and time to interactive (TII) are non-core web vitals. As Google continuously improves its understanding of user experience, it will update the web vitals regularly. As of November 2021, Google is already preparing two new vitals metrics -- smoothness and overall responsiveness.


On the topic of responsiveness, Google has been giving higher ranking to mobile responsive websites since April 2015. At the same time, they released the mobile-friendly testing tool to help SEOs ensure that they would not lose rankings after this algorithm update. Also look into AMP (Accelerated Mobile Pages), an open source framework that aims to speed up the delivery of mobile pages via AMP’s html code.


Website Security

Securing your website is first and foremost about protecting sensitive data and preventing cyberattacks, but did you know that it’s also an important factor in SEO strategy? Search engines like Google prioritize user experience, and site security is one of the key elements of a positive user experience. 


SSL (Secure Sockets Layer)

SSL (secure sockets layer) is a security technology that creates an encrypted link between a web server and browser. It’s clear if a website is using SSL because the URL will start with https (hypertext transfer protocol secure), not http. In 2014, Google announced that they are looking for “https everywhere”, and websites using SSL will get priority for SEO. Google Chrome now displays warnings anytime a user visits a site that does not use SSL. These days, most website builders such as Wix include SSL by default. If not, you should manually install an SSL certificate on your website.


In November 2025, the Google security team announced that Chrome will make https the default by October 2026, meaning users will have to give permission before any non-secure site can load.


Web Application Firewall (WAF) 

A Web Application Firewall (WAF) is a security solution that acts as a barrier between your website and malicious traffic. WAFs work by analyzing web traffic, identifying potential threats, and blocking potentially harmful requests that could exploit vulnerabilities in your website’s code or server configuration, before they reach the web server. Most modern WAF solutions like Cloudflare are available as cloud-based services, and can be easily integrated with your website. There are several ways that WAFs can enhance your website security, from an SEO perspective.


Protect Against Cyber Attacks and Threats

WAFs are designed to protect your website from a variety of cyber threats such as SQL injection, DDoS (distributed denial of service), cross-site scripting (XSS), and other types of Layer 7 attacks. Websites that are regularly attacked are at risk of slow load times and downtime. When Google's crawlers detect that a website has a high bounce rate or a low average session duration, it tends to negatively impact SEO. WAFs protect your website and ensure that it remains fast and responsive, thereby improving its user experience and SEO ranking. 


Although web crawlers from search engines are essential for indexing, not all bots are friendly. Malicious bots can scrape your content, flood your site with fake traffic, or try to access sensitive data. This wastes server resources and, more importantly, could lead to security vulnerabilities. With a WAF, you can set up specific rules to block or challenge suspicious bots, ensuring that only legitimate users and crawlers can access your site, thus protecting both your website’s integrity and SEO rankings.


Enhance Page Speed and Mobile Optimization

A lesser-known benefit of using a WAF is that it can also boost site performance. Some WAFs include features like rate limiting, bot filtering, and traffic caching, all of which can reduce the load on your server and speed up your website. Since page speed is a ranking factor for Google, having a WAF that optimizes traffic and blocks unnecessary requests can lead to better performance and improved user experience.


Trust Signals

If Google detects malware or other threats on your website, it may flag it as unsafe, leading to a drop in SEO. WAFs can prevent this from happening by detecting and blocking malicious traffic before it reaches your site.


In certain industries, data protection regulations such as GDPR require websites to maintain a high level of security. By using a WAF, you demonstrate your commitment to maintaining a safe online environment, to both users and search engines. Websites with strong security measures are more likely to be trusted by users, leading to higher engagement rates, longer sessions, and ultimately, better SEO performance.


Schema

Schema, also called structured data markup, enhances search results through the addition of rich snippets. This allows you to display details like star ratings, product prices, or event dates directly in the SERP. Adding schema by itself is not a technical SEO factor, but it is recommended by Google and can indirectly help improve rankings and increase page views.


Faq schema example below:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What kinds of companies have you worked with in the past?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "I have worked with companies across B2B technology, marketing and advertising, healthcare, lifestyle, and other industries, including both international and local businesses."
    }
  },{
    "@type": "Question",
    "name": "What size websites have you worked on for SEO?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "I’ve worked on global multilingual websites with over 3 million monthly visitors, as well as smaller businesses and startups with a few hundred monthly visitors."
    }
  }]
}
</script>

Add Schema via WordPress Plugin

If your website is hosted on WordPress, use the Schema plugin to add structured markup to your pages. This plugin uses JSON-LD, which is recommended by Google and also supported by Bing.


Add Schema Manually

If your site isn't hosted on WordPress site or you prefer not to rely on a plugin, you can manually add schema with a few more steps. Schema is usually added to the page header, although it's possible to add it to the body or footer as well. Some recent WordPress themes include specific text blogs to add schema to the body.


Note that adding schema through Google Tag Manager is not recommended. If you use Google Tag Manager, the structured data will be hidden within a container, making it difficult for Google's algorithms to read and give it appropriate weight.


To add schema manually, first, use a tool like MERKLE to generate the baseline markup. Although this is already fine to use as is, you can also paste the baseline markup in a text editor and continue to edit and customize the structured data for each page.


If you are using Sublime Text, go to View > Syntax > JavaScript > JSON to set your syntax appropriately.


Finally, insert additional properties that were not available on MERKLE as needed.


Add html Strings to Schema

Basic html strings can be added to schema, for example, if you'd like to include a bulleted list or hyperlink. An important thing to remember here is to escape double quotes when writing html, and simply replace them with single quotes.


Google Search displays the following HTML tags; all other tags are ignored: <h1> through <h6>, <br>, <ol>, <ul>, <li>, <a>, <p>, <div>, <b>, <strong>, <i>, and <em>


Validate Schema

Use Google's Rich Results Testing Tool to make sure that your schema markup is being read properly, and Structured Data Testing Tool to actually see all of the structured data on the page. The final step will be to request indexing for the page that you added markup, via Google Search Console. Within a few days, you should see your markup under the enhancements sidebar.


Note that in June 2021, Google has limited FAQ rich results to a maximum of 2 per snippet, so your snippet real estate may be a bit smaller. If you have 3 or more FAQs marked up, Google will show the 2 that are most relevant to the search query.


Log File Analysis

The log file is your website’s record of every request made to your server. It includes important information such as: the URL of the requested page, http status code, IP address of the request server, timestamp, user agent making the request, request method (GET/POST), client IP, and referrer.


Log file analysis provides insights into how Googlebot (and other web crawlers and users) are crawling your website. The log file analysis will help you answer important technical questions such as:

  • How frequently is Googlebot crawling your site?

  • How is the crawl budget being allocated?

  • How often are new and updated pages being crawled?

.

You can identify where the crawl budget is being used inefficiently, such as unnecessarily crawling static or irrelevant pages, and make improvements accordingly.


Obtain the Log File

The log file is stored on your web browser and can be accessed via your server control panels’ file manager, command line, or using an FTP client (recommended).


The server log file is commonly found in the following locations.

  • Apache: /var/log/access_log

  • Nginx: logs/access.log

  • IIS: %SystemDrive%\inetpub\logs\LogFiles

.

Tools and Software

Convert your .log file to a .csv and analyze it in Microsoft Excel or Google Sheets, or use an online log file analyzer such as SEMRush or Screaming Frog Log File Analyser. The best log file analyzer will depend on your website and what tools you might already be using for technical SEO.


Limitations

Performing regular log file analysis can be extremely useful for technical SEO, but there are some limitations. Page access and actions that occur via cache memory, proxy servers, and AJAX will not be reflected in the log file. If multiple users access your website with the same IP address, it will be counted as only one user. On the other hand, if one user uses dynamic IP assignment, the log file will show multiple accesses, overestimating the traffic count.

Updated: Jan 6, 2026


Spamdexing, or search engine spam, is the bad practice of using black-hat SEO tactics such as keyword stuffing or hidden texts and links, to manipulate search engine results. Obviously, spamdexing violates Google’s webmaster guidelines, and if you are caught, your site may face a Google penalty and deindexing. Keyword stuffing may have worked 20 years ago, but now in 2026, search engines like Google are continually improving their algorithms to identify spamdexing.


How to Prevent Spamdexing on Your Website

To prevent spamdexing, focus on implementing technical SEO best practices, maintain robust website security, and monitor your backlinks. As I have previously published blog posts on how to do a technical SEO audit and enhance website security, this post will focus on backlinks.


What is a Toxic Backlink?

Backlinks are generally a positive factor in SEO. If a user finds your website's content informative, they might link to it from their own site. That backlink is a vote of confidence in your content. However, toxic backlinks display signs of black hat tactics - low domain authority, mirrored pages, duplicate content, spammy websites, and low visible text to html ratio.


SEOs should monitor for toxic backlinks by setting up Google Alerts and using the link report in Google Search Console or a backlink analysis tool. SEMrush, or Moz subscriptions include a free backlink checker to help you work efficiently and spot toxic backlinks quickly.


Even if you are sticking to white hat SEO, you may receive toxic backlinks and get penalized for them, regardless of whether or not you were knowingly complicit.


Google's Algorithm for Spam Detection

Unlike some other SEO factors, Google is very clear about their position and approach to toxic backlinks. Having too many toxic backlinks pointing to your website can weaken your website's SEO or even get you penalized from search engines.


Seasoned SEOs may remember Google's Penguin, an algorithm that catches toxic backlinks, released in 2012. Penguin as a standalone algorithm has since been sunset replaced by SpamBrain, Google’s current spam-detection algorithm designed to find and remove low-quality websites from the SERPs. An algorithmic penalty from SpamBrain can result in a significant drop in rankings and organic traffic, or even complete de-indexing of your site.


The webspam team at Google also reviews link profiles manually. Manual review could be triggered for several reasons, including but not limited to:

  • Spam report from a competitor

  • Algorithmic penalty triggered a manual review

  • Your business is in a competitive niche that Google actively monitors


Google usually does not notify you directly for algorithmic penalties. For manual penalties, you'd receive a notification in Google Search Console.

How to Disavow Toxic Backlinks

The disavow links tool is a feature in Google Search Console that lets website owners tell Goole to ignore toxic backlinks pointing to their site, preventing them for negatively impacting search rankings. You can disavow a specific link, subdomain, or entire domain.

# Two pages to disavow
http://spam.example.com/stuff/comments.html
http://spam.example.com/stuff/paid-links.html

# One domain to disavow
domain:shadyseo.com

Despite some rumors that Google will follow Bing in completely removing the disavow tool, it is still live as of January 2026.



John Mueller has confirmed that in most cases, it's not necessary to use the disavow tool. As described in the previous section, Google is already pretty good at ignoring link spam, and many SEOs have reported no significant change from using the disavow tool preventatively. Cyrus Shepard even ran an experiment where he disavowed every link to his site, and nothing happened.


However, if Google notifies you about a manual link-based penalty, you can try to recover by disavowing the toxic backlinks. You can also file a spam report with Google, providing details of the attack.


Note that when you disavow links, it means that you are requesting Google to disregard the toxic links to your domain. Similar to rel=canonical, it's a suggestion rather than a directive. Although disavows are accepted in most cases, Google is not obligated to honor them.

My not-so-surprising SEO prediction for 2026 is that organic CTR will continue to decline. Google will continue to refine its AI overviews to appear for a broader range of search queries. Parallel to this, Gen AI models like ChatGPT will further evolve through advanced training. 


Users will have access to increasingly accurate and relevant answers via both AI Overviews and Gen AI. Even without technical advancements, the growing integration of AI tools into the user journey will, by itself, foster greater user trust and reliance over time, further contributing to the shift away from traditional SERPs. The combined effect is an increase in zero-click searches, and decrease in organic CTR. 


The obvious way to adapt for this change is to optimize for AIO and GEO, not just SEO. Following my previous post on LLM search optimization, this post discusses how AI Overviews will change the future of SEO. 


The Zero-Click Search

Although Google has previously claimed otherwise, the undeniable trend is that organic CTRs are hitting all-time lows, ever since the introduction of AI Overviews. This is true outside of the US as well -- a survey of 320 Japanese digital marketing professionals confirmed CTR drop attributed to AI Overviews, and revealed that 90% Japanese SEO professionals are now rethinking their marketing strategies.


Screenshot of Google search for the query "text mining"

The above screenshot is a Google search for the query "text mining." The AI overview provides a basic high-level definition, followed by more detailed sections such as how it works, key techniques, and why it's important. Users seeking a basic definition of text mining would have gotten what they were looking for without ever leaving the SERPs, so we can see why the zero-click search is becoming more prevalent. While the AI overview sites authoritative sources such as IBM and Wikipedia, the CTR of those pages are likely diminished by the comprehensive AI Overview. According to Search Engine Land, data from 20,000+ queries showed that AI Overview citations consistently have low CTR, matching the performance of organic results ranked at Position 6 or lower.


The range of search queires triggering AI Overviews also continues to increase. According to a November 2025 study by ahrefs, AI Overviews now appear for 21% of Google search queries. However, the prevalane of AI Overviews varies greatly from topic to topic. AI Overviews could be triggered for as high as 60% or as low as 1% of search queries for, depending on the topic cluster.


In the early days, AI Overviews mostly appeared for generic search queries like "text mining", but they are now appearing on some transactional queries as well, where lead generation and conversions are concentrated. However, at the end of 2025, Google has pulled back slightly on AI Overview coverage. If you are monitoring AI Overview coverage of target keywords for your website, you may have experienced this fluctuation first-han


How To “Rank” In AI Overviews


bar chart of percentage of pages cited in AI Overviews by organic ranking position

ahrefs did a study on what factors, especially rankings, affect whether a page gets cited in AI Overviews. Their findings revealed notable correlation between strong SEO and AI Overview visibility.


Pages ranking in the top 10 showed a moderate positive correlation (0.347) with appearing in AI Overviews citations. An even stronger correlation (0.445) was found between a page’s organic rank and its position within the AI Overview—indicating that higher-ranking pages are not only more likely to be cited, but also tend to be cited nearer at top.


bar chart of factors that correlate with brand appearance in AI Overviews

This aligns closely with my real-life SEO experience: clients with strong SEO are consistently achieving greater AI Overview visibility compared to their competitors.


However, the relationship between organic rankings and AI Overviews is not a direct 1:1 correlation. AI Overviews factor in something beyond traditional SEO, most likely brand and web mentions across a wide range of authoritative sites, which are also strong signals for credibility and context.


I arrive at the same conclusion as in my previous post on LLM visibility. SEO still matters, now more than ever. Both traditional SEO and AI Overviews operate on the same Google platform, and there is a large overlap in the authority signals used. Continue to follow the SEO best practices and E-E-A-T, to future-proof your brand visibility across all of Google's services, from the classic blue hyperlinks to AI Overview's dynamic responses.


bottom of page