top of page

Posted: Sept. 29, 2021

Updated: Feb. 22, 2024


Technical SEO is the foundation your website is built on. It ensures search engines can find, crawl, and understand your content. The below guide will walk you through the essential checks for a complete technical SEO audit.


1. XML Sitemap

XML sitemap is a file that helps search engines to understand your website structure and crawl it. It includes a list of all pages on your site, the pages prioritization, when they were last modified, and how frequently they are updated. Usually, the pages will be categorized by topic, post, product, etc.


You'll probably check the sitemap at the beginning of a technical audit. Find the sitemap of any page by typing /sitemapl.xml after the URL, for example,

https://datachai.com/sitemap.xml. If the site has multiple sitemaps, use /sitemap_index.xml


Register your sitemap with Google Search Console, which includes several tools to check technical SEO metrics such as mobile optimization and page speed. The Google Search Console XML Sitemap Report will give you the technical insight to achieve 1:1 ratio of URLs added to the site and updating the sitemap.


Ideally, your site has an internal linking structure that connects all pages efficiently, so you don’t need a sitemap. It’s actually optional. But it’s best practice for large sites to have a sitemap.


If your site has a sitemap, it's best practice to include all 200 OK URLs, and have a 1:1 ratio of exact URLs in the sitemap as there are on the site. 4xx and 5xx URLs, orphaned pages, and parameters should be removed.


2. Server Response Code and Redirects

Bulk check source codes with this Google Apps Script:

// Get http server response status code
function getStatusCode(url){
  var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
  var statusCode ;
  try {
  statusCode = UrlFetchApp .fetch(url) .getResponseCode() .toString() ;
  }
  
  catch( error ) {
  statusCode = error .toString() .match( / returned code (\d\d\d)\./ )[1] ;
  }

  finally {
  return statusCode ;
  }
}

// Exceed importxml limit
function importRegex(url, regexInput) { 
  var output = ''; 
  var fetchedUrl = UrlFetchApp.fetch(url, {muteHttpExceptions: true}); 
  if (fetchedUrl) { 
    var html = fetchedUrl.getContentText(); 
    if (html.length && regexInput.length) { 
      output = html.match(new RegExp(regexInput, 'i'))[1]; 
    } 
  } 
  Utilities.sleep(1000); 
  return unescapeHTML(output); 
} 

Then, use RegEx redirects if you need to bulk redirect multiple source URLs to the same destination, or .htaccess file for smaller scale redirects. However, if your site is hosted on WordPress, be careful about using .htaccess because it will be depreciated in php 7.4 and subsequent versions. WP Engine suggests alternatives such as using RegEx directly on WordPress or managing redirects in Yoast SEO Premium. Finally, if you are completely removing a page, orphan it then use a 410 so that Google can remove it more quickly.


3. JavaScript SEO

JavaScript is increasingly used across the web these days. Sites that have been developed with JavaScript frameworks such as React, Angular, or Vue.js have unique SEO challenges that are different from sites that have been built with CMS such as WordPress. For example, if you experience problems with getting Google to crawl your site, you can troubleshoot in Google Search Console.



4. Page Speed

On the topic of page speed, other best practices include using a fast DNS provider and minimizing http requests by keeping the CSS style sheet, scripts, and plugins to a minimum. You can also compress web pages by reducing image file size and cleaning up the code, especially for content in the first view. PageSpeed Insights is a free tool to check your page speed, which also provides specific recommendations on how to improve it.


5. robots.txt

The robots.txt file, also called the robots exclusion protocol or standard, is a text file that tells search engines which pages to crawl or not crawl. You can see the robots.txt file for any website by adding /robots.txt to the end of the domain. For example, https://www.datachai.com/robots.txt


Search engines look at robots.txt first before crawling a site, so a disallowed page will be completely excluded.


6. Crawl Budget

Crawl budget is the number of pages Google crawls and indexes on a website within a given timeframe. If your pages exceed your site's crawl budget, Googlebot will not index the balance, which can negatively affect your rankings.


Performing regular log file analysis can provide insights about how Googlebot (and other web crawlers and users) are crawling your website, giving you the necessary information to optimize the crawl budget. If your site is large and has crawl budget issues, you can adjust the crawl rate via search console.


7. SSL (Secure Sockets Layer)

SSL (secure sockets layer) is a security technology that creates an encrypted link between a web server and browser. It’s clear if a website is using SSL because the URL will start with https (hypertext transfer protocol), not http. In 2014, Google announced that they want to see https everywhere, and websites using SSL will get priority for SEO. Google Chrome now displays warnings anytime a user visits a site that does not use SSL.


These days, most top website builders such as Wix include SSL by default. If not, simply install an SSL certificate on your website.


8. Canonical Link Element

Even if you don't have multiple parameter-based URLs of each page, different versions of your pages using https, http, www. .html, etc. can quickly add up. That's where the rel=canonical tag comes in - allowing you to manage duplicate content by specifying the canonical or preferred version of your page. This functions to report duplicate content and tell Google to consolidate the ranking signals, so your page won't be disadvantaged.


If you are using a CMS like Wix or Squarespace, your web hosting service might automatically add canonical tags with the clean URL. For example, my homepage already has one as well.

<link rel="canonical" href="https://www.datachai.com"/>

9. Schema

Schema, also called structured data markup, enhances search results through the addition of rich snippets. For example, you can add star ratings or prices for your products. Adding schema by itself is not a technical SEO factor, but it is recommended by Google and can indirectly help improve rankings and increase page views.


Add Schema via WordPress Plugin

If your website is hosted on WordPress, you can use the Schema plugin to add structured markup to your pages. This plugin uses JSON-LD, which is recommended by Google and also supported by Bing.


Add Schema Manually

If your site isn't hosted on WordPress site or you prefer not to rely on a plugin, you can manually add schema with a few more steps. Schema is usually added to the page header, although it's possible to add it to the body or footer as well. Some recent WordPress themes include specific text blogs to add schema to the body.


Note that adding schema through Google Tag Manager is not recommended. If you use Google Tag Manager, the structured data will be hidden within a container, making it difficult for Google's algorithms to read and give it appropriate weight.


To add schema manually, first, use a tool like MERKLE to generate the baseline markup. Although this is already fine to use as is, you can paste the baseline markup in a text editor and continue to edit and customize the structured data for each page.


If you are using Sublime Text, go to View > Syntax > JavaScript > JSON to set your syntax appropriately.


Finally, insert additional properties that were not available on MERKLE as needed.


Add html Strings to Schema

You can add certain basic html strings to your schema markup, for example, if you'd like to include a bulleted list or hyperlink. An important thing to remember here is to escape double quotes when writing html, and simply replace them with single quotes.


Google Search displays the following HTML tags; all other tags are ignored: <h1> through <h6>, <br>, <ol>, <ul>, <li>, <a>, <p>, <div>, <b>, <strong>, <i>, and <em>


Validate Schema

Use Google's Rich Results Testing Tool to make sure that your schema markup is being read properly, and Structured Data Testing Tool to actually see all of the structured data on the page. The final step will be to request indexing for the page that you added markup, via Google Search Console. Within a few days, you should see your markup under the enhancements sidebar.


Note that in June 2021, Google has limited FAQ rich results to a maximum of 2 per snippet, so your snippet real estate may be a bit smaller. If you have 3 or more FAQs marked up, Google will show the 2 that are most relevant to the search query.


10. Log File Analysis

The log file is your website’s record of every request made to your server. It includes important information such as: the URL of the requested page, http status code, IP address of the request server, timestamp, user agent making the request, request method (GET/POST), client IP, and referrer.


By performing log file analysis, you can gain insights about how Googlebot (and other web crawlers and users) are crawling your website. The log file analysis will help you answer important technical questions such as:

  • How frequently is Googlebot crawling your site?

  • How is the crawl budget being allocated?

  • How often are new and updated pages being crawled?

.

You can identify where the crawl budget is being used inefficiently, such as unnecessarily crawling static or irrelevant pages, and make improvements accordingly.


Obtain the Log File

The log file is stored on your web browser, and you can access it via your server control panels’ file manager, command line, or using an FTP client (recommended).


The server log file is commonly found in the following locations.

  • Apache: /var/log/access_log

  • Nginx: logs/access.log

  • IIS: %SystemDrive%\inetpub\logs\LogFiles

.

Tools and Software

You can convert your .log file to a .csv and analyze it in Microsoft Excel or Google Sheets, or use an online log file analyzer such as SEMRush or Screaming Frog Log File Analyser. The best log file analyzer will depend on your website and what tools you might already be using for technical SEO.


Limitations

Performing regular log file analysis can be extremely useful for technical SEO, but there are some limitations. Page access and actions that occur via cache memory, proxy servers, and AJAX will not be reflected in the log file. If multiple users access your website with the same IP address, it will be counted as only one user. On the other hand, if one user uses dynamic IP assignment, the log file will show multiple accesses, overestimating the traffic count.


11. Core Web Vitals

In May 2020, Google introduced core web vitals as the newest way to measure user experience on a webpage. About one year later, as of June 2021, Google announced an algorithm update to use core web vitals as ranking factors.


There are three core web vitals:

  • Largest Contentful Paint (LCP)

  • First Input Delay (FID)

  • Cumulative Layout Shift (CLS)


The idea is that core web vitals point to a set of user-facing metrics related to page speed, responsiveness, and stability, which should help SEOs and web developers improve overall user experience. Let’s see each of the core web vitals in more depth.


Largest Contentful Paint (LCP)

Largest contentful paint (LCP) is the time from when the page begins loading, to when the largest text block or image element is rendered. The idea is to measure perceived pagespeed by estimating when the page’s main contents have finished loading.


Of course, lower (faster) scores are better. In general, LCP <2.5s is considered to be good, and >4s should be improved.


LCP is one of the more difficult core web vitals to troubleshoot because there are many factors that could cause slow load speed. Some common causes are slow server response time, render-blocking JavaScript or CSS, or the largest content resource being too heavy.

Note that if the largest text block or image element changes while the page is loading, then the most recent one is used to measure LCP. Also, if it's difficult to pass core web vitals in your industry (i.e. most corporate sites have graphics heavy pages), keep in mind that your pages are compared against your close competitors.


First Input Delay (FID)

First input delay (FID) is the time from when a user first interacts with your site, to when the browser can respond. Only single interactions count for FID, such as clicking on a link or tapping a key. Continuous interactions that have different performance constraints are excluded, such as scrolling or zooming.


FID of <100ms is generally considered good, while >300ms should be improved. If your FID score is high, it could be because the browser’s main thread is overloaded with JavaScript code. You can reduce FID by improving issues such as JavaScript execution time and third-party cookies impact.


Cumulative Layout Shift (CLS)

Cumulative layout shift (CLS) is about visual stability on a webpage. Instead of a time measurement, CLS is measured by a layout shift score, which is a cumulative score of all unexpected layout shifts within the viewport that occur during a page’s lifecycle. The layout shift score is the product of impact fraction and distance fraction. Impact fraction is the area of the viewport that the unstable element takes up, and distance fraction is the greatest distance that the unstable element moves between both frames, divided by the viewport’s largest dimension.


<0.1 is considered good, and >.25 is generally considered a poor score. Common causes for poor CLS include images or ads with undefined dimensions, resources loaded asynchronously, or DOM elements dynamically added to a page above existing content. The best practice is to always include size attributes for your images and videos.


How to Measure Core Web Vitals

Core web vitals are incorporated into many Google tools that you probably already use, such as Search Console, Lighthouse, and PageSpeed Insights. In addition, a new Chrome extension called Web Vitals is now available to measure the core web vitals in real time.


User experience and pagespeed often depend on the user’s connection environment and settings as well. Every time a page is loaded, LCP, FID, and CLS will be slightly different. Your site has a pool of users that make up a distribution - some people see the pages fast, others see it slower.

For the purpose of core web vitals, Google measures what the 75th percentile of users see. This and other concepts are discussed in a recent episode of Search Off the Record.


Troubleshooting

Google offers several solutions if you suspect a bug, such as if your core web vitals numbers are poor but your site has been tested and proven to be performing well. You can join the web-vitals-feedback Google group, and email list to provide feedback to Google. Google will consider the feedback when modifying the metrics going forward.


If you're looking for individual support, you can file a bug with the core web vitals template on crbug.com. You'll need to do some technical work - for example, write a little JavaScript that has a performance observer that shows the issue.


Note that the core web vitals explained above are included in Google’s page experience signal. Of course, core web vitals are not the only user experience metrics to focus on. All other web vitals such as total blocking time (TBT), first contentful paint (FCP), speed index (SI), and time to interactive (TII) are non-core web vitals. As Google continuously improves its understanding of user experience, it will update the web vitals regularly. As of November 2021, Google is already preparing two new vitals metrics -- smoothness and overall responsiveness.


12. Mobile UX

Finally, Google has been giving higher ranking to websites that have a responsive or mobile site since April 2015. At the same time, they released the mobile-friendly testing tool to help SEOs ensure that they would not lose rankings after this algorithm update. Also look into AMP (Accelerated Mobile Pages, an open source framework that aims to speed up the delivery of mobile pages via AMP’s html code.

This post will explain how to schedule a python script to run daily using Windows Task Scheduler or Cronjobs, allowing you to automate tasks using python on both Windows and Mac.


Windows Task Scheduler

  1. Open the Windows Task Scheduler GUI

  2. Actions > Create Task

  3. In the General tab, give a name for your scheduled task. If you change the Security options from Run only when user is logged on to Run whether user is logged on or not, the script will also run when the computer is sleeping. If the computer is powered off, the script will not run and will not catch up on missed executions if it is later powered on.

  4. In the Actions tab, click New...

  5. Action: Start a program

  6. Program/script: the location of python executable on your computer, for ex. C:\Users\81701\AppData\Local\Microsoft\WindowsApps\python.exe. To get the location, press Win + R to open the Run dialog, type cmd to open command prompt, and then type where python.

  7. Add arguments: the name of your python file, for ex. yourfile.py

  8. Start in (optional): the file path to your python file, for ex. C:\Users\81701\python\yourfile.py

Lastly, trigger your script execution by navigating to the Triggers tab and clicking New...

Cronjobs on Mac Crontab

  1. Open a terminal window. 

  2. Type the following command to edit the crontab file: crontab -e. This will open the crontab file in the default text editor, usually vi or vim. 

  3. Press the i key to enter Insert mode. 

  4. If there are a bunch of ~ characters, just delete them. They represent empty lines or lines that contain no visible characters. 

  5. Type in the cron job, for example 0 10 MON /path/to/python /path/to/script.py. Some cloud storage services, like OneDrive, might have restrictions on executing files directly from their synced folders. In such cases, move the file to a local directory before scheduling it. To get the file path, right click on the python file in Finder. Press the Option key, then choose 'Copy [filename] as Pathname.' On Mac, the path to python interpreter is usually /usr/bin/python3 

  6. Press the Esc key to exit Insert mode. 

  7. Type :wq to save and exit 

  8. : character puts vi in command mode 

  9. w command is for writing (saving) the file 

  10. q command is for quitting vi 

 

Debugging Crontab Jobs

Q: How to check that the cronjob was created successfully? 

A: To check the crontab entries that you have created, open Terminal again and use the crontab -l command. 

 

Q: What if the computer is powered off at the scheduled time?  

A: If the computer is sleeping or powered off, the cron daemon will not be able to execute the job. Cron jobs will not catch up on missed executions if you later log in. If you need to run a task even when the computer is powered off, use a cloud-based data platform such as Databricks 

 

Q: What if the script encounters error?  

A: There will be a notification "You have mail" in terminal. 

  1. Command mail to see the details 

  2. Command t to read the entire message. 

  3. Command delete * to delete all terminal mails 

  4. Command q to quit mail 

 

Q: How to delete an existing cronjob?  

A:

  1. Open a terminal window. 

  2. Type the following command to edit the crontab file: crontab -e 

  3. Delete the line that contains the cronjob entry. In vi or vim, navigate to the line and press dd to delete it. 

  4. Type :wq to save and exit 


Q: Other ways to schedule a python script to run daily?

A:


Posted: July 16, 2021

Updated: Feb. 18, 2024


While you technically can hard code analytics and conversion tracking tags on your site, it's generally better to use Google Tag Manager (GTM) or an alternative such as Tealium or Mixpanel. There is no direct SEO advantage of using GTM, but depending on the number of tags you have, GTM may provide an indirect boost by improving page speed.


Install Google Tag Manager

These days, most websites and apps include multiple marketing and analytics codes such as Google Analytics, Google Ads, Mixpanel, and Facebook Pixel, to name a few. GTM works well even with non-Google products. Now, you can clean up all of those separate third-party tracking codes, and just copy and paste one instead - the GTM container code.


Google previously recommended placing the container code immediately before the opening body tag, but now is a little different. GTM has split the container, so the first part should be placed in the head, and the second part in the body.


GTM provides you with the exact code to copy and paste to your website. You can find the installation code along the top navigation under Admin → Container Settings → Install Google Tag Manager. For sites hosted on WordPress, use the Insert Headers and Footers plugin or insert the code directly from wp-admin/theme-editor.php, or Appearance > Theme File Editor.


The first part is a script tag, which includes a JavaScript function that loads your container to the page. It creates a new script tag, and sets the source to the URL to your container. This should be placed immediately inside of the head to optimize tracking. The higher up in the page the snippet is, the faster it is loaded.


The second part is a no script tag, and it's actually optional. This is just a backup tag that allows you to track users without JavaScript, so it's not important in most cases. It's telling the browser: if the user does not have JavaScript enabled, then render an iframe version of the container to the page.


DataLayer is used to feed data to GTM such as clicks, form submissions, purchases, user ID, login method, etc. The best practice is to add dataLayer above the container. Otherwise, it can cause issues such as the container code overwriting dataLayer.


Form Submission Tracking

Here are several methods for form submission tracking with GTM, in order of easiest to most complicated. Since the feasibility of these methods will depend on how your site is coded, I created this flowchart on Canva that shows the relevant tracking methods for each situation. Detailed explanation follows.


ree

First, if your users are redirected to a confirmation page after the form submission, you can simply track page views for that page. Otherwise, you'll need to dig deeper in the page's source code to find some variable that GTM can recognize. At this point, if you are tracking Google Ads conversions, remember to add All Elements in addition to All Pages as a second trigger for your Conversion Linker tag.


Otherwise, if the form displays a unique confirmation message upon submission (ex. "Thank you for contacting us!"), you can use the Element Visibility trigger type in GTM.


Finally, if the element has an ID, it's easier to use Selection Method: ID. Otherwise, use CSS selector, which is explained later in this post. Also in the trigger configuration settings, regardless of which selection method you choose, check Observe DOM changes and change Minimum Percent Visible to 1 percent.


Button Click Tracking

If you form has neither a confirmation page or message, you'll need to track button clicks in GTM. This time, we'll be using the Click - All Elements trigger type. There are several possible variables that can be used for this method, but Click ID or Click Classes is probably the most common. Generally, Click ID might be more reliable, but it depends on your dev environment.


To see which variables are available for you, use preview & debug mode in GTM or check dataLayer in the Console. Note that Click Element cannot be used for this method, but it is explained in the following section on CSS selectors.


CSS Selectors

If none of the above methods so far have not worked, it's time to use CSS selector. You'll need to use CSS selector for some of the following situations.

  1. You are using the Element Visibility trigger and Selection Method: ID is not available.

  2. There are no unique variables available for tracking button clicks using Click - All Elements trigger type.

  3. The button is made up of multiple elements.

We've already seen situations 1 and 2, so I will just elaborate about 3, using the same example of the contact form on my website. Here is the source code for the "Send" button.

<div id="comp-k0kz041v" aria-disabled="false" class="_2UgQw">
	<button aria-disabled="false" data-testid="buttonElement" class="_1fbEI">
		<span class="_1Qjd7">SEND</span>
	</button>
</div>

The background and text elements are separated. However in GTM, we want to treat them the same, because we want to track form submissions regardless of whether the user clicked precisely on the "Send" text, or anywhere else within the button.


This is possible with CSS selectors, which allow you to write complex conditions for selecting certain elements on a website. To implement this method, start by creating a Click - All Elements trigger, but this time choose Click Element matches CSS selector for the variable. For this example, either of the following the CSS selectors should work.

 #comp-k0kz041v, #comp-k0kz041v *
 ._2UgQw, _2UgQw *

Deciphering the above CSS selectors:

  • # indicates id; . indicates class

  • Comma indicates OR condition

  • * is wildcard

The first CSS selector in plain English means, element with ID comp-k0kz041v or any child (or child of child, etc.) of that element. The second one means, element with class _2UgQw, or any child of that element. Note that spaces will break your CSS selector, so you can simply replace any spaces within the ID or class with a period.


This is a basic setup example, but here is an expanded list from W3schools of CSS selectors for reference: CSS Selector Reference


Trigger Groups

By default, if you add multiple triggers to a single tag on GTM, they are treated as an OR function. The tag will fire as soon as any of the triggers has its condition met. To treat multiple triggers as an AND function, use Trigger Groups, which were introduced on GTM recently in March 2019. Trigger Groups will be useful if you have multiple forms with the same confirmation message, or multiple buttons with the same class.


Debugging

There's several ways to check if your GTM tags are set up and firing correctly. First, check your website's source code and make sure that the GTM script is implemented by searching for gtm.js. If that wasn't the problem, you can check the other solutions below.


Preview & Debug Mode

As the name suggests, preview & debug (P&D) mode in GTM allows you to preview your site and check which tags are firing, and what data is sent to third-party platforms. Please note that P&D was updated in October 2020 to shift from 3rd party cookies to first party storage, so if you are following a guide or video that was published before that, it will look different.


As a freelancer or agency-side marketer, you might not have access to the client's GTM account. Even as an in-house specialist, there might be a different department that controls GTM. However, there are several methods to verify GTM even if you can't access the account.


Tag Assistant Legacy

Tag Assistant Legacy is a Google Chrome extension by Google for GTM, an extremely useful tool for testing and debugging. You can use it to troubleshoot GTM, as well as Google Ads, Analytics, and DoubleClick implementation.


Test Conversions

You can also simulate a test conversion - the way to do this depends on your specific conversion event. For example, if the conversion event that you want to test is pageviews on GA, just open the page in another tab and check GA. If you aren't sure because there are other pageviews, you can add a test utm such as:

/?utm_source=test&utm_medium=test&utm_campaign=test

In some PPC platforms like Facebook, you can just visit the confirmation page to manually simulate a conversion, and confirm it in the conversion events page. For Google Ads conversions, add the following parameter to the end of your URL to simulate a click.

?gclid=test&wc_clear=true

Custom Code

Setting up tracking between Google services like Analytics and Ads is pretty straightforward, and the built-in variables should be enough to get the job done. But if you're using GTM to connect your website to third-party platforms like Pardot or HubSpot, you'll need to add custom html or JavaScript via new user-defined variables. Note that custom JavaScript must be nested in an anonymous function with return value.

bottom of page