Select a ServiceWeb design and developmentSearch Engine OptimizationGoogle ads and Facebook adseCommerce developmentWeb application developmentMobile application developmentShopify websiteWordpress websiteOther
Select Your Budget$1500 - $3000$3000 - $5000$5000 - $10000$10000 - $15000$15000 - $25000$25000 - $50000+
Search engine optimization (SEO) is a digital technique used to improve a website or page’s visibility so that it organically ranks higher in search engine results. It often combines technical configuration, content creation, and link acquisition, with the goal of improving relevance for a searcher’s query and intent. Toronto SEO has continued to grow in popularity and become one of the most popular digital marketing channels.
With custom metrics that expose some new, never-before-seen insights, we have analyzed more than eight million homepages across the web, comparing our findings to those from 2021 and, in some instances, from 2020. Note: Our data, particularly from Lighthouse and the HTTP Archive, is limited to just websites’ homepages, not sitewide crawls.
Read on for more about how search engine-friendly the web is.
Crawling and Indexing are the backbone of what Google and other search engines ultimately display on their search results pages. Without them, ranking simply cannot happen.
The first step in the process is discovering web pages via crawling. While numerous pages are crawled, fewer of them are actually indexed, which is essentially stored and categorized in a search engine’s database. Based on a searcher’s query, matching indexed pages are then served.
This section deals with the state of the web, as it pertains to bots crawling and indexing websites. What directives are sites giving search engines bots? What are sites doing to ensure Google serves the right page and not a near duplicate in search results?
Let’s explore the web and some of the factors that impact crawlability and indexability. Robots.txt (Status Codes)
Robots.txt search engine breakdown
The robots.txt file instructs bots, including search engine crawlers, where they can and cannot go, meaning what they can or cannot crawl.
There has been a nominal increase in the percentage of sites whose robots.txt files return a 200 status code in 2022 compared to 2021. A year ago, 81% and 81.9% of robots.txt files on desktop and mobile sites, respectively, returned a 200 status code. Now in 2002, 81.5% of robots.txt files for desktop sites return a 200 status code while 82.4% of mobile sites return the same.
Concurrently, there was just a small reduction in the percentage of robots.txt files returning a 404 status code in 2022 compared to 2021. Last year, 17.3% of robots.txt files for desktop sites returned a 404 while 16.5% of mobile sites’ robots.txt files returned that status code. In 2022, it’s just 16.5% for desktop and 15.8% for mobile sites’ robots.txt files that are returning a 404 status code.
Much like 2021, the remaining status codes in 2022 have a minimal number of robots.txt files.
Note: The above data does not indicate how well optimized a robots.txt file is. Even a file returning a 200 status code can contain directives that are perhaps not in the best interest of a site’s overall health.
As expected, the overwhelming majority of robots.txt files were quite small, weighing between 0-100 KB.
Google’s max limit for a robots.txt file is 500 KiB. Any directives found after the file reaches that limit are ignored by the search engine. A very small number of robots.txt files fall into that category. Specifically, just .005% of both desktop and mobile sites contain a robots.txt file that is above Google’s max limit (which is consistent with 2021’s data). In cases where the file size exceeds limits, Google recommends consolidating directives.
Most websites today (74.9% of desktop and 76.1% of mobile) do not indicate a specific user-agent within the robots.txt file, meaning the directives in the file apply to all user-agents. This is consistent with the data from 2020 when 74% of desktop robots.txt files and 75.2% of mobile robots.txt files did not specify a particular user-agent.
Interestingly, Bingbot did not make the top 10 most specified user-agents. As for SEO tools, much like in 2021, both Majestic’s and Ahrefs’ bots are in the top 5 most specified user-agents this year, while Semrush’s bot rounds out the top 15 most specific user-agents.
In terms of search engines, Googlebot leads the pack with 3.3% of robots.txt files specifying the user-agent while Bingbot comes in at 2.5%. Interestingly, there was nearly a full percentage point difference in 2021 between mobile site robots.txt files and desktop files specifying Bingbot. Such is not the case in 2022 where the data is essentially uniform.
Of note, Yandexbot was specified in just 0.5% of robots.txt files in 2021. Now in 2022, there has been a six-fold increase, with 3% of files specifying Yandexbot.
Index If Embedded
In January 2022, Google introduced a new robots tag called indexifembedded. The tag offers control over indexation when content is embedded in an iframe on a page, even when a noindex tag has been applied.
Let’s start by determining the percentage of pages for which the new tag is possibly applicable.
Data shows 4.1% of pages contain an <iframe> element. Of those pages, 76% of them had the iframe noindexed, making them a potential use case for the new indexifembedded tag.
However, a miniscule percentage of sites have adopted the indexifembedded robots tag. In fact, the tag can be found on just 0.015% of pages surveyed.
Of the pages that have adopted the indexifembedded tag, 98.3% of them implemented it in the header while 66.3% are using the HTML.
Invalid Head Elements
The <head> element serves as the container for a page’s metadata. From an SEO point of view, a page’s title tag and meta description reside within the <head> element, as do robots meta tags.
Not all elements, however, belong in the <head>. Should Google come across an invalid element in the page’s <head>, it assumes that it has reached the end of the <head> and will not discover the rest of its contents.
Our data from 2022 shows 12.7% of desktop pages and 12.6% of mobile pages contain an invalid element in the <head>.
The most misapplied element to the <head> by far is the <img> element. It is incorrectly placed within the <head> on 9.7% of mobile pages and 9.9% of desktop pages.
The <div> element is the only other misapplied element to appear within the <head> on more than 3% of the pages in our 2022 dataset. It is incorrectly applied to the <head> on 3.5% of desktop pages and 3.9% of mobile pages.
Canonical tag usage
HTML vs HTTP
Raw vs. rendered canonical tags
Canonical tags are traditionally used when defining duplicate content pages and to help search engines prioritize. They are a snippet of HTML code (rel=”canonical”) that allows webmasters to define to the search engine which page is the “preferred” version. They are not directives, and instead act as a “hint.” Therefore, search engines such as Google determine their own canonical version of the page, based on how useful they believe the page is for the user. Canonical tags can also be used to consolidate other signals such as links, as well as to simplify tracking metrics and better manage syndicated content.
We see from the data that canonical tags usage has increased over the years. In 2019, 48.3% of mobile pages used canonicals. In 2020, it went up to 53.6%. In 2021, this grew even further to 58.5%. And in 2022, the percentage of mobile pages using canonicals has increased to 60.6%.
Mobile has a higher percentage of canonical attribution than desktop (60.6% vs. 58.7%), which is likely a direct result of single use URLs on mobile. Since the data set in this chapter is limited to homepages, it’s fair to assume this is the reason for the higher canonical attribution on mobile. According to Google’s guidelines, having a separate mobile site is not recommended.
There are two ways of implementing canonical tags:
The most common usage across both desktop and mobile is through HTML at 58.6% and 60.4%, respectively. This is probably due to the ease of implementation. While one requires basic HTML knowledge, the other method (through HTTP headers) requires a more technical skillset.
Compared to 2021, where raw canonical usage was 57.7% and rendered canonical usage was 58.4%, in 2022 there was some growth, with raw canonical usage reaching 59.4% and rendered canonical usage rising to 60.4%. This correlates with the growth in overall canonical use.
In 2021, there was greater attention paid to site speed and overall page experience following Google’s introduction in 2020 of the Core Web Vitals update [described more fully below]. While HTTPS as a ranking factor dates back to 2014, the increased focus on page experience since the announcement of Core Web Vitals has, in all likelihood, had an impact on the adoption of HTTPS across the web.
We see from the data how more sites are using a secure certificate (HTTPS) at the time of the crawl (taking into account expirations of these certificates). In 2021, 84.3% of desktop pages used HTTPS, and in 2022 it has gone up to 87.71%. On mobile, it’s increased from 81.2% in 2021 to 84.75% in 2022. Since the 2020 announcement of the Core Web Vitals update to the present there’s been an increase of nearly 11% on mobile and 10% on desktop.
Mobile-friendliness can be determined by looking at responsive design implementation versus dynamic serving. To identify this, we looked at the use of the viewport meta tag which is commonly used in responsive design versus the vary: user-agent header to determine if a website is using dynamic serving.
Viewport Meta Tag
*percentage of mobile pages using the viewport meta tag]
We have seen the use of the viewport meta tag grow from 91.1% of mobile pages in 2021 to currently 92%. Back in 2020, it was at 89.2%.
The vary header is a HTTP header that enables different content to be served to different users on different devices. This is known as dynamic serving and is the opposite of responsive design, which serves the exact same content, but to different devices.
Vary header usage has remained relatively unchanged for the past few years. In 2021, 12.6% of desktop and 13.4% of mobile pages used this footprint. In 2022, the data is nearly identical, with 12% for desktop and 13% for mobile.
In 2021, 13.5% of mobile pages were not using a legible font size. Thanks to Google’s focus on user experience across all devices, more pages than ever now use a legible font size. Only 11% of mobile pages are still not using a legible font size.
*percentage of mobile pages not using a legible font size]
Core Web Vitals was a particularly hot topic in SEO throughout 2021 following Google announcing the roll out of its Page Experience update that June. We have seen a continued interest this year, with more sites paying attention to their CWV performance.
Core Web Vitals are a series of standardized metrics that can help developers and SEOs to better understand how a user is experiencing a page. The main metrics are:
All three of these metrics are critical to user experience and the stability of a web page.
The data for Core Web Vitals is sourced from the Chrome User Experience Report (CrUX). The report comes from a public dataset of real (opted-in) users, and is sourced from millions of websites (as opposed to lab data, which is simulated).
On mobile, 39% of sites now pass CWV, which is up from 29% in 2021 and just 20% in 2020. And while 92% of sites currently pass FID, many site owners are struggling with LCP, which has a pass rate of 51%.
On desktop, we see an astounding 100% of sites passing FID, though similarly struggling to pass LCP and CLS. Noteworthy, more sites are passing CWV on desktop (43%) than on mobile (39%).
Lazy loading is a technique that defers the loading of non-critical elements on a web page until the point in which they are needed. This can help with the reduction of page weight, as well as conserve bandwidth and system resources. Eager loading is when related entities are simultaneously loaded and fetched all at once.
When looking solely at iFrames, we see lazy loading is preferred far more than eager loading, with 4.08% of iFrames being lazy loaded versus 0.37% of iFrames being eager loaded.
This is particularly interesting since browser-level lazy loading for iFrames has become standardized in Chrome. The standardization of the <loading> attribute, without specifying lazy or eager, is likely why data shows 94.4% of attributes do not contain lazy or eager.
When looking for relevancy signals, search engines look at the content on a web page. There are various on-page SEO elements that can affect rankings and/or appearance on the SERPs (Search Engine Results Pages).
Meta Data (Has Page Title & Meta Description)
For the second year in a row, 98.8% of desktop and mobile pages had <title> elements. Also in 2022, 71% of desktop and mobile homepages had <meta name=”description”> tags, a 0.1% decrease from last year.
The <title> element is an on-page ranking factor that provides a strong hint regarding page relevance and may appear on the SERP. In August 2021, Google started rewriting more websites’ titles in their search results. A month later, after a tremendous amount of feedback, Google refined how it generates titles..
Meta Data (Page Title Words Average)
Meta Data (Page Title Characters Average)
These stats remain unchanged from last year. Note: Titles on homepages tend to be shorter than those used on deeper pages.
The <meta name=”description> tag does not directly impact rankings. However, it may appear as the page description on the SERP and influence click-through rate.
Meta Data (Meta Description Words Average)
Meta Data (Meta Description Characters Average)
For the most part, these stats are relatively unchanged from last year.
Heading elements (<h1>, <h2>…) are important parts of a page’s structure since they help organize the content on the page. Heading elements are not a direct ranking factor, but they can help Google better understand the content found on the page.
The trends around implementation of headings by type in 2022 closely match those from 2021, with just a few small differences. For example, 71.9% of mobile pages utilized an h2 in 2021 while 73.02% did in 2022.
Another trend that has carried over is the discrepancy in usage between the h1 and h2. While 72.7% of desktop pages implement an h2, only 65.8% use an h1 (with similar numbers reflected on mobile).
Although there is no definitive explanation for this, one possible reason is that the h1 is often placed or used above the content. While an h1 is generally not essential for the natural flow of text, not having an h2 could result in excessively long and unstructured content.
Overall, much like 2021’s stats, there are relatively few empty H elements found on pages. Additionally, there is little discrepancy between the desktop and mobile data.
There is divergence, however, with the h1. While 65.8% of pages contained an h1 element, 58.5% contained a non-empty h1 element. That’s a 7.3 percentage point difference. Contrast that with the h2, which has just a 1.5 percentage point difference. As noted in the 2021 Web Almanac, this may be a result of the many websites that wrap logo-images in the h1 element on homepages.
The primary purpose of the alt attribute on the <img> element is accessibility. Alt attributes also assist search engines rank specific assets in image search.
What we found:
How user agents prioritize the rendering and displaying of images is affected by the loading attribute applied to <img> elements. This implementation can impact user experience and performance time, with possible effects on both SEO success and conversions.
While content length is not a ranking factor, it is still valuable to assess how many words a page contains on average.
Let’s begin with the number of words found on the page once it has been rendered.
The median desktop page in 2022 contains 421 words. This is quite close to the 425 words found in 2021. However, this is still a big leap percentage-wise from what we found back in 2020 when 402 words were found on the median desktop page. Whatever the cause was in 2021 for the uptick in rendered word count, it appears to have remained through 2022.
Similarly, the median number of rendered words on mobile in 2022 contains 366 words, which is also similar percentage-wise to the data in 2021. For context, desktop pages contain more words than mobile pages. The median desktop page contains 15% more words than mobile pages within the 50th percentile. This is significant since Google some years ago adopted a mobile-first index, and content not found on the mobile version of a page runs the risk of not being indexed by the search engine.
Much like the rendered word count, there is a minimal difference between the data in 2022 compared to what was found in 2021. For example, the median desktop page’s raw word count is 363 words in 2022 compared to 369 words in 2021. And the median mobile page’s raw word count is 318 words in 2022, which is slightly less than the 321 words found in 2021.
Here, too, mobile pages contain fewer words than desktop pages. The median mobile page contains a raw word count that is 12.39% less than desktop. As noted above, this is significant because of Google’s mobile-first indexing.
Implementing Structured Data has gained greater attention as rich results on the Google SERP have become more prominent.
The implementation of structured data in the HTML of a page has continually increased. In 2021, 41.8% of desktop pages and 42.5% of mobile pages used structured data. In 2022, it’s risen to 44% of desktop pages and 45.1% of mobile pages that have structured data within their HTML.
This reflects a 5.3% and 6% increase on desktop and mobile pages, respectively. Two possible explanations for greater adoption could be that a number of Content Management Systems have added automatic structured data markup to their pages, as well as the aforementioned prominence that structured data has played in Google SERPs.
Structured data can be implemented through various ways on a given page. However, JSON-LD, which aligns with Google’s own recommendation for implementation, is by far the most popular format.
Compared to 2021’s figures, 2022’s data shows a nominal increase in implementation via JSON-LD and a slight decrease when implementing structured data with microdata. These numbers bear out in particular on mobile. In 2021, 60.5% of mobile pages used JSON-LD to implement structured data. The number of mobile pages in 2022 using JSON-LD for adding structured data is up 2.3% to 61.9%. Conversely, 36.9% of mobile pages in 2021 utilized structured data with microdata. That number fell 4.3% in 2022 to 35.3%.
There is strong correlation between the most popular types of schema found on homepages in 2021 and 2022.
As noted in previous editions of the Web Almanac, WebSite, SearchAction, WebPage, SearchAction is what powers the Sitelinks Search Box [see chart above].
When comparing 2021 to 2022, there has been a significant increase in the adoption of the most popular schemas across the board. In fact, every noted schema type has experienced an increase in adoption in 2022. Among the most notable are the schema for BreadcrumbsList, which has risen 22.8% since 2021 and ImageObject, which is up 12.3%.
In terms of implementing the most popular schemas, there are relatively tiny differences in percentages between desktop ad mobile pages.
You can read more about structured data in our dedicated chapter.
Search engines use links to discover new pages and to pass PageRank, which helps determine the importance of pages. Links also act as a reference from one page to another (presumably relevant) page.
Anchor text, which is the clickable text used in a link, helps search engines to understand the content of the linked page. Lighthouse has a test to check whether the hyperinked anchor text is useful and/or contextual, or if it’s generic and/or non-descriptive, such as “learn more” or “click here.” In 2022, 15% and 17% of the tested links on mobile and desktop, respectively, did not have descriptive anchor text, which is a missed opportunity from an SEO perspective and bad for accessibility.
Internal links are links to other pages on the same website. Much like last year, 2022’s figures suggest pages have fewer links on their mobile versions compared to their desktop counterparts.
The median number of internal links is now 16% higher on desktop than mobile at 56% and 48%, respectively. It’s likely a result of developers minimizing the navigation menus and footers on mobile for ease of use on smaller screens.
According to CrUX data, the 1,000 most popular websites have more outgoing internal links than less popular sites, a total of 137 links on desktop versus 106 on mobile. That’s more than two times higher than the median. This may be attributed to the use of mega-menus on larger sites that generally have more pages.
External links are links to pages on a different website. The data, which has been consistent for the past few years, points to there being fewer external links on the mobile versions of pages compared to the desktop versions. Despite Google rolling out mobile-first indexing a few years ago, websites have not brought their mobile versions to parity with their desktop counterparts.
Also See – What is Internal Linking – What is Link Building
In September of 2019, Google introduced attributes that allow publishers to classify links as being sponsored or user-generated content. These attributes are in addition to rel=nofollow, which was previously introduced in 2005. The newer attributes, rel=ugc and rel=sponsored, add additional information to the links.
Not much has changed in terms of the adoption of the newer attributes, with rel=ugc appearing on 0.4% of desktop and mobile pages, and rel=sponsored appearing on 0.5% of desktop and 0.4% of mobile pages in 2022.
rel=”dofollow” once again appeared on more pages than rel=”ugc” and rel=”sponsored”. While this is technically not a problem, Google ignores rel=”follow” and rel=”dofollow” because, despite their inclusion, they are not actually official attributes.
rel=”nofollow”, which is a real attribute, was found in 2022 on 29.5% of mobile pages, which is 1.2% less than last year. Google treats nofollow as a “hint,” meaning the search engine can choose whether or not they respect the attribute.
AMP has been a controversial topic since its launch in 2015, with Toronto SEO consultants debating whether or not it had a direct impact on rankings. Google later released this statement (below) in its documentation for additional clarification:
“While AMP itself isn’t a ranking factor, speed is a ranking factor for Google Search. Google Search applies the same standard to all pages, regardless of the technology used to build the page.”
Google Search Central
The future of AMP appears to be changing ever since the launch of Core Web Vitals. A main reason for previously implementing AMP, aside from improving page speed, was that it was necessary for inclusion in Top Carousels. In 2021, Google updated its requirements and outlined that any page (AMP or non-AMP) is now eligible to appear in Top Carousels.
Desktop usage has dipped from 0.09% in 2021 to 0.07% in 2022 while mobile usage is down from 0.22% to 0.19% over the same time period.
Hreflang tags help Google and other search engines understand what the main language is on a given page. It is primarily used in international SEO campaigns when several different languages are used across different versions of a website.
Currently, 9.6% of sites use hreflang tags on desktop while 8.9% use them on mobile. This is a slight increase from 2021 when 9.0% of sites used hreflangs tags on desktop and 8.4% implemented them on mobile.
The most popular hreflang tag in 2022 is “en” [English], which accounts for 5.4% usage on desktop and 4.7% on mobile. Those percentages are approximately the same as the year before.
After x-default, which is the “fallback” version (and the second most common to be adopted), the hreflang tags for French, German and Spanish are the next most frequently used.
The three different ways to implement hreflang tags are via the <head>, link headers, or XML sitemaps. Note: Since this data looks solely at homepages, XML sitemaps are not included.
While Google tends to use hreflang tags, other search engines such as Bing prefer the content-language attribute. This can be implemented using two methods:
[Content language usage graph – same as last year’s graph]
HTTP server response is the most popular implementation method of content-language in 2022, with 8.27% of mobile sites using this and 8.82% of desktop sites. However, this has seen a decline in adoption on mobile compared to 2021 when 9.3% of mobile sites used it. Conversely, desktop has seen a slight increase compared to 2021 when 8.7% of sites used it.
HTML, on the other hand, has 2.98% adoption on desktop in 2022 and 3.01% adoption on mobile. Here, there’s also a decline in mobile usage compared to 2021 when 3.3% of mobile sites used the HTML tag.
Content Language (HTTP header & HTML)
Much like patterns in our data from 2019, 2020, and 2021, the majority of sites analyzed are showing small, yet consistent, improvement when it comes to various fundamentals of SEO, including having indexable and crawlable pages.
The data also points to an increased focus on performance elements such as Core Web Vitals, with 39% of sites currently having passing scores compared to just 20% in 2020 when the update was first announced. This seems to indicate that sites are now paying more attention to Google’s guidance. Still, more work needs to be done across the web.
For instance, new elements that have been introduced, such as the indexifembedded tag, have been slow to gain adoption. This underscores both the need for continually implementing best practices, as well as the opportunity for how much growth there still is in SEO, search engine friendliness, and the state of the web in general.
Voltstudio’s award-winning team of web designers and web developers create SEO friendly websites that capture your brand and improve your conversion rates.
Being a top rated Local SEO Agency Toronto, Volt studio will help you get more leads and sales. We’ll also make sure your website is optimized for search engines so that people find you when they search for products or services like yours.
Our team works with clients to determine their goals, business needs and target audience. Then we use our expertise in Local Seo to craft a creative solution that achieves those goals while staying within budget and meeting deadlines.