When is a web page considered SPAM?

Index

Introduction
Three types of SPAM techniques
Web SPAM Taxonomy
What are the signs that make a page SPAM in the eyes of Google?
Conclusion

Introduction

Dear Matt Cutts in an old SPAM video

Web spam refers to some techniques that attempt to manipulate search engine positioning algorithms in order to increase the position of some pages in the results in the SERP.

At best, spammers encourage users to visit their websites for undeserved advertising revenue. In the worst case, they use malicious content on their pages and try to install malware on the victim’s computer.

3 macro types of SPAM techniques

Spammers generally use three types of techniques to get a higher ranking in search engine results. These techniques are classified into:

Link-based SPAM techniques.
- URL spamming or “Spamdexing”:Spamdex is the deliberate manipulation of search engine indexes. It involves a variety of methods, such as linking and repeating unrelated phrases, to manipulate the relevance or relevance of indexed resources in ways that are incompatible with the purpose of the indexing system.
  - For example, some porn sites in the past have indexed their pages for words that had nothing to do with porn, thus receiving additional traffic.
- Link spamming:Another well-known system that also includes spamming of text anchors . Search engines consider not only the number of spam links but also the anchor text as this is one of the most important signals from a ranking standpoint . This type obviously also includes the insertion of low quality links on pages to increase the value of the linked pages (forum, comments, guestbook, etc.) and obviously the most nefarious hack and drop techniques.
SPAM concealment techniques. For example, when generally non-obvious and visible methods are used to improve the ranking of a web page. These aspects are certainly more problematic and search engines tend to treat them more carefully.
- Hiding Content: These are outdatedtechniques in which keywords and links are hidden when the browser renders a page. The most common approaches use color schemes that make the elements in question invisible to less experienced users.
- Cloaking:Cloaking is when you identify a search engine crawler and try to show the spider a different version of the page than the one provided to the end user.
- Redirect:The page is automatically redirected when the visitor is a user but not for search engine spiders.
Content-based SPAM techniques
- Term spamming:Those who try to manipulate elements such as the page title tag, meta description or meta keywords. As most of us know, two out of three of these tags have been overused to the point that most modern search engines don’t use them as ranking signals at all .

Web SPAM Taxonomy

For Google it is important to show only quality results so as not to lose the trust of its users (who also click on advertisements). For this reason, to combat the growing spam problem, Google has created a dedicated team. The head (and spokesperson) of this team was a gentleman named Matt Cutts (now no longer a Google employee).

Existing spam pages cause distrust of search engine results. This not only wastes visitors time, but also wastes a lot of search engine resources. Therefore, spam detection methods have been proposed as a solution in order to reduce the negative effects of these pages, some of these techniques are working well.

The Web Spam Taxonomy document published in 2004 by Zoltán Gyöngyi and Hector Garcia-Molina for Stanford University, classifies web spam techniques and their detection methods. Let’s see a couple of excerpts from the original text:

… All types of actions intended to boost ranking, without improving the true value of a page, are considered spamming…

… Any deliberate human action that is meant to trigger an unjustifiably favorable relevance or importance for some web page, considering the page’s true value…

Web Spam Taxonomy – March 14, 2004

I think we can all agree: spam sucks! Anyone who uses the web knows how frustrating it is to land on a page that looks promising in search results but ends up being useless when you visit it.

That said, there are a lot of web pages that aren’t spam but that look like spam. Professionals notice these things right away, eventually we spend our lives browsing websites and after a while the eye becomes expert. Regular users may not notice the same things that Google (or an SEO) considers. In this short guide I try to summarize which factors to take care of to prevent your web pages from being considered spam (or low quality).

Are you an SEO? Then I’m sure you know exactly what the Google Webmaster Guidelines say about spam. If you’re just playing SEO, here’s the quick answer: Don’t try to fool search engines with hidden text, hidden links, keyword overuse, duplicate content, or doorway pages.

As you read this article, ask yourself, “Have I or anyone working on my website ever used these practices?”

What are the signs that make a page SPAM in the eyes of Google?

Hidden text or links

Although it is a practice as old as the cuckoo , there are still some who hide text and links in HTML pages. It goes without saying how trivial, useless and dangerous this practice is.

Do you want to rip off Google? Know that there are millions of spammers who have been trying every day since Google was born. Trust me, you won’t have an easy time.

Sneaky redirects

Meta redirects, unexpected Javascript redirects, redirects between domains, are not things that Google looks favorably on. Moving the user where you want (and not where he expected to land) is incorrect.

Excess of keywords

Entering keywords following a given ” keyword density ” does not make any sense, Google does not consider the frequency of a keyword to rank your results. Indeed, exceeding with a term makes the text unnatural and difficult to read , Google notices it and consequently your page will not get good organic results.

This practice is also very old and was used with much less advanced search engines than Google is today.

Abusing PPC Ads

Trying to monetize a site is not wrong, but exceeding the “reasonable” limit, yes, it is wrong. A site with too many banners is counterproductive:

It makes it difficult for the user to enjoy the content
Makes the site confusing
Makes the site slow to load
The page loses trust in the eyes of the reader

In my career I have seen many websites lose traffic just due to an excess of ads on the pages. Removed the ads, the site resumed the original traffic. The secret is to find the right trade-off between revenue and traffic.

Content copied

Copying content from other sites has never been a forward-looking strategy.

The copied content does not rank and sometimes it is not even indexed. Why should Google spend time and resources indexing information it already has? If your website content is copied, Google will label the website as low quality and hardly get organic traffic.

Feed with PPC ads

Beware of filling your RSS feed file with too many ads, it’s not a very good signal to send to Google.

Doorway pages

What is a doorway page? Doorway pages are defined as low-quality pages (or groups of pages or entire websites) that are optimized to rank for specific keywords that act as a doorway between users and content. Typically, they offer little value to visitors and serve the sole purpose of ranking on individual keywords.

You know the websites that have many pages of this type:

Milan websites
Monza websites
Varese websites
Bergamo websites
… all the same pages where only the city name changes …

… And maybe the agency is in Rome! Do you understand that they are trying to rip off Google and the users? If I look for “Pavia websites” it is clear that I am looking for an agency or a web master in Pavia! Well, these caxxate is better to avoid them, also for a matter of seriousness and reliability .

Automatically generated pages

In the past, I and many other colleagues have done tests and experiments trying to position websites with totally automatically generated content.

Do you know what happened?

Anything.

https://it.sistrix.com/vino-online.it

Maybe for a few days the websites receive traffic, maybe they also receive important volumes, but as soon as the Google algorithms finish their analysis , the organic traffic to the website collapses .

Time is money? So avoid wasting time on these activities, they have been trying for years without great success and in the future it will be more and more difficult.

Fake search pages with PPC ads

There are and have been many portals with automatically generated pages, for example by indexing user searches and inserting tons of ads in the results. Pages with no content or quality are worthless in the eyes of Google and lower the overall quality of the website. Google is getting better and better at finding these tricks.

Fake blogs with PP ads

Sometimes I still see blogs full of copied or low quality content created with the sole intent of filling them with ads. Do you think the owners got rich? Probably if they had spent that time doing something constructive and qualitative, today they would be better off, both inside and out.

Websites that use image hotlinking

I described 6 years ago what image hotlinking was and how to protect yourself. Today there are still websites that “steal” images from other sites in the hope of driving traffic from Google Images. This practice in addition to being incorrect does not even bring results. You only risk pissing off the owner of the site you steal the images from.

If the webmaster you steal from were shrewd, he could also automatically replace the images you take with others… The clever ones understand, the fools fly by.

Blank pages

Let’s start from the concept that for Google a blank page is worth zero.

In a series of numbers, a zero lowers the average, do you agree?

If your site has two pages, one blank which is worth zero, and an excellent one which is worth 10, your site is worth 5 overall.

Remove the blank page and the site is worth 10.

Concept as simple as it is ignored.

The quality of a website calculated as an average of the value of its pages

Parked domains

Do you think buying expired domains that have backlinks and redirecting to your primary site is brilliant ? In rare cases it has worked in the past.

Today you risk wasting the money you spend on the purchase of the domain. Keep in mind that most SEOs I know believe that:

the purchase of an expired domain
simultaneous modification of hosting, WHOIS, DNS
setting up a 301
a radical change in the purpose of the website

they reset the SEO value of the domain and I fully agree with them. I wouldn’t consider buying an expired domain as an immediate solution to any ranking issues.

A site without social channels and without contacts

When I enter a site and I do not find references to the company, the office, the address, the telephone, the social networks … Maybe I only find an email from one of the many free services (see gmail) … Then many doubts arise. If you don’t want to be found , you do n’t transmit trust .

Conclusion

As we have seen, spam is one of Google’s and users’ worst enemies. Google invests a lot of money every year in the fight against spam and in the last 10 years it has made great strides. Its algorithms are now capable of automatically detecting spam content with much more precision than it did in the past.

So, if despite all the warnings you want to start a career as a spammer, you have to be really good, otherwise you risk wasting a lot of time and money.

Blog Post