Reddit Spam Traffic


Massive Internet Attack Floods the World with Fake Data

Reddit is now at the center of this attack that impacts millions of top domains (most of the Internet) since November 30. While Reddit appears at first glance as the perpetrator, it is actually the victim. This "behind the scene" scheme run from Russia generates huge amounts of fake traffic - as much as 10% of the entire Internet traffic.
It is not caught by Google Analytics, and thus it results in phony web traffic statistic and flawed reports, which is the main issue people are complaining about. It is not mentioned in any media, as far as I know. The attack, even though massive, looks rudimentary. I will explain the details shortly. It is launched either by a hacker playing some old tricks to a new scale (probably in collusion with a few Russian ISPs), or by professional criminals testing some devices, doing a rehearsal, testing how far they can go before being detected, or trying to distract us from a far more nefarious but smaller scale attack taking place at the same time.
At this point, this ongoing attack is a nightmare mostly for web analysts, webmasters, and some data scientists, though any data scientist worth her grain of salt should be able to precisely identify the fake traffic, and thus correct the phony numbers. Such attacks occured in the past (from other countries), but this one is the biggest that I have ever seen. The user visiting the websites impacted by the fake traffic won't notice anything: it is happening behind the scene. It is not a DoS (denial of service). attack impacting a few domains with highly concentrated traffic to knock them down, but instead smaller traffic volumes (per targeted domain) impacting millions of websites. If it was 10 times bigger, I would imagine that many websites would go offline though. The perpetrator is clever enough to maintain his scheme alive (avoiding being blocked) by not hitting too hard. Or maybe he has reached his limit in terms of available bandwidth. 

How is Reddit involved?
The fake (non-human) clicks come with a fake referrer. Initially, on November 30, it started with lifehacĸer.com as if the traffic was coming from that domain, but indeed the traffic was manufactured with a robot, not real humans.In the last section, we show source code that can generate such fake traffic, faking both the browser and the referrer field, so that when the victim checks his web traffic statistics, the top referral is now a fake. Typically, hackers who plant fake referrer domains use their own domain, they use this scheme as a way to generate free traffic: if dozens of million of fake referrers are planted across millions of sites, you would expect many web analysts and webmasters to check out the referral domain that suddenly seems to be generating such a big proportion of their traffic. At least this is the way this scheme has been used in the past.
Note that in the case of lifehacĸer.com (the domain used by the fraudsters on November 30) the letter k is not actually k, instead it is a cyrillic character thnat looks very much like k. Compare the two versions: lifehacĸer.com (with a Cyrillic character) with lifehacker.com (with a k.) So the fraudster tried to leverage this confusion.
Starting on the second day, and still today, the domain being used changed from lifehacĸer.com to reddit.com. Indeed, the full URL planted in millions of web logs suddenly became
as if Reddit suddenly started to spam the whole Internet. Yet the traffic still originated from the same Russian locations, using the same (possibly fake) browser Safari, version 9. Interestingly, the Reddit link in question is the only article (besides this very article) talking about the attack. So the hacker decided to plant fake Reddit referrers in web logfiles across the world. Doing so could get Reddit blacklisted by Google, as Google algorithms could think that Reddit is using black hat SEO tricks to boost its traffic, something that typically gets a website blocked on Google. If instead of Reddit, the hacker would plant fake referrers using thousands of various domains, he could get many websites blocked on Google. Is that the plan? Probably not. 
Why is this traffic not blocked? How to deal with this attack?
It will eventually be blocked, though it tends to adapt to blocking, and usually comes back in a slightly different form. It is not filtered out by Google Analytics, which means that the hacker, via the fake clicks, is able to trigger the Javascript code found on all web pages that use Google Analytics for tracking and analytic reporting purposes. Typically, Google Analytics filter out very little traffic, if any. It automatically (by design) filters out most robots, as robots typically do not trigger Javascript code found on webpages. But this one does, so the hacker must have gone the extra mile to add this feature to his web robot.
Are Alexa.com statistics also impacted by this robot? Alexa did not update its website rankings for several days, which is unusual. It did update the numbers on December 1, but now all the numbers are off. My guess is that this is not related to the lifehacĸer.com attack, but instead it is related to some changes in the way Alexa ranks websites, which coincidentally happened concurrently with the attack. For instance, Alexa could have added many subdomains to its list of websites, or using a different time frame (3 months rather than last 30 days) to compute the website ranks, bexplaining why so many websites now suddenly have a rank that is significantly worse.
It is easy to block the fake traffic at the web server (Apache) level, click here for details. And as always, the most robust traffic metric for your website is the number of new members, assuming you are able to detect and reject sign-ups from spammers and other undesirable people or robots. In this attack, no (fake) new members are being added. But the number of sessions, pageviews, and even (to a lesser extent) users, are impacted.
What are the hacker's motivations? Why is the attack so rudimentary? 
The attack is not carried out by a data scientist, or if it is, it must be by a very dumb one: it is so easy to identify the fake traffic, based on location, browser, and referrer. It is as if the attacker wants you to discover the fake traffic, and the extent of the attack, and he is smart enough to keep it going, avoiding blocking. He is probably not acting alone.
The hacker must have a database of millions of websites (the victims) with some indication of traffic volume for each website. Indeed, websites with lots of traffic are hit harder in terms of total number of fake clicks, but not so much in terms of proportion of fake traffic. Such lists of websites are easy to come by (I have my own based on years of web scraping) and some of them are even public. Quantcast used to publish such a list for the top one million websites, you can still find it here, but it is clearly outdated: many of the target websites (victims) that I checked were not on that list, despite their traffic volume. 
As for the motivations for doing this unusual attack, I don't know. It could be to prove that the attacker is smarter than Google Analytics (in some ways, he is.) Obviously, anyone carrying an attack must use the dumbest possible technique that will work, to avoid revealing advanced tricks to the people trying to catch or block you. If it works even though it is rudimentary, so be it, it is good news for the hacker. That said, there is some level of sophistication in it, but it is from a software rather than statistical engineering point of view. For instance, it must be deployed in some distributed environment to successfully generate so many clicks in so little time. But the algorithm that does that is actually a textbook example about how Map-Reduce works. From a statistical engineering point of you, you could not design something more dumb thanthat though. Yet, I imagine that the hacker will add a bit of statistical engineering in his next release. Or use a Botnet instead.
Interestingly, Ive found an article entitled A Russian Trump fan is celebrating by hacking Google Analytics, though this could just be another piece of fake news
Source code to plant fake referrers
The source code below is very basic: while it plants fake referrers, it does not trigger the Javascript code used by Google Analytics to track traffic. Click here for more details. It is also one of many different ways to achieve the same results -- and clearly the hacker did not use such a script here -- otherwise we would likely see tons of (fake, simulated) browsers associated with the attack, not just Safari version 9. 

44 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area. Data Science Training in Bangalore

    ReplyDelete
  3. Excellent Article ...thank u for sharing, such a valuable content Learners to get good knowledge after read this article.. Data Science Training in Chennai

    ReplyDelete
  4. Thanks for sharing this valuable information to our vision. You have posted a worthy blog keep sharing.

    Article submission sites
    Guest posting sites

    ReplyDelete
  5. Nice way of expressing your ideas with us.
    thanks for sharing with us and please add more informations
    best german classes in bangalore
    German Training in Nolambur
    German Training in Guindy

    ReplyDelete
  6. This is a good post. This post give truly quality information. I’m definitely going to look into it. Really very useful tips are provided here. thank you so much. Keep up the good works.
    Digital Marketing Course in Chennai
    Best digital marketing course in chennai
    Digital marketing course chennai
    Digital Marketing Training Institutes in Chennai
    Digital Marketing Chennai
    Digital Marketing Courses in Chennai

    ReplyDelete
  7. Really very nice blog information for this one and more technical skills are improve,i like that kind of post.
    excel advanced excel training in bangalore | Devops Training in Chennai

    ReplyDelete
  8. Nice post. By reading your blog, i get inspired and this provides some useful information. Thank you for posting this exclusive post for our vision. 
    python Online training in chennai
    python Online training in bangalore
    python interview question and answers

    ReplyDelete
  9. You’ve written a really great article here. Your writing style makes this material easy to understand.. I agree with some of the many points you have made. Thank you for this is real thought-provoking content
    Java training in Chennai | Java training institute in Chennai | Java course in Chennai

    Java training in Bangalore | Java training institute in Bangalore | Java course in Bangalore

    Java online training | Java Certification Online course-Gangboard

    Java training in Pune

    ReplyDelete
  10. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.
    Web Designing Course in chennai
    PHP Training in Chennai
    Web Development courses in Chennai
    Web development training in chennai
    PHP Course in Chennai
    PHP Training Institute in Chennai

    ReplyDelete
  11. I am really enjoying reading your well written articles.
    It looks like you spend a lot of effort and time on your blog.
    I have bookmarked it and I am looking forward to reading new articles. Keep up the good work..
    Java Training in Bangalore
    Best Java Training Institutes in Bangalore
    Java Course in Bangalore
    Java Training Institutes in Bangalore
    hadoop training in bangalore
    hadoop training in bangalore
    big data training in bangalore

    ReplyDelete
  12. I love the blog. Great post. It is very true, people must learn how to learn before they can learn. lol i know it sounds funny but its very true. . .
    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs Training in chennai

    automation anywhere online Training

    angularjs interview questions and answers

    ReplyDelete
  13. This is really impressive post, I am inspired with your post, do post more blogs like this, I am waiting for your blogs.
    Regards,
    Data Science Course in Chennai | R Programming Training in Chennai | Python Training in Chennai

    ReplyDelete
  14. http://www.analyticspath.com/big-data-analytics-training-in-pune

    ReplyDelete
  15. This page is very nice.I like it your post.
    Easy to learn,and improve my knowledge.
    I realy enjoying read it…
    Nice Article.
    Learn to more information,thanks for sharing in this post,
    Its Wonderful job.
    <a href="https://www.trainingbangalore.in/python-training-in-bangalore.html”>python training institute in Bangalore”</a>

    ReplyDelete
  16. nice content ..loved the blog very informative post
    http://www.iltcs.com/

    ReplyDelete
  17. We have given you great information, we hope that you will continue to provide such information even further. Read More...NetKiDuniya

    ReplyDelete
  18. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    uipath online training

    ReplyDelete
  19. very useful and informative blog . Thank you for your information.If you are searching for Python training institute in kochi visti us
    <a href="https://www.xploreitcorp.com/embedded-system-training-in-kochi-cochin/>embedded systems training in kochi,
    embedded systems course in kochi</a>

    ReplyDelete
  20. I always enjoy reading quality articles by an individual who is obviously knowledgeable on their chosen subject. Ill be watching this post with much interest. Keep up the great work, I will be back
    mongodb course ceritification

    ReplyDelete
  21. Really great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up!
    apache spark certification course

    ReplyDelete
  22. I like your post. It is good to see you verbalize from the heart and clarity on this important subject can be easily observed... soccer stream

    ReplyDelete
  23. Thanks for this great post, i find it very interesting and very well thought out and put together. I look forward to reading your work in the future. nbastreams

    ReplyDelete
  24. Thanks for this great post, i find it very interesting and very well thought out and put together. I look forward to reading your work in the future
    digital marketing course in chennai

    ReplyDelete
  25. https://sejalbatham.blogspot.com/2016/12/blog-commenting-tricks.html?showComment=1619179739880#c584581903573924618

    ReplyDelete
  26. A very awesome blog post. We are really grateful for your blog post. You will find a lot of approaches after visiting your post.
    Hi! We are water filter uae Great points made up above!
    And Anti Hair Fall Shower Filter thanks…
    I think this is one of the most important information for me. And i am glad reading your article. But should remark on few general things…

    ReplyDelete
  27. Reddit is now at the center of this attack that impacts millions of top domains most of the Internet since November 30.

    While Reddit appears at first glance as the perpetrator, it is actually the victim.This behind the scene scheme run from Russia generates huge amounts of fake traffic as much as 10% of the entire Internet traffic.

    It is not caught by Google Analytics, and thus it results in phony web traffic statistic and flawed reports which is the main issue people are complaining about.

    It is not mentioned in any media, as far as I know. The attack, even though massive, looks rudimentary.

    I will explain the details shortly.It is launched either by a hacker playing some old tricks to a new scale probably in collusion with a few Russian ISPs or by professional criminals testing some devices doing a rehearsal testing how far they can go before being detected, or trying to distract us from a far more nefarious but smaller scale attack taking place at the same time.

    At this point, this ongoing attack is a nightmare mostly for web analysts, webmasters, and some data scientists though any data scientist worth her grain of salt should be able to precisely identify the fake traffic, and thus correct the phony numbers.

    Such attacks occured in the past (from other countries), but this one is the biggest that I have ever seen.

    The user visiting the websites impacted by the fake traffic won't notice anything: it is happening behind the scene.

    Attack impacting a few domains with highly concentrated traffic to knock them down, but instead smaller traffic volumes (per targeted domain) impacting millions of websites.

    If it was 10 times bigger, I would imagine that many websites would go offline though.The perpetrator is clever enough to maintain his scheme alive avoiding being blocked by not hitting too hard.
    Reliable Permit Solutions, LLC

    ReplyDelete
  28. Informative blog, thanks for sharing this content. Best customized erp software software development company in chennai.

    ReplyDelete