Web scraping was a breakthrough for a number of businesses around the globe. As it’s a cheap and automatic way to collect online data, it is frequently used by both startups and mature organizations. However, using bots is still questionable. Is it actually legal to scrape, collect and process online data that we don’t owe? In this article, we will finally explain it.
Are bots legal?
Web scraping is simply based on bots. Although it’s hard to imagine the Internet without them, it’s usually not something to be enthusiastic about. Bots are commonly associated with fraudulent activities, shady data collection practices, abusive e-mail campaigns, and bans on social media. Well, that’s all true.
However, web scraping has nothing to do with “bad bots”, so those used to conduct harmful activities such as online frauds, data theft, stealing of intellectual property, or spam. On the contrary, “good bots” exist and can enable businesses to improve their business. Scraping bots can help to automate price comparison, build databases, reach an audience or generate quality leads. In the case of carefully targeted campaigns, bringing value to potential customers and no harm to competitors, it’s difficult to talk about harassment.
To read more about the benefits of using web scraping, check out our article: 5 Ideas of How To Use Web Scraping For Business.
How to scrape data legally?
So, is web scraping legal? Can we scrape online data without limitations? We wish, but sadly – it’s not that simple.
It’s a fact that big companies commonly use data scraping bots for their own gain. However, they also tend to protect their data from being scraped by other players. Well, if you found out that someone is using bots against you for competitive advantage, most probably you would not be totally fine with it.
Generally speaking, the question of web scraping is usually not about whether it is legal, but whether it is ethical. To put it simply, if only scraping is not carried out for harmful purposes, such as stealing data, reusing content, or sending spam, you can assume it is legal. Remember that the technology itself is only a tool, and it depends on us how we’re gonna use it – and what for. However, it’s a two-way street. You need to be aware that even though you don’t want to be scraped by competitors, it is perfectly legal in many cases.
But finally, how to make sure when using bots is fine? Where’s this thin line between scraping for gain and harassing competitors? We will discuss it in a second.
When is web scraping illegal?
Although we already know that web scraping is not illegal itself, it might be – depending on the purpose for which you intend to use it. There are a few cases when using data scraping bots might be illegal – let’s catch a glimpse of them.
Scraping non-public content
If the data is displayed for public consumption, there’s nothing illegal in copying it to a file on your computer. However, if you’re trying to scrape information that was not meant to be seen by the audience, that’s a different thing. Copying protected data for financial gain is prohibited by CFAA (Computer Fraud and Abuse Act) – American legislation over accessing online data without authorization. So, don’t try to cross the thin line and make sure that you will not try to break into a source of sensitive data.
Scraping data itself is not illegal – unlike doing it to re-publish it on your website. Data available publicly might be protected by the copyright, which means it cannot be used for any purpose. In fact, re-using non-open source content, no matter if scraped automatically or copy-pasted manually, is always illegal. Then make sure that your text is unique and you have the right to use it within your channels.
Authorized ways of using online data can also be regulated by robots.txt. It’s the file containing information on how and which data available on the website can be crawled or scraped. If the document prohibits scraping, the only option to do it legally is to ask the website owner for official permission.
Abusing Terms of Service
Similar to robots.txt, every website has its own Terms of Service (ToS). So, each time prior to scraping a particular website you should make sure that it’s allowed. If there’s nothing about web scraping being prohibited, you’re safe. Otherwise, you should also receive permission in writing.
Exceeding crawl rate
Websites are made for humans. This is why most of the bots try to imitate human behavior in order to prevent exceeding the crawl rate with too intensive scraping. However, on the other hand – advanced, human-like bots tend to process much more data and can significantly slow down the page. So, use bots that will help you to keep the right balance between maintaining crawl rate on a reasonable level and preventing server overload.
Examples of legal and illegal web scaping
To provide you with a better understanding of the legal issues of web scraping, let’s catch a glimpse of a few cases in which web scraping can be used in either right or wrong manner.
For example, bots can be used to search for YouTube video titles or descriptions. It can be legally scraped, downloaded, and saved into a file. However, the videos cannot be re-posted on our own site, as this would be copyright abuse.
To give you another example, a web crawler is able to scrape the names of users available publicly on social media. However, you can’t log in to their Facebook or LinkedIn accounts to retain protected data as it’s not permitted by the rules of the service.
What are the other tools that companies can use to protect their data from being scraped? For example, they can implement CAPTCHA verification technology. They can also use “rate-throttling” to protect their websites from downloading too many web pages at once. This helps to avoid malicious bots which can overload the site.
Legal case: LinkedIn vs HiQ
Although cases in which web scraping is legal are pretty defined, a few companies still decide to protect their data by legal means. An example is LinkedIn who took HiQ to court.
HiQ is the company scraping public information from LinkedIn profiles to provide businesses with data and insights on employees. As LinkedIn decided to launch a similar tool themselves, in 2017 they sued HiQ for unauthorized data collection. Long story short, in late 2019 the US Court of Appeals claimed that there was no CFAA abuse, and LinkedIn was asked to stop applying blocking measures against HiQ.
This case proves that any data which is publicly available and not copyrighted can be scraped freely. However, it still cannot be used for unlimited commercial purposes.
To sum up, web scraping is perfectly legal unless you use it unethically. It’s just a tool that is not harmful itself but can become illegal when used with the wrong intention. However, if you have any doubts, you can always ask the lawyer for professional advice.
If you need to scrape data for business and receive it in a processed, ready-to-use form, feel free to contact us. We will analyze your case and let you know what kind of data you can scrape and what is recommended in your case.