Can businesses access any public data in the web for their use?

The recent Federal judge’s injunction against LinkedIn that it can’t stop bots accessing its public data could turn out to be a landmark moment for the commercial use of the data on the net. It was a nuanced interpretation of the law keeping up with the reality of the rapid webification of the real world. It is still early stages regarding the result of that specific lawsuit, but the directions and arguments bode well to clarify the gray area of legality associated with internet data and commercial use.

Before we go into the questions of what is ethical, legal and practical regarding the public data on the net, let us quickly clarify a few basic quetions so that we are all on the same page.

What is public data on the internet?

The easy definition is of all the sites which have been explicitly expressed as for public use such as opendata.gov, wikipedia.org or any site that operates under open licenses like the open data commons. But in the real world, any website or source which does not require registration and lets itself indexed by search engines is also deemed public data. This is notwithstanding the terms & conditions mentioned on the site (which are usually templates and copy paste jobs) imposing some limitations in the fine print. This is the big gray area which the LinkedIn injunction touched upon.

Can the public data be used for the commercial purpose?

The key question here is whether the public information is copyright protected or comes under intellectual property ambit. If yes, clearly the data cannot be used for commercial purposes by others. But if the information is just factual data, then it does not get any IP protection.

The second question is related to individual privacy. More and more governments are enacting laws to protect privacy details of individuals and rightly so. However, if the information is about businesses and publicly accessible, then it is deemed to be open. It can be argued this is true for public activities of individuals too and there are more nuances here but for the moment this will do.

Can the public data be accessed through electronic means (a.k.a web scraping)?

The only contention against scraping is that it can put the web servers under stress and affect the primary user experience. With technology improvements, most high-end sites have an elastic server design to handle this additional traffic spikes even if from (non-malicious) bots. However, there is certainly a cost element to this and also the nuisance of skewing of website analytics due to this unwelcome traffic for the web master.

Coming back to the situation when the start-up HiQ Labs sued LinkedIn, it was certainly counterintuitive. Here is a company that openly declares it accesses data from LinkedIn through anonymous, automated scraping and arguably the data usage could be detrimental to LinkedIn members’ interest. In this scenario, LinkedIn blocking their activity seemed natural and fair. However, HiQ questioned the very basis of non-openness of that data and the public’s right to access that data through bots under a legal framework.

On the publicness of the data, the argument was that it can’t be a crime to visit a public website and anyone who types in a url in a browser should have access to the public information on the site. Prof. Laurence Tribe of Harvard University arguing against LinkedIn brought an interesting point “If you exclude someone from sites like LinkedIn, Facebook and Twitter, you are excluding them from the modern version of the town square”. This is an important and valid argument because these social media sites are the conquerors in the winner takes-all, ‘network effect’ fuelled monopolists’ game. In the wise words of Uncle Ben, ‘With great power comes great responsibility’ and so these social media sites cannot claim ownership on the town square.

The usage of data for commercial purposes by a third party becomes a moot point once the data is deemed public. However, in this case, the Judge also raised the question if LinkedIn is actually indulging in anti-competitive practices by restricting access to its public data to other businesses. The other point noted was that LinkedIn could only showcase three actual member complaints about third party accessing data from its hundreds of million members, including the fifty million who had signed up for no broadcasting of information. This questions the basic premise of confidentiality expectations by the members who published their data, especially so when LinkedIn and the other social media sites themselves reserve the right and actively engage in using the member data for commercial purposes.

The more interesting aspect of this case is about the injunction against blocking of bots. So far, the general narrative has been that the sites are within their rights to restrict access to automated bots, even if the bots follow ethical practices. But the narrative needs to be broader to include the rights for public access of data, keeping in with the direction of inclusive and open internet for every netizen of the world. So when some sites actively block traffic based on some criteria or even go as far as feeding wrong information to bots as an offensive defense play, they could be the bad guys in the eyes of the law.

There is a lot more to be done on this subject like individual’s privacy and security and it can be a tricky subject but will continue to evolve. However, enabling more easier access to the enormous wealth of data available on the web can certainly help businesses leapfrog their digital transformation and contribute to greater public good. A legal framework to share the data in an equitable way, either by making available APIs without restriction or having a reasonable access fee for bots with built-in guidelines for access could be a good start.

The current laws and seeing the new web world through the prism of earlier social and business models are flawed. It is a fact that law and ethics will always find tough to keep pace with the speed of technology advancement and in this case, the arguments were based on 1980s act called Computer Fraud & Abuse Act (CFAA) while the reality of internet has enormously transformed over the last 30 years. It will be important to have more conversations and modify the legal frameworks to suit current realities and technology advancements. From that angle, the HiQ vs LinkedIn is certainly a positive development.

  Karthik Karunakaran

  CEO & Co-founder

This article has 8 comments

  1. تسديد قروض Reply

    I am no longer positive where you’re getting your info, but great topic.
    I needs to spend a while finding out much more or figuring out more.
    Thank you for wonderful information I used to be on the lookout for this information for my mission.

  2. SammyBig Reply

    Hi. I have checked your scrape.works and i see you’ve got some duplicate
    content so probably it is the reason that you don’t rank high
    in google. But you can fix this issue fast. There is a tool
    that rewrites content like human, just search in google:
    miftolo’s tools

  3. Clement Glover Reply

    For newest information you have to pay a visit world wide web and on internet I
    found this site as a finest site for most up-to-date updates.

  4. Warren Walter Reply

    Hello, i feel that i saw you visited my weblog
    thus i came to ?go back the want?.I am attempting to find issues to improve my web site!I suppose its ok to use a
    few of your ideas!!

  5. Lamar Banuelos Reply

    I’m now not sure where you are getting your info, however good topic.
    I needs to spend some time studying much more
    or understanding more. Thank you for magnificent information I was looking for this info for my mission.

  6. Alberta Reply

    Just wanna comment on few general things, The website style is perfect, the subject material
    is very excellent :D.

Leave a Comment

Your email address will not be published. Required fields are marked *