Tech Meets Typecast – A Scary Encounter
A decade-old practice that’s still a taboo topic.
Almost every company scrapes data on the internet – whether you like it or not. There. We said it.
There are plenty of reasons that a young individual just entering the workplace should aspire to work in scraping, machine learning or analytics. It’s the most sought after and most glamorous job profile of the millennium. It’s a booming industry filled with endless possibilities.
Web scraping technology has come to define a large part of our tech culture for the past 20 years. And, of course, it’s only getting bigger.
However, before entering the industry, it’s also understandable that an up-and-comer might have some concerns about where web scraping is headed, and if at all the larger than life debate surrounding web scraping is as black and white as we have been taught to believe.
Scraping technology, and the companies that offer and practice it come to be defined by corruption and greed as it’s gotten bigger and bigger over the years. Should we be worried that scraping is a 21st-century dirty affair? Should we be concerned about its apparent soullessness as more and more companies look for ways to eat up our information? Certainly.
But let’s not allow these problems to define a whole instead of its individual parts. When that happens, we have a stereotype.
The Screen Scraping Debate
No industry is immune to the effects of wise and experienced judgment. Likewise, screen scraping isn’t the same as sports.
Screen scraping is the process of collecting screen display data from one application and translating it so that another application can display it. This is normally done to capture data from a legacy application in order to display it using a more modern user interface.
Screen scraping IS A VERY LEGITIMATE technique used to translate screen data from one application to another. We’d like to emphasise here that it’s not the same as content scraping, which is the use of manual or automatic means to harvest content from a website without the approval of the website owner.
The debate surrounding “screen-scraping” only gets bigger every day.
Take the European Union for example. Their feud with screen scraping continues as Member States of the European Union are preparing for the impending Second Payment Services Directive (“PSD2”). In this context, screen scraping is the practice in which third-party Payment Initiation Service Providers (“PISPs”) and Account Information Service Providers (“AISPs”) are granted access to bank accounts of a client utilising their credentials to perform a service.
The EU then approached the Fast IDentity Online (“FIDO”) Alliance, a consortium of over 250 organizations collaborating and developing industry best practices in online authentication, for a solution. It wrote to the EC commenting on key issues suggesting that endorsing screen scraping as a “fallback” is problematic and not acceptable.
Stepping out of the European Union example, when one probes into the debate, one thing is very clear.
For all organization, business, and industry around the world, the lead concern with screen scraping, is one of security.
Screen Scraping – Not A Dirty Secret Anymore
Screen capture methods can’t seem to shake the negative stereotype or the pesky ‘scraping’ label.
It’s not because screen scraping is inherently bad, it’s just had a rough start. In the heyday of middleware development, the technology was immature so many middleware applications struggled with it. Updating, changing display settings, and other system modifications would cause the systems to malfunction or not function at all. There is a big hazy cloud of confusion and misunderstanding following the technology everywhere it goes!
For example, any people think Optical Character Recognition (OCR) is synonymous with screen scraping but it’s just one technology used in the process.
OCR is the technology that reads the text captured from an active application window. The OCR technology in use today is vastly improved and bears little resemblance to its predecessors.
Service providers still fight the perception that screen scraping is old technology or simply does not work well. While the truth in fact, is that modern solutions have come a long way and the negative stereotypes just don’t apply. It’s definitely not your father’s screen scraping.
Silicon Valley: A Powerful Spectator
The debate over screen scraping has long pitted banks against fintech companies, but it has also recently captured the interest of another powerful interest group: Silicon Valley.
With companies like Amazon and Google starting to weigh in, it has without a doubt become one of the most important policy debates to closely observe this year.
An advocacy group that represents Amazon, Apple, Google, Intuit, and PayPal has stated, “We think it’s really important that access to bank account data not be blocked. It’s up to the consumer to decide what technology they want to use and what level of privacy and security they want.”
This was followed up by banks raising concerns over screen scraping, citing worries about cybersecurity and privacy. In 2016 alone, several large banks were accused of denying screen scrapers entry into their systems.
You can swipe open your smartphone and say, ‘Assistant, start my coffee!’
It will do that.
And you can say, ‘Assistant, what is my bank balance?’ and it will also do that.
As younger people get really comfortable with receiving personal money management, banking, and retirement advice not from a human being in an office but through digital means, one can imagine that there’s an audience for something like this.
Beyond consumer rights and privacy issues, financial data is also a business asset that banks and tech companies will likely jockey over for many years to come.
In the bleakest of scenarios for financial institutions, large tech firms could use their access to financial data to eat banks’ lunch.
So far it was just middlemen against banks and financial institutions in the scraping debate. If Silicon Valley enters the debate in force, it might tip the scales in the debate.
We live in a world—and deal with markets—increasingly driven by data. Consumers and companies throughout the globe generate massive amounts of data at any given moment. Internet searches, mobile phone clicks, website profile information, e-commerce transactions, and basically any other action that can be quantified digitally make up the basis of “Big Data.”
Big Data is a complex issue—different firms and individuals have different access to different sources of data and want to use that data in different ways. This complexity means that the legality of some methods of culling and using Big Data remains unclear.
Legal departments know this, which is why some of the largest companies in the world use scraping services like Import.io, Scrapeworks, Mozenda, etc., to convert the web into structured data for use in their businesses.
It’s easy to imagine a multitasking, all-knowing evolution of Siri or Amazon’s Alexa armed with millions of data points, including transactional data, credit score history, and upcoming mortgage payments.
But the bottomline is this.
Scraping is just automated access, and everyone does it.