▶Expect to see more online data recovery, thanks to misinterpreted court ruling

The comparison
News & Updates
Expect to see more data recovery online, thanks to a misconstrued court ruling

Expect to see more data recovery online, thanks to a misconstrued court ruling

A US appeals court, in a case involving LinkedIn, recently ruled that data scraping of publicly viewable information does not violate the Computer Fraud and Abuse Act.

This decision, the ZDNet decision is here, has a reality component and a perception component. In reality, the ruling is delightfully narrow and unlikely to have much legal impact. As for the perception part, this is where corporate webmasters and their IT colleagues are likely to have major headaches. The same goes for corporate marketers (but most of them deserve it).

Reality: The decision does not say that competing web-scraping is legal. He simply said that he was not violating that specific law. It might violate other criminal laws and certainly some civil laws, but the panel only ruled on what was presented to it, as it should.

But the perception of most people, encouraged by misleading headlines that the court has given all scraping a legal green light, is that the practice is now legal and that scrapers can proceed aggressively. Although the court said no such thing, it is easy to predict that it will drive an increase in scratching.

How much increase? Well, that probably won't be much of an increase. Why? Because the kind of people who steal content by scratching are not really law-abiding. It's not like there were a lot of vendors who wanted to scrape but wisely held off until the courts ruled on the legality of scraping.

That said, misinterpretation of this decision will encourage scrapers to scrape much more.

What can and should IT do? Since these are generally publicly viewable pages, this is a problem. There are few technical methods of blocking scrapers that won't cause problems for the site visitors the company wants.

Years ago, I ran a outlet that was making a huge leap into premium content, which now means readers would have to pay for select premium stories. We found a problem. We couldn't allow people to freely share premium content because we needed people to buy those subscriptions.

This meant that we blocked copy and paste and specifically prevented anyone from saving the page as a PDF. But this meant that these pages could not be printed either. (Saving to PDF is actually printing to PDF, so blocking PDF downloads meant blocking all printers.) It was only a few hours before new premium subscribers shouted that they paid for access and should be able to print pages and read them at home. or on a train. After several subscribers threatened to cancel their paid subscriptions, we gave up and reinstated the ability to print. (And our fears were confirmed; PDFs of our premium content started showing up everywhere.)

This dilemma is similar to struggling with scratching efforts. And most Internet users will quickly conclude that simply accepting scrapers is probably the best course of action.

Going back to the LinkedIn case, I would argue that even citing the Computer Fraud and Abuse Act was a massively flawed argument by LinkedIn. A better argument, though perhaps just as unlikely, would be copyright infringement.

The details of LinkedIn make this argument difficult. Unlike the media (like "Computerworld"), LinkedIn doesn't pay money to create great content. The overwhelming amount of content obtained involves what LinkedIn clients write individually for free. Can LinkedIn even bluntly claim that it rightfully owns all the information on my CV, which I have posted on my LinkedIn page?

If LinkedIn paid me to post comments and messages and work history details, maybe that could assert ownership. But that's not what they do.

However, do users expect the content they post on LinkedIn to only appear on LinkedIn? Specifically, do these users have realistic expectations of staying in place? Like many journalists, I've often gone to a LinkedIn page to verify a source's biographical information or to verify someone's professional information for a column or post I'm writing. Is someone questioning my right to do so?

And where exactly should the line be drawn on what constitutes scratching? Reference a draft title? How about four previous titles of a person, or 10? Or is it information about more than 100 people? This is a problem, because if LinkedIn decides not to care about small data referrals, it compromises its ability to search for larger referrals.

This is where we enter the public space argument. If I post something sensitive about myself in a public forum on a major discussion site, do I have any reason to expect privacy? (Actually, I might because no one cares what I think, but I digress.) If he had wanted something to remain silent, he would not have made it public.

One of the most interesting uses of LinkedIn for journalists is to examine the details of someone's experience. Why? Because we know that a lot of coders and other technical talent will be sharing en masse, detailing what they've done on projects for their employer, including a lot of highly sensitive information about systems they've worked on, apps purchased by their employer, and even unannounced ones. security vulnerabilities they fixed.

The only legal action is that their companies could fire them for leaking inside information. But the coder who posted it has no guidelines. It was his choice.

In short, I think we can all look forward to more scraping and content theft, and IT will sadly find that there really isn't much they can do to stop it.

Expect to see more data recovery online, thanks to a misconstrued court ruling