Search engines & IDX Part V: Is search engine indexing legal?

June 29, 2009 by Brian Larson

(Warning: possibly mind-numbingly boring and long legal content to follow. And I’m not even giving legal advice here.)

When I posted regarding indexing and search engine audiences and objectives, I mentioned that I could provide a short exposition of how copyright law sees search engines. Only Paula Henry had the courage to ask a lawyer to discuss a legal topic in her comment on that post. With only one taker, I thought maybe I’d pass on the chance to explain, but then I got thinking how this might fit in with later posts.

My knee-jerk instinct when folks asked, “Is it fair use if Google indexes listings on an IDX site?” was to say, “Of course. We love web search engine indexing.” As it happens, when I began to think over how the prevailing case law treats web search engines in the copyright context, I started thinking I may have been a little hasty.

So, here it is. Kelly v. Arriba Soft Corp. (2003), from the Ninth Circuit Court of Appeals (the federal court that hears appeals from California and a couple other western states), is the leading case in analyzing copyright fair use in the search engine context.

Facts from Kelly

(For you lawyers, you can find the Kelly opinion at 336 F.3d 811 (9th Cir. 2003).)

Plaintiff Leslie Kelly was a professional photographer. Some of his images appeared on his web site and other sites to which he had licensed them. Arriba Soft Corp. operated an Internet image search engine. (It’s name has changed and it’s now available at http://www.ditto.com/.) Arriba’s web crawler cached full-sized copies of the images onto Arriba’s server. (See discussion of indexing and caching in previous post.) It then created ‘thumbnails’ (smaller, low-res versions) of them. It allowed visitors to search for images. Arriba Soft then displayed thumbnail versions of matching images. According to the court:

by clicking on the ‘Source’ link or the thumbnail from the results page, the site produced two new windows on top of the Arriba page. The window in the forefront contained solely the full-sized image. This window partially obscured another window, which displayed a reduced-size version of the image’s originating web page. Part of the Arriba web page was visible underneath both of these new windows.

All of this detail, it turns out, was not necessary to the court’s decision, as the court addressed only the question of Arriba’s display of the thumbnail images. A technicality in legal procedure prevented the court from considering the question of displaying the larger-scale images.

Kelly never gave Arriba permission to copy his images and objected when he discovered the copying. Based on his complaint, Arriba took the images down and blocked indexing of the sites where his photos appeared, but he sued anyway. Kelly claimed violations of his exclusive rights to display, reproduce, and distribute his copyrighted works. The trial court granted Arriba Soft’s motion for summary judgment (basically dismissing the case), both with regard to display of the thumbnails and the full-sized images, based on the fair use doctrine. Kelly appealed to the Ninth Circuit.

The Kelly court’s basic rules of copyright law

Here are the basic rules of copyright law, as the Kelly court laid them out in its opinion.

The owner of a copyright in a work has the exclusive right to reproduce, distribute, and publicly display the work.
For Kelly to establish that Arriba Soft infringed his copyrights by reproducing his work, he had to show that he owned the copyrights and that Arriba copied the works. (As to the thumbnails, Arriba conceded that it had copied the images and that Kelly owned the copyrights in them.)
If the plaintiff establishes that the defendant has copied the plaintiff’s copyright-protected work, the defendant may escape liability if it can show that the copying is “fair use.”
Fair use is an exception to the general rule of copyright law, designed to permit creativity and productive output that rigid application of the copyright law might otherwise impede.
Whether a use is a “fair use” depends on the weight given to and interaction between four factors. The court can consider other factors, but it must consider the four identified in the Copyright Act.
The four “fair use factors” are 1) the purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes; 2) the nature of the copyrighted work; 3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4) the effect of the use on the potential market for or value of the copyrighted work.

Factor one: Purpose and character of the defendant’s copying

First, just because Arriba’s use was commercial, it did not necessarily lose on this factor. Quoting an earlier case by the U.S. Supreme Court, the Ninth Circuit said the central purpose of this factor is to determine

whether the new work merely supersede[s] the objects of the original creation, or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message; it asks, in other words, whether and to what extent the new work is transformative. (BNL’s emphasis)

The Ninth Circuit said that the “more transformative the new work, the less important the other factors… become.” As we shall see, this is an important observation. The court compared Kelly’s purpose in the works (“intended to inform and to engage the viewer in an aesthetic experience”) with Arriba’s (intended “as a tool to help index and improve access to images on the internet and their related web sites”), and found Arriba’s copying to be highly transformative. The court found that this factor weighed “slightly” in favor of Arriba, with the transformative use outweighing its commercial nature.

Factor two: Nature of copyrighted work

Some works warrant greater protection under copyright laws than others. For example, works of art generally get higher degrees of protection than compilations of facts. Because photographs of the kind that Kelly created are close to the “core of intended copyright protection,” the court found this factor was slightly in Kelly’s favor.

Factor three: Amount and substantiality of the portions used

Generally, the more that one copies of the original work, the more likely this factor will weigh for the plaintiff. But where, as here, copying the whole image is necessary for the transformative use, the court found this factor did not weigh in favor of either party.

Factor four: Effect of the use on potential market or value of the copyrighted work

This factor considers the extent of market harm caused by the defendant’s use and the likely impact on the market for the original work that defendant’s conduct would have if engaged in by many others.

The key here is that the thumbnail images on Arriba’s site (the only ones considered in this opinion) were not in any way a substitute for any use that Kelly was making of his own images. They were too low-quality to be of use in any size larger than a thumbnail. The court noted, though, that this might not always be the case with photos. (In fact, in a later opinion, the Ninth Circuit would acknowledge that this factor might weigh against the defendant if it indexed images from a source that made a business of marketing images in thumbnail form.) It also noted that low-quality copies of other types of media (e.g., text, audio, etc.) might not always result in a win for defendants on this factor, either.

The court determined that this factor went for Arriba.

The tally of factors favors Arriba

The court noted that one factor went slightly for Kelly (factor two), one was neutral (factor three), and two went for Arriba (factors one and four) – the tally thus favored Arriba. It’s interesting that the court took this approach for two reasons: First, the court seemed to use the transformative nature of the use (factor one) to make amount and substantiality (factor three) a ‘push’; second, the court’s analysis of the effect of the use (factor four) found Arriba’s transformative use did not compete with Kelly’s uses.

In fact, I think the very transformative nature of Arriba’s use was the deciding “factor” in three out of four factors in the fair use test.

Is the current dust-up like Kelly?

Remember, my knee-jerk instinct when folks asked, “Is it fair use if Google indexes listings on an IDX site?” was to say, “Of course. We love web search engine indexing.” But think about the Kelly court’s focus on the transformative nature of Arriba’s use and compare it to the IDX search engine situation. If a web search engine indexes the contents of an IDX search engine, is it transforming the content? Isn’t it just copying the whole thing, possibly slightly transforming it, and then replicating the utility of the original? Don’t three of the four factors now go against the defendant/web search engine?

I’m curious what you think. (You don’t need to be a lawyer to weigh in on this.)

Copyright licenses and ROBOTS.TXT

But even if a web search engine indexing an IDX search engine is not fair use, I think the web search engine has one more defense – permission. In copyright law, as in real estate, giving someone permission to use your property is called “granting a license.” (The term is less common in real estate law than in intellectual property – but it has a long history in ‘dirt law.’)

Having a license is a defense to a copyright infringement claim. If you say I copied your work, and I can prove you gave me permission, you lose your copyright infringement claim against me.

Licenses can be implied: that is, the copyright-holder can behave in such a way as to suggest that she is granting a license (even if she never actually says, “You may copy this”). Generally, she can revoke an implied license at any time.

I expect if an MLS sued Google over indexing an IDX site, Google would claim that that the IDX sites it indexed had given it an implied license to copy for at least two reasons:

By creating static pages of listings and detailed sitemaps, the IDX site was “index-fishing,” the only point of which is to be indexed. Thus, the site owner cannot complain when the indexing actually happens.
Any web designer worth his/her salt knows that you can put a “ROBOTS.TXT” file on a web site or portion of a site to keep search engines away. If the owners of these IDX sites did not want them indexed, they could easily put the file in place, expressly revoking any implied license.

There’s only one problem: Just like in real property, one cannot grant any more rights in intellectual property than one actually has. The broker who displays listings on an IDX site has a license from other listing brokers and the MLS to display their listings on the displaying broker’s IDX site. But the broker does not have the right to license that content to other sites, including web search engines.

But listing brokers can do what they want, right?

Here’s the key (and I think this is responsive in roundabout way to a comment Victor Lund posted a few days back): The broker operating an IDX site could certainly give a license to copy its own listings to web search engines. In fact, I don’t think that NAR’s IDX policy (or the MLS rules promulgated under it) prevent a listing brokerage doing whatever it likes
with its own listings.

Consequently, if a listing broker wants to put up an IDX site and index fish using its own listings as bait, I don’t think any MLS is likely to complain. I don’t think that would require an NAR policy or rule change at all. Just don’t create static pages and sitemaps for the listings of your competitors.

-Brian

Comments

Matt Cohen says

June 30, 2009 at 6:09 am

Brian, I would go a bit further than your statement that "By creating static pages of listings and detailed sitemaps…the site owner cannot complain when the indexing actually happens." – I believe that by placing information on the Internet (not just on a sitemap) without a restrictive robots.txt file one is implicitly allowing a search engine to index the information. Since brokers
Brian N. Larson says

June 30, 2009 at 6:38 am

Matt: I think what you are suggesting is that by permitting their listings in the IDX program, listing brokers are permitting them to be displayed on other brokers' IDX sites, and BY IMPLICATION, permitting them to be indexed on search engines because they are appearing on other brokers' sites.

I agree with you if, after acquiring an awareness of how these things work (e.g., by
mwurzer says

June 30, 2009 at 7:30 am

Brian asked: "If a web search engine indexes the contents of an IDX search engine, is it transforming the content? Isn't it just copying the whole thing, possibly slightly transforming it, and then replicating the utility of the original? Don't three of the four factors now go against the defendant/web search engine?"

My answer is from your earlier post, which
Brian N. Larson says

June 30, 2009 at 7:51 am

@Mike: I'll accept your proposition to the extent that it talks about Trulia and Google as they stand today, and I think I agree with it in principle. But that does not really help us draw the line from a policy standpoint. In other words, what if Google becomes a little less "conduitty" and Trulia becomes a little less "destinationy"? At what point does either cross the
mwurzer says

June 30, 2009 at 8:07 am

Issues of misappropriation always are fact specific, and, yes, the facts can change, which also would change the conclusion. That doesn't mean, however, that we need to reach the conclusion before the facts change and shoot a shotgun to try to kill a mosquito. You'll most likely miss the mosquito and destroy what you want to preserve.

Though I disagree with the both the
Matt Cohen says

June 30, 2009 at 11:12 am

Michael – I think you've more or less summed up my position – "If you're putting listings on the web, that means playing with search engines." We can put restrictions on search engines, but that means putting explicit restrictions on either what is put on the web via IDX or segregating it in some way so a policy / robots.txt file would apply. I'm not saying here that we
mwurzer says

June 30, 2009 at 1:55 pm

Matt, though we might agree on the basics, I'm stating my opinion that the core purpose of the IDX policy contradicts requiring a no-index tag, and you're leaving the question open. Brokers have a choice now through the opt-out, and that should be the mode of deciding this issue, not a blanket restatement of the IDX policy to require a no-index tag.
Brian N. Larson says

June 30, 2009 at 2:09 pm

@Mike and Matt: Your comments do a great job of foreshadowing my next post, which is focused on the strategic purposes of IDX. In short Mike, I think Matt is right to leave the issue "open" as you say, because the industry has never adopted a statement as to the strategic purpose for IDX. I think all of us have ideas about what it is/might be — but the brokers whose listings appear
Matt Cohen says

June 30, 2009 at 2:35 pm

Brian – you've nailed it – once we agree on the purpose and change the rules (that's IF we change the rules) then we can expect that broker site owners to comply, and search engines to follow rules such as robots.txt/noindex. It will just be important to make sure that somehow broker sites are not disadvantaged to others in some way. I don't believe purpose and policy discussion will
Brian N. Larson says

June 30, 2009 at 2:36 pm

I'm glad at least the two of your are reading. I hope some brokers and MLS manager-types will chime in, too. I know some of them are reading…
mwurzer says

June 30, 2009 at 2:41 pm

I look forward to your next post, Brian, and I'll foreshadow my comments:

Whether express or not, the purpose of IDX is and has been for a long time to allow competing brokers to put listings on the web. I also suggest that "on the web" means links and links mean search engines. Anything less than "on the web" means off the web. Also, as I've said already
Fred the Realtor & Lawyer wannabe says

July 1, 2009 at 9:43 am

I read it all! Thanks. Not dry.

I am glad you brought up the Robots.txt defense. One might question whether the courts would allow defense that the Copyright holder should have Opted-out of indexing.

Lets assume there was no sitemap and no robots.txt.

Just like with spammers, the copyright holder can say "sure I didn't opt out with robots.txt, but when
Brian N. Larson says

July 1, 2009 at 9:49 am

@Fred: You're right, it's really unlikely anyone is going to sue Google (or ther web search engines) in this situation. But my reason for doing the legal analysis was to show that what Google is doing is probably not illegal. Things might be different though if aggregator search engines like Trulia or Zillow stared indexing IDX sites, based upon the analysis I provided. In that case, I
Rob Hahn says

July 6, 2009 at 9:52 am

Brian – I thought the series is wonderful. Thanks for putting in the work.

Something I'm curious about — and you as the IP attorney would be the go-to guy on — is the difference between the Kelly case (which raises somewhat more traditional IP issues) and the case of listings.

A listing, after all, is more or less a compilation of facts about a property. It isn't
Victor Lund says

July 7, 2009 at 12:57 pm

I found this article today that gives step by step directions to pretend to be a google bot – hinting that anyone can fake out a website and scrape data

http://www.addictivetips.com/internet-tips/access-any-website-or-forum-without-registering/
IndyAgent says

July 18, 2009 at 11:28 pm

Brian –

I read this several times and sincerely appreciate your insight. I too, would like to hear from more brokers and boards.

The whole topic is of great interest to me, not only because I was thrown into the fire storm; also, because I see this as a huge issue facing agents and brokers.

I have a few thoughts here:

If a broker is not aware he is
Little Broker with a Big Site says

July 29, 2009 at 10:01 pm

mwurzer nailed it:

IDX is all about putting listings on the web through competing broker sites. If you're putting listings on the web, that means playing with search engines. Period. The policy is just fine as it is. Brokers can opt out if they want, but complaining about indexing when the purpose of IDX is to put the listings on the web is wrong.

Exactly.

Facts from Kelly

The Kelly court’s basic rules of copyright law

Factor one: Purpose and character of the defendant’s copying

Factor two: Nature of copyrighted work

Factor three: Amount and substantiality of the portions used

Factor four: Effect of the use on potential market or value of the copyrighted work

The tally of factors favors Arriba

Is the current dust-up like Kelly?

Copyright licenses and ROBOTS.TXT

But listing brokers can do what they want, right?

Comments

Larson Skinner, PLLC

Quick Menu

Newsletter

Facts from Kelly

The Kelly court’s basic rules of copyright law

Factor one: Purpose and character of the defendant’s copying

Factor two: Nature of copyrighted work

Factor three: Amount and substantiality of the portions used

Factor four: Effect of the use on potential market or value of the copyrighted work

The tally of factors favors Arriba

Is the current dust-up like Kelly?

Copyright licenses and ROBOTS.TXT

But listing brokers can do what they want, right?

Reader Interactions

Comments

Footer

Larson Skinner, PLLC

Quick Menu

Newsletter