We’ve considered this issue in seven posts:
- Part I: What’s the beef?
- Part II: Search engine scope and data sources
- Part III: Search engines indexing, audiences, and their objectives
- Part IV: “IDX index fishing”
- Part V: Is search engine indexing legal?
- Part VI: Purpose of IDX and broker expectations
- Part VII: Miscellaneous items
I promised that breaking this topic up would keep me from five-page blog posts, but I’ve lied. In this last marathon scheduled post on this topic, we’ll look at some solutions folks have proposed and see whether they resolve the situation.
Interests to advance
A good solution to this problem should advance several interests, and I’ll use these to evaluate the potential solutions below:
1. Listing Broker Expectations: Establishing reasonable expectations in listing brokers about how their listings may be used on the IDX sites of their competitors. This does not mean accepting brokers’ current expectations, because in many cases, those expectations may result from lack of knowledge. Rather, it means telling the brokers, by rule, education, or a combination, what to expect. Any use beyond those expectations should be considered misuse.
2. Supporting Broker Web Efforts: Supporting efforts of brokers to be the ‘go-to’ sites on the Internet for consumers looking for real estate. This means giving brokers the tools to attract and retain consumers to and on broker sites. I think this is even more important than the first factor: after all, through education and clear rule-making, we can change listing broker expectations. But nothing MLSs do is going to alter the expectations of consumers on the web. We need to equip brokers to face the realities of the web.
3. Technology Tools: Exploiting technology that prevents obvious misuses of data at the source, if there are such tools. This means placing responsibility with the displaying broker (the one hosting the IDX site) to prevent obvious ‘scraping’ efforts, to the extent they can be distinguished from permissible uses.
4. Legal Tools: Giving listing brokers and MLSs legal tools necessary to go after those who misappropriate and misuse the MLS data. Once the data is out of the hands of the broker operating the IDX site, neither that broker nor MLS has physical means to prevent misuse of the data. So listing brokers and MLSs need other tools to go after ‘the bad guys.’
Of course, the challenge will likely be balancing listing broker expectations and the efforts of site-hosting brokers to attract and retain consumers by all legal and ethical means. To balance those interests, I wish we had a little more info on these two points:
First, what are the types and frequencies of these searches on search engines? How often, for example, does a consumer go to Google and type: “Nancy Smith” if she’s looking for a listing of Nancy’s? How often does a consumer go to Google and type “123 Elm Street” when she has driven by a listing? If consumers only infrequently use these “long-tail” searches, permitting them may not be critical to the success of broker SEO efforts.
Second, if we were to talk through how IDX index fishing works with a group of listing brokers, would they be distressed? I’ve argued that listing brokers would be justified if they felt IDX index fishing frustrates their expectations about how IDX works. But no one has demonstrated that listing brokers as a group actually care. Am I wrong about that?
I don’t list clarity as an interest, because clarity is essential – being clear about what is required, what is permitted, and what is forbidden. Rules that are not clear do not provide a good basis for conduct. As far as clarity is concerned, I don’t think anyone as satisfactorily defined the distinction between legitimate indexing by web search engines, entities not in the real estate business that are indexing the whole web and that promptly pass consumers off to matching sites, from “indexing” that is really scraping by a real estate player. If we permit ‘search engines’ without clarifying what we mean by ‘search engine,’ anyone can scrape data from IDX sites all over the country, put it up on a national ‘real estate search engine,’ and attempt to monetize the resulting traffic by selling leads back, etc.
So, what are our options? Well, NAR’s current rule reads this way:
Participants must protect IDX information from misappropriation by employing reasonable efforts to monitor and prevent “scraping” or other unauthorized accessing, reproduction, or use of the MLS database. (Section 18.2.2 of NAR’s model rules, NAR Handbook on Multiple Listing Policy, 2009 ed.)
Option A. No change – Allow each MLS to interpret the rules
If NAR does not change the text of Section 18.2.2 of its model rules, I expect MLSs will continue to interpret the section as they do now, with some holding that web search engines are ‘scrapers’ and other that they are not.
The current rule has a clarity problem: The first sentence imposes on the broker a duty to prevent misconduct that is beyond the sphere of the broker’s control (“monitor and prevent … unauthorized … reproduction, or use”). A broker can be asked to prevent scraping of her site, using commercially available tools to monitor access to the web site to detect scraping and terminate access from IP addresses that engage in it. (This may not ultimately prove successful, but it may at least make scraping more difficult.) But she can hardly be expected to police uses by a third party unless the way the third party ‘scrapes’ the site gives the broker notice of some kind of problem.
It’s also unclear in that we don’t know whether web search engines are making authorized use or unauthorized use of the MLS data.
It’s unlikely NAR will allow local MLSs to make these decisions. A key reason is that brokers who function in multiple markets will not like it if two adjacent MLSs take opposite views. The arguments those brokers make about the difficulties this would pose for them might be overstated, but NAR will hear the arguments nevertheless. NAR has attempted over the last several years to make IDX rules more uniform, and I expect they will respond on this issue in kind.
Option B. All MLSs adopt the MIBOR approach
NAR could require all MLSs to adopt the MIBOR approach, prohibiting indexing by search engines. Then I’d recommend changing section 18.2.2 of the model rules to read:
Participants must make commercially reasonable efforts to prevent the gathering of MLS data from their IDX sites by automated means, including techniques commonly described as “scraping,” “spidering,” or “web-crawling.” For purposes of this Section, indexing by search engines constitutes “web-crawling.”
By the way, I don’t think this approach would prevent listing brokers from doing IDX index fishing with their own listings. Generally, NAR and MLS policies do not restrict how a listing broker displays her own listings. So, in this case, even if MLS says “you can’t allow search engines to index IDX listings,” a broker could permit her own listings to indexed on her own web site.
Thus, there are many who would claim the big listing brokerages were driving this approach. I think that it’s reasonable for a listing broker to say, however, “If someone searches for my listing agent or the address of one of my listings on Google, the result should be a link to my site or to a neutral site that I have authorized to display my listings, not to my competitor.” The listing broker who agreed to take part in IDX to allow other brokers to advertise her listings may never have contemplated that her listings would be used as search engine bait.
As for our interests, this approach does a good job of setting listing broker expectations because it’s clear what is permitted and what is not. As this approach does not require making a distinction between good scraping and bad scraping, it does not require nuances in interpretation. There are also technology tools that permit web sites to prevent almost all spidering, web-crawling, etc. (rather than requiring the server to attempt to distinguish between good bots and bad bots), so it supports our third interest. As for legal tools, this approach provides none, but adopting Option E in conjunction with it could change that.
This approach fails to support broker web efforts, because non-broker sites will be able to take advantage of web search engines, but most broker sites will not. This is only one of five interests weighing against this approach, but frankly, I think it’s critically important. Do we really want to say that brokers with IDX sites cannot use site development techniques that are common across many industries and that are otherwise legal and ethical? Do we want to permit aggregators to use tools that we do not allow to brokers?
Option C. NAR policy committee approach from May 2009 (or modified form)
NAR could adopt the policy proposal it considered in May, which essentially permits all indexing by search engines. This is how that change would have left Section 18.2.2 of the model rules:
Participants must protect IDX information from unauthorized uses. This requirement does not prohibit indexing of IDX sites by search engines.
This also suffers from clarity problems similar to the ones in the current policy (Option A). Here’s language I’d propose instead:
Participants must make commercially reasonable efforts to prevent the gathering of MLS data from their IDX sites by automated means, including techniques commonly described as “scraping,” “spidering,” or “web-crawling,” except when those techniques are used by web search engines. A “web search engine” is [define here].
(Note that I have not defined what a “web search engine” is or what data uses typify a “web search engine.” This is not a trivial problem. If you have a sense of how to define it, post a comment.)
It is essential that the policy language define “search engine.” As we concluded previously, any database that a consumer can search that displays the results online is a “search engine.” Trulia, Google, and Realtor.com are search engines. In fact, almost any site likely to misappropriate IDX data is likely to do so by presenting it in the form of a “search engine.” When evaluating this approach, I’ll assume it comes with a good definition of “web search engine.”
This approach should establish clear listing broker expectations (though I think we’d have to educate listing brokers about how this works, what they might see, and what their own options are for using the same techniques). It also supports broker web efforts, essentially allowing IDX to compete with the likes of Realtor.com, etc. It does not really provide any legal tools for stopping bad actors, but adopting Option E in conjunction with it would help.
This approach is a mixed bag on the technology front. The broker hosting an IDX site might be able to detect that someone is scraping the site, and as long as the scraper is not posing as a web search engine, we could probably expect the broker operating the site to take steps to prevent the scraping. But it would be difficult for the broker know if a malicious scraper is posing as a legitimate web search engine; in other words, the displaying broker’s “commercially reasonable” efforts are unlikely to prevent a determined scraper.
This approach will not be without its detractors: I think listing brokers do not expect that searching “Nancy Smith,” a listing agent with ABC Realty, on Google, will result in a match to one of Nancy’s listings being displayed on XYZ Realty’s IDX site. I know there are those who discount this listing broker concern as symptom of a failure to compete on the web. But IDX after all, is a grant of permission by listing brokers for other brokers to display their listings. They are entitled to have a say about how the listings will be displayed. If you want to do funky stuff with other brokers’ listings, you can do it behind a password on your VOW (though web search engines should not be able to index VOWs). Might big listing brokers pull out of IDX if NAR adopts this option? I’ll save that question for a post on the strategic interplay between IDX and VOWs.
On the other hand, if the information I’ve shared in previous posts (and comments of others associated with them) is correct, listing brokers can readily adopt the techniques of “IDX index fishing”; and widespread adoption of these techniques will make them less effective, and presumably, less distressing to listing brokers. Perhaps the way to address listing brokers’ legitimate concerns is to educate them about how IDX index fishing works, and about how they can make use of it themselves.
Option D. Permit indexing, but limit fields that can be indexed
NAR could adopt an approach that permits indexing generally, but prohibits it on a few key fields to address the listing broker concerns about the example I gave in Section C. The rule might look like this:
Participants must take commercially reasonable steps to prevent the gathering of MLS data from their IDX sites by automated means, including techniques commonly described as “scraping,” “spidering,” or “web-crawling,” except when those techniques are used by web search engines. A “web search engine” is [define here]. If a Participant permits or encourages web search engines to index the Participant’s IDX site, the Participant must not expose the following fields in the listing display provided to web search engines: name of listing broker, name of listing agent (or non-principal broker), street address of listing [Others?].
Generally, this approach has the same virtues and vices as Option C, except this one does address a narrow concern of listing brokers seen in the example I gave in Section C (search for listing agent’s name shows link to agent’s listing on a different broker’s web site).
This approach does create a sort of paradox, though. Most MLSs have rules requiring that an IDX display include the listing broker’s name. It seems odd to require an IDX site to withhold the listing broker’s name when displaying that broker’s listing to a search engine. Essentially, I think this rule would require the broker to provide a different display to search engines than it provides to consumers. That makes me uneasy, though I can’t say exactly why.
Option D.5? One possibility arises from the fact that many MLSs do not require that listing agent name be displayed on IDX (only listing broker name). By prohibiting display of listing agent name in IDX, MLSs would prevent the problem in the “Nancy Smith” example above, but without having separate display rules for search engines and consumers. This option probably does not work in states that require the listing agent to be identified on advertising.
(BTW, though this suggestion comes from a post by Mike, I don’t support a suggestion he made in the same post that it would be better to limit the fields available in IDX to alter the impact of search engine indexing. I’ll take that up in a post on strategic interplay of IDX and VOWs in coming weeks.)
What I would probably do…
So, if NAR said to me, “We’re going to take action on this, what do you recommend?”, I’d probably say this:
- Define what we mean by ‘web search engine,’ identify the benign uses they make of listing data, and incorporate those descriptions into the rules.
- Say that broker IDX sites may allow and even encourage indexing by web search engines.
- If brokers are particularly miffed by the “Nancy Smith” example, MLSs can prohibit display of listing agent in IDX (as long as state law does not require it).
- Educate all brokers about how site indexing works and about technology options to allow them to take advantage of it.
- Develop a good model TOU and invite the MLSs to promulgate it to brokers; two key terms would make MLS a third-party beneficiary and would allow ‘web search engine’ use but not any other commercial use of the listing data.
I’d chose permitting indexing over prohibiting it, because I think we have to let brokers have the tools they need to compete on the real web. Listing brokers who don’t like the results can employ the same tools. I don’t like Option D, but as I say, I can’t quite put my finger on what there is about it that makes me uneasy.
I would stress the need to define exactly what constitutes benign “search engine indexing.”
I would also stress the need to put together some education to help MLSs and brokers. NAR has CRT, which is a great resource, but it still tends to make knee-jerk decisions and then allow local MLSs to have to deal with the implications. It’s down-right idiotic if NAR does not produce material that explains how this works in layman’s terms.