Indexing

Search engines can return results quickly because they ‘index’ the content in the database. When a consumer searches on a broker’s IDX site for a listing number that she saw in a print ad, the displaying broker’s search engine finds the matching listing immediately. If the database were not indexed, it would have to do the equivalent of checking every listing record, asking, “Is this record #1234567?” Even at computer speeds, that would take a long time. The index speeds this process.

IDX and MLS consumer-facing search engines are usually indexed based on fields of data: Listing number, bedrooms, bathrooms, etc. This kind of indexing is common in relational databases. This works well, because the visitor to an IDX or MLS site can usually answer questions about particular data fields. E.g., how many bedrooms, bathrooms, what listing price range, etc.?

Web search engines like Google work differently, indexing whole web pages. They note things like the frequency of a word on the page, its propinquity to other words, and much more, using complicated indexing algorithms. That’s why you get much better results when you provide multiple words in your search. “Hepburn movie leopard” is more likely to get the answer you are looking for than “leopard,” or “leopard movie.” (The answer I wanted was “Bringing Up Baby.”) The web search engine finds pages that have more of your words on them. Proximity of the search terms matters, too. That’s why when you search “marcy holmes,” a page about the Marcy-Holmes Neighborhood Association comes up more readily than a page that refers to Marcy Smith’s affection for Sherlock Holmes.

It’s important to understand that for Google to perform the sophisticated types of indexing it does, it must ‘cache’ (or retain a copy of) pretty much every web-page that it indexes. This copying of copyright-protected content from web sites might raise copyright law issues, but the Federal courts have generally held that search engine indexing of the kind that Google does qualifies for treatment as ‘fair use’ under the copyright laws. (It’s not clear whether ALL search indexing would meet the ‘fair use’ test, which is a multi-factor balancing test. Some cases have found that Google’s indexing of images, for example, might not be fair use if it impairs the copyright-holder’s ability to commercialize its work. (Comment if you want more of a copyright law exposition on the search engine issue.)

But how you index your search engine may depend a great deal on your target audiences and their objectives.

Search engine audiences and their objectives

Search engines can be targeted at broad audiences or narrow ones; and they can be ‘destination’ sites or ‘conduit’ sites. A search engine can be designed to redirect visitors to other sites that are the source of the indexed information – a conduit site – or it can be designed to present the information directly to the visitor without resort to other sites – a destination site. Search engines vary in their audiences and objectives:

  • Google is a conduit site with a broad audience. Web users of all kinds visit looking for information on topics of all kinds, and Google redirects them to the sites that are the sources of its information. (This is a generalization – Google does offer services that don’t work quite this way.)
  • IDX search engines are destination sites with a moderately targeted audience. They attempt to attract and retain the interest of consumers interested in real estate in the geographic area(s) where the broker practices (still a pretty broad group); they do not ‘refer’ visitors to listing brokers’ web sites, but instead attempt to build relationships with them through rich content. (More on this in another post.)
  • Real estate aggregator search engines generally have the same audience as IDX search engines (though perhaps with expanded geography). The extent to which they are destination or conduit sites varies greatly. For example, visit Realtor.com and try to get to the listing broker’s web site from the average listing. Compare that to Trulia or Googlebase. (I presume it gets easier to get to the listing broker’s site if she has paid Realtor.com for some kind of ‘enhanced’ service.)
  • MLS consumer-facing search engines have roughly the same audience (and scope as we noted before) as IDX search engines, but they vary in the extent to which they are destination sites. The Houston Association of REALTORS site, HAR.com, for example, has very high traffic numbers, probably because it is designed to keep consumers on the site. (Again, try to get to the listing broker’s page for the listing you are viewing on HAR.com.) Other sites may function more as conduits.
  • The online legal research service Westlaw is a destination site with a relatively narrow audience. It indexes court opinions (among other things), principally for use by lawyers who are performing legal research. Westlaw’s goal is to sell the password-only index itself (and certain ancillary services). It does not redirect its customers to the web sites of the courts whose opinions it indexes; it has cached, indexed, and annotated copies of all the opinions and delivers the whole kit and caboodle.

Next, we’ll look at some SEO efforts of brokers operating IDX sites to enhance their appearance on web search engines like Google, including the practice I call “IDX index fishing.”

-Brian

Reader Interactions

Comments

  1. I would like more of a copyright law exposition on the search engine issue. I find this quite fascinating and understand why you had to break in down into small posts – however, I am one who likes to read the end of the story first 🙂

    On the other hand, you keep me coming back.

  2. Looking forward to the ongoing posts in this series. As I have mentioned in my earlier posts, Google scrapes photos, agent descriptions and other expressly forbidden information from the IDX compilation and displays that on their website. Indeed, this is practice is nearly exactly the same information that Trulia displays before linking to the Agent, Broker, or MLS's compliant website.