Dismiss Notice
Welcome to Our Community
Wanting to join the rest of our members? Feel free to sign up today.

How Much Page Does Yahoo Search Index?

Discussion in 'Search Engine Optimization' started by ovi, Oct 4, 2004.

  1. ovi

    ovi Guest

    One of the lesser-discussed facets of Web searching is the spidering limits of search engines. Even if a search engine is a full-text engine, it may not search the entirety of a given page if it’s too large. In Google’s case the limit is 101K for HTML pages (its spider will only index the first 101K of an HTML Web page; search Google for aardvark apple zither zephyr filetype:html and look at the file sizes of the results) and ? for PDF pages. (I can’t see the limit; if you look at tinyurl.com/4px8n ; you’ll see that about two-thirds of the pages listed in the TOC are available in Google’s HTML version. 300K limit? 500K?)

    I knew that Yahoo had a larger index limit, but I didn’t know how large. I learned earlier this week that Yahoo’s limit is the first 150K of a Web page, while its PDF indexing limit is 500K.

    … this is what I’m told, anyway. However, I’m finding something interesting. If you search Yahoo for aardvark apple zither zephyr originurlextension:html (originurlextension: is Yahoo’s gawdawful syntax for filetype:; I’m told they’ll be fixing it soon. Propburgers to Greg Notess of searchengineshowdown.com for educating me about it) you’ll find that filesizes are listed with search results, and the filesizes listed are well over 150K – I see page sizes of over 800K listed here! At least one of the pages listed, at 173K, appears from its cache to be fully indexed (the headers, footers, and copyright disclaimers are all in place – it doesn’t look “cut off") and a cache copied-and-pasted into a text editor weighs in at well over 200K.

    The bottom line is that Yahoo indexes far more of HTML pages than Google; if you’re running searches which might tend to focus on large pages (like word listing searches that might point you to dictionaries) try Yahoo first.

    I take this article from searchenginejournal.com
  2. blogginginc

    blogginginc New Member Webmaster

    wonder if anyone will ever try to read your thread full of copy paste spam.

Featured Resources (View All)

Share This Page