Thursday, November 18, 2010

Week 11-Readings

Web Search Engines:  Parts One and Two--

There was so much good information in these two articles.  They were a little hard to find, so I'm glad I persevered.  After reading both I have one question:  are faster searches really what the average searcher is looking for?  I Googled "Beagle," and got 9,980,000 hits in .15 seconds.  How many of those hits contain information that I will be able to use?  How many are duplicates, and how many have nothing to do with beagles?  Would the average searcher be willing to wait longer for searches that are better?  I've waited several minutes for a YouTube video to load.  If I needed reliable, valid information about beagles, I'd be willing to wait a little while if it were going to save me time on the other end when I didn't have to comb my way through 9.9 million hits.

The Deep Web and the BrightPlanet Project--

The statistics presented in this paper are staggering--the deep web is 400 to 550 times larger than the surface web, there are 550 billion documents in the deep web compared to one billion in the surface web, and 95% of the deep web is information that is available to the public for free.  And these statistics are from 2001.  Nearly 10 years later, have technologies been developed that allow the general public access to the deep web?  If not, why not?

OAI Protocol for Metadata Harvesting--

"No one service provider can serve the needs of the entire public, hence user group-specific service providers have become the norm...These communities of interest are significant not only because they have adopted the protocol for a specific domain but also because they have developed additional standards, tools, and metadata scchemas to use along with the OAI protocol--much as the originators of the protocol had hoped."

It seems my questions about the deep web have been partially answered by the OAI Protocol for Metadata Harvesting.  It's interesting that an application originally designed for one use is being put to a similar use in many other communities.  I wonder, as the projects grow, will they become more or less useful, as disparate vocabularies make aggregating metadata difficult.  Controlled vocabularies are one way to avoid this problem, but who decides what vocabulary is the right one? 

10 comments:

  1. Melissa, I also enjoyed the search engine articles. To answer your question, I don't think that "faster" search engines are what the general web user wants. At least, it's not what I want. I want a search engine that is intelligent enoguh to narrow my search down to relevant results. I am not interested in a search engine that can end my search in 0.15 seconds, but gives me a zillion results that I can't use. In the future, I think Google needs to address this problem.

    ReplyDelete
  2. Hello Melissa,

    I'm right there with you regarding Google's "faster searches." What is the point of having millions of results in less than a second if nothing is relevant to you? At the very least, "duplicates" are removed, but there are many times where I will do a search for something in particular and find half a dozen sites citing the same resource, linking the same documents, and oftentimes having the same (or VERY similar) text. If that would be cleaned up a bit, it could be rather interesting. . .

    Which brings us to the metadata note you make: who decides what vocabulary should be used? Also, even if that idea were to ever come to pass, wouldn't we still have the same problem of a "fast" search causing hours of digging? Just curious to see if anyone else reached that same conclusion. . .

    ReplyDelete
  3. I was also curious about whether we are, today, any closer to public access to the Deep Web. Is it possible for, say, a librarian, to access the Deep Web for a client? It worries me now that we apparently are not finding quality information on the Surface Web. We teach people how to critically evaluate Web sites; we warn them about bogus information; yet we are still unable to get them to the best information.

    ReplyDelete
  4. I genuinely think that faster searching is really what the general public wants. Personally, I use Google to look things up on Wikipedia or to get to a website I can't quite recall the URL for more than I use it for real searching, so I'd be happy to see it get as instant as possible.

    Quality of resources and speed are not mutually exclusive.

    ReplyDelete
  5. Melissa, I agree with you on the issue of waiting longer for better results! I would definitely rather wait longer if the search engine produced the most relevant sites. In the long run we would save time (and sanity!) becuase we would not have to browse so many results.

    ReplyDelete
  6. Hey, like you I was also struck by the statistics behind the deep web. Especially with this article dealing with numbers now nearly a decade old one can only imagine what today's situation is like. It would be interesting to see if our coverage has improved or decreased in the meantime.

    ReplyDelete
  7. I totally agree with everyone who chose "quality" over "speed" or "quantity" when it comes to search engines. Even when results are "sorted by relevancy," I am often left dumbfounded at how irrelevant most results are. I think, however, that speed may be important for people just looking to 'skim' for information.

    ReplyDelete
  8. Yes,those articles were great, I passed them on to several interested parties. As to the selection of a metadata language, I guess that will reside with the creator of the information, we are a idiosyncratic bunch hence the flexibility of the OAI provisions. We can therefore opt for what works best for us

    ReplyDelete
  9. The first IT class that I took in college had us do an assignment where we had to perform a search and get it under a certain time. I wondered then what the point was and I have yet to figure it out. Quality is the most important thing for the user and I know I'd be willing to wait 4.6 seconds longer for it.

    ReplyDelete
  10. I also agree with everyone that waiting a "few seconds" longer for better results would actually be more ideal for me. I feel though that Google knows about this issue. That while their searches are super fast, sometimes the results are not the best. I wonder if they have done research (and I'm sure they have)to find that most people are looking for something within the first few results, so therefore speed and the top ten links are what matter the most. The next few sets of results are not really important as the majority of individuals do not look at the other results. Isn't that in some way part of the whole new Google "instant" idea, the first ten results faster?

    ReplyDelete