Yesterday, I was treated to a lovely dinner at a favorite restaurant near the harbor in Boston. My husband and I shared a spectacular bottle of wine and an elegantly- prepared French meal. The lighting was low, the restaurant was filled with couples hunched toward each other engaged in rapt conversation. We were raptly conversing as well. About what, you might ask?
My husband was trying to remember a search engine that he was particularly fond of P.G. (Pre.Google.) He couldn’t remember the name, but he remembered why he liked it: it would rank results by the number and quality of the hits to similar pages. I suggested he “google” it or “wikipedia” it when we got back home (when DID those proper nouns become verbs?) and so he did.
The engine he was trying to recall was Direct Hit, the brain-child of a Boston-based company founded in the mid to late 1990s. The engine relied on the searching activity of internet users (back then in the low millions) and then applied the company’s patented algorithms measuring how long searchers spent on similar pages, how searchers clicked though pages, where pages were ranked in the original search results lists, etc. My husband had remembered, correctly so, that Direct Hit was purchased by Ask Jeeves in around 2000.
So, this got me thinking: how did internet search evolve to the point where “google” is an action verb? Where is search “going” and what will our next action verb relative to internet searching be?
I did rely heavily on Wikipedia for this entry, as I don’t have sufficient time to really delve this massive topic in the manner it should be addressed. First, some background. The term “Internet” cropped up around 1974 to describe a system of copper wire, fiber optic cables, wireless technology and other bridges forming a network of computers. This network offers the means to exchange data through use of a standard protocol or common language via “packet switching.” It is an external network of internal networks. Although its development spanned the better part of 30 years, the Internet was not opened to commercial interests until 1988, with the connection to MCI Mail.
The term “Internet” is distinguished from the often-substituted term “world wide web” or WWW. The WWW was not invented until around 1989 and was not publicized until the early 1990s. This seems amazing to me in retrospect: the WWW has been in the public domain for less than 20 years, but has become so heavily ingrained in our lives that it is hard to imagine life without it.
The WWW is a browser or system of interlinked, hypertext documents accessed by the internet. My image is of a vast community of houses and businesses and libraries, linked by streets, electrical and cable wires and plumbing. I won’t get into the technical process here, but the “how” is fascinating.
Picture of First Web Server Used by Tim Berners-Lee
Information is stored on a computer, or web server, that runs a program that can converse with the WWW via the internet. Once the hardware network and the software means of connecting information to that network were established, the immediate need for a road-map or “Zagat’s List” of the desired information arose. In other words, how the heck do we find the house, business or library where the desired information is stored?
Enter the search engine. First and foremost, search engines are the tools employed to search the WWW and find the information lurking in the “nodes” along the vine. They are distinguished from web directories – lists compiled and maintained by humans – in that they employ technology or algorithms, along with some human input in achieving information retrieval. Before 1993, information was manually incorporated into a list of web servers, edited and hosted by the father of the WWW, Tim Berners-Lee. Needless to say, as the quantity of information attaching to the network exploded, humans could not keep up with the indexing task.
The first tool for collecting this information arose in 1990 and was called Archie. Archie downloaded directory listings but could not index them. Gopher came next and offered an indexing system, which two new programs, Veronica and Jughead, could search in a very rudimentary way. In 1994, the floodgates opened with Lycos, followed quickly by AltaVista, Dogpile and other popular means.
I copied this incomplete timeline of search engines and the year they were launched from Wikipedia:
In 1998, the juggernaut Google came to be. Today, several hundred million queries run through its gates. What makes Google so special? Google uses PageRank, a patented system, to rank pages in a perceived hierarchy of importance to the human searcher. PageRank uses an algorithm that assigns a numerical “weight” to the elements of hyperlinked documents available to the search engine. Pages are scored using the algorithm based on a weighted sum of all of the PageRanks of pages linking to a given result. Google calls the system “democratic” in that it relies on “votes” or links to a result, generated by human hits, with greater weight afforded to more important links. The more “important” a page appears from its hits and links, the higher its rank in the result. Google is not the only engine using this type of ranking: others include Ask.com and Teoma.
Where is this all going? Web 3.0 and context searching are coming to a browser near you! Whereas Web 2.0 has been about networking the people on the Web and employing their resources through social connectivity and collaboration, Web 3.0 will be about “intelligent” web applications natural language processing and machine-based learning and reasoning. The “intelligent web” will attempt to tailor online searching to more closely match our needs. Now, that sent shivers up my spine as my mind’s eye pictures HAL from 2001!
As we edge closer to true open flow of information and true artificial intelligence, applications and search engines likely will recede into the background. Picture Second Life, where the on-line world starts to mimic the real world and the line between them blurs. Web 2.0, which employs old standbys like Google and Yahoo, is characterized by information overload, while Web 3.0 will offer greater control over the overload.
Searching will be semantic and contextual and will use the meaning in language when compiling lists of recommended sources. Instead of just searching keywords, semantic search will permit computers to check the context of the words, offering more relevant results. While there are a lot of bells and whistles to the concept, the underlying premise is a shift in information packaging to take it from a form created and understood by humans to a form more easily digested by computers. The semantic web is an effort, spearheaded by none other than WWW daddy Berners-Lee, to take the “human” out. Machines will talk to machines and, as Berners-Lee describes it “[t]he intelligent agents that people have touted for ages will finally materialize.” HTML will be a thing of the past and new language will arise that will fundamentally change the nature of search. Your specific needs, preferences and wants will live in a “cloud” up in the WWW-sky, available to you at a moments notice in a vertical channel.
O.k., back to reality. Do you want to try semantic searching? Hakia offers a search engine that employs the concept. There is no statistical ranking, a la Google, at Hakia. Hakia’s results focus on quality rather than popularity. In other words, we have moved from a democracy to a benign dictatorship. Hakia proclaims that its results must satisfy three criteria: they come from credible websites; they represent the most recent available information; and, they remain absolutely relevant to the query. Results are mined by concept and meaning match, information that is virtually invisible to a Google search.
I was able to capture a portion of the screen from a search on “intellectual property law”:
Although not terribly clear from my shot, this page shows results in the main window, with are tabs or filters for credible sites, news, images and “meet others.” The page itself shows bits of the “all” results, “credible sites”, “news results”, as well as user- generated content. Like Google, there are the ever-present “sponsored links”. The page also offers a link to the law “gallery.” The law “gallery” page contains the overview of the concept of “law” with legal definitions of “law”, image search results, basic information and FAQ and topical entries.
This is just one of the new breed of search engines pushing search closer to Web 3.0 sensibilities. Judging by how fast “Google” became a verb in our lexicon, I am sure a year from now we might be “hakia-ing” or some other inscrutable task when we look for content on the Web. No matter what the verb becomes, search is evolving and I am excited to see where it is all leading. I will definitely keep you posted on searching and researching on the Web as I learn more!