I was asked earlier today to find something that Google couldn’t find, at least for free anyway. So what did I do? I did the deep dive, of course.
I haven’t touched on this topic recently here in the Studio, so the time is ripe. I am talking about the “deep web”, the “invisible web” of data and documents hosted on the Internet that traditional search bots and crawlers of Google and similar ilk can’t seem to index. It is estimated that the invisible web is 500 times larger than the searchable portion of the Web, which we all know is pretty freaking big to begin with. Sometimes, you won’t be able to find what you are looking for using traditional search engines, so what do you do? You use some tricks to access those hidden databases, of course – you are more than 500 times more likely to find the goods.
If you are looking for a search engine tuned to deep web searching, check out this great list (over 100!) broken down by topic, curated by the Online Education Database.
If you are looking for information that is more geared towards the legal profession, you can do no better than this great list of invisible web resources over at LLRX curated by Marcus Zillman.
Wondering what I was looking for? A current list of legislation across the 50 states pertaining to medical malpractice, particularly tort damages caps. I found it in a database maintained by the National Association of State Legislatures. I didn’t have to pay a cent for it. Thanks guys! Sorry Westlaw.
Confounded in your efforts to locate the unlocatable? Google not working for you? Perhaps you need to dig a little bit deeper. Pipl is a deep web search service that focuses on finding individuals. Google searches web pages and Pipl searches the “deep” or “invisible” web – that part of the Web that is hidden for the most part from standard browsers. Stuff like documents in on-line databases. There is as much as 500 times as much information lurking in the deep web than floating on the surface. When it comes to people, the best information is generally found in such “unsearchable” documents and not on web pages. Using advanced language analysis and algorithms, Pipl can extract facts, contact details and other information from profiles, directories, scientific publications, court records and other sources.
The search box asks for the person’s full name, email, username or phone number. The information retrieved is not private – it is public information that is simply hard to get to due to its particular web form. If you are concerned about your own information, you can request to have your Pipl Profile removed from their site by clicking here. While their automatic removal is disabled, you will be given the email to manually request removal.
Of course I searched myself. And I found a staggering amount of information. Not that is not entirely surprising given the amount of time I spend on the web and the number of profiles I have filled out. That said, Pipl managed to tie a lot of disparate information about me into one page at their site. Needless to say, Pipl is pretty powerful.
If you want to
stalk search for that missing someone, give Pipl a try. You never know what you might find.
It’s been a while since I talked about getting to “hidden” web documents. I figured it was about time to hit it up again – I like search and just love a good mystery.
The Deep Web (also known as the Invisible Web), for those unfamiliar, is the huge expanse of resources lurking below the reach of traditional search engines. Google’s minions cannot access content protected by passwords, or unfamiliar document extensions, or privately stored information. Over half of the estimated amount of Web content out there is attributed to this relatively untapped Deep Web.
I was prompted by the good people at MakeUsOf, my favorite tech for dummies web site. They just ran this great article compiling some of the current Deep Web diving tools (link here). The tools include Infomine (link here), the product of a consortium of libraries that taps stuff stored in databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and other resources. The WWW Virtual Library (link here) is a venerable collection started by Web Daddy Tim Berners-Lee. Intute (link here) is UK-based and university sponsored, with topical content and human-curated links. Cool add – Intute has 60 free online tutorials on how to improve your internet search skills! Complete Planet (link here) also organizes by topic, promising to uncover hidden web content with advance search filters. Infopedia (link here) should be considered as a curated alternative for Wikipedia – it accesses encyclopedias, almanacs and other reference materials. DeepPeep (link here) offers a deep but transient look at forms across a limited spectrum of subjects. IncyWincy (link here) is a metasearch engine for the Deep Web, with the ability to set alerts. DeepWeb Tech (link here) offers access to five search engines as well as plug-ins, for medicine, science and business information. Scirus (link here) meets your DeepWeb scientific needs. TechXtra (link here) is all about the math.
Go. Search. Find!
Way back, almost a year ago, I posted about a paper presented by Marcus Zillman on deep web research. Deep web research involves getting below the surface layer of web pages to documents stored on-line with extensions such as .pdf, .doc, .xls, ppt, .ps. and other more esoteric extensions. These extensions and this type of searching are particularly applicable to business research, as companies tend to store their information in this manner.
Mr. Zillman has done it again – check out his list of deep web resources for 2010 published on LLRX.com. His comprehensive list includes articles with background information, tools, resources on the semantic web, presentations, pertinent blogs and lots of other great links. You don’t have to “dig deep” to find what you might be looking for with Mr. Zillman’s help!