Deep Web Research for 2010

DiggingWay back, almost a year ago, I posted about a paper presented by Marcus Zillman on deep web research. Deep web research involves getting below the surface layer of web pages to documents stored on-line with extensions such as .pdf, .doc, .xls, ppt, .ps. and other more esoteric extensions. These extensions and this type of searching are particularly applicable to business research, as companies tend to store their information in this manner.

Mr. Zillman has done it again – check out his list of deep web resources for 2010 published on His comprehensive list includes articles with background information, tools, resources on the semantic web, presentations, pertinent blogs and lots of other great links. You don’t have to “dig deep” to find what you might be looking for with Mr. Zillman’s help!

Reblog this post [with Zemanta]

Can Legal Information Be Semanticized?

You may have noticed that it has been unusually quite in the Studio over the past week. Extreme travel and excessive work have taken their toll on my blogging output. However, it is time, once again, to buckle down and plumb the depths of law, technology and cutting-edge research and writing tools.

Absence apparently makes the heart grow fonder, as my reader greeted me with an interesting article over at by Dr. Adam Zachary Wyner on legal ontologies and how they may “spin a semantic web”.  I have heard it said before that legal information is too broad and deep and resistant to organization to be put under the semantic knife, or more aptly “blanket.”  I was heartened to see that Dr. Wyner disagreed (sort of) with this position.

First, I commend Dr. Wyner on providing an excellent explanation of what ontologies, knowledge management and taxonomies are and how they interrelate. I learned a few new concepts reading through his summary.

Next, the problem for those of us anxiously awaiting semantic treatment for case and statutory law is indeed that there is little consistency in how these materials are presented, the wide variety of search need to be satisfied and the sheer volume of material produced on a yearly basis. Dr. Wyner sees this conundrum as well. Coupled with the fact that most text marking, the process by which information is overlaid so that a computer can “read” and calcaulate an answer to a semantic search, would have to be performed by humans in the legal arena, it appears that the desired result may be somewhat of an impossible dream.

However, Dr. Wyner correctly points out that there are some categories of information that are clearly defined across cases and statutes that could certainly be marked for organization. For example, in the case law context, case headings, party information, result, even statements of issues can be tagged, organized and converted to computer-readable form. Dr. Wyner even suggests means for treating the information: through Semantic MediaWikis, as a learning tool for researchers and law students, and through treatment by large scale publishers and government agencies and courts prior to publication. Dr. Wyner sounds as enthusiastic as I feel about the possibilities of incorporating this layer of information and what such treatment could bring to the legal researcher’s table.

I am relunctant to get too excited about the Big Two jumping onto the semantic bandwagon and spearheading the effort to “semanticize” the vast amounts of legal information in their stables. After all, where is their incentive? Nonetheless, I can’t help but smile a bit at Dr. Wyner’s excited undertone and the thought that bringing cases and statutes into the 21st century may not be as impossible a dream as Don Quixote tilting at windmills.

Reblog this post [with Zemanta]