Tuesday, November 24, 2009

Assignment #6

Here is the link to my webpage for Assignment #6.

It isn't particularly fancy, as this was my first time doing HTML since high school, but I'm pretty sure it has all it needs. :)

Saturday, November 21, 2009

Week 11 reading notes

Even though I was signed in to the Pitt Library website, I couldn’t access the articles by David Hawking without being prompted to pay for each article, so I wasn’t able to read them.

Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A. (2005). Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.

“The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has been widely adopted since its initial release in 2001. Initially developed as a means to federate access to diverse e-print archives through metadata harvesting and aggregation, the protocol has demonstrated its potential usefulness to a broad range of communities.”

“The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has been widely adopted since its initial release in 2001. Initially developed as a means to federate access to diverse e-print archives through metadata harvesting (Lagoze & Van de Sompel, 2003), the protocol has demonstrated its potential usefulness to a broad range of communities. According to the Experimental OAI Registry at the University of Illinois Library at Urbana–Champaign (UIUC) (Experimental OAI Registry at UIUC, n.d.), there are currently over 300 active data providers using the production version (2.0) of the protocol from a wide variety of domains and institution types. Developers of both open source and commercial content management systems (such as D-Space and CONTENTdm) are including OAI data provider services as part of their products.”

“The OAI world is divided into data providers or repositories, which traditionally make their metadata available through the protocol, and service providers or harvesters, who completely or selectively harvest metadata from data providers, again through the use of the protocol (Lagoze & Van de Sompel, 2001).”

“As the OAI community has matured, and especially as the number of OAI repositories and the number of data sets served by those repositories has grown, it has become increasingly difficult for service providers to discover and effectively utilize the myriad repositories. In order to address this difficulty the OAI research group at UIUC has developed a comprehensive, searchable registry of OAI repositories (Experimental OAI Registry at UIUC, n.d.).”

MICHAEL K. BERGMAN, “The Deep Web: Surfacing Hidden Value” http://www.press.umich.edu/jep/07-01/bergman.html

“Traditional search engines can not "see" or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.
The deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a "one at a time" laborious way to search. BrightPlanet's search technology automates the process of making dozens of direct queries simultaneously using multiple-thread technology and thus is the only search technology, so far, that is capable of identifying, retrieving, qualifying, classifying, and organizing both "deep" and "surface" content.”

•“Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.
•The deep Web contains 7,500 terabytes of information compared to nineteen terabytes of information in the surface Web.
•The deep Web contains nearly 550 billion individual documents compared to the one billion of the surface Web.
•More than 200,000 deep Web sites presently exist.
•Sixty of the largest deep-Web sites collectively contain about 750 terabytes of information — sufficient by themselves to exceed the size of the surface Web forty times.
•On average, deep Web sites receive fifty per cent greater monthly traffic than surface sites and are more highly linked to than surface sites; however, the typical (median) deep Web site is not well known to the Internet-searching public.
•The deep Web is the largest growing category of new information on the Internet.
•Deep Web sites tend to be narrower, with deeper content, than conventional surface sites.
•Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web.
•Deep Web content is highly relevant to every information need, market, and domain.
•More than half of the deep Web content resides in topic-specific databases.
•A full ninety-five per cent of the deep Web is publicly accessible information — not subject to fees or subscriptions.”

“It has been said that what cannot be seen cannot be defined, and what is not defined cannot be understood. Such has been the case with the importance of databases to the information content of the Web. And such has been the case with a lack of appreciation for how the older model of crawling static Web pages — today's paradigm for conventional search engines — no longer applies to the information content of the Internet.”

“The sixty known, largest deep Web sites contain data of about 750 terabytes (HTML-included basis) or roughly forty times the size of the known surface Web. These sites appear in a broad array of domains from science to law to images and commerce. We estimate the total number of records or documents within this group to be about eighty-five billion.

Roughly two-thirds of these sites are public ones, representing about 90% of the content available within this group of sixty. The absolutely massive size of the largest sites shown also illustrates the universal power function distribution of sites within the deep Web, not dissimilar to Web site popularity or surface Web sites. One implication of this type of distribution is that there is no real upper size boundary to which sites may grow.”

“Directed query technology is the only means to integrate deep and surface Web information. The information retrieval answer has to involve both "mega" searching of appropriate deep Web sites and "meta" searching of surface Web search engines to overcome their coverage problem. Client-side tools are not universally acceptable because of the need to download the tool and issue effective queries to it. Pre-assembled storehouses for selected content are also possible, but will not be satisfactory for all information requests and needs. Specific vertical market services are already evolving to partially address these challenges. These will likely need to be supplemented with a persistent query system customizable by the user that would set the queries, search sites, filters, and schedules for repeated queries.”

Wednesday, November 18, 2009

Week 10 reading notes

Mischo, W. (July/August 2005). Digital Libraries: challenges and influential work. D-Lib Magazine. 11(7/8). http://www.dlib.org/dlib/july05/mischo/07mischo.html

“Effective search and discovery over open and hidden digital resources on the Internet remains a problematic and challenging task. The difficulties are exacerbated by today's greatly distributed scholarly information landscape. This distributed information environment is populated by silos of: full-text repositories maintained by commercial and professional society publishers; preprint servers and Open Archive Initiative (OAI) provider sites; specialized Abstracting and Indexing (A & I) services; publisher and vendor vertical portals; local, regional, and national online catalogs; Web search and metasearch engines; local e-resource registries and digital content databases; campus institutional repository systems; and learning management systems.”

“For years, information providers have focused on developing mechanisms to transform the myriad distributed digital collections into true "digital libraries" with the essential services that are required to make these digital libraries useful to and productive for users. As Lynch and others have pointed out, there is a huge difference between providing access to discrete sets of digital collections and providing digital library services (Lynch, 2002). To address these concerns, information providers have designed enhanced gateway and navigation services on the interface side and also introduced federation mechanisms to assist users through the distributed, heterogeneous information environment. The mantra has been: aggregate, virtually collocate, and federate. The goal of seamless federation across distributed, heterogeneous resources remains the holy grail of digital library work.”

Paepcke, A. et al. (July/August 2005). Dewey meets Turing: librarians, computer scientists and the digital libraries initiative. D-Lib Magazine. 11(7/8). http://www.dlib.org/dlib/july05/paepcke/07paepcke.html

“In 1994 the National Science Foundation launched its Digital Libraries Initiative (DLI). The choice of combining the word digital with library immediately defined three interested parties: librarians, computer scientists, and publishers. The eventual impact of the Initiative reached far beyond these three groups. The Google search engine emerged from the funded work and has changed working styles for virtually all professions and private activities that involve a computer.”

“For computer scientists NSF's DL Initiative provided a framework for exciting new work that was to be informed by the centuries-old discipline and values of librarianship. The scientists had been trained to use libraries since their years of secondary education. They could see, or at least imagine how current library functions would be moved forward by an injection of computing insight.

Digital library projects were for many computer scientists the perfect relief from the tension between conducting 'pure' research and impacting day-to-day society. Computing sciences are called on to continually generate novelty. On the other hand, they experience both their own desire, as well as funders' calls for deep impact on society and neighboring scientific fields. Work on digital libraries promised a perfect resolution of that tension.”

“For librarians the new Initiative was promising from two perspectives. They had observed over the years that the natural sciences were beneficiaries of large grants, while library operations were much more difficult to fund and maintain. The Initiative would finally be a conduit for much needed funds.

Aside from the monetary issues, librarians who involved themselves in the Initiative understood that information technologies were indeed important to ensure libraries' continued impact on scholarly work. Obvious opportunities lay in novel search capabilities, holdings management, and instant access. Online Public Access Catalogs (OPACS) constituted the entirety of digital facilities for many libraries. The partnership with computer science would contribute the expertise that was not yet widely available in the library community.”

“The coalition between the computing and library communities had been anchored in a tacit understanding that even in the 'new' world there would be coherent collections that one would operate on to search, organize, and browse. The collections would include multiple media; they would be larger than current holdings; and access methods would change. But the scene would still include information consumers, producers, and collections. Some strutting computer scientists predicted the end of collection gatekeeping and mediation between collections and their consumers; librarians in response clarified for their sometimes naive computing partners just how much key information is revealed in a reference interview. But other than these maybe occasionally testy exchanges, the common vision of better and more complete holdings prevailed.

The Web not only blurred the distinction between consumers and producers of information, but it dispersed most items that in the aggregate should have been collections across the world and under diverse ownership. This change undermined the common ground that had brought the two disciplines together.”

Lynch, Clifford A. "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age" ARL, no. 226 (February 2003): 1-7.

The link provided in the syllabus didn’t work, so I had to look up the article through the Pitt Library search - https://sremote.pitt.edu/bm~doc/,DanaInfo=www.arl.org+br226ir.pdf

“The development of institutional repositories emerged as a new strategy that allows universities to apply serious, systematic leverage to accelerate changes taking place in scholarship and scholarly communication, both moving beyond their historic relatively passive role of supporting established publishers in modernizing scholarly publishing through the licensing of digital content, and also scaling up beyond ad-hoc alliances, partnerships, and support arrangements with a few select faculty pioneers exploring more transformative new uses of the digital medium.”

“In my view, a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”

“At the most basic and fundamental level, an institutional repository is a recognition that the intellectual life and scholarship of our universities will increasingly be represented, documented, and shared in digital form, and that a primary responsibility of our universities is to exercise stewardship over these riches: both to make them available and to preserve them. An institutional repository is the means by which our universities will address this responsibility both to the members of their communities and to the public. It is a new channel for structuring the university's contribution to the broader world, and as such invites policy and cultural reassessment of this relationship.”

“To summarize, institutional repositories can facilitate greatly enhanced access to traditional scholarly content by empowering faculty to effectively use the new dissemination capabilities offered by the network.”

“An institutional repository can fail over time for many reasons: policy (for example, the institution chooses to stop funding it), management failure or incompetence, or technical problems. Any of these failures can result in the disruption of access, or worse, total and permanent loss of material stored in the institutional repository. As we think about institutional repositories today, there is much less redundancy than we have had in our systems of print publication and libraries, so any single institutional failure can cause more damage.”

“I believe that institutional repositories will promote progress in the development and deployment of infrastructure standards in a variety of difficult or neglected areas…
-Preservable Formats
-Identifiers
-Rights Documentation and Management”