ࡱ > 4 6 + , - . / 0 1 2 3 ߗ bjbj S o u ?! l h h h h $ n P l T n { v ̴ 0 I \ ' 0 $ - W @ I W W h > 5 W H b W $ '" * \ #eWn * " K 0 { n 2 , ^ h h , Digital Library Technical Infrastructure Task Force Building the Digital Library Environment at the University of Kansas Lawrence Working Document, ver. 1.0 Report to the Digital Library Executive Group November 10, 2000 Rick Clement Wes Hubert John Miller Jerry Niebaum Beth Forrest Warner Table of Contents TOC \o "1-3" \h \z HYPERLINK \l "_Toc498592267" Executive Summary PAGEREF _Toc498592267 \h 1 HYPERLINK \l "_Toc498592268" Introduction PAGEREF _Toc498592268 \h 7 HYPERLINK \l "_Toc498592269" Conceptual Foundations of the KU Digital Library PAGEREF _Toc498592269 \h 9 HYPERLINK \l "_Toc498592270" Implementation of the KU Digital Library PAGEREF _Toc498592270 \h 9 HYPERLINK \l "_Toc498592271" Strategies PAGEREF _Toc498592271 \h 9 HYPERLINK \l "_Toc498592272" Functions PAGEREF _Toc498592272 \h 12 HYPERLINK \l "_Toc498592273" Roles and Responsibilities PAGEREF _Toc498592273 \h 13 HYPERLINK \l "_Toc498592274" Architecture Components and Standards PAGEREF _Toc498592274 \h 17 HYPERLINK \l "_Toc498592275" Component Selection PAGEREF _Toc498592275 \h 17 HYPERLINK \l "_Toc498592276" Strategic Components PAGEREF _Toc498592276 \h 18 HYPERLINK \l "_Toc498592277" Local Repositories PAGEREF _Toc498592277 \h 20 HYPERLINK \l "_Toc498592278" Resource Naming Services PAGEREF _Toc498592278 \h 21 HYPERLINK \l "_Toc498592279" Object Classes and Services PAGEREF _Toc498592279 \h 22 HYPERLINK \l "_Toc498592280" Standards for Content and Metadata PAGEREF _Toc498592280 \h 23 HYPERLINK \l "_Toc498592281" Archiving PAGEREF _Toc498592281 \h 31 HYPERLINK \l "_Toc498592282" Interface Considerations PAGEREF _Toc498592282 \h 33 HYPERLINK \l "_Toc498592283" Interoperability Considerations PAGEREF _Toc498592283 \h 33 HYPERLINK \l "_Toc498592284" Conceptual Architecture Model PAGEREF _Toc498592284 \h 34 HYPERLINK \l "_Toc498592285" Supported Software Tools PAGEREF _Toc498592285 \h 35 HYPERLINK \l "_Toc498592286" Support Issues PAGEREF _Toc498592286 \h 37 HYPERLINK \l "_Toc498592287" Project Selection and Prioritization PAGEREF _Toc498592287 \h 40 HYPERLINK \l "_Toc498592288" Selection Guidelines: PAGEREF _Toc498592288 \h 40 HYPERLINK \l "_Toc498592289" Process PAGEREF _Toc498592289 \h 41 HYPERLINK \l "_Toc498592290" On-going Services Evaluation PAGEREF _Toc498592290 \h 42 HYPERLINK \l "_Toc498592291" Recommendations PAGEREF _Toc498592291 \h 43 HYPERLINK \l "_Toc498592292" Appendix A: Charge to the Task Force PAGEREF _Toc498592292 \h 45 HYPERLINK \l "_Toc498592293" Appendix B: KU Digital Library Mission and Goals PAGEREF _Toc498592293 \h 46 HYPERLINK \l "_Toc498592294" Appendix C: Object Classes & Behaviors Background Information PAGEREF _Toc498592294 \h 48 HYPERLINK \l "_Toc498592295" Appendix D: Textual Markup Languages Background Information PAGEREF _Toc498592295 \h 53 HYPERLINK \l "_Toc498592296" Appendix E: Tools Background Information PAGEREF _Toc498592296 \h 60 HYPERLINK \l "_Toc498592297" Appendix F: Archiving Background Information PAGEREF _Toc498592297 \h 63 HYPERLINK \l "_Toc498592298" Appendix G: User Interaction Scenarios PAGEREF _Toc498592298 \h 67 Table of Figures TOC \h \z \c "Figure" HYPERLINK \l "_Toc496445040" Figure 1: Digital Library Functions PAGEREF _Toc496445040 \h 12 HYPERLINK \l "_Toc496445041" Figure 2. Strategic Components Diagram PAGEREF _Toc496445041 \h 18 HYPERLINK \l "_Toc496445042" Figure 3: Conceptual Architecture Model PAGEREF _Toc496445042 \h 34 Acknowledgements / Disclaimer While many websites and articles were reviewed, the Task Force would like to especially acknowledge the following institutions for providing information that was used to help shape the discussions of the Task Force and this document: California Digital Library Digital Library Federation Harvard University Library of Congress University of Arizona University of California-Berkeley University of Michigan Thank you. Any mistakes or misinterpretations of information are solely the responsibility of the Task Force Report authors. Executive Summary The meaning of the term digital library is less transparent than one might expect. The University of Kansas Digital Library Executive Group and Digital Library Advisory Group were formed in 1999 to begin the process of defining and shaping the concept of digital library in the KU environment As KU prepares to step into the digital library arena, it is helpful to shape a common understanding of the concept by referring to our Digital Library Vision, Mission, and Principles statements, available on the KU Digital Library Initiatives website ( HYPERLINK "http://kudiglib.ukans.edu" http://kudiglib.ukans.edu). In May 2000, the Digital Library Executive Group formed the Digital Library Technical Infrastructure Task Force. This report is a result of Task Force discussions on the technical issues involved in creating a digital library environment for the University of Kansas. Given the volatile nature of technology in general, and digital libraries in particular, this report should be viewed as a framework for technical implementation. Key points from the report are summarized below. Roles and Responsibilities There are many roles in a digital library program and there is seldom a one-to-one relationship between these roles and individual people. These roles fall into several broad categories including Management, Requirements Analysis & Design, Core Technical Support, User Support, and Legal and Policy Issues Support. The digital library team must have a balance of skills across a variety of roles with individual staff members often wearing several hats at once. It is important to recognize that each of these roles is critical to the success of the initiative. Architectural Components and Standards One of the hallmarks of a library is the ability to provide coherence and context for access to disparate collections of information resources. This is a critical principle to carry forward into the digital environment and a distinguishing characteristic that separates digital libraries from simple collections of links to electronic objects. There are several critical factors involved in being able to provide a coherent, contextual environment for digital resources: Use of standards for creating objects Use of standards for describing resources (for access) A common methodology for access to object types, and An understanding of object behaviors and user interactions A successful digital library environment is not an isolated, self-sufficient entity that exists and operates apart from the rest of the information and technology environment of the institution. It is critically dependent on, and must work within, the resources and decisions made in many areas including networking, computing support, information resource support, and information technology policies. Access to and use of information resources for the University depends upon a solid foundation that provides an exceptional network and computing services infrastructure. A key factor in the provision of a coherent, unified environment is the use of standards, whenever possible, for the many different aspects of storing and accessing digital information, including standards for: interoperability, data format, resource identification, resource description, and data archiving. Building upon this basic institutional infrastructure, a successful digital library environment should provide additional architectural components including: Local Repositories The conceptual model of a repository as a set of services and related facilities may be defined to include: A datastore containing digital objects (content and metadata) created by, or under the auspices of, the KU-DLI Services necessary to the smooth operation of satisfying requests for objects residing in the datastore Services for effective long-term management of objects in the datastore. Datastore facilities and services may be either centralized or distributed, with a hybrid approach being the most likely and practical. Not only will remote resources be included but potentially, local resources hosted on non-DL servers as well. While these may be included from an access perspective only, certain standards for persistent access should be met for materials to be included under the KU-DL umbrella. Resource Naming Services and Standards The lack of permanence, over time, of object names is both the hallmark and the bane of existence of creating persistent access to digital resources. To facilitate the use of digital object names within the KU digital library environment, the KU-DLI should adopt the concept of name resolution services; develop and provide a name resolution server for the campus; and develop and provide a set of services that permit the KU community to create and manage names for their digital objects. In addition to developing naming conventions, policies and processes must be developed to create organizational naming authorities and outline the responsibilities for maintaining the validity of names over time. Content and Metadata Standards Standards in the creation of objects and metadata allow common storage, access, and management processes to be used and economies of scale to be realized for the institution. The use of recommended standards should be required for objects and metadata created locally under the auspices of the KU Digital Library and stored in the central object / metadata repositories. These same standards should be strongly encouraged for objects and metadata created outside the control or coordination of the KU-DL and stored / accessed remotely (either on or off campus). Objects not adhering to published KU-DLI standards should be accepted for long-term management only under exceptional circumstances. Specific content and metadata standards are outlined in detail in the report. Archiving The mission of digital archiving initiatives is to preserve the integrity of objects and ensure their persistence. This seemingly simple statement, however, raises a wide range of questions, most of which focus on standards, responsibility for archiving, technical strategies, the connection between archiving and access, and economic models. Preservation in the digital world is not exclusively a matter of longevity of storage media. The viability of digitized files is much more dependent on the life expectancy of the access system. Institutions must prepare to migrate digitized resources from one generation of technology to subsequent generations. The use of digital technologies from a preservation perspective requires a deep and longstanding institutional commitment to long-term access, the full integration of the technology into information management procedures and processes, and significant leadership in developing appropriate definitions and standards for digital preservation. Services and Policies Tying the various individual components together are the services offered through the digital library environment. These are the tools and processes whose ultimate goal is to provide accurate, seamless, and 'transparent' access across the various repositories and systems for the user community. Services and policies initially provided for the digital library environment should include object repository registration processes and guidelines, metadata advisory and creation processes and guidelines, naming convention guidelines and services, object creation advisory and creation processes and guidelines, common navigation processes, common tool sets, etc. Bringing these technical, service, and policy components together to provide a coherent whole can be illustrated with the following diagram of a conceptual architectural model for the digital library environment: Selection and Prioritization The number of meritorious projects proposed will always far outweigh the resources available to address them. In order to ensure that resources are invested wisely in digitizing and managing the most significant and useful materials at the lowest possible cost without placing the institution at legal or social risk, a mechanism for selecting projects and giving them priority needs to be developed at the outset, in consultation with the primary stakeholders. The following guidelines are proposed as an initial set of selection criteria for projects being considered for implementation within the KU-DLI: consistent with the mission and goals of the institution and fits within the strategic focus of the KU-DLI has clearly defined goals and outcomes responds to known campus needs is collaborative in nature, leverages resources, and supports partnerships on campus or with other institutions, groups, or individuals with similar interests cost/benefit analysis over the short- and long-term is positive takes advantage of available outside funding, or positions the KU-DLI to obtain outside funding enhances the diversity of resources available or the audience served, within the supported technical infrastructure and standards facilitates innovation in teaching and the curriculum facilitates innovation in research facilitates innovation in new modes of scholarly communication facilitates creative, innovative, and interesting concepts and approaches within the technical architecture framework utilizes or provides the opportunity to build on hardware and software solutions existing within the University preserves and enables continuing access to significant rare or unique collections supports selection and/or creation of information products in disciplines or sub-disciplines where KU is recognized as a leader or is pre-eminent [note: this would not be determined solely by rankings] enhances curricular development in emerging areas (e.g., indigenous studies) maximizes economies of scale uses the Universitys fiscal, human, and infrastructure (space, hardware, etc.) resources effectively is consistent with KU Libraries scholarly communications and collection development and management principles enhances KUs systems capability to support usersin self-sufficiency benefits a significant number of users follows the technical architecture parameters of the KU-DLI and provides digital objects that meet the highest technical standards the institution can afford In order to apply these guidelines / criteria to selecting and prioritizing projects, a selection process must be established. This process should include an objective method of applying criteria. In addition, it should involve a number of campus stakeholders in both developing and approving the process including the Digital Library Executive Group and the Digital Library Advisory Group. Selection and prioritization processes will need to follow somewhat different criteria and process paths depending on the source of the materials and the parties involved. Implications for Resources As with any project, the vision can quickly overwhelm the resources available. In order to create a successful digital library environment for the University, it is critical that the initial scope of the initiative be carefully defined in order to maximize use of the resources available. Recommendations Based on the background information and discussion presented in this report, the Digital Library Technical Infrastructure Task Force recommends the following actions to facilitate the establishment of a robust digital library environment for the University of Kansas Lawrence campus: Adopt the Principles statement as the basis for central Digital Library support for the University of Kansas Lawrence campus. Adopt the Implementation Strategies for the Digital Library Goals as the initial implementation framework for the DLI. Adopt the conceptual architecture model for the KU-DL. Recognize that implementation of this model will require a phased, iterative approach whose dimensions will be determined by the scope of the KU-DLI. Adopt the concept of local repositories for the storage and management of local digital resources. Appoint a group to develop and bring to the DLEG for approval, a detailed plan. Adopt the concept of Name Resolution Services for the naming of and access to resources in the digital library. Appoint a group to select a name resolution service scheme and develop a detailed implementation plan. Adopt the concept of Object Classes and Services as a methodology for standardizing treatment of and interaction with digital library resources. Adopt the guidelines & process for component/service/tool selection for the DLI Adopt the concept of a software toolkit and support levels and the criteria for inclusion. Specific recommendations/phasing will be determined by the scope of KU-DLI and decisions on other recommendations in this report. Adopt the basic recommendations for content format standards for locally produced objects and commercially purchased/licensed resources, whenever possible. Adopt the basic recommendations for metadata standards for resources to be included in the KU-DL. Appoint a group to select/develop metadata tagset definitions (minimal levels at least) and crosswalks, and procedures for creating and maintaining KU-SL metadata. Determine the scope of KUs archiving / preservation commitments for digital resources. Adopt the project selection guidelines and process. Appoint a group to develop detailed procedures for submission and selection of projects. Select initial project(s) for implementation. Following selection, appoint a group to develop detailed task plans, timelines, and resource needs. Approve the use of existing, and/or purchase of new, equipment and software for the initial DL implementation once the DLI scope and initial projects are determined. Obtain funding for start-up costs. Appoint a group to develop service evaluation guidelines and process as the DLI progresses. Introduction The meaning of the term digital library is less transparent than one might expect. The words conjure up images of cutting-edge computer and information science research. They are invoked to describe what some assert to be radically new kinds of practices for the management and use of information. And they are used to replace earlier references to electronic and virtual libraries. In order to help shape their common understanding of the concept, members of the Digital Library Federation crafted a working definition of digital libraries: Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities. This is a full definition by any measure, and a good working definition because it is broad enough to comprehend other uses of the term. [] other definitions focus on one or more of the features included in the DLF definition, while ignoring or de-emphasizing the rest. For example, the term digital library may refer simply to the notion of collection, without reference to its organization, intellectual accessibility, or service attributes. This is the particular sense that seems to be in play when we hear the World Wide Web described as a digital library. But the words might refer as well to the organization underlying the collection, or, even more specifically, to the computer-based system in which the collection resides. The latter sense is most clearly in use in the National Science Foundations Digital Library Initiative. Yet again, institutions may be characterized as digital libraries to distinguish them from digital archives when the intent is to call attention to the differences in the nature of their collections. While the concept of digitized resources in libraries has been in evidence for several decades, the current use of the term digital library stems from the federally funded (NSF/ARPA/NASA) Digital Libraries Initiative in 1994. Since then, many excellent digital library efforts have sprung up across the country, and around the world, that have helped refine the concepts, practices, systems, and tools that go into building a digital library. The University of Kansas is now preparing to step into the digital library arena. Commercially produced digital resources have been made available through the Library for a number of years. An increasing appetite within the university community for access to more digital resources and an interest in digitizing locally held materials has highlighted the need for a cooperative approach to designing and supporting a coherent digital library environment for the University. In response to this need, the Digital Library Executive Group and the Digital Library Advisory Group were formed in 1999. In May 2000, the Digital Library Executive Group formed the Digital Library Technical Infrastructure Task Force with the following charge: The Digital Library Technical Infrastructure Task Force is charged to develop specific recommendations and plans for the KU Digital Library Technical Infrastructure the collection of common systems and services that make it possible to store, organize, and access digital materials: Develop architectural principles and standards for KU shared digital collections. These should be consistent with relevant industry, Kansas Digital Library, and other University standards, provide a framework that facilitates the creation of integrated systems, and provide the flexibility to foster innovations in scholarly communication. After DLEG approval, submit to the State Architecture process for consideration. Identify appropriate elements for the development of an integrated digital library system which accommodates all formats - metadata, full-text, images, numeric data, geospatial data, etc. Develop principles and specifications for the identification, evaluation, selection, and implementation of online tools and services for sharing, accessing, manipulating, integrating, and archiving electronic scholarly content in all forms. Recommend an appropriate strategy for moving campus-developed tools to the KU Digital Library. Recommend a process for prioritizing services to be implemented. Develop principles and guidelines for the type and frequency of evaluation of tools and services, both before and after they are implemented. The Task Force will present their report and work plan to the DLEG by September 1, 2000. The plan should present specific recommendations, task plans, and timelines for accomplishment. This report is a result of Task Force discussions on the technical issues involved in creating a digital library environment for the University of Kansas and draws on draft Digital Library Executive Group documents defining the overall programmatic direction for the KU Digital Library. While the Task Force focused on efforts on the Lawrence campus, the Initiatives will be coordinated with the KU Medical Campus whenever appropriate. Given the volatile nature of technology in general and digital libraries in particular, this report should be viewed as a framework for technical implementation. While specific technical recommendations are based on current industry standards and best practices, these will undoubtedly evolve as the industry and the initiative mature. Conceptual Foundations of the KU Digital Library Although this Task Force concentrated primarily on defining the technical aspects of KUs Digital Library, it is critical to have a shared understanding of the basic philosophical and conceptual foundations upon which these technical discussions and recommendations are based. These foundation documents are available on the KU Digital Library Initiatives website ( HYPERLINK "http://kudiglib.ukans.edu" http://kudiglib.ukans.edu ). Implementation of the KU Digital Library Strategies The goals of the KU Digital Library Initiatives (KU-DLI) will be achieved through a variety of implementation strategies. The KU-DLI goals are restated below with specific implementation strategies outlined for each. Goal 1: Develop digital collections -- expanding over time in number and scope -- created from the conversion to digital form of documents contained in our and other libraries and archives, and from the incorporation of holdings already in electronic form. Identify needs and initiate projects to acquire or create appropriate digital resources Identify existing digital projects / resources on campus and incorporate them into the framework of the Digital Library to the extent possible Encourage the capture and centralized management of primary university scholarly data including resources born digitally. As appropriate, manage resources in conjunction with other University programs, such as the Data Warehousing Initiative and records management efforts. Implement and/or develop scalable systems for handling classes of materials Provide advisory services and training for campus users in standards-based methods and procedures for creating and managing digital materials Provide a means for identification of resources associated with the KU-DLI Goal 2: Establish a collaborative management structure to coordinate and guide the implementation and ongoing maintenance of the digital library collections; to set policy regarding participation, funding, development and access; to encourage and facilitate broad involvement; and to address issues of policy and practice that may inhibit full citizen access. Provide a management focus and technical infrastructure for digital information projects on campus Provide central coordinating and management services for campus projects Provide and encourage use of centralized server and storage services Provide and encourage use of centrally supported developer tools and services Work with the campus community to identify and prioritize project areas for digital collections Actively seek partnerships with campus units and departments, the University Press of Kansas; regional institutions, centers, organizations, and, in doing so, market ISs expertise and information resources Encourage greater interoperability and integration of digital collections across campus and within the state, consistent with national and international initiatives Initiate and participate in consortial activities as appropriate Provide universal and open access to the campus and beyond to the extent technically feasible and within appropriate licensing provisions Develop policies and guidelines, as appropriate, for the management of and access to the Digital Library Goal 3: Develop a funding strategy that addresses the need for support from both public and private sources to provide the means to launch new initiatives. Advocate investment consistent with mission: adequate staff for managing and supporting the KU-DLI funding for collections, both acquired and created adequate technical infrastructure funding and support Develop economic models to support a sustainable digital library environment for the campus Explore, identify, and secure external resources (e.g., funding) and expertise to support the Digital Library. work with various campus funding and fund-raising groups, as appropriate, to secure contributions pursue grant opportunities Goal 4: Form selection guidelines that will accommodate local initiatives and projects; and ensure that the digital library collections comprise a significant corpus of materials. Provide for coordination with the Kansas Digital Library. Develop criteria and processes for project and resource selection. Work in cooperation with KUMC and the Kansas Digital Library to shape standards, processes, and initiatives that meet both the needs of the University and the state as a whole. Goal 5: Adopt common standards and best practices to ensure full informational capture; guarantee universal accessibility and interchangeability; simplify retrieval and navigation; and facilitate archiving and enduring access. Provide guidance for organization of campus electronic information resources, especially in support of distance learning Incorporate appropriate tools and practices from the larger technical environment for creating, managing, accessing, and analyzing content Emphasize and encourage the use of standards and/or best practices, as appropriate, for the creation and management of digital resources Provide a common interface to the KU-DL to the extent it is technically feasible and legally possible Apply appropriate technical standards consistent with established and emerging standards and best practices (e.g., metadata, file formats, search protocols, networking/telecommunications, etc.) for compatibility and interoperability (e.g., minimize the number of interfaces, software applications, etc.) Coordinate the use of appropriate technical standards and best practices with KUMC and Kansas Digital Library Provide a secure environment for access (I/A/A) and for resources Facilitate long-term access to digital collections by developing and maintaining a content migration / preservation strategy Provide support facilities for resource management and tools Recommend centralized server support Provide guidelines for distributed local servers that participate as resource repositories Provide basic support for a recommended DL toolkit, conversion services, and metadata creation and management Goal 6: Establish an ongoing and comprehensive evaluation program Monitor and evaluate national and international efforts toward developing criteria for digital library service evaluation Develop evaluation criteria appropriate for assessment by the University and external funding agencies Work with the University community through the Digital Library Advisory Group, campus surveys, usage monitoring, etc. to evaluate the success of the KU-DLI Functions Looking at digital library definitions broadly, the digital library can be distilled into three essential aspects: Selected and managed digital collections Schema for organization and access Supporting infrastructure and services It is instructive to extrapolate them into their basic components: Access, Mediation, Collection Development, and Archiving. Figure SEQ Figure \* ARABIC 1: Digital Library Functions By building upon these functional foundations, it becomes possible to further define the roles and responsibilities and the technical infrastructure needed to develop a coherent digital library environment for the campus. Roles and Responsibilities There are many roles in a digital library program and there is seldom a one-to-one relationship between these roles and individual people. These roles fall into several broad categories including Management, Requirements Analysis & Design, Core Technical Support, User Support, and Legal and Policy Issues Support. The digital library team must have a balance of skills across a variety of roles with individual staff members often wearing several hats at once. It is important to recognize that each of these roles is critical to the success of the initiative. Management User sponsorship User sponsors are usually project initiators. Their role is to help make and then support key project scope decisions. The user sponsor(s) will work closely with program coordinator(s) to gain support and resources for the project. Program Coordination The program coordination role is to work closely with the user sponsor(s) to achieve success. Primary responsibilities of program coordination include: educating University leadership and the campus community at large about the application and impacts of digital libraries gaining economic support for the program leading the process of identifying and prioritizing applications defining budgets and schedules working with project managers to ensure project success monitoring industry trends and identifying emerging technologies and standards that should be adopted representing the KU-DLI to the broader library, IT, and academic communities promoting the work and resources of the DL to the University and beyond Project management Project management is responsible for day-to-day direction of project tasks and activities including resource coordination, status tracking, and communication of project progress and issues, working closely with staff and users involved with the project. Project management responsibilities include: create, manage, and adjust project plans define overall architecture and set standards evaluate and select hardware platforms evaluate and select networking facilities evaluate and select middleware Requirements Analysis & Design User requirements analysis This role is responsible for leading user requirements definition activities and then representing those requirements as the digital library environment is developed. During the scope phase of the project, the user requirements analysts role is to collect, consolidate, organize, and prioritize the needs and problems the user community presents. The objective is to create a set of requirements, which ensure that the project is a success from the users perspective, not just a technical success. Technical architecture This role is responsible for the design and oversight of the technical infrastructure and security strategy to support the digital library and provide the overall cohesiveness to ensure that the components will fit together. Close coordination with staff who manage other IT infrastructures is important. Content specialization / source analysis The content specialist / source data analyst role is responsible for reviewing source objects in a variety of formats (digital or non-digital) and from various sources (locally created/maintained, remote access), and recommending appropriate conversion and/or integration processes. The analyst also works with end-user application specialists to develop tools and access systems. Metadata development / management Metadata modeling and management is the process of determining the metadata element requirements and their collection, organization, and maintenance processes. This role is involved in developing / acquiring the digital librarys metadata management system(s). Repository design and management The role is responsible for designing, implementing, and maintaining standards and procedures for metadata and digital object repositories. Responsibilities include development of object naming conventions, deposit procedures, metadata registry procedures, and migration/archiving methods. Core Technical Support Conversion/integration format expertise The format expertise role is responsible for working with the content analyst(s) to convert and/or integrate objects into the digital library. Personnel in this role are expert in the appropriate format standards and standards/best practices for creating and processing content. Quality assurance The quality assurance (QA) analyst ensures that the data loaded into the digital library is accurate. This role identifies potential data errors and resolves them, and performs all QA tasks necessary for application development. Database administration and physical database design The function of applying formal guidelines and tools to manage the information resource is referred to as database administration. The DBA is often responsible for day-to-day operational support of the database, ensuring data integrity, database availability, and performance. This role can be split into design and production roles. The DBA typically performs these tasks: creates, and modifies as necessary, the physical database structure evaluates and selects database management software runs load scripts to handle database loading monitors query and database performance, and query repetitions tunes the database for performance by analyzing response-time problems and how the database structure can be modified to make it run faster performs backup and restore operations as necessary administers user access and security monitors database capacity creates proactive monitoring and preventive action systems to avoid outages Programming - middleware, applications support Equally as important as the database are the middleware applications and desktop tools used for querying, reporting, online analytical processing, or object manipulation. This role implements / creates and maintains these types of end user applications and tools. This role must: determine which of several different implementation strategies makes the most sense in a specific environment and why evaluate software follow design and specification guidelines develop, test, and document applications Dataloading Programmers are needed to construct and automate the data staging and load processes. Primary responsibilities can include: developing and implementing acquired data re-engineering software programming data acquisition and transformation processes developing and documenting test plans automating the load process maintaining, updating, and monitoring acquisition and loading searching for causes of incompatibility Operations support / backup This role provides basic 24x7 systems support and system and data backup services. Personnel in this role are responsible for initial problem response / resolution and referral of more complex system problems as appropriate. They are also responsible for implementing backup schedules and recovery procedures. R&D, testing, experimentation This role is responsible for researching, testing, documenting, demonstrating, and proposing new technologies for potential application by the DLI. Training Staff must be educated on the new technologies, standards, and procedures, as well as existing system capabilities, data content, and end user applications. This role typically develops initial course materials and delivers them on an ongoing basis. User Support Consulting / advising Personnel in this role are knowledgeable in the basics of various end user tools, data conversion standards and processes, metadata systems, etc. and are able to respond to or refer user questions as appropriate. Training End users must be educated on the system capabilities, data content, and end user applications. This role typically develops the initial education course materials, as well as delivers the education on an ongoing basis. Technical support These specialists are involved in early stages of the digital library to perform resource and capacity planning. During product selection, they ensure compatibility with the existing technical environment. Once technology has been selected, they are involved in the installation and configuration of the new components. Legal and Policy Issues Support This role is responsible for developing policy and procedures for rights procurement and IP management; documentation; and monitoring and responding to intellectual property concerns and access issues over the life of the program. Architecture Components and Standards One of the hallmarks of a library is the ability to provide coherence and context for access to disparate collections of information resources. This is a critical principle to carry forward into the digital environment and a distinguishing characteristic that separates digital libraries from simple collections of links to electronic objects. There are several critical factors involved in being able to provide a coherent, contextual environment for digital resources: Use of standards for creating objects Use of standards for describing resources (for access) A common methodology for access to object types, and An understanding of object behaviors and user interactions Component Selection In order to build this environment, a number of components must be evaluated and brought together into an overall architecture. The specific recommendations for components, services, and tools listed in this report should be viewed as a starting point rather than an inclusive list that restricts what can be done or used over the long-term. These initial recommendations are weighted toward those specific areas where standards are already developed or industry best practices have emerged. As the KU-DL grows and the environment matures, these initial recommendations will undoubtedly evolve. However, in general, the primary criteria for inclusion of a tool, component, or service in the future should be: Is there a demonstrated, user-driven need? Does it support an accepted standard or process within the KU-DL infrastructure that is not already adequately supported? Is it required to support interoperability with a critical University, state or regional cooperative effort? Is adequate support available for its use? Incremental changes in components and tools should be determined and documented within the general day-to-day management process of the KU-DL with regular reports to the Digital Library Executive Group. Changes that could impact the integrity or focus of the KU-DL should be approved by the Digital Library Executive Group in consultation with DL managers and technical staff. Based on strategic considerations, selection guidelines, and current digital library community developments, the initial components needed to build a digital library environment are described below. Strategic Components A successful digital library environment is not an isolated, self-sufficient entity that exists and operates apart from the rest of the information and technology environment of the institution. It is critically dependent on, and must work within, the resources and decisions made in many areas including networking, computing support, information resource support, and information technology policies. A simplistic model of the tiers of dependency can be seen in REF _Ref492983882 \h Figure 2. Figure SEQ Figure \* ARABIC 2. Strategic Components Diagram Access to and use of information resources for the University depends upon a solid foundation that provides an exceptional network and computing services infrastructure. Two critical examples of this dependency are: Bandwidth An excellent networking infrastructure is critical to the support and delivery of digital resources to the user community. Given the realities of the digital resource environment, information resources will be distributed across physical locations rather than centralized. This distributed nature emphasizes the need for a universally high level of network support since not only will users be accessing resources in a distributed fashion, but the resources themselves will be served from a variety of locations. Without a solid delivery system, even the best set of information resources and discovery tools present frustrations and appear inadequate to the user community. Access Management One objective of the Digital Library Initiatives is to make materials available to as wide an audience as possible. However, licensing restrictions or other considerations may require limiting access to some material available through the digital library to users specifically associated with the University. Access management for network-accessible resources has several components: user authentication, user profiling or authorization, and resource-specific access protocols. Authentication is the process of validating the user identity associated with a request to perform a given operation on a specified resource. Authorization, on the other hand, is the process which associates a given set of privileges with an authenticated identity. Privileges are generally assigned on the basis of a user profile, made up of characteristics associated with the authenticated identity. In combination, the two processes answer the questions Are you who you say you are? and Are you permitted to do what you have requested?. Authentication and authorization services should not be specific to the Digital Library technical infrastructure, rather, the Digital Library should make use of mechanisms developed for the institution as a whole. For example, services such as certificates and LDAP-accessible profile directories could be created and maintained at the institutional level and applied in the Digital Library environment. Network-accessible resources are made available through a unique set of access protocols and mechanisms (e.g. URLs, cookies, login scripts, session IDs, etc.) necessary to access a resource or resource class. While these mechanisms can enhance use of digital library services, neither they nor A/A services should maintain a record linking an individuals identity to resources accessed, in order to ensure privacy for the individual. Building upon these basic underpinnings, a solid information services infrastructure is needed. Information infrastructure technical activities center around developing a collection of common systems and services that make it possible to reliably store, organize, and access digital objects. How adequate the services and support, hardware and software support, standards, tools, training, archiving, etc. are will determine how well the user community is able to make the fullest use of the resources provided. Once the infrastructure is in place, it must be populated with resources that address the needs of the user community. Resources should be added in such a way that they will address long-term access requirements as well as the immediate needs. Finally, tying it all together, applications and environments must be built to provide access to the resources provided. A variety of access and analysis tools, specialized interfaces for diverse user community needs, general access as well as personalized spaces, are all components of this tier. Local Repositories The KU Digital Library Initiatives has among its goals the creation of enduring KU-specific digital resources, authored by, available to, and supported by KU faculty, staff, studuents, and organizational units. Within this context, the conceptual model of a repository as a set of services and related facilities may be defined. These services and facilities include: A datastore containing digital objects (content and metadata) created by, or under the auspices of, the KU-DLI Services necessary to the smooth operation of satisfying requests for objects residing in the datastore Services for effective long-term management of objects in the datastore. Repository services and facilities can be classed into three basic types or layers: Core services and facilities that are integral to the basic functionality of the digital library Specialized services and facilities that are not part of the basic functionality of the digital library but are regularly supported as services available to users, most likely for an additional fee Customized services and facilities that are developed and/or customized for users under an appropriate fee structure for this value-added effort Services and facilities offered under the KU-DL must satisfy the needs of a recognized segment of the KU-DL user community and should be somewhat narrowly defined, at least initially. These repository services and facilities must be further defined within the business and service models for their support and long-term continuation within the KU environment. However, the common, base technical infrastructure should support: Persistent storage Persistent access and retrieval Persistent object names Wide availability, as appropriate within copyright or other legal constraints Access control (authentication / authorization, rights management) Datastore facilities and services may be either centralized or distributed, with a hybrid approach being the most likely and practical. Not only will remote resources be included but potentially, local resources hosted on non-DL servers as well. While these may be included from an access perspective only, certain standards for persistent access should be met for materials to be included under the KU-DL umbrella. Specific policies and guidelines should be developed to address: Standards for content and metadata for deposited or accessed objects Standards for naming objects within the KU-DL environment Requirements for depositing objects within the institutional repository Requirements for managing these objects Other considerations will include development of economic models and incentives for participation in the repository by other institutional units. Resource Naming Services The lack of permanence, over time, of object names is both the hallmark and the bane of existence of creating persistent access to digital resources. The current method of discovering and locating resources on the Web relies on allocating an identifier to all resources. At present, these identifiers are primarily Uniform Resource Locators or URLs, and are allocated according to the location of the resource. Although URLs have been serving the combined purpose of identifying a resource and describing its location for some time now, they are not a satisfactory means of uniquely identifying a digital resource. The URL simply points to the current location of the resource. If a resource is moved to a new location, the previous URL is no longer useful. A persistent and unique identifier, specific to a given digital resource, that preserves access to that resource regardless of its location, is necessary for supporting a long-term digital library. Names are persistent, location-independent identifiers for network-accessible resources. Names are preferable to URLs since they can be used without regard to the physical location of the resource. A Uniform Resource Name (URN) is a standard, persistent, and unique identifier for digital resources. In a sense, URNs are analogous to call numbers on books in the library. A call number identifies a book, but does not tell you where it can be physically found without a guide to the stacks. In the network environment, this guide is a name resolution server. If the library needs to rearrange the books, only the stack guide needs to be updated, not the call number on every book. Similarly, when digital objects need to be moved to a different machine or directory, only the name resolution service needs to be updated. Several persistent naming schemas have been developed including: Handles (CNRI) URNs DOIs PURLs (OCLC) To facilitate the use of digital object names within the KU digital library environment, the KU-DLI should adopt the concept of name resolution services; develop and provide a name resolution server for the campus; and develop and provide a set of services that permit the KU community to create and manage names for their digital objects. Unfortunately, there is no common practice or agreement on the use of the various naming schemas as yet. The creation of an enduring model and service for digital objects will require the creation of naming services that can be used now, and migrated as commonly used standards emerge in the future. A potential model that should be explored further within the KU context is under development at the Office for Information Systems, Harvard University Library (HUL). The HUL model involves development of a system of naming domains called namespaces. A namespace defines a set of rules under which names can be formulated. Every namespace has a unique identifier (e.g. hdl for Handles). Names created within a given namespace, by definition, do not conflict with names created in another namespace, because the namespace identifier is always part of the name and differentiates it. A name has three components: a namespace identifier (nid), a separator character (:), and the namespace string (nss or name) itself. Syntactically, the form of the name is: nid: nss Within the namespace, additional rules should be defined that permit name creation and maintenance to be easily distributed within the organization, depending, of course, on the rules governing access to the repositories. These namespace rules depend on the concept of naming authorities and authority paths. Authority paths are used to define compartmentalized subsets of the 'nid' namespace in which names are created. Thus, the full syntax for a name under this scheme would be: nid: authority-path: resource-name Since this model allows for both central and decentralized management of objects in the namespace, in addition to developing naming conventions, policies and processes must be developed to create organizational naming authorities and outline the responsibilities for maintaining the validity of names over time. Object Classes and Services Early digital resource development has sometimes been referred to as the age of digital incunabula. Faced with a lack of standards for resource creation and navigation, many unique approaches were taken to move resources into the digital arena. While these efforts were invaluable in exploring a range of possibilities, they often presented obstacles when trying to combine a variety of resources in a broader context. By defining and adhering to standards for resource creation, some of these obstacles can be removed. Along with unique methods of resource creation, a number of approaches to providing access to these resources have been tried. A common approach has been to pre-define collections of materials and build tailored systems to access them. While this approach works adequately for small numbers of resources and systems, it becomes less useful as the numbers grow and users are required to select and individually search many sites, each with their own idiosyncrasies. An emerging trend is to provide a more generalized approach to classes of objects, based on the common characteristics of those objects and user interactions with them. One hypothetical class / behavior schema would divide objects into the following categories ( REF _Ref493055453 \h \* MERGEFORMAT Appendix C: Object Classes & Behaviors Background Information): Bibliographic Text (Monographs, Reference, Journals, Dictionary ) Images (Bi-tonal, Continuous tone, Video) Audio Geospatial Numeric These classes/subclasses have common object and/or user behaviors associated with them that can be drawn on to develop common interfaces and sets of access and manipulation services. By standardizing the systems and tools designed for object classes, development efforts can be maximized, management can be more efficient, and the access environment can be made more coherent for users. Standards for Content and Metadata A key factor in the provision of a coherent, unified environment is the use of standards, whenever possible, for the many different aspects of storing and accessing digital information, including standards for: interoperability, data format, resource identification, resource description, and data archiving. Standards in the creation of objects and metadata allow common storage, access, and management processes to be used and economies of scale to be realized for the institution. Common information search & retrieval protocols provide a more easily mastered retrieval environment for users. Archival migration of objects can be made easier by the use of standards during object and metadata creation. The use of recommended standards should be required for objects and metadata created locally under the auspices of the KU Digital Library and stored in the central object / metadata repositories. These same standards should be strongly encouraged for objects and metadata created outside the control or coordination of the KU-DL and stored / accessed remotely (either on or off campus). Objects not adhering to published KU-DLI standards should be accepted for long-term management only under exceptional circumstances. Initial recommended standards for the KU Digital Library include: Content Supported Formats and Recommended Standards Images The creation of image files in not an end in itself they are generally created to enhance access to institutional resources, extend the reach of institutional collections, provide digital surrogates or replacements for original resources, and perhaps minimize institutional costs through collaborative efforts. Before creating digital images, several questions must first be answered regarding the intended lifetime of the object, the intended use, and the anticipated audiences. Key among these questions is whether the intent is to create a digital replacement for the original object (if the original is not born-digital) or a usable surrogate (based on specified, documented criteria). The requirements for digitization, and long-term management, are quite different depending on the answer. It is important to note that different standards / best practices are emerging for different source materials text images should be treated differently than visual images such as photographs, for instance. While standards for archival/preservation images are not yet set, community best practices are emerging. In addition to scanning bit-depth, it is important to choose a lossless compression process to ensure no loss of information between the original and subsequent files. Source of Images / Use: Master, TIFF using LZW compression (lossless) Textual images 600 ppi for 1-bit (black & white) 400 ppi for 8-bit (grayscale) 300 ppi for 24-bit (color, RGB) Photographs 5000 lines (8-bit grayscale) 5000 lines (24-bit color, RGB) Maps / Plans / Oversized 300 ppi (8-bit grayscale) 300 ppi (24-bit color, RGB) Previewing / Viewing (dynamic generation whenever possible) JPEG (continuous tone) GIF (multiple levels depending on use) PDF (600 dpi) MrSID (wavelet compression) Audio Downloadable files wav (Microsoft format) Streaming files RealAudio Video Moderate-resolution downloadable files Image size: 320x240 pixels Frame rate: 30 fps Data rate: ca. 1.2 MB/S(ca. 150KB/S) Compression: MPEG-1 Format: mpg Low-resolution downloadable files Image size: 160x120 pixels Color depth: 24 bits/pixel Data rate: ca. 100 KB/S Format: QuickTime (Apple Computer format) File extension: mov Streaming video RealVideo Text (marked-up, see Supported Markup Languages) ASCII Unicode Supported Markup Languages and Recommended Standards SGML (archival content) XML (archival and display content) HTML (display content) XHTML (archival and display, when standards are finalized and display mechanisms are available) MathML (archival and display content) Metadata Simply put, metadata is data about data. In order to convey this data in a meaningful way, three elements are needed: semantics, or meaning, as defined by a community to meet specific needs syntax, which is a systematic arrangement of data elements, which facilitates the exchange and use of metadata among various applications structure, which is the formal arrangement of the syntax with the goal of consistent representation of the semantics. Probably one of the more familiar examples of metadata is the library catalog record which provides information about physical or electronic objects in, or accessible by, the library. This type of metadata has served to provide: discovery and retrieval identification / veracity / provenance. As the library environment has evolved in the online arena, the concept of metadata has also evolved to include additional functions such as: rights management interoperability structure / viewing and preservation / longevity This evolution of the functions that metadata serves has resulted in the definition of several basic categories of metadata including: Descriptive primarily aimed at searching, discovering, and retrieving the digital object Administrative primarily aimed at managing, preserving, and perpetually identifying the digital object, including creation data and data that uniquely identifies a version / edition / instantiation Structural primarily aimed at storing and presenting the digital object, including navigation, behaviors and use of the object Intellectual / Rights primarily aimed at controlling access to the digital Management object and protecting and rewarding the intellectual property rights holders Within these categories, metadata can be hierarchical (i.e. metadata within metadata and metadata nesting) in order to accommodate the diversity of digital objects and to propagate data with some efficiency. Metadata elements may be supplied at multiple levels. Levels of metadata as defined by the Library of Congress are: Set-level: applies to a broader collection formed from aggregates that group objects by content and custodial responsibility applies to all objects within the set Aggregate: a group of objects organized by digital type and custodial responsibility can be a digital collection applies to all objects within an aggregate Primary Object: specific item usually the digital equivalent of physical library items applies to all intermediate objects Intermediate object: a view or component of the primary object, e.g., a book (primary object) can be presented as page images (one intermediate object) and as plain text (another intermediate object) allows the gathering of digital files and metadata for the creation of presentations Terminal object: the digital content file or files that is the object terminal metadata is primarily structural, e.g., size, extension, bit-depth, etc. In order to maximize the sharing and use of resources, interoperability of metadata has become increasingly important, which has led to the development of standards for metadata data structures, content element syntax, and data communication. Standards are mutually agreed-upon statements that help control an action or product. Data standards promote the consistent recording of information and are fundamental to the efficient exchange of information. They provide the rules for structuring information, so that the data entered into a system can be reliably read, sorted, indexed, retrieved, communicated between systems, and shared. They also help protect the long-term value of the data. Standards are the work of communities and are necessary so that communities can work together. Several metadata standards are already accepted and in wide use in the digital library community. Their development has usually centered around their primary application community and has tended to focus on specific content formats. Some of the most commonly used standards include: MARC / MARC21 (Machine Readable Cataloging) HYPERLINK "http://lcweb.loc.gov/marc/" http://lcweb.loc.gov/marc/ Dublin Core HYPERLINK "http://purl.org/DC/" http://purl.org/DC/ TEI or TEI Lite DTD (Text Encoding Initiative Data Type Definition) HYPERLINK "http://www.tei-c.org/" http://www.tei-c.org/ EAD (Encoded Archival Description) HYPERLINK "http://lcweb.loc.gov/ead/" http://lcweb.loc.gov/ead/ VRA Core (Visual Resources Association) HYPERLINK "http://www.gsd.harvard.edu/~staffaw3/vra/vracore3.htm" http://www.gsd.harvard.edu/~staffaw3/vra/vracore3.htm CIMI (Computerized interchange of Museum Information) HYPERLINK "http://www.cimi.org/standards/index.html" http://www.cimi.org/standards/index.html CSDGM (Content Standard for Digital Geospatial Metadata, Federal Geographic Data Committee) HYPERLINK "http://www.fgdc.gov/metadata/contstan.html" http://www.fgdc.gov/metadata/contstan.html An emerging metadata standard that provides recommendations for interoperability among documents archives is the Open Archives Metadata Set (oams), which is a component of the Santa Fe Convention. The semantics of this metadata set have purposely been kept simple in the interest of easy creation and widest applicability. The expectation is that individual archives will maintain metadata with more expressive semantics to allow more in-depth access ( HYPERLINK "http://www.openarchives.org/sfc/sfc_oams.htm" http://www.openarchives.org/sfc/sfc_oams.htm). In addition to the data structures used to convey metadata, there are a number of standard resources available for determining the semantics and syntax of the metadata content. Again, these are usually application- or data format- community specific and include resources such as: AACR2 (Anglo-American Cataloging Rules) LCSH, (Library of Congress Subject Headings) AAT (Art and Architecture Thesaurus) HYPERLINK "http://shiva.pub.getty.edu/aat_browser/" http://shiva.pub.getty.edu/aat_browser/ CDWA (Categories for the Description of Works of Art) HYPERLINK "http://www.getty.edu/gri/standard/cdwa/" http://www.getty.edu/gri/standard/cdwa/) TGN (Getty Thesaurus of Geographic Names) HYPERLINK "http://shiva.pub.getty.edu/tgn_browser/" http://shiva.pub.getty.edu/tgn_browser/) ULAN (Union List of Artist Names) HYPERLINK "http://shiva.pub.getty.edu/ulan_browser/" http://shiva.pub.getty.edu/ulan_browser/). The advent of the Internet and the exponential growth in electronic resources has increased users' demand for the ability to search across many different metadata structures simultaneously. Access to the universe of online resources has now become the goal of many institutions that manage information resources. This has motivated institutions either to convert their metadata to a format more readily accessible, or to provide a single interface to search many heterogeneous databases at the same time. Whether the plan is to design search interfaces or to convert data to a new standard, the first step is to analyze the information elements in each database and correlate the discrete information fields in the different databases that have the same or similar meaning. This is sometimes referred to as metadata mapping or semantic mapping. Crosswalks are the visual representations or "maps" that show these relationships. Mapping supports the ability of a search engine to query fields with the same or similar content in different databases; in other words it supports "semantic interoperability." Crosswalks are not only important for supporting the demand for "one-stop shopping," or cross-domain searching, they are instrumental in converting data from one format to another that is more widely accessible. The development of crosswalks is important in that they eliminate the need for monolithic, universally adopted standards and shift the focus to flexibility and interoperability. Examples of existing bi-directional crosswalks include: Dublin Core to MARC Nordic Metadata Project National Library of the Netherlands projects OCLC CORC Dublin Core to GILS Dublin Core to CSDGM CSDGM to MARC GILS to MARC MARC to SGML Regardless of the apparent benefits of crosswalks, some essential principles to bear in mind when considering them include: Crosswalks fill a fundamental need: This is especially true among the various descriptive metadata formats / standards. It is essential to be able to constructively migrate one to another Granularity and specificity of content designation are crucial Conversions are never perfect, there are problems with converted records: Complex vs. simple schemes some data and content designation may be lost Differences in semantics Properties may vary e.g. required vs. optional, repeatable vs. non-repeatable elements, etc. It is often impossible to go back the other direction and restore the original Use accepted standards whenever possible: To make objects universally available To facilitate sharing and interchange of information To preserve information by making it safe from changes in hardware and software Use packages of administrative, structural, and descriptive metadata that fit together, i.e. complementary packages. The ability to work within standard encoding schemes (especially HTML, XML, and SGML) is essential or at least the ability to easily migrate to a form that can work with them. Metadata Recommendations for the KU Digital Library Based on the background information above, the Task Force recommends the following: Metadata (all categories) must be provided, at least minimally, for resources created and/or managed by the KU-DLI Metadata should use an accepted standard whenever possible. Standards initially supported by the KU-DLI will include: MARC Dublin Core TEI EAD VRA CSDGM oams Dublin Core descriptive elements, as applicable, should set the minimum standard A crosswalk should exist between the metadata format and DC or oams A crosswalk should exist between the metadata format and MARC, to extent possible, for secondary or auxiliary access via the Librarys ILMS, either as a file on the ILMS or via data interchange protocols such as Z39.50 Data formats should be definable within and work with SGML and XML Administrative metadata elements should be developed for use by the KU-DL based on current best practices. These elements will differ somewhat according to the format of the content. However, at a minimum, administrative data elements should include: Information to identify where the object resides Information that identifies related objects, their relationship, and where they reside Information describing the creation process Structural metadata elements should be developed for use by the KU-DL based on current best practices. These elements will differ somewhat according to the format and navigational requirements of the content. However, at a minimum, structural data elements should include: Information needed to view, print, or otherwise render the object Information needed to navigate to and from the object Intellectual / Rights Management metadata elements should be developed for use by the KU-DL based on current best practices. These elements will differ somewhat according to the format of the content. However, at a minimum, Intellectual / Rights Management data elements should include: Information to determine the owner Information to determine the copyright / access category Information to understand and control copy / distribution Information to understand and control display / transmission License terms and dates Where standards do not currently exist, or the metadata standard is of sufficient complexity that definition of a minimal element set is desirable, appoint small task forces / committees to define and/or develop the minimal element sets for each category of metadata (descriptive, administrative, structural, and intellectual / rights management) based on current community best practices. Information Retrieval Protocols In conjunction with standard metadata, standard information retrieval protocols can provide the effect of homogeneous access in a heterogeneous environment and allow more standardized interface development. HyperText Transport Protocol (http) HTTP allows retrieval of resources from the Web. This protocol is supported by standard browsers including Netscape and Internet Explorer. Z39.50 Server and Client Protocol-based (NISO standard) client/server software providing centralized search and retrieval of physically and/or logically distributed collections. Used primarily for searching databases of MARC records; can be used in conjunction with the http protocol. KU currently uses the Z39.50 protocol via Voyager. Several freeware versions of the software are available. For more information see: HYPERLINK "http://lcweb.loc.gov/z3950/agency/" http://lcweb.loc.gov/z3950/agency/. Dienst Protocol-based server software providing centralized searching of physically and/or logically distributed collections. For detailed information see: HYPERLINK "http://www.cs.cornell.edu/cdlrg/dienst/protocols/DienstProtocol.htm" www.cs.cornell.edu/cdlrg/dienst/protocols/DienstProtocol.htm. Archiving The mission of digital archiving initiatives is to preserve the integrity of objects and ensure their persistence. This seemingly simple statement, however, raises a wide range of questions, most of which focus on standards, responsibility for archiving, technical strategies, the connection between archiving and access, and economic models. Preservation in the digital world is not exclusively a matter of longevity of storage media. The viability of digitized files is much more dependent on the life expectancy of the access system. Archives must prepare to migrate digitized resources from one generation of technology to subsequent generations. The use of digital technologies from a preservation perspective requires a deep and longstanding institutional commitment to long-term access, the full integration of the technology into information management procedures and processes, and significant leadership in developing appropriate definitions and standards for digital preservation. The use of standards is one strategy which may be used to assist in preserving the integrity of and access to digital information. Adherence to standards can assist by facilitating the transfer of information between hardware and software platforms as technologies evolve. Use of standards can also help ensure best practice in the management of digital information. However, while adherence to standards will assist in preserving access to digital information, it must be recognized that technological standards themselves are evolving rapidly. There are different levels of standards with varying degrees of authority. The Guidelines on Best Practices for Using Electronic Information define the following three levels of standards: De facto standards: these are standards that are commonly accepted by the marketplace; they are established by common practice or dominant market share. Publicly available specifications (PAS): these standards are developed when "several leading firms on the market join together in a consortium to define an interface standard". For example, specifications produced by the Internet Engineering Task Force. De jure standards: these are standards that are formally established by law, or by a recognized standards-setting body such as the International Standards Organisation (ISO). A number of problems associated with using standards for preserving access to digital information have been noted. The eLib Standards Guidelines lists four of the main problems: Several versions of a standard may be in use, with earlier versions no longer being compatible; Suppliers may offer "value added" versions of standards in their implementations; A standard that is not well specified may be differently implemented in software; or Some standards may have more features than are likely to be used, resulting in different subsets being used in different implementations. In Digital Archive Models, Abby Smith argues that because information technology is still evolving it is difficult to define long-term standards. Jeff Rothenberg also argues against standards as a long-term solution, however he concedes "converting digital documents into standard forms, and migrating to new standards if necessary, may be a useful interim approach while a true long-term solution is being developed". In pursuing effective long-term archiving strategies for digital libraries, the KU-DLI should consider incorporating the concept of digital masters in the creation of its online systems, whenever possible. Since digitization resources are scarce and materials that have been scanned are often fragile, returning to the object for second or subsequent digitization efforts is often unwise. Consequently, digital surrogates should be created with initially high levels of information content. In addition to capturing content, it is also critical to capture adequate and accurate metadata of all types for the object as it is created. One school of thought holds that the long-term archival/migration function will be best served by a data representation that binds certain metadata closely to the bits representing the content (encapsulation). However, metadata representations appropriate for manipulation and long-term retention may not always be appropriate for real-time delivery. In other words, the form of an archival digital object may need to differ from the form of a digital object used to support access.. This consideration, in conjunction with migration issues noted above, need to be balanced in arriving at an implementation strategy. In developing an archiving policy, it is critical to distinguish collection levels for resources contained / accessed by the KU Digital Library. Within this collection context, policies should be developed that outline the institutional level of commitment for archiving and preservation. Characteristics that may be considered when committing to archival maintenance for digital collections include: perceived usefulness of the content perceived life-span of the original object availability of the content or copies of the object elsewhere uniqueness of the content or object commitment by another institution to archive the content or object. Interface Considerations Overlaying the functional aspects of the digital library is the interface used by patrons to access, navigate, and manipulate the resources being made available to them. In developing the interface for the KU Digital Library, several aspects should be considered: common vs. specialized interfaces (general web delivery vs. specialized clients) discovery tools (searching, browsing, etc.) display and navigation aspects (object class behaviors) manipulation / analysis tools ADA considerations (federal guidelines for web accessibility) Guidelines must be developed to present as consistent and coherent look & feel for the KU Digital Library as possible. Interoperability Considerations In addition to local considerations for developing the KU Digital Library, there are also state-wide and regional interoperability issues. Just as no single library can physically hold all the information resources needed and must rely on external resources, the Digital Library must be able to provide avenues to externally available resources. Interoperatibility with state, regional, and national systems will maximize the resources available to the KU community. Conceptual Architecture Model Tying the various individual content and metadata repositories together are the services offered through the digital library environment. These are the tools and processes whose ultimate goal is to provide accurate, seamless, and 'transparent' access across the various repositories and systems for the user community. Services and policies initially provided for the digital library environment should include object repository registration processes and guidelines, metadata advisory and creation processes and guidelines, naming convention guidelines and services, object creation advisory and creation processes and guidelines, common navigation processes, common tool sets, etc. Bringing these technical, service, and policy components together to provide a coherent whole can be illustrated with the following diagram of a conceptual architectural model for the digital library environment: Figure SEQ Figure \* ARABIC 3: Conceptual Architecture Model Supported Software Tools (See also REF _Ref492364843 \h \* MERGEFORMAT Appendix E: Tools Background Information) While the goal of the Digital Library is to develop a seamless, coherent view of the information environment at the University of Kansas, the reality is that no single system available today provides all the capabilities and tools needed to create that view. The current state of the art is to develop a toolkit that operates within the guidelines of the local DL infrastructure and for which the University can provide at least basic user support. The Task Force discussed a variety of tools, drawing heavily on UC Berkeleys Tools for Building the Digital Library. The software tools listed in this section should be viewed as a starting point rather than an inclusive list that restricts what can be used. For example, in a number of cases, there are multiple tools that perform a similar task or service, not all are included here. This initial list is weighted toward tools that are either already available / supported on campus, or those we feel need to be acquired to fill a specific initial gap. As the KU-DL grows and the available tools mature, this list will undoubtedly evolve. The primary criteria for inclusion of a tool in the future will be whether or not it supports an accepted standard or process within the KU-DL infrastructure that is not already supported by a current tool, and KU staff can provide support for its use. Server Tools Database Management Oracle Oracle is a good general-purpose commercial database management system, already used for a variety of applications on campus. It will provide a good platform for central server support of, and access to, locally created digital resources in most cases. Reference: HYPERLINK "http://www.oracle.com" www.oracle.com Access For smaller applications, Microsoft Access will provide sufficient capabilities. In addition, it can be used as a front-end to Oracle databases. Reference: HYPERLINK "http://www.microsoft.com" www.microsoft.com Filemaker Pro Because of the cross-platform capability that it provides, Filemaker Pro will be supported. It is used for existing applications that are likely to be incorporated into the Digital Library, and provides ODBC access to external databases. Reference: HYPERLINK "http://www.filemaker.com" www.filemaker.com FoxPro While existing FoxPro applications may be incorporated into the Digital Library, it will not be encouraged for new development because the features it provides are not sufficiently distinct from other supported tools. Reference: HYPERLINK "http://msdn.microsoft.com/vfoxpro/" http://msdn.microsoft.com/vfoxpro/ Web Server Apache Search & Retrieval Resources and Services XPAT The XPAT engine is an SGML- and XML-aware search engine based on OT5(, previously marketed by OpenText. XPAT provides excellent support for word and phrase searching, indexing of SGML and XML elements and attributes, fast retrieval, and open systems integration. Reference: HYPERLINK "http://www.umdl.umich.edu/dlxs/" http://www.umdl.umich.edu/dlxs/ Languages Perl Java Javascript Tcl/Tk Web Development Tools Cold Fusion PHP Utilities Tif2gif utility for dynamically converting TIFF images to GIF Reference: HYPERLINK "http://kalex.engin.umich.edu/tif2gif/" http://kalex.engin.umich.edu/tif2gif/ Other OCR Software Several OCR packages may be needed based on the specific needs of the project(s); suitability will vary based on the age/type of the original and level of accuracy required. Client Tools Browsers Netscape Explorer Markup Language Editors AuthEd or equivalent for SGML Near & Far or equivalent for XML Dreamweaver for HTML ????? (to be selected later) for MathML Viewers/Data Manipulation Tools Acrobat Viewer ArcView / ArcIMS DjVU MS Media Player Panorama QuickTime RealAudio RealVideo Support Issues Server-side Because the digital library server represents the university libraries to a wide range of both campus and external users, server support should be as comprehensive as funding permits. The Computer Center machine room is staffed 24 hours a day, seven days a week. It also has conditioned power and high-speed network connectivity. This combination makes it an ideal location for the digital library server. Software Versions As with all production software on central servers, it will be necessary to strike a balance between providing the most current versions of software available for the server and maintaining a completely stable environment. Some server-side changes may impact client-side browser extensions. Testing of server-side software changes should include access from as wide a variety of clients as possible. Hardware Levels Until specific projects have been selected for the digital library, and even after selection, until prototype or developmental systems have been implemented, it will be impossible to determine specific hardware requirements for the server. A recommended course of action, therefore, is to begin implementation using existing hardware, then assess hardware requirements from that base. As a general rule, hardware should be selected for compatibility with existing servers. This will simplify operational support as well as provide the possibility of migrating applications from one server to another to meet changing load requirements. Media / Backup Backup for the server should use procedures and media compatible with existing machine room servers for disaster recovery purposes. In addition to these backups, digital library projects may include procedures for saving and storing off-site periodic snapshots of materials for archival purposes or storage in alternative systems. Client-side Users should reasonably expect to be able to access the digital library using any system suitable for general Internet access and Web browsing. Because viewing some digital library content is likely to require specialized client applications, the digital library server should be designed to enable users to easily download and install such extensions. Software Versions Wherever content-specific extensions or viewers are required for client systems the digital library site should clearly identify not only the software required but also the range of compatible versions. The latest version tested for compatibility should be available for download from the server or via a link to the appropriate site distributing copies from third-party sources. Hardware Levels Digital library projects should be designed to permit retrieval and viewing from the lowest level of hardware consistent with the materials being provided; however, content should not be compromised because of client-side hardware limitations. In some cases it may be desirable to provide alternative materials for basic client hardware while making available higher quality content for systems with more capable hardware and/or higher bandwidth network connections. User Support While it will not be possible to provide comprehensive user support for all digital library clients, appropriate links should be included to permit users to report problems and submit questions through online forms within the system. In addition, information about external contact points should be provided for those unable to use the internal facilities. Project Selection and Prioritization Selection Guidelines: The number of meritorious projects proposed will always far outweigh the resources available to address them. In order to ensure that resources are invested wisely in digitizing and managing the most significant and useful materials at the lowest possible cost without placing the institution at legal or social risk, a mechanism for selecting projects and giving them priority needs to be developed at the outset, in consultation with the primary stakeholders. The following guidelines are proposed as an initial set of selection criteria for projects being considered for implementation within the KU-DLI: consistent with the mission and goals of the institution and fits within the strategic focus of the KU-DLI has clearly defined goals and outcomes responds to known campus needs is collaborative in nature, leverages resources, and supports partnerships on campus or with other institutions, groups, or individuals with similar interests cost/benefit analysis over the short- and long-term is positive takes advantage of available outside funding, or positions the KU-DLI to obtain outside funding enhances the diversity of resources available or the audience served, within the supported technical infrastructure and standards facilitates innovation in teaching and the curriculum facilitates innovation in research facilitates innovation in new modes of scholarly communication facilitates creative, innovative, and interesting concepts and approaches within the technical architecture framework utilizes or provides the opportunity to build on hardware and software solutions existing within the University preserves and enables continuing access to significant rare or unique collections supports selection and/or creation of information products in disciplines or sub-disciplines where KU is recognized as a leader or is pre-eminent [note: this would not be determined solely by rankings] enhances curricular development in emerging areas (e.g., indigenous studies) maximizes economies of scale uses the Universitys fiscal, human, and infrastructure (space, hardware, etc.) resources effectively is consistent with KU Libraries scholarly communications and collection development and management principles enhances KUs systems capability to support usersin self-sufficiency benefits a significant number of users follows the technical architecture parameters of the KU-DLI and provides digital objects that meet the highest technical standards the institution can afford Process In order to apply these guidelines / criteria to selecting and prioritizing projects, a selection process must be established. This process should include an objective method of applying criteria. The National Park Service has developed an excellent process for selecting materials that should be considered as a basis for developing a similar selection and prioritization process for KU. This process should involve a number of campus stakeholders in both developing and approving the process including the Digital Library Executive Group and the Digital Library Advisory Group. Selection and prioritization processes will need to follow somewhat different criteria and process paths depending on the source of the materials and the parties involved: Commercial resources Local load remote access Local resources locally created consortial On-going Services Evaluation On-going evaluation of Digital Library services will be critical to ensuring the success of the Initiatives in meeting user needs. While many of the same criteria used for determining success of traditional library and computing services will apply, new criteria and processes must be developed to measure the new kinds of services available. The Task Force recommends the appointment of new Task Force composed of Library and ACS staff and members of the DLAG to develop criteria and a review mechanism for evaluating KU Digital Library effectiveness, consistent with emerging initiatives elsewhere. Recommendations Based on the background information and discussion presented in this report, the Digital Library Technical Infrastructure Task Force recommends the following actions to facilitate the establishment of a robust digital library environment for the University of Kansas Lawrence campus: Adopt the Principles statement (page PAGEREF _Ref495379276 \h 10) as the basis for central Digital Library support for the University of Kansas Lawrence campus. Adopt the Implementation Strategies for the Digital Library Goals (page PAGEREF _Ref495379355 \h 13) as the initial implementation framework for the DLI. Adopt the guidelines & process for component/service/tool selection for the DLI (page PAGEREF _Ref496444792 \h 21) Adopt the concept of local repositories (page PAGEREF _Ref495379759 \h 24) for the storage and management of local digital resources. Appoint a group to develop and bring to the DLEG for approval a detailed plan containing: standards for inclusion of content and metadata in the repository naming standards for objects management requirements and procedures Adopt the concept of Name Resolution Services (page PAGEREF _Ref495379853 \h 25) for the naming of and access to resources in the digital library. Appoint a group to select a name resolution service scheme and develop a detailed implementation plan. Adopt the concept of Object Classes and Services (page PAGEREF _Ref495380003 \h 26) as a methodology for standardizing treatment of and interaction with digital library resources. Adopt the basic recommendations for content format standards (page PAGEREF _Ref495380134 \h 28) for locally produced objects and commercially purchased/licensed resources, whenever possible. Adopt the basic recommendations for metadata standards (page PAGEREF _Ref495380410 \h 33) for resources to be included in the KU-DL. Appoint a group to select/develop metadata tagset definitions (minimal levels at least) and crosswalks, and procedures for creating and maintaining KU-SL metadata. Determine the scope of KUs archiving / preservation commitments (page PAGEREF _Ref495380545 \h 35) for digital resources. Adopt the conceptual architecture model for the KU-DL (page PAGEREF _Ref495380609 \h 38). Recognize that implementation of this model will require a phased, iterative approach whose dimensions will be determined by the scope of the KU-DLI as determined by the DLEG. Adopt the concept of a software toolkit and support levels and the criteria for inclusion (page PAGEREF _Ref495380879 \h 39). Specific recommendations/phasing will be determined by the scope of KU-DLI and decisions on other recommendations in this report. Adopt the project selection guidelines and process (page PAGEREF _Ref495381062 \h 44). Appoint a group to develop detailed procedures for submission and selection of projects. Appoint a group to develop service evaluation guidelines and process (page PAGEREF _Ref495381129 \h 46) as the DLI progresses. Select initial project(s) for implementation. Following selection, appoint a group to develop detailed task plans, timelines, and resource needs (Library support, ACS support, other). The suggested list of initial projects includes: IMLS grant with State Historical Society Art History slides project Spencer Research Library and Museums projects GIS grant Humanities Data and Social Sciences Data projects Current Library e-resources Kansas Collection (L. Nelson) Approve the use of existing and/or purchase of new equipment and software for the initial DL implementation once the DLI scope and initial projects are determined. Obtain funding for start-up costs. Appendix A: Charge to the Task Force Digital Library Technical Infrastructure Task Force May 10, 2000 Established by: Vice Chancellor for Information Services and Chair, Digital Library Executive Committee (Marilu Goodyear) Members: Assistant Vice Chancellor for Information Services and Coordinator of Digital Library (Jerry Niebaum) Assistant Dean for Information Technology-Libraries (John Miller) Assistant to the Vice Chancellor for Information Services (Beth Warner) Assistant Special Collections LibrarianDigital Projects (substitute, Rick Clement) Associate Director, Academic Computing Services (Wes Hubert) Effective Dates: May 15 September 1, 2000 Background: The Digital Library Executive Group is responsible for coordinating the development of the KU Digital Library Program. A major aspect of this development is defining the basic infrastructure, i.e. the architecture, standards, guidelines, services, and staff requirements for the KU Digital Library. In addition, the development of digital collections is dependent upon the availability of appropriate tools that fit within the overall architectural framework and meet the approved standards. Guiding principles are needed for the development of these high quality tools and services. There is also a need to reduce duplication of tool development and leverage existing initiatives. Charge: The Digital Library Technical Infrastructure Task Force is charged to develop specific recommendations and plans for the KU Digital Library Technical Infrastructure the collection of common systems and services that make it possible to store, organize, and access digital materials: Develop architectural principles and standards for KU shared digital collections. These should be consistent with relevant industry, Kansas Digital Library, and other University standards, provide a framework that facilitates the creation of integrated systems, and provide the flexibility to foster innovations in scholarly communication. After DLEG approval, submit to the State Architecture process for consideration. Identify appropriate elements for the development of an integrated digital library system which accommodates all formats - metadata, full-text, images, numeric data, geospatial data, etc. Develop principles and specifications for the identification, evaluation, selection, and implementation of online tools and services for sharing, accessing, manipulating, integrating, and archiving electronic scholarly content in all forms. Recommend an appropriate strategy for moving campus-developed tools to the KU Digital Library. Recommend a process for prioritizing services to be implemented. Develop principles and guidelines for the type and frequency of evaluation of tools and services, both before and after they are implemented. The Task Force will present their report and work plan to the DLEG by September 1, 2000. The plan should present specific recommendations, task plans, and timelines for accomplishment. Appendix B: KU Digital Library Mission and Goals The KU Digital Library Initiatives Vision Along with its national and international peers, the University of Kansas enters the twenty-first century faced with the challenge of supporting teaching and scholarly endeavors in the digital age. As part of an overall university strategy, the University of Kansas Digital Library Initiatives creates a comprehensive networked information environment that supplements existing Library services and resources which stand at the core of the Universitys research, teaching, and public service missions and enables knowledge generation, access, and use for the University community. Creating this environment requires attention to the staff and systems, which organize and make digital collections available, as well as to economic and policy structures, which must be viewed anew in the context of new business models, scalable with the exponential growth in digital information and compatible with state, national, and global digital library efforts. Mission The KU-DLI is a collaborative effort that leverages the Universitys institutional resources by providing a common, standardized architecture and infrastructure that support the specialized staff and tools to select, create, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of digital works. It focuses on projects that support the teaching and research of faculty, and the learning and research of students, by providing a wide range of electronic resources as well as a toolkit that enables scholars to easily incorporate, manipulate, and edit selections from organized collections of digital resources directly into creative works. In addition, it provides access to a wide range of electronic resources for citizens of the state and beyond. The University of Kansas Digital Library Initiative (KU-DLI) facilitates the process of scholarly creation by involvement in all phases of the information transfer cycle. While not necessarily the exclusive performer of these activities, the KU-DLI, in all cases, adds value, in partnership with others, through every step of the cycle. Goals Develop digital collections expanding over time in number and scope created from the conversion to digital form of documents contained in our and other libraries and archives, and from the incorporation of holdings already in electronic form. Objectives Identify projects to demonstrate the value and use of digital collections and that match KU faculty expertise and/or collection strengths. Support the development of these digital collections. 2. Establish a collaborative management structure to coordinate and guide the implementation and ongoing maintenance of the digital library collections; to set policy regarding participation, funding, development and access; to encourage and facilitate broad involvement; and to address issues of policy and practice that may inhibit full citizen access. Objectives Seek advice and counsel from faculty and administrators on campus as to the appropriate mechanism for the management structure. Accomplish regular meetings of the Digital Library Executive Group and the Advisory Group. Review structures from other universities. 3. Develop a funding strategy that addresses the need for support from both public and private sources to provide the means to launch new initiatives. Objectives Beginning July, 2000, start a funding stream from local sources for building of the digital library. Plan for overall leadership for the program and staffing. 4. Form selection guidelines that will accommodate local initiatives and projects; and ensure that the digital library collections comprise a significant corpus of materials. Provide for coordination with the Kansas Digital Library. 5. Adopt common standards and best practices to ensure full informational capture; guarantee universal accessibility and interchangeability; simplify retrieval and navigation; and facilitate archiving and enduring access. Objectives 1. Appoint a Task Force to produce technical standards and guidelines (see Task Force Charge). 6. Establish an ongoing and comprehensive evaluation program to study: how scholars, students of all levels, and citizens everywhere make use of the digital library collections for research, learning, discovery, and collaboration; how such usage compares with that of traditional libraries and other sources of information; how digital libraries affect the mission, economics, staffing, and organization of libraries and other institutions; and how to design systems to encourage access by individuals representing a broad spectrum of interests. June 6, 2000 (Draft) Appendix C: Object Classes & Behaviors Background Information DLPS ClassesAbstracts See also the HYPERLINK "classes.html" Classes Table, and the HYPERLINK "../behaviors/behaviors.html" Behaviors Table Bibliographic HYPERLINK "classes.html" \l "Bib" Table entry The Bibliographic class includes a range of descriptive metadata, from the minimal (cf. Wing) to the expansive (cf. MARC records or information extracted from the TEI Header). Entities in this class have a relatively flat structure (i.e., no nesting) and a high degree of relational organization in the way that they uses controlled vocabularies or name authorities. Most instances within the Bibliographic class are collections of bibliographic citations, but may also be collections of entries describing entities other than books or journal articles (e.g., coins, photographs, ostraca, or other objects). The primary behaviors of the class are relatively limited, focusing primarily on searching, display, and management (e.g., collecting together citations of primary interest from a much larger collection of valid citations). Entities in the Bibliographic class are typically displayed as brief citations or more expansive citations, with and without field labels, and in different formats suitable for different purposes (e.g., downloading). While issues such as display are simple compared to texts, users have come to expect more sophisticated methods for navigating and managing large numbers of results, e.g., sorting on different indexes, or using a "shopping cart" model of collecting the information together for subsequent uses. Bitonal Images HYPERLINK "classes.html" \l "Image-BT" Table entry For DLPS, Bitonal Images are captured as part of book and journal conversion. Metadata for these Bitonal Images are relatively limited, consisting of pagination, sequence, feature (e.g., whether the page contains an illustration or an index), and administrative metadata such as resolution and method of capture. All Bitonal Image metadata in DLPS are managed by inserting the data in the associated page reference (even though the metadata frequently originate in local databases). Navigational mechanisms focus on movement through the body of pages, with mechanisms such as "Next" and "Previous," or "Go to page [number]." Complex display handling for these typically 600dpi images is handled through tif2gif, a tool for dynamically rendering the image in a variety of resolutions. Instances of this class are online as an independent resource in their own right, and as a step in the process of further conversion. Continuous Tone Images HYPERLINK "classes.html" \l "Image-CT" Table entry Continuous Tone Images are image files with a bit depth greater than "one," with associated bibliographic or descriptive information, as well as more complex administrative metadata. The bit depth in these images will typically be 8-bit (for grayscale) and 24-bit (for color). The demands for capture of the features of the source materials dictate the bit depth of the image. For example, the complex features of a papyri and art images cannot be captured with bitonal methods, and are thus captured at a high bit-depth and high resolution. Currently, the associated descriptive information is the primary means of locating these images in the DLPS systems designed to support them, but browsing mechanisms are also made available to users. (Descriptive metadata also varies from minimal--e.g., only a title--to rich, with complex vocabularies for subject and theme analysis.) Administrative metadata, and in particular metadata about the process for capture and conversion, becomes particularly critical for Continuous Tone Images, as the processes for migration and even subsequent conversion are highly dependent on that information. Of special importance in the DLPS systems designed to support Continuous Tone Images are methods for navigating high resolution images (e.g., pan and zoom) and methods for using images in a variety of contexts (e.g., comparing images or creating small collections for study). Dictionary HYPERLINK "classes.html" \l "Dict" Table entry The Dictionary class is devoted to a type of reference work whose specific characteristics (unlike the more general Reference class) dictate a type of organization and retrieval that is similar across most members of the class. DLPS work focuses primarily on historical dictionaries, but other dictionaries (e.g., the American Heritage Dictionary) fall into this category. Despite many similarities, members of the class also exhibit significant variation. The TEI Guidelines comment that "because the structure of dictionary entries varies widely both among and within dictionaries, the simplest way for an encoding scheme to accommodate the entire range of structures actually encountered is to allow virtually any element to appear virtually anywhere in a dictionary entry." Despite this, they continue, "[i]t is clear ... that strong and consistent structural principles do govern the vast majority of conventional dictionaries, as well as many or most entries even in more exotic dictionaries; ideally, a set of encoding guidelines should capture these structural principles" ( HYPERLINK "http://www.hti.umich.edu/bin/tei-search-idx?type=HTML&rgn=DIV1&byte=1014434" Chapter 12, Print Dictionaries). The DLPS Dictionary class is based closely on the TEI DTD for dictionaries and attempts to both capture those structural principles and embrace the wide variety of dictionary entries. Hybrid HYPERLINK "classes.html" \l "Hybrid" Table entry The Hybrid class is probably not a class at all, as we have said, but an instance that draws from multiple classes. What characteristic would the Hybrid have that is not found in another class? For example, encyclopedias frequently include significant pictorial or multimedia information, and are thus "hybrids" of text and image, but they are defined primarily by their belonging to the Reference class. This is probably one class that will go, and instead we will focus our energies on documenting the way that objects in our separate systems "talk" to each other and draw in information from one another. If not a class, hybrid objects draw upon a set of behaviors and relationships. We expect that the relationships can be standardized or cataloged, to some extent, and this process of standardization will be reflected in documentation of behaviors or other aspects of DLPS architecture. HyperBibliography HYPERLINK "classes.html" \l "HyperBib" Table entry This class currently has two instances in DLPS, both deriving from Frances McSparran's concept for these large, complex bibliographies. One is her HYPERLINK "http://www.hti.umich.edu/mec/hyperbib/" HyperBibliography, found in the HYPERLINK "http://www.hti.umich.edu/mec/" Middle English Compendium. The other is Maria Bonn's HYPERLINK "http://www.hti.umich.edu/english/amverse/hyperbib.html" HyperBibliography to American Poetry. Both differ from members of the Bibliography class in significant ways. They both contain internal structure, with groupings of types information such as linguistic information, editions associated with works, or related electronic resources. They both contain considerable repetition in some areas of the entry: for example, in listing associated editions, each entry may include references to dozens of works, with each reference found and maintained as a discrete element within the entry. The high degree of structure and repetition, as well as their specialized purposes and the associated specialized systems, has led to our maintaining this as a discrete class, separate from the Bibliography class, the Dictionary class, and the Reference class, all of which share traits with the HyperBibliography class. Journals HYPERLINK "classes.html" \l "Journals" Table entry The Journals class is characterized by a specific and regular structure, typically reflecting convenient chronological divisions, along with strong management tools for the journal's editor(s). This class, like the HYPERLINK "" \l "Ref" Reference class, is a specialized instance of the HYPERLINK "" \l "Text" Text class. The Journals class is in the process of being elaborated. In it, we expect to support: Articles in different formats, including: SGML/XML Rigorously specified HTML with consistent metadata Loosely specified HTML with consistent metadata PDF with consistent metadata Strong management tools, including the ability for editors to: Add articles Remove articles (with backup protection on DLPS's end) Replace articles (with backup protection on DLPS's end) Change articles in at least the first two categories of formats (with backup protection on DLPS's end) Manage descriptive metadata (including additions, removals, and changes of bibliographic records) Manage subscriptions, including the ability for designated representatives of the journal to: Add individual subscribers Add institutional subscribers Renew or remove either of the above. Numeric HYPERLINK "classes.html" \l "Numeric" Table entry The Numeric class is under development in DLPS and as of April 1999, only exploratory work had been done. We expect this class to focus on those resources that use numeric information to describe phenomena for primarily statistical purposes. Familiar examples include the Census, but the range of resources is extraordinarily broad, including ongoing and one-time studies of income, health, consumer behavior, crime (and punishment), elections, and many other subjects. This class explicitly excludes data in numeric form that are not used for statistical purposes: data such as geographic data (used in geographic information systems) and scientific data (used with a variety of analytical tools) are organized and used differently, and will eventually need classes of their own as the bodies of publicly accessible data begin to be collected and made available. We expect the DLPS Numeric systems to support a wide variety of applications, some of which may not be recognizable to uses as statistical applications. For example, popular government data such as that found in the Statistical Abstract of the United States should be mounted such that commonly used tables may simply be selected for display. At the other end of the spectrum, we hope to be able to support complex statistical operations through access to high-end computers in concert with other UM and consortial efforts such as NPACI. An important middle ground between these two extremes will be the mechanisms that support common statistical operations with a wide variety of data, data retrieval, and data extraction so that users may perform statistical analysis in another (probably local) environment. Reference HYPERLINK "classes.html" \l "Reference" Table entry The Reference class, a specialized instance of the HYPERLINK "abstracts.html" \l "Text" Text class, is characterized by significant and regular structure, and by fairly narrowly defined uses. Common members of this class are the encyclopedia, but we have also include in this class examples such as almanacs and essay collections such as those found in Pictures of Record (PoR). The resource is likely to be organized into major thematic sections, frequently more specific thematic sub-sections, and then articles. The articles may or may not be subdivided, and may or may not be authored by named individuals (cf. PoR essays or EB11 articles). Users are unlikely to consult the resource in order to read an entire thematic section or subsection, but may use this structure to browse to a particular article by way of a descriptive title. The editors or authors of the resources use this combination of descriptive titles and thematic organization in lieu of subject analysis. Readers will frequently consult the resource to ascertain a "fact" or piece of information. DLPS display result behaviors focus less on KWIC displays and more on bringing the user to the desired heading or sub-heading for subsequent review. The Reference class is distinct from the HYPERLINK "abstracts.html" \l "Text" Text class by virtue of its predictability (in organization and purpose) and from the Journal class by virtue of its largely static nature, being a fixed object that seldom changes, or changes in total through the release of a new edition. Text HYPERLINK "classes.html" \l "Text" Table entry The Text class consists primarily of monographic--books and pamphlets--material, but also material such as journals, especially when converted from print and not subject to ongoing work (see Journals). Whether current publishing or historical, and whether the work was composed/edited for electronic distribution or print, these works are all: extended text, typically of prose, verse, or drama, and (significantly) combinations of these typically with a high degree of structure and frequently drawn together in large groups or collections that we call "libraries" Although materials in the Reference class may become a part of such a collection, we tend to think of members of the Text class as having a less predictable structure and application. While the typical uses of these materials have much to do with the behaviors we apply to them (e.g., members of the Text class are often read at length; a member of the Reference class is more typically consulted for information), we focus here primarily on the great variability of their organization and the relatively large bodies of material that are assembled. These two factors converge to lead us to treat the organizational characteristics more generically. In an important way, this class more than any other serves as a base class from which new classes grow or can be defined. The HYPERLINK "Text.html" Text class documentation treats some of this process of emerging classes more fully. [Price-Wilkin, J., DLPS, University of Michigan, 1999 HYPERLINK "http://docs.umdl.umich.edu/docs/arch/classes/abstracts.html" http://docs.umdl.umich.edu/docs/arch/classes/abstracts.html ] Appendix D: Textual Markup Languages Background Information SGML ( HYPERLINK "http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/index.html#TOC" http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/index.html#TOC) Standard Generalized Markup Language (SGML) is an international standard for the definition of device-independent, system-independent methods of representing texts in electronic form. More exactly, SGML is a metalanguage, that is, a means of formally describing a language, in this case, a markup language. There are three characteristics of SGML which distinguish it from other markup languages: its emphasis on descriptive rather than procedural markup its document type concept; and its independence of any one system for representing the script in which a text is written. A descriptive markup system uses markup codes which simply provide names to categorize parts of a document. By contrast, a procedural markup system defines what processing is to be carried out at particular points in a document. In SGML, the instructions needed to process a document for some particular purpose (for example, to format it) are sharply distinguished from the descriptive markup which occurs within the document. Usually, they are collected outside the document in separate procedures or programs. With descriptive instead of procedural markup the same document can readily be processed by many different pieces of software, each of which can apply different processing instructions to those parts of it which are considered relevant. Secondly, SGML introduces the notion of a document type, and hence a document type definition (DTD). Documents are regarded as having types, just as other objects processed by computers do. The type of a document is formally defined by its constituent parts and their structure. If documents are of known types, a special purpose program (called a parser) can be used to process a document claiming to be of a particular type and check that all the elements required for that document type are indeed present and correctly ordered. More significantly, different documents of the same type can be processed in a uniform way. Programs can be written which take advantage of the knowledge encapsulated in the document structure information, and which can thus behave in a more intelligent fashion. A basic design goal of SGML was to ensure that documents encoded according to its provisions should be transportable from one hardware and software environment to another without loss of information. The two features discussed so far both address this requirement at an abstract level; the third feature addresses it at the level of the strings of bytes (characters) of which documents are composed. SGML provides a general purpose mechanism forstring substitution, that is, a simple machine-independent way of stating that a particular string of characters in the document should be replaced by some other string when the document is processed. One obvious application for this mechanism is to ensure consistency of nomenclature; another, more significant one, is to counter the notorious inability of different computer systems to understand each other's character sets, or of any one system to provide all the graphic characters needed for a particular application, by providing descriptive mappings for non-portable characters. The strings defined by this string-substitution mechanism are called entities. EAD ( HYPERLINK "http://lcweb.loc.gov/ead/" http://lcweb.loc.gov/ead/) Encoded Archival Description (EAD) Document Type Definition (DTD) is a use of the Standard Generalized Markup Language (SGML) for archival finding aids. Traditionally, finding aids have been printed listings and descriptions of the contents of archival collections. The use of SGML and the EAD DTD now allows these documents to be encoded in a standard machine-readable form so that they can be made available electronically through the World Wide Web. A growing number of archival repositories have been experimenting with the EAD DTD to create Internet-accessible finding aids. Development of the EAD DTD began with a project initiated by the University of California, Berkeley, Library in 1993. The goal of the Berkeley project was to investigate the desirability and feasibility of developing a nonproprietary encoding standard for machine-readable finding aids such as inventories, registers, indexes, and other documents created by archives, libraries, museums, and manuscript repositories to support the use of their holdings. The Berkeley Project, developed requirements for the encoding standard which included the following criteria: 1) ability to present extensive and interrelated descriptive information found in archival finding aids, 2) ability to preserve the hierarchical relationships existing between levels of description, 3) ability to represent descriptive information that is inherited by one hierarchical level from another, 4) ability to move within a hierarchical informational structure, and 5) support for element-specific indexing and retrieval. ISO 12083 ( HYPERLINK "http://www.xmlxperts.com/12083.htm" http://www.xmlxperts.com/12083.htm) This International Standard presents a reference document type definition (DTD) which facilitates the authoring, interchange and archiving of a variety of publications. This document type definition is deliberately general. It is a reference document type definition which provides a set of building blocks for the structuring of books, articles, serials, and similar publications in print and electronic form. This International Standard is intended to provide a document architecture to facilitate the creation of various application-specific document type definitions. XML The Extensible Markup Language (XML) is a subset of the Standard Generalized Markup Language (SGML) intended to make it more usable for distributing materials on the World Wide Web. XML differs from SGML primarily in simplifying the sometimes intimidating formalisms of SGML in order to ensure that an XML parser is simple enough to embed in even lightweight software, including Web browsers. It differs from HTML primarily in allowing the user to specify new tags, marking types of elements not foreseen in the HTML specification, and making it possible for common off-the-shelf browsers and other software to handle such user-defined element types usefully. XML describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language [ISO 8879]. By construction, XML documents are conforming SGML documents. XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure. A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application. The design goals for XML are: XML shall be straightforwardly usable over the Internet. .XML shall support a wide variety of applications. XML shall be compatible with SGML. It shall be easy to write programs which process XML documents. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. XML documents should be human-legible and reasonably clear. The XML design should be prepared quickly. The design of XML shall be formal and concise. XML documents shall be easy to create. Terseness in XML markup is of minimal importance. HTML ( HYPERLINK "http://www.w3.org/MarkUp/" http://www.w3.org/MarkUp/) To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (from HyperText Markup Language). HTML gives authors the means to: Publish online documents with headings, text, tables, lists, photos, etc. Retrieve online information via hypertext links, at the click of a button. Design forms for conducting transactions with remote services, for use in searching for information, making reservations, ordering products, etc. Include spread-sheets, video clips, sound clips, and other applications directly in their documents. HTML is a non-proprietary format based upon SGML, and can be created and processed by a wide range of tools, from simple plain text editors - you type it in from scratch- to sophisticated WYSIWYG authoring tools. HTML uses tags such as