Escolar Documentos
Profissional Documentos
Cultura Documentos
Digital Preservation (GSLIS 752) Spring Session 2012 Syllabus (v. 1.5.1)
Instructor: Fred Grevin Email: grevinf@earthlink.net Phone: 917-902-2462 Meetings: by appointment Course Description We will examine the nature and characteristics of digital resources and explore the need to preserve them within a variety of contexts, including corporations, research institutions, government, libraries and archives. We will consider how to assess and mitigate the risks to digital resources. We will evaluate the current state of digital preservation; identify those problems that solutions; those that have solutions under development; and those that have not yet been addressed. This syllabus will be revised as the semester progresses. Each revised syllabus will be emailed to the students directly. It is each students responsibility to ensure he/she has the revised version (see version number on title page and in file name). Course Objectives Upon successfully completing this course, students will: Understand the need for preserving digital resources in various contexts.
Demonstrate a general knowledge of the technical requirements for digital preservation. Understand the risks faced by digital resources, how to assess and mitigate them, and how to apply that understanding to specific situations requiring short-term or long-term retention. Understand the functional requirements of a digital preservation system. Know how to communicate and understanding and knowledge of digital preservation needs to executives and funders.
Course Schedule Classes will be held on Thursdays, from 6:40 p.m. through 9:15 p.m. Each class session will have a 15minute break. The class will begin at 6:40 p.m. sharp. Grading Class participation: 20%.
Assignments: 40%
Basis of Grading Grading policy: work defined as good (in which the stated objective of the assignment or project is met, no more and no less) will typically be graded as 83 to 86.9 points. This translates to a letter grade of B. Students wishing to achieve a higher score will have to produce work that is qualitatively superior to that defined as good. Results count most; effort counts less. Unless otherwise stated, quantity will not result in a better grade. Students are expected to participate actively in class discussions. Failure to participate actively in class discussions, as well as excessive lateness and unexcused absences, will be penalized by a reduction in grade. Assignments and readings are listed in this syllabus. There is no textbook; most of the readings are available online at no cost (see List of Readings at end of syllabus). Assignments may be individual or group assignments (see Description of Assignments & Final Course Project at end of syllabus). The final course project is a group exercise (typically, groups of five to six students each). Each group shall have a Team Leader who shall function as a project manager. The instructor will assign students to groups, and appoint Team Leaders. For details, see the Description of Assignments & Final Course Project.
97412238.doc
Course Description
Definition of digital preservation: ensuring future generations have access to digital objects.
97412238.doc
Assignment 2: Referring back to Assignment 1, define your email records as authentic and trustworthyor not. Justify your choice.
A Memory of Webs Past Ariel Bleicher, IEEE Spectrum, March 2011. Read all. AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship (2012), AIMS Work Group. Read Foreword (pp. i-viii) and Introduction (pp. 1-2). Digital Preservation Tutorials: File Naming (videos). Digital Preservation Education for North Carolina Employees. View all. Visualizing Digital Preservation Workflows, by Bill LeFurgy (March 8th, 2012). Read all. Life Cycle Models for Digital Stewardship, by Bill LeFurgy (February 21st, 2012). Read all.
PDF File Migration to PDF/A: Technical Considerations Frank L. Walker, et al. Read all. Thursday 12 April: Spring Recessno class
97412238.doc
Closing citation:
Information lasts only so long as someone cares about it. The conclusion Ive come to...., after several decades of careful consideration, is that there is no set of hardware and software standards existing today, nor any likely to come along, that will provide any reasonable level of confidence that the stored information will still be accessible (without unreasonable levels of effort) decades from now. (Ray Kurzweil)
97412238.doc
97412238.doc
List of Readings
The readings are listed in class reading order.
Classes 1 and 2
A Canticle for Leibowitz, by Walter M. Miller Jr. Any edition is fine (the book has in print since 1959). public libraries have multiple copies. Preserving Digital Information: Final Report and Recommendations (1996), by the Task Force on Archiving of Digital Information. Source: http://www.oclc.org/research/activities/past/rlg/digpresstudy/default.htm (URL verified 2012-01-28).
Class 2
The Digital Divide: Assessing Organisations Preparations for Digital Preservation (2010), Pauline Sinclair, Planets. (http://www.planets-project.eu/publications/?search[0]=9, under the heading Market Survey White Paper and Survey Analysis. Posted on 11th May 2010URL verified 2012-01-28). Data Storage: From the Floppy Disk to the Cloud Paul Thurrott, Windows IT Pro (2012-01-24), http://www.windowsitpro.com/article/storage/data-storage-floppy-disk-cloud-142021 (URL verified 2012-0128). Data Preservation at LEP, Holzner et al, arXiv (2009). http://arxiv.org/abs/0912.1803v1 (download on upper right of page). (URL verified 2012-01-28).
Classes 3 and 4
The digital signature dilemma (2006), Jean-Franois Blanchette. http://polaris.gseis.ucla.edu/blanchette/papers/annals.pdf (URL verified 2012-01-28). Authenticity in a Digital Environment (2000-05), Council on Library and Information Resources. Read the two articles: Archival Authenticity in a Digital Age, Peter B. Hirtle and Authenticity in Perspective, Abby Smith. www.clir.org/pubs/reports/pub92/pub92.pdf (URL verified 2012-01-28). Enduring Paradigm, New Opportunities: The Value of the Archival Perspective in the Digital Environment, by Anne J. Gilliland-Swetland (February 2000). From http://www.clir.org/pubs/reports/pub89/pub89.pdf (URL verified 2012-01-28). Uniform Electronic Legal Material Act, National Conference of Commissioners on Uniform State Laws. http://www.law.upenn.edu/bll/archives/ulc/apselm/UELMA_Final_2011.htm (URL verified 2012-01-28). ABA should pause before backing digital-only laws Tonda Rush, WisLawJournal.com (2012-01-26). http://wislawjournal.com/2012/01/26/aba-should-pause-before-backing-digital-only-laws/ URL verified 2012-01-28). Authentication of Primary Legal Materials and Pricing Options State of California, Office of Legislative Counsel (2011-12). http://www.mnhs.org/preserve/records/legislativerecords/docs_pdfs/CA_Authentication_WhitePaper_Dec201 1.pdf (URL verified 2012-02-19).
Classes 5 and 6
Technology Watch Report 04-01: The Open Archival Information System Reference Model: Introductory Guide Brian F. Lavoie 2004 (http://www.dpconline.org/publications/technology-watch-reports) (URL verified 2012-01-28).
97412238.doc
ISO Reference Model For an Open Archival Information System (OAIS), Tutorial Presentation, Sawyer et al (2003). nssdc.gsfc.nasa.gov/nost/isoas/presentations/oais_tutorial_200210.ppt. (URL verified 2012-01-28). Digital Preservation with Special Reference to the Open Archival Information System (OAIS) Reference Model: An Overview Sibsankar Jana et al, (2009) http://academic.research.microsoft.com/Paper/2064447. (URL verified 2012-01-28). ERPANET OAIS Training Seminar Report (2003). http://www.erpanet.org/events/2002/copenhagen/ERPANET%20OAIS%20Training%20Seminar%20Report_final.pdf (URL verified 2012-01-28). Reference Model for an Open Archival Information System CCSDS 650.0-B-1 Blue Book, Issue 1 (January 2002). (http://public.ccsds.org/publications/AllPubs.aspxURL verified 2012-01-28; the publications are listed by number, so look for CCSDS 650.0-B-1, a little more than halfway down the page). Consultative Committee for Space Data Systems. NOTE: this document has been published as an International Standard: ISO 14721:2003 Space data and information transfer systems -- Open archival information system -- Reference model. Towards an Open Source Repository and Preservation System. Recommendations on the Implementation of an Open Source Digital Archival and Preservation System and on Related Software Development Bradley et al, UNESCO (2007) http://portal.unesco.org/ci/en/files/24700/11824297751towards_open_source_repository.doc/towards_open_source_repository.doc (URL verified 2012-01-28). The DOI systemIntroductory Overview, The DOI Foundation (2011-10-03) http://www.doi.org/overview/sys_overview_021601.html. (URL verified 2012-01-28).
Classes 7 and 8
Parsimonious preservation: preventing pointless processes! Tim Gollins, The National Archives (UK), 2009. Read all 4 pages. http://www.nationalarchives.gov.uk/documents/parsimonious-preservation.pdf (URL verified 2012-01-28). The Digital Dilemma: Strategic Issues in Archiving and Accessing Digital Motion Picture Materials, Academy of Motion Picture Arts and Sciences, 2007. http://www.oscars.org/science-technology/council/projects/digitaldilemma/ (URL verified 2012-01-28). Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years (2002-07), Kenneth Thibodeau, in CLIR Conference Proceedings The State of Digital Preservation: An International Perspective, pp. 4-31. http://www.clir.org/pubs/abstract/pub107abst.html (choose PDF). (URL verified 2012-01-28). Thirteen Ways of Looking at...Digital Preservation (2004), Brian Lavoie. http://www.dlib.org/dlib/july04/lavoie/07lavoie.html (URL verified 2012-01-28). A Memory of Webs Past Ariel Bleicher, IEEE Spectrum, March 2011. http://spectrum.ieee.org/telecom/internet/a-memory-of-webs-past/0 (URL verified 2012-01-28). AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship (2012) - AIMS Work Group. http://www2.lib.virginia.edu/aims/whitepaper/AIMS_final.pdf (URL verified 2012-01-28). Digital Preservation Tutorials: File Naming (videos). Digital Preservation Education for North Carolina Employees. http://digitalpreservation.ncdcr.gov/tutorials.html. View all. (URL verified 2012-01-28). Visualizing Digital Preservation Workflows, by Bill LeFurgy (March 8th, 2012). http://blogs.loc.gov/digitalpreservation/2012/03/visualizing-digital-preservation-workflows/ (URL verified 2012-03-12).
97412238.doc
Life Cycle Models for Digital Stewardship, by Bill LeFurgy (February 21st, 2012). http://blogs.loc.gov/digitalpreservation/2012/02/life-cycle-models-for-digital-stewardship/ (URL verified 2012-03-12).
Class 9
Statement of David A. Powner, Director Information Technology Management Issues, United States Government Accountability Office (GAO Report GAO-10-222T). Testimony Before the Subcommittee on Information Policy, Census, and National Archives, Committee on Oversight and Government Reform, House of Representatives, November 5, 2009: Progress and Risks in Implementing its Electronic Records Archive Initiative. http://www.gao.gov/products/GAO-10-222T (URL verified 2012-03-04).
Classes 10 and 11
Assessing the Durability of Formats in a Digital Preservation Environment (2004-11), Andreas Stanescu. http://www.dlib.org/dlib/november04/stanescu/11stanescu.html (URL verified 2012-01-28). Defining File Format Obsolescence: A Risky Journey, David Pearson, Colin Webb, International Journal of Digital Curation, Vol 3, No 1 (2008). http://www.ijdc.net/index.php/ijdc/article/view/76 (URL verified 201201-28). Content Categories, a sub-section of Sustainability of Digital Formats: Planning for Library of Congress Collections. http://www.digitalpreservation.gov/formats/content/content_categories.shtml (URL verified 201201-28). PDF File Migration to PDF/A: Technical Considerations Frank L. Walker, et al (2006) archive.nlm.nih.gov/pubs/ceb2007/2007020.pdf. (URL verified 2012-01-28).
Classes 12 and 13
Understanding Metadata, National Information Standards Organization (2004). http://www.niso.org/publications/press/UnderstandingMetadata.pdf (URL verified 2012-02-03). Metacrap: Putting the torch to seven straw-men of the meta-utopia, Cory Doctorow (2001) http://www.well.com/~doctorow/metacrap.htm (URL verified 2012-02-03). Metadata Encoding and Transmission Standard: Primer and Reference Manual, v. 1.6 revised. 2010 Digital Library Federation. http://www.loc.gov/standards/mets/mets-schemadocs.html (URL verified 2012-01-28). PREMIS Data Dictionary for Preservation Metadata, version 2.1 (2011-01). PREMIS Editorial Committee. http://www.loc.gov/standards/premis/ (URL verified 2012-01-28).
Class 14
What is Digital Curation? (a subsection of The Value of Digital Curation, 2010). Digital Curation Centre Web site http://www.dcc.ac.uk/digital-curation/what-digital-curation (URL verified 2012-01-28). Historical context and the information age: the Diaspora of Holocaust archives, Raymund Schtz (2011). Provided by instructor. Archivists, Curators, and Museum Technicians Bureau of Labor Statistics (BLS) Occupational Outlook Handbook (2010-11 Ed.). http://www.bls.gov/oco/ocos065.htm (URL verified 2012-02-06). Data Curation in Climate and Weather: Transforming Our Ability to Improve Predictions through Global Knowledge Sharing, Clifford A. Jacobs, National Science Foundation, Steven J. Worley, National Center for Atmospheric Research, The International Journal of Digital Curation, Issue 2, Volume 4 (2009). www.ijdc.net/index.php/ijdc/article/viewFile/119/122 (URL verified 2012-01-28).
97412238.doc
Data Curation Program Development in U.S. Universities: The Georgia Institute of Technology Example, Tyler O. Walters, Associate Director, Technology and Resource Services, Library and Information Center, Georgia Institute of Technology, The International Journal of Digital Curation, Issue 3, Volume 4 (2009). www.ijdc.net/index.php/ijdc/article/viewFile/136/153 (URL verified 2012-01-28).
Class 15
United States Code, Title 17, Chapter 1. http://www.copyright.gov/title17/circ92.pdf (URL verified 2012-01-28). Case: Lowrys Reports, Inc. v. Legg Mason, Inc. http://www.internetlibrary.com/cases/lib_case520.cfm and http://www.wlf.org/upload/062705LUPK.pdf (URLs verified 2012-01-28). The Orphan Wars, James Grimmelmann, EDUCAUSE Review, Volume 47 No. 1, January/February 2012. http://www.educause.edu/EDUCAUSE+Review/EDUCAUSEReviewMagazineVolume47/TheOrphanWars/2 44410 (URL verified 2012-01-28).
97412238.doc
Notes from Matthew Hovey at Weird Kid: when the students are comparing before and after for differences, they should make sure to look at the full email headers. Often, this meta information is as important as the content. Importing into Thunderbird is a little tricky in that you can't use Thunderbirds Import feature its buggy. The Emailchemy user manual outlines a different process. Usually, if people are looking to preserve data, I usually recommend keeping the email in EML format for reasons of portability and robustness. Emailchemy writes the files in folders sorted as they were email client and names the files so that they are easily sorted by timestamp. Most desktop OSs can index and search these files easily, and the files are easily opened and rendered by most email clients. The one downside is attachments -- they don't get indexed because they are stored in base64 MIME parts. In your paper (5-10 pages), discuss the following: 1. The ease or difficulty installing and configuring the Emailchemy program. 2. The ease or difficulty of importing your email into Emailchemy. 3. The ease or difficulty of converting your email to the EML format. 4. The ease or difficulty of importing your email into the Mozilla Thunderbird email client. 5. The detectable differences, if any, between the messages and attachments in their original formats and as displayed by the Mozilla Thunderbird email client. 6. Your conclusions on the effectiveness and the efficiency of preserving email through this type of software. For the purposes of this assignment, effectiveness is defined as how completely the Emailchemy software preserved the email messages with minimal detectable differences between the messages and attachments in their original formats and as displayed by the Mozilla Thunderbird email client. Again for the purposes of this assignment, efficiency is defined as the amount of time and effort you had to invest in the project, in proportion to the volume of email preserved AND its effectiveness. Assignment 4: Prepare an outline, with some illustrative details, of the final course assignment for review and discussion in class on Thursday 26 April. This is a group assignment (groups of five to six students each). Because each group will be required to work together on a single document, it is recommended that students have access to some form of collaborative Web-based tool, such as Google Docs. Students will be assigned to groups and team leaders appointed by the instructor.
97412238.doc
Project Description (draft) It is for a company we'll call Awesome Oil & Gas (AOG). AOG is a small company with about 300 employees in its head office in Calgary, Alberta. It is owned by a large US oil company that also has interests in Houston, TX. AOG is profitable, drilling exploratory and producing wells in Alberta and Saskatchewan. AOG has an active document management (DM) group, but there is no formal Records Management in place. DM consists primarily of a two functions. First, they operate a central file room where employees request records, in particular well files. A well file holds ALL information about a particular land location (even if there have been multiple wells at that location). A small, recently-created well file is a couple of inches thick. An old, active well file can be a few feet thick with information going back prior to the mid-1950's. This information is not sent to offsite storage because engineers and geophysicists look at historical information regularly in order to make decisions about what to do next. The second function of DM is scanning. They scan company documents (95% of them are financial records, such as Accounts Payable). They're a happy bunch, feeling like they're doing something to push the company into the 21st century. One Spring day with a hint of promise in the air, AOG purchases another oil and gas company called Pretty Good Oil & Gas (PGOG). It is common for a purchased company to send its well files to the purchaser as soon as possible, and it is no different in this case. However, DM learns very quickly that PGOG was terrible at managing their well files - documents in the file are loose, unordered, mixed up, and generally a big fat mess. High up the chain of command it was decided that it would be good if they were scanned, since technology solves all problems. Of course, if you scan a big mess, you get a big scanned mess. The files were scanned by a third-party vendor, who pleaded with AOG to organize the files somewhat first. The pressure from high up to scan quickly prevailed, however, and so the vendor scanned each well file as one large PDF (some as large as 40 MB in size). A descriptive filename was given to each PDF and AOG stored them on a network drive for employees to look at when they needed. Because AOG wanted to utilize the PGOG assets as quickly as possible, the files got accessed right away, whether in their electronic version or in their paper version in the file room. Cries of anger arose quickly from the business units, where employees had great difficulties using the files. It impacted the bottom line, and so another project quickly developed to clean up the well files. The project took on two dimensions. The first dimension was to organize the paper file into industry-accepted patterns that the company was used to. That meant sorting them into 12 specific groups of documents, and then in reverse chronological order. Some in the DM group knew well files very well and were able to accomplish this task. The second dimension was to organize the electronic counterparts. One would think that it'd be easier just to abandon the electronic files, but a combination of political will (in order to save face) and younger engineers who saw the promise in an electronic version was enough for the digital re-organization to proceed. Two people were hired to re-organize the electronic versions, add new material, and make them comparable to the paper versions. Instead of one big PDF, each well file consisted of several PDF documents indexed by land location, document type, and date. The files were imported into an electronic document management database (EDMS) and employees were given access and search tools to find the documents.
97412238.doc
Thus peace returned to AOG. While older employees still demanded the paper well file, younger engineers commented that in some cases they could do in half an hour what would have taken most of the day if the electronic files had not existed. Issues for analysis in the final course project: TBD
97412238.doc
Course Calendar
February 2012 Monday 6 13 Tuesday 7 14 Wednesday Thursday 1 2 Class 1: Introduction. Assignment 1 begins. 8 9 Class 2: The extent and nature of the problem. 15 16 Class 3: Archival theory and diplomatics; Authenticity, Integrity and Trust. Assignment 1 due before class. 22 23 Class 4: Archival theory and diplomatics; Authenticity, Integrity and Trustcontinued. Assignment 2 begins. 29 March 2012 Monday Tuesday Wednesday Thursday 1 Class 5: A functional framework. Assignment 2 due before class. Guest lecturer: Nicholas Webb 8 Class 6: A functional frameworkcontinued. Assignment 3 begins. 15 Class 7: Preservation strategies 22 Class 8: Preservation strategiescontinued. 29 Class 9: Guest presentation (Skype): Fynnette Eaton April 2012 Monday 2 9 16 Tuesday 3 10 17 Wednesday 4 5 11 12 18 19 Thursday Class 10: Data formats Spring Recessno class Class 11: Data formatscontinued. Assignment 3 is due before class. Assignment 4 begins. Guest presentation (Skype): Wayne Hoff Class 12: Metadata. Assignment 4 due before class. Friday 6 13 20 Friday 2 9 16 23 30 Friday 3 10 17
20
21
24
27
28
5 12 19 26
6 13 20 27
7 14 21 28
23 30 Monday
24
25
26
27
May 2012 Tuesday 1 8 15 22 Wednesday 2 3 9 16 23 Thursday Class 13: Metadatacontinued. Guest lecturer: Rebecca Guenther. 10 Class 14: Data curation 17 Class 15: Digital preservation and the law 24 Class 16: presentation of final course project. Due before class. Friday 4 11 18 25
7 14 21
97412238.doc