Prepublication version: please consult with the author ( before citing.

Bringing Digital Data Management into Methods Courses

American Anthropological Association

Linguistic Anthropology module

Arienne M. Dwyer

version 2016-06-10 (revision history)

Author note: The ppt/rtf formats required here by the American Anthropological Association do not meet sustainability requirements for current best data practices.

Aim: To introduce best practices in data management for researchers in linguistic anthropology.

Timeline: This unit can serve either as a one- or two-week standard university course or a short-term (e.g. 1.5 day) intensive workshop.

Target audience in any country: (Post-)Graduate students before and after data collection; post-doctoral researchers; early-and mid-career faculty; community members; collaborative projects.

Table of Contents

Course Guide 2

Course Content 3

1 Ensuring the future of your data and avoiding catastrophe 3

2 The basics: Working with data 6

3 What are our responsibilities? 9

4 Archiving and re-use of data 11

5 Making the most of your data (Optional additional unit) 12

Acknowledgements 13

References 14

Appendices 15

Course Guide

This course guide may be used as a course introduction.

  1. Data management is crucial for good research. Scholarship creates a range of data forms that require converting, analyzing, storing, and sharing. As scholars, we have a responsibility to make sure that data endure into the future in accessible formats. These data are gathered by researchers (often from participants), and usually receive input from many people. We are therefore also are responsible for the ethical, legal and intellectual property issues arising from these data, including proper attribution/anonymization and adhering to conventions and laws of relevant locales. Archiving and sharing the output of research in print and online venues (a.k.a. publication) requires attending to best practices in data management. Most research funders now require a data management plan, in order that your results and data be enduring and public.

  1. Beyond scope: This course does not cover methods of obtaining grants or human subjects permission. It is not a tutorial on intellectual property or other national or international laws. It is also not a guide to digitization or format conversion. Some further resources on these topics are found in the references and appendices; some will need to be added.

  1. Data management and archiving begin at research design, not after the data are collected.

  1. Ethics begins at research design: we have a responsibility to plan and carry out research in partnership with a community; to ensure that the work benefits that community, as well as our institutions, our funders, and ourselves; to ensure that our work conforms to local moral and ethical practices, institutional regulations, as well as national and international laws.

  1. Data types in linguistic anthropology: Linguistic anthropological methods reflect the inter­disciplinary nature of the sub-discipline, and overlap with qualitative and quantitative methods for linguistics (e.g. documentary linguistics, sociolinguistics, discourse analysis, cognitive tasks) and cultural anthropology (participant observation, interviewing, surveying, and re­flexivity). Linguistic anthropologists are likely to work with human subjects. They are likely to generate data that includes any or all of the following: audio and/or video (A/V) recordings and transcriptions of them (often with translation and grammatical annotation, which is sometimes time-aligned); notes, sketches, images (photographs, maps (georectified or not), and diagrams), spatial data, artifacts and other physical data; websites, blogs, emails or other Internet-based communication; ultrasound, and MRI; word lists, grammatical paradigms, sentences, grammaticality judgments of them, and texts; the texts may include printed, handwritten, or electronic texts, questionnaires, surveys, and inscriptions, as well as metadata about these primary research data. These data may be digital or non-digital, structured or unstructured.

  1. Pretty good practice is good enough. Good practices are not out of reach; don't let “best practices” or this course keep you from learning pretty good practice (EMELD 2006).

  1. Key data management practices are common to all anthropologists. To maximize access to and ethical use of anthropological data, the AAA advocates unified guidelines for data management areas in common among all sub-disciplines. Data management methods common to all anthropologists can be summarized in four points:

  1. Data should be put into an enduring format;

  2. Data should be discoverable via metadata;

  3. Data should be archived; and

  4. Data gathering, archiving, and dissemination should be fully consultative and with the permission of involved participants.

Course Content

The course sessions present an overview of the key issues in data management via a tip of the iceberg approach: the digital data workflow from research design and project planning to data creation, data management and analysis to preservation, reusability and publication. Each of the five numbered units (four main units and one optional unit) can constitute one or more class session(s); the contents can be covered in more or less detail depending on available time. Each unit requires participants to come up with examples from their own experience and apply that unit's concepts to those examples. Ideally, the instructor (and/or a future version of this course) would also provide use cases for each unit.

1 Ensuring the future of your data and avoiding catastrophe

Aim: To provide an overview of the issues; each bullet is relevant, no matter what the project. Each of these introductory issues is revisited later in the course. Bullet points can be exemplified by the instructor.


Planning data management entails negotiation with research subjects and/or a community, about:

See Unit 3 and the Appendices below.


Exercise: Create a first draft of a Data Management Plan (1-2 pp.) and answer the two reflection questions.

Reflection questions

2 The basics: Working with data

Aim: To introduce the many data types and formats, and to describe the minimum a researcher must do to create enduring data. Part two (regarding software) will need regular updating.

Unit 2, Part One: Data and metadata

Projects that create digital data during research (“in the field”) may need immediate storage for large files (e.g A/V recordings and their associated metadata). DMPs describe each “field” and archival data type, and follow best practices for storing originals and altered versions.

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” (NISO 2004).

Instructor: can provide examples of data/metadata in discourse analysis, documentary ethnolinguistics, language socialization


Unit 2, Part Two: Tools (software)

Specific tools rapidly become obsolete; these will need regular updating. In any case, open-source tools that allow maximal re-use are preferable.

3 What are our responsibilities?

Aim: To discuss researchers' ethical and legal responsibilities and Intellectual Property Rights. It's best to confront these issues during project planning, well before an IRB application. Attention to ethics equals good data. Also emphasized are the limits of Open Access: full consultation with communities forms the basis for solid data management plans and sharing arrangements that align with community norms.



4 Archiving and re-use of data

  1. Why should I archive?

  1. Data care (backup and data protection):

  1. Key archives for linguistic anthropologists

  1. Archival Concepts:

  1. Mobilization: Re-using your outputs


5 Making the most of your data (Optional additional unit)

Below five separate topics are outlined, whose only commonality is that they are beyond introductory. Each topic awaits further development.

  1. Using Regular Expressions (RegEx) to convert data into new forms

  1. Working collaboratively at great distance (remote data access; collaboration environments, ...)

  1. Making data and websites accessible to people of all abilities (e.g. colorblind, hearing/sight impaired, multilingual, non-English speaker, elderly etc.), and to people with slow internet connections. See the W3C's Web Accessibility Initiative recommendations.

  1. Establishing your own digital archive (optional topic if there's interest)

  1. Planning for data re-use



This module has benefitted from and incorporated the specific comments of Philip Cash Cash, Jenny Cashman, Fatimah Williams Castro, Sara Gonzales, Candace Greene, Jared Lyle, Christine Mallison, Ricardo Punzalan, Thurka Sangaramoorthy, and Stephanie Simms. Naturally, the current author is responsible for any errors or infelicities, and thanks the AAA and the U.S. N.S.F. for its sponsorship and guidance of this effort.


Websites mentioned and/or linked in this document do not necessarily represent the views of the author or the American Anthropological Association. Commercial websites mentioned and/or linked here are intended as examples, and do not represent the endorsement of the author or the AAA.

Dwyer, Arienne M. 2006. Ethics and Practicalities of Cooperative Fieldwork and Analysis. In Gippert, Jost, Mosel, Ulrike and Nicolaus Himmelmann, eds. 2006. Fundamentals of Language Documentation: A Handbook. Berlin: Mouton de Gruyter, pp. 31-66. Web preprints. [english] [español]

EMELD [Electronic Metastructure for Endangered Languages Data] 2006. Working Group 1 report on Collecting Primary Texts. (Marianna Di Paolo, Gary Holton,
Susan Smith, Arienne Dwyer, Steve Moran, Doug Whalen, Julia Good Fox, and Barbara Need.) Web.

Flanders, Julia and Trevor Muñoz. 2016. An Introduction to Humanities Data Curation. Web.

IPinCH [Intellectual Property Issues in Cultural Heritage Project]. 2016. Web.

IPinCH. 2016. Factsheet: Traditional Knowledge. Web.

IPinCH [Intellectual Property Issues in Cultural Heritage Project]. 2015. Think Before You Appropriate. Things to know and questions to ask in order to avoid misappropriating Indigenous cultural heritage. Simon Fraser University: Vancouver. Web.

Levine, Melissa. 2016. Policy, Practice and Law. In Flanders and Muñoz. Web.

Library of Congress. 2013. Sustainability of Digital Formats. Web.

Library of Congress. 2013. Recommended formats. Web.

Newman, Paul. 2007. Copyright Essentials for Linguists. Language Documentation and Conservation 1.1. Web.

Nichols, George, Catherine Bell, Rosemary Coombe, John R. Welch, Brian Noble, Jane Anderson, Kelly Bannister, and Joe Watkins. 2010. Intellectual Property Issues in Heritage Management

Part 2: Legal Dimensions, Ethical Considerations, and Collaborative Research Practices. Cultural Heritage Management 3.1:117-147.

NISO [National Information Standards Organization] 2004. Understanding Metadata. Web.

OLAC [Open Languages Archiving Community] 2008. Metadata. Web.

Simons, Gary F. 2006. Ensuring that digital data last: The priority of archival form over working form and presentation form. SIL Electronic Working Papers 2006-003. Web.

Stanford University Libraries. Best Practices for File Formats. Web.

UNESCO. 2003. Best practices on Indigenous Knowledge. Management of Social Transformations Programme. Web.

Van den Eynden, V., and L. Bishop. 2014. Incentives and motivations for sharing research data, a researcher’s perspective. A Knowledge Exchange Report. Web.

van Driem, George. 2016. Endangered Language Research and the Moral Depravity of Ethics Protocols. Language Documentation and Conservation 10:243-252. [pdf]

W3C [World Wide Web Consortium]. 2016. Data on the Web Best Practices. Latest published version. Web.

W3C [World Wide Web Consortium]. 2016. Web Accessibility Initiative. Web.


General Resources for all anthropologists

Resources specific to linguistic anthropologists

Revision history:

Feedback on this document is welcome.

Optimization of this module will require regular updating.

The Bringing Digital Data Management into Methods Courses: Linguistic Anthropology Module by Arienne M. Dwyer is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.