Codebook for SAS Dataset: TOPICS2
Dataset
- Dataset Label
- A SAS Dataset generated from minutes from NSF1448107 group at Dagstuhl event 14432
- Date Created
- 2014-11-09T11:43:30.8
- Date Last Modified
- 2014-11-09T11:43:31.0
- Number of Observations
- 25
- Number of Variables
- 8
- Encoding
- wlatin1 Western (Windows)
- Engine
- V9
__________________extended attributes___________________
- Abstract
- This dataset was created from the October 20-22 minutes of the NSF1448107 sponsored group attending Dagstuhl event 14432. Topics were generated using default settings of SAS Text Miner on the concatenated raw minutes files, one record per paragraph, for the first three days of the meeting.
- AccessRights
- Freely available, with attribution
- Contributor
- Mary Vardigan(conceptualization, equal), Sam Hume(conceptualization, equal), Sanda Ionescu(conceptualization, equal), Jay Greenfield(conceptualization, equal), Jeremy Iverson(conceptualization, equal), John Kunze(conceptualization, equal), Barry Radler(conceptualization, equal), Stuart Weibel(conceptualization, equal), Michael C. Witt(conceptualization, equal)
- Creator
- Larry Hoyle
- Description
- This dataset is intended as an example for attaching source information to a dataset and a variable.
- FundingInformation
- This dataset was created during Dagstuhl event 14432 by a group funded from NSF grant number1448107.
- Language
- en-US
- License
- Freely available, with attribution
- ParentDatasets
- Cite.Minutes, Cite.Topics
- Permanence
- Permanent: Unchanging Content
- PublicationDate
- 2014-11-17
- Publisher
- University of Kansas
- RelatedResourceAuthor
- Jay Greenfield, Larry Hoyle, Sam Hume, Sanda Ionescu, Jeremy Iverson, John Kunze, Barry Radler, Mary Vardigan, Stuart Weibel, Michael C. Witt
- RelatedResourcePublicationDate
- 2014-11-17
- RelatedResourcePublisher
- University of Kansas
- RelatedResourceRelationship
- isDerivedFrom
- RelatedResourceTitle
- Minutes for Oct 20-22 2014 from NSF1448107 group at Dagstuhl event 14432
- ResourceType
- dataset
- SpatialCoverage
- Schloss Dagstuhl, Wadern, Germany
- Study_AnalysisUnit
- paragraphs from raw minutes files
- Study_CollectionMethodology
- Minutes were generated as Google Docs, one for each day at the Dagstuhl workshop. All participants could simultaneously edit the daily minutes file. Minutes for 2014-10-20, 2014-10-21, and 2014-10-22were copied from downloaded Microsoft Word files and concatenated into a single text file. This file was read into SAS and then used as input for SAS Text Miner with all default options chosen. The Topics results table was exported as this SAS dataset
- Study_FundingInformation
- Participant travel and accomodations funded by NSF grant 1448107
- Study_KindOfData
- SAS Text Miner Topics results table. Derived topics descriptions, dataset includes metadata in SAS extended attributes
- Study_ProcessingDescription
- /* data read from concatenated minutes */ filename mins "C:\DDRIVE\projects\various\DDI\NSFDearColleague\MineMinutes\Oct20_22Minutes.txt"; libname minlib "C:\DDRIVE\projects\various\DDI\NSFDearColleague\MineMinutes"; data minlib.minutes; infile mins lrecl=520 pad; input para $520.; run; /* Dataset CITE.Topics generated from SAS Text Miner from minlib.minutes, CIte.Topics2 then generated by */ data CITE.TOPICS2; set CITE.TOPICS; Length TopicDescription $ 1000; TopicDescription = catx(" ","Topic ",_topicID," has ",_numDocs," documents and",_name," as Terms:"); run;
- Study_Purpose
- a sample dataset for enhanced data citation
- TemporalCoverage
- 2014-10-20 to 2014-10-22
- Title
- Topics generated from minutes from NSF1448107 group at Dagstuhl event 14432
- TopicalCoverage
- Enhanced citation
- Version
- 1.0
- VersionDate
- 2014-11-17
- VersionResponsibility
- Larry Hoyle
Variables
1 Variable: _displayCat
- Label
- Category
- Type: Character - Length
- 16
- Transcode
- yes
- SortedBy
- 0
2 Variable: _topicid
- Label
- Topic ID
- Type: Numeric, internal bytes
- 8
- Transcode
- yes
- SortedBy
- 0
3 Variable: _docCutoff
- Label
- Document Cutoff
- SASFormat
- 5.3
- Type: Numeric, internal bytes
- 8
- Transcode
- yes
- SortedBy
- 0
4 Variable: _termCutoff
- Label
- Term Cutoff
- SASFormat
- 5.3
- Type: Numeric, internal bytes
- 8
- Transcode
- yes
- SortedBy
- 0
5 Variable: _name
- Label
- Topic
- Type: Character - Length
- 100
- Transcode
- yes
- SortedBy
- 0
6 Variable: _numterms
- Label
- Number of Terms
- Type: Numeric, internal bytes
- 8
- Transcode
- yes
- SortedBy
- 0
7 Variable: _numdocs
- Label
- # Docs
- Type: Numeric, internal bytes
- 8
- Transcode
- yes
- SortedBy
- 0
8 Variable: TopicDescription
- Type: Character - Length
- 1000
- Transcode
- yes
- SortedBy
- 0
_____________extended attributes_________
- AccessRights
- Freely available, with attribution
- AnalysisUnit
- paragraphs
- Concept
- A label for a topic generated by SAS Text Miner combining the topic number, the number of documents relating to teh topic and the key descriptive terms for the document.
- Contributor
- Mary Vardigan(writing – review & editing, lead)
- Creator
- Larry Hoyle
- Description
- A variable to be used with a Topics results dataset produced by SAS Enterprise Miner Test Miner. One string includes topic number, number of related Documents, and key terms.
- GenerationInstruction
- TopicDescription = catx(" ","Topic ",_n_," has ",_numDocs," documents and",_name," as Terms:");
- Language
- en-US
- LevelOfMeasurement
- Nominal
- Permanence
- Permanent: Unchanging Content
- ProcessingDescription
- computed with the following SAS assignment statment: TopicDescription = catx(" ","Topic ",_topicID," has ",_numDocs," documents and",_name," as Terms:"); _topicID, _numDocs, and _name are standard variable names from an unsorted Topics dataset saved from Enterprise Miner.
- PublicationDate
- 2014-11-14
- Publisher
- University of Kansas
- ResourceType
- Variable
- Role
- Potentially useful for topic labeling
- Title
- Topic Descriptor Combining Sequence Number, Number of Related Documents, and Terms List From A SAS Text Miner Text Topics Node Result Table
- VariableIdentifier
- TopicDescription
- Version
- 1.0
- VersionDate
- 2014_10_23
- VersionResponsibility
- Larry Hoyle
Codelists (Formats, Value Labels)
There were 0 formats defined in the SAS session which generated this documentation. Note that not all of these formats were necessarily in use by a variable.
__________________SAS INFORMATS___________________
SAS variables may also have and associated 'informat' which describes how the variable is to be read from a text representation.
SAS INFORMATS
SAS variables may also have an associated 'informat' which describes how the variable is to be read from a text representation. There were 0 informats defined in the SAS session which generated this documentation. Note that not all of these informats were necessarily in use by a variable.
codebook generated at 11/9/2014 11:48:43 AM (11/9/2014 05:48:43 PM UTC)