LDC2015T13_Penn_Treebank_revised.tar.zst |
6.86MB |
Type: Dataset
Bibtex:
Tags:
Bibtex:
@article{,
title= {Penn Treebank Revised: English News Text Treebank LDC2015T13},
journal= {},
author= {Ann Bies and Justin Mott and Colin Warner},
year= {2015},
url= {https://doi.org/10.35111/xpjy-at91},
doi= {10.35111/xpjy-at91},
isbn= {1-58563-724-6},
dcmi= {text},
languages= {english},
language= {english},
ldc= {LDC2015T13},
abstract= {# Penn Treebank Revised: English News Text Treebank - 2015
## Metadata
* Item Name: English News Text Treebank: Penn Treebank Revised
* Author(s): Ann Bies, Justin Mott, Colin Warner
* LDC Catalog No.: LDC2015T13
* ISBN: 1-58563-724-6
* DOI: https://doi.org/10.35111/xpjy-at91
* Release Date: July 15, 2015
* Member Year(s): 2015
* DCMI Type(s): Text
* Data Source(s): newswire
* Application(s): parsing, tagging, part of speech tagging, natural language processing
* Language(s): English
* Language ID(s): eng
* License(s): LDC User Agreement for Non-Members
* Online Documentation: LDC2015T13 Documents
* Licensing Instructions: Subscription & Standard Members, and Non-Members
* Citation: Bies, Ann, Justin Mott, and Colin Warner. English News Text Treebank: Penn Treebank Revised LDC2015T13. Web Download. Philadelphia: Linguistic Data Consortium, 2015.
* Related Works: View
## Introduction
English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) annotation of Wall Street Journal (WSJ) stories. The data is comprised of 1,203,648 word-level tokens in 49,191 sentence-level tokens -- in all 2,312 of the original Penn Treebank WSJ files.
## Data
This release includes revised tokenization, part-of-speech, and syntactic treebank annotation intended to bring the full WSJ treebank section into compliance with the agreed-upon policies and updates implemented for current English treebank annotation specifications at LDC. Examples include English Web Treebank ([LDC2012T13](https://catalog.ldc.upenn.edu/LDC2012T13)), OntoNotes ([LDC2013T19](https://catalog.ldc.upenn.edu/LDC2013T19)), and English translation treebanks such as English Translation Treebank: An-Nahar Newswire ([LDC2012T02](https://catalog.ldc.upenn.edu/LDC2012T02)). English Treebank Supplemental Guidelines are included in this release.
## Samples
Please view this [treebank](https://catalog.ldc.upenn.edu/desc/addenda/LDC2015T13.tree.txt) and [tokenized](https://catalog.ldc.upenn.edu/desc/addenda/LDC2015T13.txt) samples.
## Updates
None at this time.
},
keywords= {nlp, english, natural language, corpus, news, text, newswire, Treebank, LDC, corpora, Penn Treebank, Penn, 2015, LDC2015T13, parsing, tagging, part of speech, WSJ, PTB},
terms= {},
license= {},
superseded= {}
}
LDC2015T13_Penn_Treebank_revised.tar.zst