ClueWeb09_Anchors (anchor text derived from CMU's ClueWeb09 web crawl)
Djoerd Hiemstra

folder ClueWeb09_Anchors (132 files)
filepart-00131.gz 185.91MB
filepart-00130.gz 184.87MB
filepart-00129.gz 185.22MB
filepart-00128.gz 185.23MB
filepart-00127.gz 185.27MB
filepart-00126.gz 184.97MB
filepart-00125.gz 185.40MB
filepart-00124.gz 184.85MB
filepart-00123.gz 187.35MB
filepart-00122.gz 184.74MB
filepart-00121.gz 185.53MB
filepart-00120.gz 184.24MB
filepart-00119.gz 185.58MB
filepart-00118.gz 185.17MB
filepart-00117.gz 185.33MB
filepart-00116.gz 184.43MB
filepart-00115.gz 185.54MB
filepart-00114.gz 184.62MB
filepart-00113.gz 184.99MB
filepart-00112.gz 184.95MB
filepart-00111.gz 184.78MB
filepart-00110.gz 185.16MB
filepart-00109.gz 184.87MB
filepart-00108.gz 185.00MB
filepart-00107.gz 184.96MB
filepart-00106.gz 185.94MB
filepart-00105.gz 185.14MB
filepart-00104.gz 185.67MB
filepart-00103.gz 185.38MB
filepart-00102.gz 185.18MB
filepart-00101.gz 185.84MB
filepart-00100.gz 184.54MB
filepart-00099.gz 185.70MB
filepart-00098.gz 185.13MB
filepart-00097.gz 185.61MB
filepart-00096.gz 185.00MB
filepart-00095.gz 184.77MB
filepart-00094.gz 187.29MB
filepart-00093.gz 185.42MB
filepart-00092.gz 185.67MB
filepart-00091.gz 185.35MB
filepart-00090.gz 184.63MB
filepart-00089.gz 184.56MB
filepart-00088.gz 184.78MB
filepart-00087.gz 184.69MB
filepart-00086.gz 186.81MB
filepart-00085.gz 184.20MB
filepart-00084.gz 185.11MB
filepart-00083.gz 185.43MB
Too many files! Click here to view them all.
Type: Dataset
Tags: web, ClueWeb, HTML, CMU, Twente, anchors, TREC

Bibtex:
@article{,
title= {ClueWeb09_Anchors (anchor text derived from CMU's ClueWeb09 web crawl)},
journal= {Technical Report TR-CTIT-10-15, Centre for Telematics and Information Technology University of Twente, Enschede. ISSN 1381-3625},
author= {Djoerd Hiemstra},
year= {2010},
url= {http://mirex.sf.net},
license= {http://creativecommons.org/licenses/by/4.0/},
abstract= {Anchor texts extracted from ClueWeb09

https://djoerdhiemstra.com/2010/anchor-text-for-clueweb09-category-a/},
keywords= {web, ClueWeb, HTML, CMU, Twente, anchors, TREC},
terms= {},
superseded= {}
}


Send Feedback