ClueWeb12_Anchors (132 files)
part-r-00131.gz |
229.08MB |
part-r-00130.gz |
228.56MB |
part-r-00129.gz |
228.60MB |
part-r-00128.gz |
229.41MB |
part-r-00127.gz |
229.25MB |
part-r-00126.gz |
229.03MB |
part-r-00125.gz |
230.89MB |
part-r-00124.gz |
230.38MB |
part-r-00123.gz |
229.58MB |
part-r-00122.gz |
229.63MB |
part-r-00121.gz |
230.13MB |
part-r-00120.gz |
229.38MB |
part-r-00119.gz |
230.19MB |
part-r-00118.gz |
230.53MB |
part-r-00117.gz |
229.79MB |
part-r-00116.gz |
229.49MB |
part-r-00115.gz |
230.18MB |
part-r-00114.gz |
230.39MB |
part-r-00113.gz |
230.62MB |
part-r-00112.gz |
230.23MB |
part-r-00111.gz |
233.36MB |
part-r-00110.gz |
230.27MB |
part-r-00109.gz |
228.99MB |
part-r-00108.gz |
229.00MB |
part-r-00107.gz |
229.11MB |
part-r-00106.gz |
229.06MB |
part-r-00105.gz |
230.12MB |
part-r-00104.gz |
229.46MB |
part-r-00103.gz |
229.89MB |
part-r-00102.gz |
229.80MB |
part-r-00101.gz |
230.18MB |
part-r-00100.gz |
230.08MB |
part-r-00099.gz |
229.97MB |
part-r-00098.gz |
229.81MB |
part-r-00097.gz |
230.39MB |
part-r-00096.gz |
230.97MB |
part-r-00095.gz |
229.91MB |
part-r-00094.gz |
229.71MB |
part-r-00093.gz |
228.87MB |
part-r-00092.gz |
229.47MB |
part-r-00091.gz |
229.07MB |
part-r-00090.gz |
230.91MB |
part-r-00089.gz |
230.78MB |
part-r-00088.gz |
230.69MB |
part-r-00087.gz |
229.19MB |
part-r-00086.gz |
228.93MB |
part-r-00085.gz |
229.70MB |
part-r-00084.gz |
229.16MB |
part-r-00083.gz |
229.78MB |
|
|
|
Type: Dataset
Bibtex:
Tags:
Bibtex:
@article{,
title= {ClueWeb12_Anchors (anchor text derived from CMU's ClueWeb12 web crawl) },
journal= {},
author= {Djoerd Hiemstra},
year= {2013},
url= {http://www.cs.utwente.nl/~hiemstra/2013/anchor-text-for-clueweb12.html},
license= {http://creativecommons.org/licenses/by/4.0/l},
abstract= {Anchor texts extracted from ClueWeb12
https://djoerdhiemstra.com/2013/anchor-text-for-clueweb12/},
keywords= {TREC, ClueWeb, HTML, web data, anchor texts, web search, Text Retrieval Conference, Univeristy of Twente},
terms= {},
superseded= {}
}
part-r-00131.gz