Name | DL | Torrents | Total Size |
ClueWeb12_Anchors (132 files)
part-r-00131.gz | 229.08MB |
part-r-00130.gz | 228.56MB |
part-r-00129.gz | 228.60MB |
part-r-00128.gz | 229.41MB |
part-r-00127.gz | 229.25MB |
part-r-00126.gz | 229.03MB |
part-r-00125.gz | 230.89MB |
part-r-00124.gz | 230.38MB |
part-r-00123.gz | 229.58MB |
part-r-00122.gz | 229.63MB |
part-r-00121.gz | 230.13MB |
part-r-00120.gz | 229.38MB |
part-r-00119.gz | 230.19MB |
part-r-00118.gz | 230.53MB |
part-r-00117.gz | 229.79MB |
part-r-00116.gz | 229.49MB |
part-r-00115.gz | 230.18MB |
part-r-00114.gz | 230.39MB |
part-r-00113.gz | 230.62MB |
part-r-00112.gz | 230.23MB |
part-r-00111.gz | 233.36MB |
part-r-00110.gz | 230.27MB |
part-r-00109.gz | 228.99MB |
part-r-00108.gz | 229.00MB |
part-r-00107.gz | 229.11MB |
part-r-00106.gz | 229.06MB |
part-r-00105.gz | 230.12MB |
part-r-00104.gz | 229.46MB |
part-r-00103.gz | 229.89MB |
part-r-00102.gz | 229.80MB |
part-r-00101.gz | 230.18MB |
part-r-00100.gz | 230.08MB |
part-r-00099.gz | 229.97MB |
part-r-00098.gz | 229.81MB |
part-r-00097.gz | 230.39MB |
part-r-00096.gz | 230.97MB |
part-r-00095.gz | 229.91MB |
part-r-00094.gz | 229.71MB |
part-r-00093.gz | 228.87MB |
part-r-00092.gz | 229.47MB |
part-r-00091.gz | 229.07MB |
part-r-00090.gz | 230.91MB |
part-r-00089.gz | 230.78MB |
part-r-00088.gz | 230.69MB |
part-r-00087.gz | 229.19MB |
part-r-00086.gz | 228.93MB |
part-r-00085.gz | 229.70MB |
part-r-00084.gz | 229.16MB |
part-r-00083.gz | 229.78MB |
Type: Dataset
Tags: TREC, ClueWeb, HTML, web data, anchor texts , web search, Text Retrieval Conference, Univeristy of Twente
Bibtex:
Tags: TREC, ClueWeb, HTML, web data, anchor texts , web search, Text Retrieval Conference, Univeristy of Twente
Bibtex:
@article{, title= {ClueWeb12_Anchors (anchor text derived from CMU's ClueWeb12 web crawl) }, journal= {}, author= {Djoerd Hiemstra}, year= {2013}, url= {http://www.cs.utwente.nl/~hiemstra/2013/anchor-text-for-clueweb12.html}, license= {http://creativecommons.org/licenses/by/4.0/l}, abstract= {Anchor texts extracted from ClueWeb12 https://djoerdhiemstra.com/2013/anchor-text-for-clueweb12/}, keywords= {TREC, ClueWeb, HTML, web data, anchor texts, web search, Text Retrieval Conference, Univeristy of Twente}, terms= {}, superseded= {} }