ClueWeb12_Anchors (anchor text derived from CMU's ClueWeb12 web crawl)
Djoerd Hiemstra

folder ClueWeb12_Anchors (132 files)
filepart-r-00131.gz 229.08MB
filepart-r-00130.gz 228.56MB
filepart-r-00129.gz 228.60MB
filepart-r-00128.gz 229.41MB
filepart-r-00127.gz 229.25MB
filepart-r-00126.gz 229.03MB
filepart-r-00125.gz 230.89MB
filepart-r-00124.gz 230.38MB
filepart-r-00123.gz 229.58MB
filepart-r-00122.gz 229.63MB
filepart-r-00121.gz 230.13MB
filepart-r-00120.gz 229.38MB
filepart-r-00119.gz 230.19MB
filepart-r-00118.gz 230.53MB
filepart-r-00117.gz 229.79MB
filepart-r-00116.gz 229.49MB
filepart-r-00115.gz 230.18MB
filepart-r-00114.gz 230.39MB
filepart-r-00113.gz 230.62MB
filepart-r-00112.gz 230.23MB
filepart-r-00111.gz 233.36MB
filepart-r-00110.gz 230.27MB
filepart-r-00109.gz 228.99MB
filepart-r-00108.gz 229.00MB
filepart-r-00107.gz 229.11MB
filepart-r-00106.gz 229.06MB
filepart-r-00105.gz 230.12MB
filepart-r-00104.gz 229.46MB
filepart-r-00103.gz 229.89MB
filepart-r-00102.gz 229.80MB
filepart-r-00101.gz 230.18MB
filepart-r-00100.gz 230.08MB
filepart-r-00099.gz 229.97MB
filepart-r-00098.gz 229.81MB
filepart-r-00097.gz 230.39MB
filepart-r-00096.gz 230.97MB
filepart-r-00095.gz 229.91MB
filepart-r-00094.gz 229.71MB
filepart-r-00093.gz 228.87MB
filepart-r-00092.gz 229.47MB
filepart-r-00091.gz 229.07MB
filepart-r-00090.gz 230.91MB
filepart-r-00089.gz 230.78MB
filepart-r-00088.gz 230.69MB
filepart-r-00087.gz 229.19MB
filepart-r-00086.gz 228.93MB
filepart-r-00085.gz 229.70MB
filepart-r-00084.gz 229.16MB
filepart-r-00083.gz 229.78MB
Too many files! Click here to view them all.
Type: Dataset
Tags: TREC, ClueWeb, HTML, web data, anchor texts , web search, Text Retrieval Conference, Univeristy of Twente

title= {ClueWeb12_Anchors (anchor text derived from CMU's ClueWeb12 web crawl) },
journal= {},
author= {Djoerd Hiemstra},
year= {2013},
url= {},
license= {},
abstract= {Anchor texts extracted from ClueWeb12},
keywords= {TREC, ClueWeb, HTML, web data, anchor texts, web search, Text Retrieval Conference, Univeristy of Twente},
terms= {},
superseded= {}

Send Feedback