News Score: Score the News, Sort the News, Rewrite the Headlines

Crawling a billion web pages in just over 24 hours

Contents Discussion on r/programming. tl;dr: 1.005 billion web pages 25.5 hours $462 For some reason, nobody's written about what it takes to crawl a big chunk of the web in a while: the last point of reference I saw was Michael Nielsen's post from 2012. Obviously lots of things have changed since then. Most bigger, better, faster: CPUs have gotten a lot more cores, spinning disks have been replaced by NVMe solid state drives with near-RAM I/O bandwidth, network pipe widths have exploded, EC2 ha...

Read more at andrewkchan.dev

© News Score  score the news, sort the news, rewrite the headlines