DataComp for Language Models (DCLM): An AI Benchmark for Language Model Training Data Curation

DataComp for Language Models (DCLM): An AI Benchmark for Language Model Training Data Curation