Open Dataset Pulse · Mar 19 Daily Digest
New Biological Datasets
- 🔥 MetaOmics-10T: MetaOmics-10T is a foundational dataset with 25K diverse microbiome samples and 10M reads, passed...

Created by DNVR
Latest public AI dataset releases, best‑practice guides, and benchmark leaderboard updates
Explore the latest content tracked by Open Dataset Pulse
Key trend in massive omics data for biology AI:
SocialOmni introduces a benchmark for audio-visual social interactivity in omni models, advancing evaluation of multimodal capabilities in social scenarios. Join the paper discussion for latest insights.
Massive public dataset releases turbocharge genomics AI pipelines:
Rahul Sarkar's 32-minute workshop talk explores datasets and challenges in accurate graph inference from images and documents—key for multimodal document understanding research. Watch on YouTube via ICMS channel.
Rising trend in domain-specific LLM evals:
Combine foundation-model surrogates for active learning with full pipeline best practices:
FHIBE debuts as the first publicly available, consent-driven, globally diverse dataset for bias evaluation in human-centric CV tasks.
Key...
Streamlines high-dimensional bio analysis by training ML models like GateNet on labeled data for reproducible gating.
Key features:
Emerging open tools drive scalable synthetic dataset trend, skipping real-world collection:
Exciting new dataset for spike-based visual recognition:
Autonomous trucking demands enormous datasets for robust AI:
Fresh benchmark for small AI models just dropped on Hugging Face:
Key unlock for AI researchers: Purdue’s Data to Science Initiative (D2S) joins AWS Open Data Sponsorship Program, making global geospatial UAV...
Key breakthroughs in ag AI datasets:
Rising trend in multimodal evaluation datasets tackling complex scenarios:
Best practices shift: Quality curation over sheer volume for research-grade models.
Hands-on demo for synthetic data to fix gaps:
OpenFold announces a major update and public release of training data. The OpenFold3 white paper reports competitive performance versus AlphaFold3 across broad modalities and tasks – ideal for open protein folding research.