DBbun LLC creates unique, high-quality synthetic datasets for research, analytics, and machine learning. DBbun’s datasets are completely synthetic, generated intelligently using advanced AI. DB stands for database, and bun stands for bundling many pieces of data together in one place. Each dataset is a carefully assembled mix of variables, statistics, and outcomes (click to explore the library). For inquiries, please contact [email protected].
- Introducing DBbun: Synthetic Data for a Data-Hungry World
- From Story to Dataset: Turning Speculative Fiction Into Data
- From EMRBots (2015) to Clinically-Informed Synthetic Patient Populations (October 2025)
- How AI Turns a Neighborhood Into a 400-Year Story
- An AI-Created Vision of 2045 Health Sensing
AI development and testing — experimenting with models when real data is unavailable or restricted.
Research and education — teaching machine learning, statistics, and analytics using fully synthetic data.
Prototyping and validation — exploring ideas, workflows, and pipelines before real-world data access is granted.
Government and innovation programs — simulation, stress-testing, and evaluating analytic methods in a controlled environment.
DBbun is an active, SAM-registered U.S. small business (UEI: QY39Y38E6WG8; CAGE: 16VU3) specializing in high-fidelity synthetic data generation and configurable simulation environments. The company supports research, defense applications, and advanced AI development by providing intelligent synthetic datasets when real data is unavailable, restricted, or insufficient.
DBbun was founded in September 2025 by Uri Kartoun, a data scientist, inventor, and PhD in Intelligent Systems with over 15 years of experience in real-world evidence, predictive modeling, and large-scale data solutions at Microsoft, IBM, and Harvard/Mass General Hospital. Uri is the author of 85+ patents and has developed pioneering methods for generating and analyzing complex datasets.

During his fellowship at Harvard/Mass General Hospital, Uri created EMRBots, a non-profit project that generated synthetic EMR-like data long before generative AI became popular.
EMRBots became widely used in teaching and research.
It inspired development of a new type of neural network.
Its popularity and impact on the scientific community laid the groundwork for DBbun.
All DBbun datasets are generated from public-domain information or fully synthetic, imaginary data, and no real patient or personally identifiable information is used. They are intended solely for research, teaching, prototyping, and analytics, and are not suitable for clinical decision support or direct patient care. Users are fully responsible for any use or application of the data.
DBbun owns utility patent-pending and trade-secret technologies that support its unique approach to synthetic dataset generation.