DBbun LLC creates unique, high-quality synthetic datasets for research, analytics, and machine learning. DBbun’s datasets are completely synthetic, generated intelligently using advanced AI on publicly available resources. The DB stands for database, and the bun stands for bundling many pieces of data together in one place. Each dataset is a carefully assembled mix of variables, statistics, and outcomes.
DBbun’s mission is to build an extensive, evolving library of synthetic datasets that are:
Synthetic — no patient or customer data.
Public-domain based — generated only from open resources.
Responsive to demand — crafted in response to researcher, educator, and industry needs.
Cross-domain — starting with healthcare, but not limited to it.
DBbun was founded in September 2025 by Uri Kartoun, a data scientist, inventor, and PhD in Intelligent Systems with over 15 years of experience in real-world evidence, predictive modeling, and large-scale data solutions. Uri is the author of 80+ patents and has developed pioneering methods for generating and analyzing complex datasets.
During his fellowship at Harvard/Mass General Hospital, Uri created EMRBots, a non-profit project that generated synthetic EMR-like data long before generative AI existed.
EMRBots became widely used in teaching and research.
It inspired development of a new type of neural network.
Its popularity and impact on the scientific community laid the groundwork for DBbun.
Synthetic by design: Never based on real patients.
Advanced Generative AI: Transforms public scientific sources into new, high-quality datasets.
Immediate usability: Delivered in CSV/Parquet.
Commercial licensing: Straightforward terms for private, commercial, or enterprise use.
Startup Companies in Stealth or Early Growth: Need realistic datasets to test prototypes without privacy concerns. Useful for showing traction to investors or validating product pipelines.
Consulting Firms & Independent Analysts: Can run proof-of-concept analyses for clients without waiting for access to sensitive real-world data. Synthetic data helps them demonstrate methods, models, or dashboards.
Educational Institutions & Instructors: Professors and trainers can use synthetic datasets for hands-on workshops. Students can safely practice machine learning, statistics, and prediction modeling.
Hackathons, Bootcamps, and Training Programs: Organizers can provide ready-to-use, realistic datasets for competitions and training exercises.
DBbun LLC owns utility patent-pending technologies covering its unique approach to dataset generation and packaging.
All generated data is based on public-domain sources only.
No real patient data is used.
DBbun datasets are intended for research, teaching, prototyping, and analytics.
They are not suitable for clinical decision support or direct patient care.
Users are solely responsible for how the data is applied.
Users can contact us using the form below for any inquiry — including requests to create tailored new datasets based on specific resources.