Private user

Private user posted an update

GenDataset-Gemini3pro Creating reliable datasets for machine learning is time-consuming and often limited by data availability, quality, and reusability. Our goal is to automate dataset creation while maintaining realism, structural accuracy, and long-term usability across multiple machine learning workflows. To achieve this, the application uses Gemini 3 Pro as the core intelligence layer for custom AI-driven dataset generation. Users can configure dataset requirements through multiple customizable options (as demonstrated in the video), including dataset domain, column structure, data size, data types, and contextual constraints. Gemini 3 Pro leverages its advanced reasoning and instruction-following capabilities to generate structured, high-quality datasets that closely align with userdefined specifications. To further enhance data quality and realism, the system integrates the Kaggle API, allowing the generator to reference existing datasets across various categories. These reference datasets provide grounding patterns for feature relationships, value distributions, and schema consistency, enabling Gemini 3 Pro to optimize and enrich the generated data rather than relying solely on synthetic patterns. Once generated, datasets can be downloaded in the required format. Simultaneously, the dataset and its metadata are stored in MongoDB, enabling versioning and traceability. Stored datasets are also reused as references for future, similar requests, reducing redundant computation and continuously improving dataset quality over time. Overall, Gemini 3 Pro is central to enabling scalable, intelligent, and reusable dataset generation tailored for machine learning applications.

Log in or sign up for Devpost to join the conversation.