Flock | Notion

API Documentations

https://mellow-trader-6de.notion.site/152ba9bb589180a4adfde02bfad742e6

SynthGen Agent: Enhancing AI Models with Synthetic Data

The SynthGen Agent is designed to augment models trained on FLock's AI Arena by generating high-quality synthetic data. Utilizing initial datasets from FLock's training tasks, SynthGen employs advanced algorithms to produce datasets that enhance the robustness and performance of machine learning models. This tool is ideal for hackathon participants aiming to push the boundaries of AI capabilities and explore innovative solutions in synthetic data generation.

Scoring Criteria:

Participants' solutions will be evaluated based on the following criteria:

Innovation (25%): Originality and creativity in synthetic data generation approaches.
Impact on Model Performance (50%): Degree to which the synthetic data improves the performance of Large Language Models (LLMs).
Scalability (25%): Ability of the solution to handle large datasets and adapt to various scenarios.

Benchmarking Methodology:

To quantitatively assess the impact of synthetic data on model performance, we will fine-tune a specified model (e.g., LLAMA3.2 3B or Phi3) using both the original and the augmented datasets. The performance of each fine-tuned model will be evaluated using a relevant metric (e.g., accuracy, F1 score) on a designated test set.

Performance Improvement Calculation:

The improvement in model performance due to synthetic data can be calculated using the following formula:

Performance Improvement (%) = ((P_synthetic - P_initial) / P_initial) × 100

Where:

P_synthetic = Performance metric of the model trained with synthetic data.
P_initial = Performance metric of the model trained with the initial dataset (without synthetic data).

Sample Dataset

You can find test data, which will benchmark the model across different scenarios:

Text to SQL Dataset