🎉 Paper Acceptance to KDD 2024! 🎉
👉 Excited to share that our paper, “EcoVal: Efficient Data Valuation Framework for Machine Learning Models,” has been accepted to KDD 2024! 🎉
About the Paper
Quantifying the value of data within a machine learning workflow is crucial for making strategic decisions. Existing frameworks based on Shapley values are often too computationally expensive due to the need for repeated model training. In our paper, we introduce EcoVal, a novel and efficient framework for data valuation. By working with clusters of similar data points rather than individual samples, EcoVal estimates data value quickly and practically.
Our method innovatively formulates model performance as a production function—a concept from economic theory—to determine the intrinsic and extrinsic value of data. We provide a formal proof of our technique and demonstrate its effectiveness with both in-distribution and out-of-sample data, addressing the challenge of efficient data valuation at scale.
Key Contributions
- EcoVal Framework: Efficiently estimates data value by clustering similar data points.
- Economic Theory Application: Uses the production function concept to determine data value.
- Formal Proof: Provides a robust theoretical foundation for the technique.
- Real-World Application: Demonstrates effectiveness with both in-distribution and out-of-sample data.
Acknowledgements
A great collaborative effort by the amazing team:
- Vikram Singh Chundawat
- AYUSH KUMAR
- TARUN
- Hong Ming Tan
- Bowei Chen
- Mohan Kankanhalli
We look forward to discussing our findings and connecting with fellow researchers in Barcelona!
Join the Conversation
Follow our journey and join the conversation using the hashtags: #KDD2024 #DataScience #MachineLearning #AI #DataValuation #ResearchInnovation #ACMSIGKDD