Transparency Initiative in Artificial Intelligence Training Data from Microsoft

Microsoft has announced a research project called “Training-Time Provenance” aimed at increasing transparency and fairness in the training process of artificial intelligence models.
WHAT IS TRAINING-TIME PROVENANCE?
The research project by Microsoft aims to analyze the impact of individual data sources (such as a photo collection or text contents) used in training artificial intelligence models on the outputs. This approach relies on a technology that can be called “data contribution analytics.” Artificial intelligence models are usually trained on large amounts of diverse data sets, and the impact of each of these data on the model generally remains unclear. The system developed by Microsoft can determine how much a specific piece of data contributed to the output of artificial intelligence through mathematical and analytical methods.
RIGHTS OF DATA CONTRIBUTORS
Jaron Lanier, one of the leading figures in the Microsoft Research team, emphasizes that this project highlights a fundamental concept of “data dignity.” According to Lanier, those who contribute data that plays a significant role in the success of artificial intelligence models should be recognized and, if necessary, rewarded. For example, when an artificial intelligence model creates an image inspired by a piece of art, the original creators of the artwork can be recognized for their contributions or be subjected to a reward system based on copyright. TECHNOLOGICAL DETAILS The system developed by Microsoft is based on contribution tracking algorithms that measure the impact of each data piece used in the training process of artificial intelligence models on the outputs. These algorithms analyze the role of data sources in the outputs of artificial intelligence and determine which data is more effective on the model. Furthermore, the system allows users to see the effects of data sources through transparency reporting tools, and this information can be used as a reference for copyright payments or recognition processes. The system rates the impact of data sources on the model, enabling a fair reward or recognition mechanism to be established.
LEGAL AND ETHICAL DIMENSIONS
This innovative approach is gaining importance, especially in light of copyright lawsuits and ethical debates against artificial intelligence companies in recent times. Microsoft states that this system aims to raise ethical standards in artificial intelligence technologies and resolve legal issues. Microsoft’s move has the potential to create a fairer and more transparent system in the artificial intelligence ecosystem. The system for recognizing and rewarding contributors of training data can be beneficial not only for individual creators but also for communities and organizations that generate data on a large scale. This project is seen as a significant step towards developing artificial intelligence in a more responsible and ethical manner. Microsoft’s approach could serve as an example for other companies in the technology sector.