Data sharing networks usually have a strong industry focus, with agriculture, data-generating driving, manufacturing and telecom towing the line. They come to life and are operated in one of four ways:
The value of a data sharing network corresponds to the quality of its data. So how is the validity of the shared data being ensured? And how are participants punished for sending bad data? Especially if data is provided “manually” by the participant, rather than sent automatically through a verified device, and if companies have an incentive to provide false data, for example to paint themselves in a positive light or to disrupt the competition, these questions must be addressed.
In practice however, these questions around data quality often remain unanswered. Instead, many networks choose to invest only in their participant-onboarding processes and rely on the assumption that bad apples are blocked from joining the network in the first place. The reasoning goes that if a company has been properly vetted, they can be blindly trusted to submit high-quality data. Onboarding processes can take many forms, ranging from manual, even face-to-face processes, or state-of-the art unified digital identity processes (e.g. eIDAS). But even the most fine-grained identity check cannot ensure that the network will not be poisoned with inaccurate or even plainly false data. Monitoring and sanctioning mechanisms are essential to make sure that data quality is transparent and high. The remainder of this article will present good practice of monitoring and sanctioning processes.
Data monitoring mechanisms for B2B data sharing networks include complaints mechanisms, audit, random sampling and even monitoring of all data.
What actually constitutes false data is a separate topic: Networks need to decide which checks need to be passed for data to be declared valid. If benchmarking data is available this can be used to look for outliers. Other good practices can be borrowed from e-commerce, where businesses look for patterns that might indicate fraudulent payment behavior. For example, when a shopper is using multiple different IP addresses or is shopping from various geo locations or addresses, recurring too often within a short time-frame, or has a low solvency score, these could all be indicators of fraud.
Members of a data sharing network can flag potential violations to a centralised body, designated to handle complaints or disputes. The ISEAL Alliance, a global membership organisation for “ambitious, collaborative and transparent sustainability systems”, even mandates that all of its members have such a centralised complaints mechanism in place.
It is recommended that complaints can be submitted anonymously and free of charge: If the submission of complaints is not as frictionless as possible, the network might end up with a suspiciously clean record of zero complaints. This is the case, for example, for the Aluminium Stewardship Initiative. Depending on the nature of the complaint, complaint handling bodies can initiate a guided dialogue between the parties, which could be followed with a formal investigation if needed. In some cases, however, such as whistleblower scenarios, a dialogue might be counter effective.
Even though complaints mechanisms are usually thought to be a manual process, they can be re-imagined to some extent in a technical, automated manner: Network members could configure the bounds of expected data submissions for records they are familiar with. For example, in the case of a car manufacturing data network, member A would know the boundaries of the reasonable cost for manufacturing a certain type of engine. If Member B submitted a cost far lower than expected, member A could be notified and automatically or manually flag the suspicious data to a complaints body.
Networks can go a step further by instating auditing bodies. These check up on the members’ behaviour on the network and internal processes on fixed or random intervals. They could be scheduled (such as in the Aluminium Stewardship Initiative) or unannounced.
Similar to the unannounced audits, some data submissions could randomly be checked for veracity and accuracy, whereas the randomness is a parameter to be set by the network (for example: every one in ten submissions). This can be handled by a centralised body (Aluminium Stewardship Initiative), or by the network participants themselves (GAIA-X).
The most stringent, but also most resource-consuming option is to validate each individual datapoint submitted to the network.
There are not many networks which choose to centrally validate every data submission. The Carbon Energo.gov platform, a combined effort by the Republic of Kazakhstan and the World Bank, is an example. This online platform was launched to handle monitoring, reporting and verifying emission sources and greenhouse gases. It was launched in 2018, at which point there were only seven verification companies accredited in Kazakhstan, with five more in the accreditation process. Since then the project seems to have disappeared off the map.
Validating each data point is more feasible in a decentralised, technical setting. In this case, networks might need to accept a delay in transaction speed, since the validation of each transaction will cost additional computation power. For example,ReBloc is a blockchain-based data-sharing platform that focuses on the real estate market. ReBloc has introduced an interesting decentralised validation concept (however, it is not clear whether it is currently used in practice). They distinguish between the roles of data-sharing party (Enricher) and Validator. When an Enricher submits data to the network, this data is automatically pulled into a smart contract. At the same time, a quorum of Validators with access to similar data is selected. The smart contract automatically pulls the benchmarking data from the Validator databases, and compares the data points to see whether the submitted data falls within the expected pattern.
ReBloc works with monetary incentives: Both the Enricher and the Validator send a stake along with their data, which they are at risk of losing if the data or their validation-decision is deemed inaccurate. Enrichers are compensated for the data they sell, Validators stand to earn with their validation decisions.
In general, if data monitoring is decentralised, networks need to decide upfront whether it would be considered beneficial or harmful if members who are somehow involved, for example because they operate within the same value chain or are competitors, were to monitor each other.
B2B data sharing networks largely use one or more of the same sanctions to penalize members who send bad data. These are outlined below from least to most severe.
Reputation is known to be a powerful motivator. Reputation scoring has mostly been studied and applied in a B2C setting but has equal applicability in the B2B world, where it can function as a decentralised sanctioning method. According to a Brightlocal study, 75% of consumers say they trust a company more if it has positive reviews, whilst 60% state that negative reviews made them not want to use a business. Marketplaces such as AirBnB, eBay, Amazon, Uber, Glassdoor all rely heavily on reputation scoring mechanisms and have invested large amounts of research into getting the maximum value out of their reputation rules.
In B2B data sharing, reputation scoring could take many forms. Networks who follow the classical approach of letting members rate and review each other can draw inspiration from the tech unicorns. They need to make sure to set the parameters right: should reviews be open or blind, editable and or deletable, anonymous or identifiable, should the reviewee be allowed a response, etc. For blockchain based networks, each failed transaction would be logged on chain and made publicly visible to all members, resulting in a reputation mechanism of its own.
Members who have been found guilty of submitting low-quality data (and e.g. obtained a bad reputation score as a consequence), could be temporarily banned from the network as a second-degree sanction. Networks using certifications could withdraw the certificate from a Member, whilst allowing them to stay on the network (examples include Aluminium Stewardship Initiative and ISEAL Alliance).
Loss of membership is seen as the ultimate sanction by most networks. However, this approach is not feasible if a loss of any member significantly weakens the network as a whole.
Sanctioning can even be taken outside the boundaries of the network by relying on existing legal structures. Some networks provide Master Agreements to govern the relationship between data sender and recipient, which include relevant clauses regarding data usage limitations, security, confidentiality, privacy, and forcing the data sender to guarantee the accuracy and veracity of their data. If data is found to be purposefully false, the Agreement is violated, which constitutes a basis for legal action. This approach is recommended by the EU Commission in their Guidance on sharing private sector data in the European data economy and applied by the FOT-Net Automated Driving Data Sharing Framework.
Although there are some practical approaches to data validity monitoring and member sanctioning, it still remains a topic which is often overlooked by B2B data sharing networks. These networks should address several questions to decide on the best approach:
General
Monitoring
Sanctions
The answers to these questions can help guide the Network to the ideal approach to member accountability for the validity of data.