Risk (Data Privacy)
Paid vs “Free”
Most publicly available GenAI models have both “free” and paid or premium versions of their models available. As a rule, you should never use the “free” version of a GenAI model for anything related to your research. Whatever you input into the “free” version will be used to train the model. If you were to ask CoPilot to re-word the Conclusions chapter of your thesis, for instance, you are adding that chapter to the vastness of the ChatGPT training data, essentially giving your work to Microsoft and OpenAI for use without attribution to you. The cost of the “free” version is the information you put into it. It is also important to understand that the content used to train GenAI models may infringe on the IP rights of third parties or privacy laws.
UniSQ provides access to the premium version of CoPilot. The key benefit of the premium version is that the information you input is not used to train the model. You can, therefore, ask it to reword parts of your thesis without it being absorbed by the model.
Ethics Concerns
The responsible collection, handling, and storage of research data is of paramount importance in any research project and there are multiple reasons why the data in your research project might need to be kept safe from irresponsible use. Uploading your research data to any cloud-based GenAI model (paid or “free”) risks:
• The transfer of the data to another country;
• The use of the data to train the model you have used, or a subsequent model;
• The upload of data that could be used to identify research participants by a sufficiently powerful AI model, either now or in the future;
• The inclusion of your data in responses to prompts made by other users of the GenAI model;
• There is a growing number of organisations developing and implementing a growing number of AI models. As happens with time, ownership of those models will change and the Terms & Conditions associated with them will change too. You cannot assume that the promise of data security made to you this week will be kept into the future;
• Data that has become part of a GenAI model cannot be removed from that model. If your Ethics Clearance requires you to give participants the option to remove their data from the study, you will not be able to meet that obligation if the data has been integrated with the training dataset for the model.
Mitigation (Data Privacy)
GenAI can be an immensely powerful tool for data analysis and it is conceivable that there are/will be projects that cannot be realised without AI-powered analyses. The only way you can be assured that the privacy of the data analysed will not be compromised is to conduct that analysis using a model installed and run in an environment controlled by yourself or the 精东传媒app (i.e. a model run on the 精东传媒app’s HPC, noting that the current generation of 精东传媒app-supplied computers are not capable of running a local LLM effectively).
Risk (Bias)
The major GenAI models were trained on datasets dominated by Western languages, cultures, and perspectives. This introduces a bias towards a Western-centric view in the responses GenAI delivers. Some models also appear to have been deliberately set up to deliver outputs that align with the views of their company executives, the jurisdictions in which their owner organisations operate, or to obscure objective historical truths.
Mitigation (Bias)
Do not accept an AI output as objective truth. Think critically about what the model has delivered and remember that, even though it has access to a vast amount of knowledge, may interact with you in natural language and seem to have agency, it is only as good as the data it has been trained on and the parameters within which it has been set up to operate.