Using Generative AI in Research at UniSQ

Home
Research
...
Current HDR students
Using Generative AI in Research at UniSQ

Using Generative AI in Research at UniSQ

Generative AI (GenAI) has emerged as a powerful tool with the potential to enhance the capacity of researchers at all levels. Use of GenAI in research is not, however, without its risks. If a researcher chooses to use GenAI in their work, they are responsible for ensuring they do so in a manner that does not breach the (the Code) or the 精东传媒app’s .

All researchers at the 精东传媒app are expected to uphold the principles of the Code whilst undertaking research activities. Those principles are:

1. honesty in the development, undertaking and reporting of research
2. rigour in the development, undertaking and reporting of research
3. transparency in declaring interests and reporting research methodology, data and findings
4. fairness in the treatment of others
5. respect for research participants

The is designed to provide you with the knowledge to use AI responsibly, ethically, and with integrity as you progress in your program. It is not a comprehensive manual for the use of AI in research and does not provide explicit training in prompt engineering/similar. You'll find links to that in the "Other Resources" section.

Both students and supervisory team members must actively consider the responsible and appropriate use of AI throughout a student’s candidature. Whilst student will be responsible for the inclusion, application, and implications of AI in the final thesis and research outcomes, supervisors may also be held accountable as the stakeholders providing a final approval of a thesis for examination.

A UniSQ Thesis is required to include a declaration of AI use per the Sample texts for inclusion are provided in the Thesis Templates.

This advice helps mitigate all the risks below and provides context for any future review of your work, hence it appears first in this list. It is also vital to ensure transparency, helps with reproducibility, and mitigates the risk of the quality of your work being questioned. A lab-based researcher might keep a notebook/log of their laboratory work for similar reasons.

Every time you use a GenAI model for research purposes, record the:

• Date and time you used the model;
• The name and version of the model;
• The entity paying for the use of the model (see Bias and Data Privacy below);
• The prompt you used and any follow-up questions;
• The raw responses the model provided;
• How you used the response the model provided.

You may be required to disclose GenAI use when you submit your papers for publication, or your thesis. A detailed log will be vital to that disclosure.

The Graduate Research School recommends data log for all AI usage which should be included in a Data Management Plan. Under the , good data management aligns with a researcher’s responsibilities. A template for this record is available from AI use data log is available. Upon completion of an HDR program, the GRS will invite students to store all pertinent date in the RISE Research Data Repository.

Higher Degree by Research (HDR) students should maintain an open discussion about any GenAI use in their research with their HDR Supervisors. They can help you navigate this and other potential Research Integrity pitfalls.

Risk

GenAI models are “black boxes.” That is, there is no way to reliably know where they have drawn the text they generate from. It is entirely possible that, by including verbatim text generated by a GenAI model in a research output, you may be plagiarising another researcher or, debatably, the model itself.

Mitigation

You might use the AI-generated text to inspire your own description of what you are reporting or reviewing.
Example: There may be occasions when the AI model includes information in a summary that you had no idea existed. That has value (provided you verify the factual content).

Risk

Even the most advanced GenAI models “hallucinate” from time to time. You cannot trust that every fact within an AI output is accurate even when you provide the relevant information in your prompt. Multiple researchers across fields and institutions have, for instance, had GenAI generate a reference list for their own written work. That reference list has then been found to be either completely or partially made up and the researchers involved have had papers retracted, lost their employment, and suffered huge reputational damage.

Mitigation

Verify all information provided to you by GenAI yourself.

Risk (Data Privacy)

Paid vs “Free”

Most publicly available GenAI models have both “free” and paid or premium versions of their models available. As a rule, you should never use the “free” version of a GenAI model for anything related to your research. Whatever you input into the “free” version will be used to train the model. If you were to ask CoPilot to re-word the Conclusions chapter of your thesis, for instance, you are adding that chapter to the vastness of the ChatGPT training data, essentially giving your work to Microsoft and OpenAI for use without attribution to you. The cost of the “free” version is the information you put into it. It is also important to understand that the content used to train GenAI models may infringe on the IP rights of third parties or privacy laws.
UniSQ provides access to the premium version of CoPilot. The key benefit of the premium version is that the information you input is not used to train the model. You can, therefore, ask it to reword parts of your thesis without it being absorbed by the model.

Ethics Concerns

The responsible collection, handling, and storage of research data is of paramount importance in any research project and there are multiple reasons why the data in your research project might need to be kept safe from irresponsible use. Uploading your research data to any cloud-based GenAI model (paid or “free”) risks:

• The transfer of the data to another country;
• The use of the data to train the model you have used, or a subsequent model;
• The upload of data that could be used to identify research participants by a sufficiently powerful AI model, either now or in the future;
• The inclusion of your data in responses to prompts made by other users of the GenAI model;
• There is a growing number of organisations developing and implementing a growing number of AI models. As happens with time, ownership of those models will change and the Terms & Conditions associated with them will change too. You cannot assume that the promise of data security made to you this week will be kept into the future;
• Data that has become part of a GenAI model cannot be removed from that model. If your Ethics Clearance requires you to give participants the option to remove their data from the study, you will not be able to meet that obligation if the data has been integrated with the training dataset for the model.

Mitigation (Data Privacy)

GenAI can be an immensely powerful tool for data analysis and it is conceivable that there are/will be projects that cannot be realised without AI-powered analyses. The only way you can be assured that the privacy of the data analysed will not be compromised is to conduct that analysis using a model installed and run in an environment controlled by yourself or the 精东传媒app (i.e. a model run on the 精东传媒app’s HPC, noting that the current generation of 精东传媒app-supplied computers are not capable of running a local LLM effectively).

Risk (Bias)

The major GenAI models were trained on datasets dominated by Western languages, cultures, and perspectives. This introduces a bias towards a Western-centric view in the responses GenAI delivers. Some models also appear to have been deliberately set up to deliver outputs that align with the views of their company executives, the jurisdictions in which their owner organisations operate, or to obscure objective historical truths.

Mitigation (Bias)

Do not accept an AI output as objective truth. Think critically about what the model has delivered and remember that, even though it has access to a vast amount of knowledge, may interact with you in natural language and seem to have agency, it is only as good as the data it has been trained on and the parameters within which it has been set up to operate.