skip to main contentskip to main menuskip to footer Universität Bielefeld Play Search
skip breadcrumb navigation to main content

Generative AI tools

Background information for the use of generative AI tools in education and teaching

Campus der Universität Bielefeld
© Universität Bielefeld

Generative AI tools in education and teaching

Switch to main content of the section

Contact


														Dr. Benjamin Angerer
													 (Photo)

Dr. Benjamin Angerer

Digitales Lehren und Lernen in der Hochschuldidaktik

Telephone
+49 521 106-87940
Room
UHG A2-118

Please note: This page is still under construction!

This page provides educational and pedagogical material regarding the use of generative AI in education and teaching. Without diving to deeply into the technical details, we briefly summarise what you should know about this technology, its effects, and its shortcomings. Additionally, we provide you with recommendations for

  1. covering the topic of generative AI in your lectures or seminars (e.g. as part of a dialogue about good scientific practice and the admissibility of tools) and
  2. to help you answer the question of whether you might want to use generative AI tools in your teaching or your studies - and if so, how and what for.

First of all, the most important recommendation: Because of their underlying manner of operation, the outputs of generative AI systems and tools based on such systems always need to be checked for correctness and appropriateness by their users. If you're not able to perform such a check (e.g. because you're lacking the time or topical expertise to do so), we recommend against using generative AI technologies for that use case.

In other cases, it may be advisable to gain personal experience through experimentation in order to assess whether the use of generative AI systems is warranted for that particular use case.

To facilitate such experiments and many others, from 1 October 2024, Bielefeld University provides its students and academic staff with BIKI, the "Bielefeld AI Interface" ("KI" in German, hence the acronym). BIKI allows the use of the GPT language models from OpenAI as well as a number of open source language models run by GWDG, directly via the university's website. This has the advantage that there are no individual costs for you and you don't have to register with external service providers.

Background

The technology behind current text generation systems are so-called Large Language Models (LLMs). These are statistical models of very large text corpora that are able to make plausible continuations for a given text input. LLMs gain this ability through several steps: First, they are "trained" on billions of pieces of human-written text (scanned books, excerpts of the internet, etc.). In a further step, these models are adapted according to an extensive manual process of evaluating the (un)desirability of their outputs. As a more comfortable and familiar form of interaction, the user interface for such an LLM often takes the form of a dialogue-like chat interface, as opposed to simply 'continuing' the text.

The main property of LLMs -- which is underlying both their impressive abilities as well as their limitations and problems -- is that their outputs are statistically similar to the data they have been trained with. Assertions made by LLMs therefore often sound very plausible, but aren't necessarily correct. By the same token, biases present in the training data can be reproduced and thereby reinforced. This can include, for instance, the overrepresentation of certain languages and written cultures, or within academia, certain fields, methods, authors etc. Conversely, potential gaps and omissions in the training data can cause an LLM to produce unreliable outputs.

Because language models have no capacity for metacognition about what they 'know' and say, they are unable to recognise when they don't know an answer: They will always give an answer as if they know, and it's up to the user to say whether that answer was reasonable or not. By "fine-tuning" the models through human judgement as mentioned above, the manufacturers of LLMs also try to prevent the reproduction of toxic (e.g. racist, sexist or otherwise dehumanising) content such as it might be found in training data taken from the internet. This makes such outputs less likely, but is by no means always successful.

Awareness of the data with which LLMs are trained therefore plays a major role in assessing their capabilities and risks. Although nearly all manufacturers of LLMs keep the exact composition of their training data secret, as a rule of thumb it is helpful for users to think about how well or poorly, biased or diverse the subject area on which they are questioning an LLM tends to be represented on the internet as a whole.

When using generative AI tools (including BIKI), there is some legal information to be observed. In particular, we ask you not to enter/upload any personal or copyright-protected data into these systems. When using them in teaching and studying, you may also need to take into account aspects of examination law, for which we refer you to the relevant information page of Justiziariat Studium und Lehre (German).

The above-mentioned statement that the output of generative AI tools must always be checked for correctness and appropriateness is not only pragmatically necessary in order to use these tools sensibly, but is also derived directly from the obligation of our university and its members to good scientific practice. Bielefeld University's guidelines for safeguarding good scientific practice can be found here.

The conscious and unconscious attribution of human attributes to systems that can interact with their users in natural language is a phenomenon that has long been recognised in cognitive science and AI research. With the enhanced linguistic capabilities of current generative AI systems, these effects have become even more pronounced. This is already reflected in the vocabulary used to talk about these systems: They "learn" and "know", "the AI" "thinks" or even "hallucinates". Together with the fluent linguistic interaction typical of a generative AI chatbot, this type of language suggests to many users that it is - at least to some extent - an understanding, thinking agent. This in turn has consequences for the further assessment of these systems: a system that "hallucinates" has temporarily lost its touch with reality, but still has the capacity to be "in touch with reality" in principle. This aspect can even extend to legal issues: for example, a frequently cited justification for the use of copyrighted works in the training of LLMs is that they "read" the texts in a way way similar to how human authors do, without the latter being accused of copyright infringment for their writing being influenced by what they previously read.

Both, the training and operation of generative AI systems are very energy-demanding. For example, a query to ChatGPT consumes around six to ten times as much energy as a Google search. The current share of AI systems in the total energy consumption of data centres is estimated at 10-20%, with an upward trend (see here). Generative AI tools should therefore not be used without careful consideration. In particular, for tasks that can already be performed well and possibly more reliably with other tools (search engines, spell and grammar checking, translation, etc.), we recommend to use these tools instead. In this context, we also refer you to Bielefeld University's sustainability mission statement.

Education and Teaching

Used in a responsible and judicious manner, generative AI systems can be a versatile, complementary tool for education and teaching.

Depending on your level of expertise and the specifics of your subject, different applications of generative AI tools may or may not be suitable. What remains central is that you must be able to evaluate the content of the output, make your use transparent, and take responsibility for the content of the outputs you wish to continue working on. This is most likely to be possible with forms of use where you can simply discard outputs that you find inappropriate. This could, for instance, include the following:

  • Support with idea generation
  • Generation of clearly outlined text snippets or program code drafts
  • Paraphrasing or rewriting your own texts
  • Creating drafts for teaching material, e.g.
    • Repetitive exercise material in variations
    • Plausible incorrect answers for multiple choice tests
  • Feedback or corrections for your own texts
  • Additional help with literature research (as a source of inspiration, not as a database)
  • Raw translations

Regardless of whether you use generative AI tools in teaching yourself, we recommend that lecturers treat generative AI - with due brevity - as a subject matter of their teaching. Especially since Bielefeld University gives lecturers a great deal of freedom in regulating the specific use of generative AI tools, it is important for a successful course to ensure that your students understand the learning objectives of your course and to what extent generative AI tools could help or hinder them in reaching these objectives. Ideally, you should enter into dialogue with your students and give them the opportunity to exchange ideas. Proactively addressing generative AI in your course is the best way to prevent ambiguities and problems with the use of unauthorised tools after the course.

ZLL is happy to support you in addressing generative AI in your teaching with its current range of workshops and other events. Additionally, you are also very welcome to contact us at any time for individual support (zll@uni-biefeld.de).

For an assessment regarding the use of generative AI tools in terms of examination law, please refer to the website of the Justiziariat Studium und Lehre (German).

If you as a lecturer are wondering whether you should adapt your examination format, we recommend that you first test some of your previous exam questions or essay topics using BIKI to get a feel for what LLMs can and cannot do in your specific case. If you realise that the scope of questions that BIKI can handle well is too large, there are several options: The most radical "solution" is to switch to exam forms in which generative AI tools cannot really be used, such as oral exams or on-site closed-book exams. However, especially in writing-intensive fields and courses in which, for example, the independent writing of term papers is one of the essential skills to be taught, this is hardly an option. Instead, consideration should be given to how the existing form of examination can be adapted. If your examination regulations and module description allow this and you are not already working in this way, it may be helpful, for example, to move elements of term paper writing (topic identification, outlining, peer discussions, etc.) into your course sessions during the semester. This will give you an insight into the different stages of your students' work and you will be in a better position to assess whether the final submission is consistent.

To make it easier for you to double-check citations, we also recommend (if this is possible for sources commonly cited in your field) that you insist that students list a DOI for each citation. This makes it much easier for you to check (via copy & paste or by clicking) whether the citation exists and whether its content corresponds to what is claimed.

Common misconceptions

On the surface, you might think that search engines and generative AI chatbots perform the same function: they answer their users' queries. In the abstract, that's true. But in a more concrete sense, it's important to understand the differences: Search engines index existing content and web resources, but generative AI chatbots generate new text in response to each query, without an index entry that would allow a source to be reviewed. Although AI-generated text is statistically similar to what the system was trained on, this doesn't guarantee that every single response from a generative AI system is accurate. Furthermore, the index that a search engine searches is constantly updated, whereas language models can only generate their text responses based on the texts they have been trained on at some point in the past. For many information retrieval tasks, "classic" search engines are therefore still the recommended method for obtaining trustworthy and up-to-date knowledge.

The language models on which generative AI systems are based are trained with a specific data set, which the systems then uses to generatate their responses. Although these contain parts of the internet at the time of training, they cannot be updated later. Some generative AI tools offer the additional feature of retrieving websites at the time of being prompted and taking their content into account when generating a response text (that is, the language model is prompted with the user prompt plus the content of the website in question). However, not all generative AI tools offer this and it only works to a limited extent, especially for longer or interactive websites.

Although a generative AI system will often give an answer when prompted for sources or evidence, such an answer is not based on an actual look-up in corresponding databases or indices. Instead, as any other response, the text is generated by the underlying language model. Depending on the frequency with which the respective source occurs in the training data, this answer may turn out to be correct, but it may also be incorrect. In the case of rarely discussed or specialised publications, this is all the more likely. Some specialised AI tools are therefore linked to additional databases that can provide fairly reliable answers (still not fully reliable though), but many of these solutions are primarily geared towards English-language STEM publications.

Since AI-generated texts statistically mirror the texts with which the AI system was trained, they cannot be reliably distinguished from each other by AI systems that statistically analyse these texts. Although individual generative AI tools frequently have idiosyncracies in their outputs that another system could learn to recognise, these can change over time and the false negative and false positive rates of such "recognition tools" are very high. Particularly problematic is the fact that the language use of non-native speakers and linguistic minorities is often incorrectly labelled as AI-generated.

We therefore generally advise against the use of AI recognition tools.

 

 

Recent advances in generative AI technologies are often attributed primarily to the combination of the collection of large amounts of data on the Internet and the increased computing power of modern data centres. However, this overlooks the fact that modern AI also involves a great deal of manual labour: Not only were the texts used to train the language models written by humans, but the training datasets must also be collated by humans and sifted for inappropriate content. After this training step, the model is "fine-tuned" manually, with humans "training" the model not to give certain undesirable responses. The conditions under which click workers carry out this work also need to be mentioned.

Much of the pessimism and optimism about generative AI doesn't relate to what current systems can do, but more to what future systems are expected to be able to do. These expectations are based on the assumption that the problems of current systems will more or less solve themselves by "scaling" current technical approaches - i.e. by adding ever larger amounts of data and training ever larger models. From a technical perspective, there is currently no consensus among researchers as to how far the scaling of current methods will go. However, given that, for example, the collection and processing of ever-larger data sets isn't fully automated, but involves manual steps whose effort scales (at least) with the size of the data set, there are natural limits to such an approach. For example, it has been shown that larger datasets produced in the same way contain larger proportions of toxic content.

back to top