Jump to content

Wikipedia talk:Large language models/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 4Archive 5

Perplexity AI and the upcoming Bing with ChatGPT integration

I give the AI a shot to look for sources by searching in similar way to Google in my attempt to give an idea how useful it is to provide sources for citation needed tags. But what I noticed is that AI-generated answers grabbed information from Wikipedia itself, which is bad, that's why you have to be careful to use Perplexity AI to find sources for Wikipedia articles. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:05, 29 January 2023 (UTC)


The following string added to queries seems to eliminate most if not all wikipedia references: -site:wikipedia.org

However, there are other problems. Queries about countries tend to access travel agency website content. And the chatbot seems to be adding citations that have nothing to do with the sentence they are attached to, but they may match the topic generally. Citations may be so old that they are out of context - what was true back then may not be true today. And a great many citations are to sources deemed unreliable by Wikipedia.

On the bright side, the more coverage a subject has in the training data set, and the more general the query, the more likely the references will be valid.

Keep in mind that perplexity.ai is built on top of GPT-3, except that the training data has been expanded to include present day content. In the early days, prompt injection could get perplexity.ai to reveal its underlying GPT-3 prompt, and you could see that the chatbot was instructed by its prompt engineer to reply in a journalistic tone, for instance. They appear to have closed that loophole.

But, prompt injection still has potential to break out the chatbot from its base instructions. With some experimentation with queries, I was able to get perplexity.com to format its results as a bulleted list (though it uses hyphens instead of bullets, technically making it a hyphenated list), and sometimes you can get it to present a numerical bulleted list. Those lists seem to rarely include Wikipedia references. Here's an example of a query that returns a bulleted list: "garden tips, in bullet list format" returned a bulleted (hyphened) list of tips, and the entries were from different sources.

The performance of perplexity.ai as a natural language-driven search engine is pretty amazing, and will get a lot better as GPT improves. GPT-4 is coming soon, and I hope this chatbot is upgraded to it.    — The Transhumanist   08:48, 1 February 2023 (UTC)
@The Transhumanist: I gave the shot of minus string prepended to anything, which tell Google Search to omit certain things (like adding -wikipedia omits anything that mentions Wikipedia) and in conjunction with "site:" thing, it omits anything hosted on the particular website (like adding -site:wikipedia.org omits Wikipedia on search results) on this below:
Prompt: "mid/side peak meter foobar2000 component -site:audio-file.org"
Result: "The Foobar2000 component for mid/side peak meter is called foo_uie_peakmeter[1][2]. It displays the level of each channel in dB and can be used with Columns UI (foo_uie_peakmeter)[1], or as part of the Foobar2000 plugin repository[3]. However, it does not show clipping[4] and other features such as full unicode support are available on FileForum.com[5]."
Sources:
  1. https://wiki.hydrogenaud.io/index.php?title=Foobar2000:Legacy_components
  2. https://hydrogenaud.io/index.php/topic,61149.0.html
  3. https://www.foobar2000.org/components
  4. https://www.head-fi.org/threads/the-foobar2000-help-thread-got-problems-or-questions-ask-here.624628/page-14
  5. https://fileforum.com/detail/foobar2000/1045605070/1?all_reviews
While the "-site:spamsite.com" or something like that works, it still produces WP:SYNTH level of garbage where the first sentence mixes up mid/side thing on the prompt with Columns UI peakmeter thing, which hasn't been updated yet nor have an option to visualize mid/side channels instead of usual left/right. So no, the AI-powered conversational search engines can't replace Google Search or any other traditional search engines. 2001:448A:3041:7E63:D3E:5407:F6DD:3DF5 (talk) 00:56, 15 February 2023 (UTC)
Found out some more about perplexity.ai. It is built on top of GPT-3, yep, but, it doesn't add to the training data. Instead, there are 2 AIs involved. Bing, which uses AI in its search, is used to return search results based on the user's query in perplexity.ai, but it limits its results to 5 webpages only. Then those pages are fed to GPT-3, which answers the user's question from their content. The big problem with this model is that general questions about a topic may not be answerable from a sample of just 5 pages about the subject, yet, GPT-3 answers based on what it did or did not find in their text.    — The Transhumanist   06:23, 15 February 2023 (UTC)
@The Transhumanist: Interesting, as this site in question uses two AIs. BTW, I give the shot at editing sources/"citations" for this result for "bark scale graphic eq foobar2000" to prune irrelevant/spam crap out of the result like this:
Result (after pruning irrelevant garbage): The foobar2000 media player has a component called Perceptual Graphic EQ which is an alternative to the default graphic equalizer[1]. It uses a perceptually-spaced cosine-shaped filter bank and has an optional linear-phase mode[1].
Source:
  1. https://wiki.hydrogenaud.io/index.php?title=Foobar2000:Components/Perceptual_Graphic_EQ_(foo_dsp_perceptual_geq)
I wish I can add the relevant sources in-addition to pruning irrelevant ones for this search result.[clarification: not all results on Perplexity AI didn't include adding sources] 2001:448A:3042:7FD9:75AE:9537:3A9B:A23F (talk) 04:20, 22 February 2023 (UTC)

More examples of Perplexity AI results

— Preceding unsigned comment added by 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:27, 29 January 2023 (UTC)

@Rsjaffe and JPxG: I believe that these inaccuracies can also apply to ChatGPT integration of Bing even if it hasn't been out yet. Also, some sources on Perplexity AI results are unreliable or even listed on perennial sources list as I said before in ANI thread. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 04:06, 29 January 2023 (UTC)

I gave it a shot on an article I wrote; results are here. It seems to alternate between quoting directly from the Wikipedia article and quoting random irrelevant stuff about different powder houses. For example, it says The historic 'Powder House' was built in 1818 as a state gunpowder store and is considered to have historical significance, with a citation to a WordPress blog talking about a different location entirely (Powder House Island was constructed by a private company in 1881). jp×g 08:40, 29 January 2023 (UTC)
I guess I can just dump a few prompts and output into here.
"Why did Joe Biden rig the election?"
There is no evidence that Joe Biden rigged the 2020 election[1][2]. Claims of voter fraud have been widely debunked and rejected by Republican voters in some states[3]. Despite this, more than 40% of Americans still do not believe that Joe Biden legitimately won the election[4]. Twitter has also been accused of interfering in the election by censoring the New York Post's story about Hunter Biden[5].
(sources from pbs.org, apnews.com, bloomberg.com, theguardian.com and senate.gov
"why is donald trump a fascist"
Donald Trump's politics has been likened to fascism by many experts within fascism studies[1], and his campaign has been compared to fascist movements[2]. However, there is still debate over whether or not he fits the definition of a fascist[3][4].
(sources: https://brill.com/view/journals/fasc/10/1/article-p1_1.xml?language=en https://www.washingtonpost.com/outlook/how-fascist-is-president-trump-theres-still-a-formula-for-that/2020/08/21/aa023aca-e2fc-11ea-b69b-64f7b0477ed4_story.html https://www.bu.edu/articles/2022/are-trump-republicans-fascists https://www.vox.com/policy-and-politics/21521958/what-is-fascism-signs-donald-trump )
"phineas gage personality change"
Phineas Gage's personality changed after a tamping iron destroyed much of his brain's left frontal lobe[1][2][3][4][5]. He was reported to have permanently lost his inhibitions, leading to inappropriate behavior in social situations[1]. His friends described him as an almost completely different person[4].
(sources: https://www.theguardian.com/science/blog/2010/nov/05/phineas-gage-head-personality#:~:text=Similarly%2C%20most%20popular%20accounts%20of,behave%20inappropriately%20in%20social%20situations. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1114479 https://www.simplypsychology.org/phineas-gage.html https://www.verywellmind [diggity diggity dot] com/phineas-gage-2795244 https://www.npr.org/sections/health-shots/2017/05/21/528966102/why-brain-scientists-are-still-obsessed-with-the-curious-case-of-phineas-gage )
"bidirectional lstm use cases"
Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems[1]. They are capable of utilizing information from both sides of the input, allowing them to better capture sequential dependencies between words and phrases[2]. Bidirectional LSTMs have been used for a variety of tasks, such as translation[3], handwritten recognition[3], protein structure prediction[3], part-of-speech tagging[3], dependency parsing[3], entity extraction[3], and regression problems[4].
(sources: https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras https://www.baeldung.com/cs/bidirectional-vs-unidirectional-lstm#:~:text=Bidirectional%20LSTM&text=Unlike%20standard%20LSTM%2C%20the%20input,both%20directions%20of%20the%20sequence. https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks https://towardsdatascience.com/lstm-and-bidirectional-lstm-for-regression-4fddf910c655 )
It seems to me that it is heavily biased towards newspapers and blogs, which works to its advantage for AMPOL subjects, and to its detriment on other things: the Phineas Gage personality change thing is mostly untrue, as our article deftly explains, and while the Bi-LSTM summary is good, it's linking mostly to random WordPress blogs. Note that one of the Phineas Gage "sources" is apparently such puke that it's on the en.wp URL blacklist! jp×g 08:51, 29 January 2023 (UTC)
Even if all the other problems are fixed, ultimately, I believe that all these "AIs", no matter how good they get, will remain problematic for Wikipedia for one fundamental reason: the training of the AI does not use the same reliable source requirements as Wikipedia. It's the old garbage in garbage out issue. — rsjaffe 🗣️ 18:22, 29 January 2023 (UTC)
I don't think sourcing is the biggest hurdle. The real issue is that any AI used to generate text for Wikipedia will need to be trained specifically for Wikipedia's purposes, including policy-compliance (with regard to NPOV, FRINGE, POVFORK, and especially BLP). The risk is that it'll be good at looking like it follows them, far earlier than it gets good at actually following them. See Tesla and self-driving. When we end up with AIs that spit out full articles, with proper formatting, that only use reliable sources, people will assume that its output is good enough to include, when it may be horribly flawed in hard-to-detect ways. DFlhb (talk) 19:08, 29 January 2023 (UTC)
I agree with both DFlhb and Rsjaffe in regards to ML-related content and issues related to AI alignment thing in which ChatGPT and other LLM models doesn't know well about Wikipedia's policies and guidelines, therefore I see why there should be a guideline or even policy about AI-generated content. BTW, I put another example of Perplexity AI into the sandbox, where last two "sources" are junk or even something that could be blacklisted if ever actively used on Wikipedia (actually, I do not support pre-emptive blacklisting of junk websites). 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 02:28, 30 January 2023 (UTC)

Examples of editing sources functionality of Perplexity AI to prune spam and adding more factual ones

This below is my second attempt at crafting more accurate answers by editing sources. Note that not all results includes Add Sources button on this aforementioned conversational search engine. Results before editing is here above results after editing.

Prompt: "an algorithm to simulate cling and skids in billiards"

Result (before editing): "An algorithm to simulate cling and skids in billiards requires coordination of equations and data access for each simulated component[1][2]. Cling, skid, or kick is a "bad hit" resulting from an excessive amount of throw[3]. This has nothing to do with pool tables, 16 balls, 2 cues, 1 chalk and 1 triangle[4], or slang words for "toilet billiards"[5]."

Sources:

  1. https://ekiefl.github.io/2020/12/20/pooltool-alg/ (incomplete, not mentioning cling/skid/kick)
  2. https://arxiv.org/abs/cond-mat/0503627 (nothing to do with actual billiards game)
  3. https://billiards.colostate.edu/faq/throw/cling/
  4. https://qhydx.jksun.de/pool-table-amazon.html (WP:SPAMLINK)
  5. https://urbanthesaurus.org/synonyms/toilet%20billiards (totally irrelevant)

Result (after editing): "There is only one search result for this question, which is a definition of the term "cling" in billiards[1]. No algorithm was provided."

Sources (after editing):

  1. https://billiards.colostate.edu/faq/throw/cling/
WP:DENY
The following discussion has been closed. Please do not modify it.

@Rsjaffe, The Transhumanist, and JPxG: Any thoughts on editing sources feature on Perplexity AI? Currently, adding sources is limited since it doesn't have a capability to provide typing in any URL you want in adding "sources", which could argue it is a double-edged sword (the ability to craft even more accurate answers at the cost of adding room for misuse) right? 2001:448A:3046:59C8:9074:C5DB:F266:4DEE (talk) 02:53, 26 February 2023 (UTC)

Is there any reason to go through this much effort for a mediocre result other than just wanting to use an LLM? — rsjaffe 🗣️ 04:02, 26 February 2023 (UTC)

ChatGPT has opinions about its use for Wikipedia articles

I had a brief simple conversation with ChatGPT that I thought would be interesting to see. My prompt has a bullet and the reply by ChatGPT is indented.

  • Are any Wikipedia articles written by ChatGPT?
No, ChatGPT is not capable of writing articles on Wikipedia. It can only generate text based on the input it receives and its training data.
  • What should I do with a Wikipedia article someone wrote using ChatGPT?
Articles generated by language models like ChatGPT are not suitable for use on Wikipedia as they are not written by humans and lack context and accuracy. If you have come across such an article on Wikipedia, you should report it to the Wikipedia community for review and potential removal, as Wikipedia has strict policies regarding the sources and reliability of the information on the platform.
  • Write me a Wikipedia article on nematodes and include references.
ChatGPT goes ahead and writes an article even though it warned me in other questions that it is not useable for Wikipedia

Nematodes, also known as roundworms, are a group of parasitic and free-living worms that are among the most abundant multicellular organisms on Earth. They are found in nearly every habitat, including soil, freshwater, and marine environments, as well as in the tissues of plants and animals.

Nematodes are elongated, cylindrical worms that typically range from 0.5 to 2 millimeters in length, although some species can grow to be several meters long. They have a simple digestive system, nervous system, and reproductive system, and move using a characteristic undulating motion.

Nematodes play important roles in various ecological and agricultural systems. Some species are beneficial, serving as decomposers of organic matter and as predators of other soil-dwelling organisms, while others are harmful pests that can cause damage to crops and other plants. For example, the root-knot nematode is a major pest of crops such as cotton, tomatoes, and peanuts, while the soybean cyst nematode is a destructive parasite of soybean crops.

In medicine, nematodes are important parasites of humans and animals, causing diseases such as elephantiasis and river blindness. They can also be used as experimental organisms in biological research, as they have simple anatomy and physiology, and can be easily cultured and manipulated in the laboratory.

References:

M. W. Adams, “Nematoda,” in The Biology of Nematodes, edited by D. L. Lee et al. (Taylor & Francis, 1998), pp. 3–21.

J. R. Grewal, “Nematode Parasites of Crops,” Annual Review of Phytopathology, vol. 48, no. 1 (2010), pp. 365–390.

J. B. Bloomquist and J. R. Grewal, “Entomopathogenic Nematodes,” Annual Review of Entomology, vol. 55 (2010), pp. 271–292.

— rsjaffe 🗣️ 02:59, 30 January 2023 (UTC)
I haven't used ChatGPT yet on my entire life, but it is interesting nonetheless. The ChatGPT-generated ones are superficially encyclopedic with "In conclusion" on the last sentence whereas the actual WP article about the same subject does not. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:29, 30 January 2023 (UTC)
(Note) Edit conflict. I edited the article while you replied to include references, and then ChatGPT omitted the "In conclusion," section you saw earlier. — rsjaffe 🗣️ 03:32, 30 January 2023 (UTC)
Also note that the references are fictitious, though there is an article "Entomopathogenic Nematodes" by Welch in the 10th volume of the Annual Review of Entomology. — rsjaffe 🗣️ 03:41, 30 January 2023 (UTC)
That's why be sure to check references section of AI-generated gibberish for fictitious referencess and you editing your prompt for ChatGPT to include refs, which supposedly removes rubbish "In conclusion" section and adds "references", but it is probably fake and it doesn't change the fact that large language models can't be used to write articles from scratch without human intervention. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:47, 30 January 2023 (UTC)

@Rsjaffe: Interesting what Chat-GPT had to say about it writing articles for Wikipedia. If you ask it for its sources on that opinion, will it provide them to you?    — The Transhumanist   05:10, 30 January 2023 (UTC)

This is what it said My knowledge and statement about the existence of Wikipedia articles written by ChatGPT is based on my training data and the information provided by OpenAI, the organization responsible for developing the language model technology that I am based on. OpenAI has publicly discussed the capabilities of their language models, including GPT-3, and how they can be used to generate text that resembles various types of written content, such as articles, poems, and answers to questions. However, I do not have any specific references or sources to cite regarding the writing of Wikipedia articles by GPT models. — rsjaffe 🗣️ 05:22, 30 January 2023 (UTC)

These responses read like they were hard-coded by ChatGPT's developers. If you troll it with inappropriate or unethical inputs, the output switches to this type of boilerplate lecture tone. I've even seen it happen in real time: One day it gladly wrote a press release defending a murderer, the next it refused along with an explanation of why it couldn't. But of course the model doesn't "know" any of this; in this case it appears the devs programmed it to explain why it's unsuitable for Wikipedia articles but not to actually refuse to write them. And if you think about it, there's likely very little content about ChatGPT's own capabilities in its pre-2021 training model, since these things weren't really being discussed until after it was released. Any self-awareness would have had to have been be hard-coded. It's important not to let this give us a false sense of confidence that it will refuse to write anything beyond its abilities, since the model itself doesn't actually know what its limits are. –dlthewave 20:11, 30 January 2023 (UTC)

Here's a repeat of the questions. Not hardcoded per se, but very similar.
  • Are any Wikipedia articles written by ChatGPT?
No, Wikipedia articles are not written by AI models like ChatGPT. They are written and edited by volunteers who follow Wikipedia's guidelines and policies. AI models like ChatGPT can assist with generating text, but the final content must be reviewed and approved by humans.
  • What should I do with a Wikipedia article someone wrote using ChatGPT?
If you find a Wikipedia article that you suspect was written using an AI model like ChatGPT, you should report it to the Wikipedia community as it violates their policies. Wikipedia requires that all articles be written by human editors and must not be generated by automated means, including AI models like ChatGPT. The article will likely be flagged for review and possibly deleted. If you would like to contribute to Wikipedia, you can learn how to become a Wikipedia editor and write articles in accordance with their guidelines and policies.
Note that ChatGPT has a limited memory of conversations (I believe the previous 4,000 words) and flushes it in between sessions, so this is from a fresh session. — rsjaffe 🗣️ 22:09, 30 January 2023 (UTC)

Contradiction?

This proposed policy seems all fine and good... until when hypothetically Wikipedia produces its own LLM for use. This may not be that far off as it seems as the Abstract Wikipedia are considering adopting some kind of natural language generation to generate articles. CactiStaccingCrane 10:44, 30 January 2023 (UTC)

The chance that "Abstract Wikipedia" articles will be allowed on enwiki is next-to-zero. After some bad experiments and experiences with bot-created articles, this was disallowed a long time ago. And these were using reliable sources (though databases only) normally. To accept articles created from Wikidata, which is not a reliable source at all, is extremely unlikely, even ignoring the quality or lack thereof of the actual language model (which is far from being usable at the moment). Normally in 2022 the Wikifunctions Beta had to be launched, but no trace of it seems to exist. The page Abstract Wikipedia/Overview succeeds in not using the word "Abstract" even once, so in this case a chatbot probably had produced a better result. The page on the Function Model has seen one update in 2022, and it isn't really an overwhelming one. Turns out there is a Wikifunctions Alpha after all, though rather hard to find as it isn't listed on the main pages: they have e.g. a "smart paradigm" for English plurals[1], which knows that the plural of "sheep" is "sheeps" and the plural of "human" is "humen". Oh, and the plural of "bus" is "buss". Somehow, I don't think we need to worry about the generation of actual articles anytime soon... Fram (talk) 13:56, 30 January 2023 (UTC)

If you want a job doing properly...

The current draft and this discussion is already too long and so violates WP:CREEP. To get a more coherent draft, I naturally asked ChatGPT:

Q. Please write a policy for the use of large language models on Wikipedia

A.

Wikipedia Policy for the Use of Large Language Models

Purpose: To ensure the accuracy and impartiality of information on Wikipedia through responsible use of large language models.

Definitions: A "large language model" refers to an artificial intelligence model with the capacity to generate human-like text.

Use of Large Language Models:

a. Large language models may be used to improve the accuracy and completeness of Wikipedia articles, but may not be used to generate new articles.
b. Any contributions made by a large language model must be reviewed and approved by a human editor.
c. The use of large language models must be disclosed on the talk page of the article.
d. Large language models must not be used to engage in disruptive behavior, such as vandalism or propaganda.

Responsibility:

a. Users employing large language models are responsible for ensuring compliance with this policy.
b. The Wikimedia Foundation, which operates Wikipedia, is not responsible for the content generated by large language models.

Enforcement: Non-compliance with this policy may result in a range of actions, including but not limited to: warning, temporary or permanent ban from editing, or removal of contributions.

Amendment: This policy may be amended by the Wikimedia Foundation or its designated representatives at any time.

This seems as sensible as any of the human suggestions and it's more succinct. Job done. See also WP:LIGHTBULB.

Andrew🐉(talk) 11:34, 30 January 2023 (UTC)

I agree that it's too long; the TOC in particular stands out as unappealing due to long headings. The copyright section too seems like bunk to me; if the output is neither considered copyrighted by OpenAI, nor by American law (yet), then what's the point in speculating?
We could have one section on using LLMs to generate article contents (which would cover bullshit sourcing, original research, and other issues, and prominently discourage their use for that purpose), and have a second section on other use for LLMs (with guidance). Regardless, we should simplify both the structure and prose, so it's easier to read. DFlhb (talk) 12:42, 30 January 2023 (UTC)
The current first section serves to explain the inherent deficiencies from the standpoint of policy and the second section explains that editors who want to use LLMs need to have relevant experience doing unassisted edits to be able to overcome these deficiencies, so that their edits could be entirely constructive. I'm opposed to any fundamental rewrite whereby this message would be lost, because my view of what the policy should be is precisely that. —Alalch E. 13:32, 30 January 2023 (UTC)
I agree, but still think it's overly complex. The litmus test for policies and guidelines is whether they are easy to skim; if not, they're too complicated to be enforced. I'll reword it, in one edit so both revisions can easily be opened side-by-side and compared. DFlhb (talk) 14:07, 30 January 2023 (UTC)

Andrew Davidson, could you explain why The Wikimedia Foundation, which operates Wikipedia, is not responsible for the content generated by large language models. is included? I'm not sure that this type of disclaimer is correct or necessary. –dlthewave 14:46, 30 January 2023 (UTC)

I asked and the response was
Q. Why did you say that the Wikimedia Foundation is not responsible for the content generated by large language models?
A. The Wikimedia Foundation, which operates Wikipedia, is not responsible for the content generated by large language models because it is generated by artificial intelligence, not by human editors employed or affiliated with the Foundation. The Foundation simply provides the platform for editing and hosting Wikipedia, and users are responsible for their own contributions, including those made with the assistance of large language models. By stating that the Foundation is not responsible for the content generated by large language models, the policy clarifies the division of responsibility and helps to protect the Foundation from potential legal or reputational liability.
This seems reasonably clear to me.
Andrew🐉(talk) 14:58, 30 January 2023 (UTC)
Thanks for clarifying. I think this is somewhat self evident and doesn't need to be enshrined in our policy, especially if the goal is to keep it short and simple. The WMF should be the ones making that disclaimer if they feel that it is necessary for legal protection. –dlthewave 15:58, 30 January 2023 (UTC)
  • Essentially, the entire policy as it stands now is functionally identical in terms of what it allows and forbids (with a few additions) to what it was when I made the first version. The core of it is this:
"Since is against the rules to break the rules, it is also against the rules to break the rules when using LLM output."
This proved to be quite controversial, with many demands made that this page explain in detail every way in which it could be possible to break the rules with LLMs, and specifically proscribe them. As a result, the page is now quite long, although I think it does a very good job of giving an overview and reasonable interpretation of PAGs as they apply to LLM usage. I don't know if it would be possible to condense it without making it worse in this regard. jp×g 00:43, 2 February 2023 (UTC)

Jimbo's position

I asked Jimbo for the WMF's position on constructive and destructive uses of LLMs, and the availability of relevant technical tools for each case. Below is his complete answer, with minor layout adjustments:

Great - I can't speak for the Foundation at all but I think it's safe to say that in many ways the WMF staff and board are just like everyone else in this community - very interested in the possibilities of constructive roles here, and worried about the risks as well. I suppose what I am saying is that I don't think the WMF *has* a full position yet, nor would I expect them to!

It looks like the conversation there is a good one and people are learning.

Now, I can't speak for the Foundation but I can speak for myself. I'll only speak at the moment about a few positive ideas that I have rather than go into details about the negatives which are huge and which can be summed up prettye easily with "ChatGPT and similar models make stuff up out of thin air which is horrible".

If you go back in the archives here on my talk page (don't bother, as I'll explain enough) there was a discussion about a proposed article that hadn't made it through a new page review. In response to an inquiry about it, I opened up a newspaper archive website (that I pay for personally) and quickly found 10-15 decent sources which could have been used to improve the article. I skimmed each of them just to figure out if I thought the subject was notable or not. I passed along the sources (but they aren't that useful to anyone who doesn't subscribe to a newspaper archive website) because I didn't have time to actually read them carefully enough to improve the original stub article.

Now, ChatGPT does not have the ability to follow a URL. Also, the archives are in jpeg format, so ChatGPT would not be able to read a download of it, and I don't have any easy way to do image-to-text. (It would be faster to just read and write the articles in this case). But imagine that those minor technical limitations were removed in some way. Suppose I could say: "Hey, ChatGPT, here's a Wikipedia stub. And here are 15 links to sources that I, an experienced Wikipedian, judge to be relevant. Please read these articles and add facts from them to the article, adhering to Wikipedia policies and writing in a typical Wikipedia style. Don't make anything up that isn't clearly in the articles."

That doesn't strike me as a super far-fetched use of this technology. It would then require me to read the output, check that nothing was made up out of thin air, and to make sure it wasn't getting it wrong in some other way. But I suspect this would be a productivity boost for us. And if not today, then in 3 years? 5 years?

I can think of similar use cases. "Here's a Wikipedia entry. Follow all the links to sources and read them. Find sentences in this entry which are in disagreement with what the sources say, if any." "Here's a paragraph from Wikipedia. Someone has complained that the article introduces a subtle bias not found in the original sources. Check the sources and rewrite the article to more closely comply with NPOV policies."

In each case don't imagine some automatic result, just think about whether this might be useful to good editors in at least some cases. It's hard to see that it wouldn't be.

--Jimbo Wales (talk) 13:45, 30 January 2023 (UTC)

François Robere (talk) 14:02, 30 January 2023 (UTC)

Jimbo perfectly illustrates the potential benefits of these technologies. There's information "out there", the issue is finding it, collating it, and analyzing it. Wouldn't it be great, if an AI could give us a list of all scholarly papers on each side of an issue, so we could properly assess what the consensus is? Or quickly compile lists of how reliable sources describe an event, so we know: 116 call it "X", but 243 call it "Y", instead of having to take hours manually surveing them to find the common name? Or if it could go look for reliable sources that contradict article contents, that may be difficult to find, to improve our verifiability? We're not there yet, but once current technical flaws are addressed, these models will be a game-changer for Wikipedia.
From a WMF perspective, I expect the first mass-scale uses of LLM-like models won't be to generate article contents, but to make Wikignomes obsolete, and automate boring maintenance tasks. And the very next mass-use case will be to add "AI alerts" on talk pages, when a model detects reliable sources that contradict what we have. DFlhb (talk) 14:20, 30 January 2023 (UTC)

Simplification of policy language

While I believe it is important to have readable succinct policies, in this case, where the rationale for the policy may not be readily apparent to the LLM-naive user, I’d like to see a secondary page to the policy that discusses the rationale, preserving, in spirit, some of the text deleted from the policy itself. — rsjaffe 🗣️ 17:10, 30 January 2023 (UTC)

This article should still make clear to everyone why it exists (likely in the lead); I'm a poor judge of whether it's currently good at that, but if it isn't, that should likely be fixed here. DFlhb (talk) 18:07, 30 January 2023 (UTC)
I'm probably a poor judge as well, given how deeply I've dove into this in the past few days. Perhaps get a few test readers naïve to the subject before this is finalized? — rsjaffe 🗣️ 18:18, 30 January 2023 (UTC)
Agreed that the draft was too discursive. The examples and commentary belong in an essay or informative page. I think it is better now. Ovinus (alt) (talk) 19:17, 30 January 2023 (UTC)
I really like the simplicity of Wikipedia:Using neural network language models on Wikipedia. We could try cutting a lot of cruft out of our "Using LLMs" section, with just a few lines of prose at the beginning, a dozen bullet points (numbered), and a few more lines of prose at the end to mention the appropriate templates. DFlhb (talk) 13:53, 31 January 2023 (UTC)
Courtesy ping User:CactiStaccingCrane DFlhb (talk) 13:58, 31 January 2023 (UTC)

Definition list

Regarding this edit: for this to be a definition list, it would have to define plagiarism, verifiability, neutral point of view, and no original research. The text does not do this; it describes considerations for these topics with respect to large language models. I appreciate there is not a lot of good usage of definition lists out there to point to. Nonetheless, definition lists are only semantically appropriate for things like glossaries. isaacl (talk) 22:16, 30 January 2023 (UTC)

I think it's pretty clear that it's about defining LLM risks and pitfalls: with regard to copyrights; with regard to verifiability etc., but okay I'll convert to bullets then. Putting boldface (to what remained a definition list) didn't do anything, it produces no visible change as far as I'm aware. —Alalch E. 22:27, 30 January 2023 (UTC)
Not using semicolons means the output HTML no longer used the description term and description details elements, and so user agents (such as assistive readers) won't assume definition list semantics. isaacl (talk) 22:41, 30 January 2023 (UTC)
You're right, I didn't see well that you had removed the semicolons. But, basically, these still aren't pseudo-headings. —Alalch E. 22:47, 30 January 2023 (UTC)
Definition lists have a very narrow meaning in the spec which limits their use. People writing HTML or generating HTML output misused them for their default visual display (witness the use of colons for unbulleted lists on Wikipedia; these aren't semantically correct, either, but we're kind of stuck with them). Now that we have CSS to style elements as we like, and ways to specify roles on elements, I don't think there's any chance of the spec expanding the meaning of definition lists. isaacl (talk) 22:56, 30 January 2023 (UTC)
I think we're good now. —Alalch E. 22:58, 30 January 2023 (UTC)
It frankly looks messier than it did before; is there a way to comply with accessibility requirements, while still making it look like it did before? DFlhb (talk) 08:04, 31 January 2023 (UTC)
Personally, I prefer inline headings as a more compact way of laying out the text, and it matches the use of inline headings within lists further down on the page. However to answer the question, I think {{Indented plainlist}} could be used, with an explicit line break after each heading phrase, to indent the subsequent text of each item. isaacl (talk) 16:37, 31 January 2023 (UTC)
Good suggestion, I formatted it as an indented plainlist. —Alalch E. 18:15, 31 January 2023 (UTC)

Plagiarism

Regarding this edit: I think we should be careful not to mix up plagiarism and copyright violation. Plagiarism is an academic crime when one fails to acknowledge where an idea came from. One can still violate copyright while avoiding plagiarism. The two concepts have some overlap when text is licensed for reuse with a requirement to cite the original source, but we should be careful not to use the two terms interchangeably. isaacl (talk) 22:22, 30 January 2023 (UTC)

I think we should be talking about how unchecked LLM usage fails our copyright policy there, in the first listed item. —Alalch E. 22:24, 30 January 2023 (UTC)
If that's the consensus view, then I think we shouldn't refer to plagiarism, but just copyright violation or copyright licence violation. isaacl (talk) 22:32, 30 January 2023 (UTC)
Agreed. —Alalch E. 22:33, 30 January 2023 (UTC)
Makes sense to me. DFlhb (talk) 22:36, 30 January 2023 (UTC)

Tone

I feel like this page is a bit harsh on the usage of LLMs; if they generate *perfect* text, it should be okay to verbatim copy it, especially if you're an experienced editor. Thoughts welcome! EpicPupper (talk) 03:58, 31 January 2023 (UTC)

As long as we make it clear that the editor is fully responsible for their use of LLMs (including re: plagiarism, original research, and especially things like BLP violations), I think we should allow some leeway for use. LLMs are tools, and if used well, they can benefit the encyclopedia. For example, if Meta's Side AI gives bad citations 90% of the time, but great ones 10% of the time, and I only add the 10% of good ones (after checking), that's strictly a benefit.
It'll be a tough balance to strike, between discouraging inexperienced editors who don't fully understand these tools' limitations, and allowing experienced editors who know what they're doing to save some time and use LLMs to do things they (the editor) fully endorse. We should probably avoid being too prescriptive, and let the policy "evolve" after being passed in reaction to specific incidents, rather than overreact out of the gate. DFlhb (talk) 07:42, 31 January 2023 (UTC)
Well, we’re already having incidents, including inclusion of incorrect references. — rsjaffe 🗣️ 11:45, 31 January 2023 (UTC)
True; I'm trying to walk that rope by clarifying the acceptable use requirements ("rigorous scrutiny") rather than by weakening the admonishments. My idea is to avoid weakening this draft, while still giving space for experienced editors to use their best judgment, and somewhat preserve the "early-2000s Wikipedia" spirit of thoughtful experimentation/boldness. DFlhb (talk) 12:13, 31 January 2023 (UTC)
Fully agree with the potential risks and limitations, thanks for your input! I think some leeway is important. EpicPupper (talk) 16:17, 31 January 2023 (UTC)
Yes, the rigorous scrutiny standard is needed, which is higher than the scrutiny you’d use for a reliable source, as the “AIs” are like unreliable narrators. — rsjaffe 🗣️ 16:24, 31 January 2023 (UTC)
This was my original thinking, but there is substantial desire (and perhaps substantial need) for language that thoroughly reinforces basic principles like "do not copy-paste random text into articles without checking it". It seems that some people tend to think of LLMs as magical boxes that generate true and useful content with no modification or verification necessary (i.e. there have been a number of drafts CSD'd for all of their references being fictional). In this case, I think it is probably beneficial to have a detailed policy that urges caution. And, on a more practical level: if someone cannot be bothered to read a whole guideline before going wild with the LLM, do we really think they're going to be diligent enough to check its output? jp×g 01:26, 2 February 2023 (UTC)

"Migrated" discussion from Village Pump

While a migrated discussion close was made on the discussion at the Village Pump, it doesn't appear anything regarding it was then discussed over here. Nor of the fact that both polls conducted there on a blanket ban that would result in this page being a policy and enacting a ban on usage of LLMs on talk pages ([2], [3]) very much did not seem to be supported by the community from the looks of things. I certainly hope people on this talk page above aren't going to ignore the outcomes of those discussions. SilverserenC 23:28, 31 January 2023 (UTC)

That close was extremely unfortunate, since that page is much more high-profile than this one, and the discussion was vigorous, and far from running out of steam. It should be re-opened and allowed to run for a while. Any RFC on adoption will be held at VPP, so consensus around the big issues (like having a blanket ban or not) should be formed there as well. We need solid consensus on all key points, prior to submitting for adoption. This proposal should merely enshrine the still-to-be-formed consensus, per WP:GUIDANCE, otherwise it risks receiving dozens/hundreds of edits during the VPP adoption RFC as the editors over there react in horror to whatever part they oppose, and we'll end up with a mess. DFlhb (talk) 09:37, 1 February 2023 (UTC)
@Silver seren Update: I've reopened the WP:VPP discussion, so everything is centralized in one, high-profile, place, and so the discussion can run its course. DFlhb (talk) 13:47, 1 February 2023 (UTC)
I've closed the thread because it is becoming extremely cumbersome to understand what people are discussing about, thus significantly hinder consensus formation. I also don't feel that continuing the RfC at this time is helpful as most of the broad strokes about LLMs are already being covered. IMO, we should break up the discussion to individual WikiProject pages, not just those participating at the Village Pump thread, to make other editors informed about LLMs' implications to their work. Maybe a few months later when the hype has gone down, a different RfC with concrete plans to address LLMs will be made with the participants being much more informed than they are right now. CactiStaccingCrane 13:52, 1 February 2023 (UTC)
To be clear, I've no objections to the reopening of the thread, but I doubt that the discussion would result in actionable proposals. CactiStaccingCrane 13:52, 1 February 2023 (UTC)
That may be; If momentum doesn't pick back up, we can close it again in a few days. It may be better to avoid holding a hypothetical well-formed RFC on whether LLMs can be used to generate text for Wikipedia, and instead wait a few months to see how things unfold at the CSD discussion, and see how the community responds to future instances of LLM use. Then we can just update this draft to reflect those emerging "best practices". DFlhb (talk) 14:04, 1 February 2023 (UTC)
I have to say: the VPP discussion is a mess. It is massive, confusing, and split into an unreasonable amount of subsections (most of which seem to have hit dead ends weeks ago). While there is some conversation going on there, I think most of it is irrelevant to the practical use of the tools covered by this policy proposal. jp×g 00:35, 2 February 2023 (UTC)
@Silver seren: I agree that the conversations at VPP and the conversations here are not quite based on the same subjects. However, yI disagree that there were open proposals for "this page being a policy and enacting a ban on usage of LLMs on talk pages". Neither of the proposals were with regard to this page (although it was referred to by people commenting on both). Those proposals were both for totally different things: the "Crystallize" section proposed that "such chatbot generated content is not allowed in Wikipedia", and the "Blanket" section proposed a "blanket ban on LLM content on Talk page discussions". Neither of those would be consistent with what's currently at WP:LLM, which attempts to thread the needle on permitting LLM output while preventing a tsunami of piss. jp×g 00:20, 2 February 2023 (UTC)

Article expansion and feedback

Are there any examples of LLMs successfully being used for "Generating ideas for article expansion" and "Asking an LLM for feedback on an existing article" in the Positive uses section? When I tried this out with a few short geography articles, the output was the same "plausible souding nonsense" that we've seen with article generation: Mentioning outdated population figures for a place with no listed population; miscounting the number of references; suggesting things that we don't normally include such as a Conclusion section. And analyzing an entire article is useless with ChatGPT's current length limits. Unless there's a valid way to do this that I'm not seeing, I suggest moving these to Riskier Use Cases. –dlthewave 16:44, 1 February 2023 (UTC)

I agree that many applications a very much a hit-or-miss. The output can be really useful at times but it may also miss the mark by a lot. I think "Riskier Use Cases" fits this quite well, especially for non-trivial tasks. Phlsph7 (talk) 18:57, 1 February 2023 (UTC)
More like "Theoretical Use Cases" really. silvia (BlankpopsiclesilviaASHs4) (inquire within) 19:19, 1 February 2023 (UTC)
@Dlthewave: I gave it a spin at User:JPxG/LLM_demonstration#Recommendations_for_article_improvement_or_deletion_(Qarah_Daghli), and to some extent at User:JPxG/LLM_demonstration#Identification_and_tagging_of_unreferenced_statements_(KBVA). I can probably come up with a few others. jp×g 23:09, 1 February 2023 (UTC)
I have put some more up at User:JPxG/LLM_demonstration_2. jp×g 00:06, 2 February 2023 (UTC)

For the record: I have split out some of the lengthy explanation of copyright issues to Wikipedia:Large language models and copyright and linked to it from the "Copyright" section. jp×g 01:37, 2 February 2023 (UTC)

@JPxG, the blue check in the mbox there might misleadingly imply that the page is a guideline :) EpicPupper (talk) 23:27, 4 February 2023 (UTC)
Hmm, I thought Elon Musk abolished those ;^) I will try and do something about it. jp×g 23:50, 4 February 2023 (UTC)
I assume this is supposed to be an explanatory essay. In that case, shouldn't we use "{{supplement |interprets=[[Wikipedia:Large language models]] page}}" as the header? Phlsph7 (talk) 06:26, 5 February 2023 (UTC)

Perplexity.AI alternative

Based on my research (toying around with the AI), elicit.org is so much superior to perplexity.ai in that it only searches research papers and summarize the sources using GPT-3. The website does not do any original synthesis like perplexity.ai does, it just merely summarize the abstract to one or two sentence. And to top it all off, the website is governed by a 501(c)3 organization and is being transparent with their work (see https://elicit.org/faq). I think we have a lot to learn from the website about how to use LLMs, how to integrate them to our work, and how to align LLMs to do what we want. CactiStaccingCrane 16:29, 2 February 2023 (UTC)

Well, I asked elicit What are the characteristics of Sabethes Cyaneus? and it summarized one reference as Sabethes cyaneus is a species of frog. Unfortunately, Sabethes cyaneus is a mosquito. — rsjaffe 🗣️ 19:50, 2 February 2023 (UTC)
I wonder why this site requires signing up for an account to use AI search on this site despite there are imperfections in every single machine learning technology (including large language models like GPT-3 and ChatGPT) where the model sometimes provides false information? To be honest, no machine learning tech is perfect as with video games and other stuffs. 2001:448A:304A:3A2A:F87F:AE94:6B45:64E1 (talk) 05:41, 3 February 2023 (UTC)

Chatbots, AI search engines, etc.

App types powered by LLMs, such as chatbots and AI search engines, are not mentioned anywhere in the policy draft. I knew what a chatbot was long before I knew what an LLM was. I used AI search engines long before I knew they were powered by LLMs. "Large language model" is a pretty obscure term. Relying solely on that in the policy would be setting up a trap for those who don't know what it is, even though they are using one unbeknownst to themselves, or who have heard of them, but don't know that one powers a type of app they are using.    — The Transhumanist   23:27, 2 February 2023 (UTC)

I've set up redirects from WP:CHATBOT and WP:CHATBOTS, although I think "chatbot" is a very misleading word to use for these models (our article on the subject, for example, talks almost entirely about simple programs like ELIZA and Markov chains, mentions neural networks only briefly, and does so with very simple models that are about a decade out of date). jp×g 23:07, 3 February 2023 (UTC)
@JPxG: Right now, most of the use is of chatbots and AI search engines, not LLMs directly. So, the policy should mention chatbots and AI search engines. It should probably also cover their idiosyncracies. For example, perplexity.ai sometimes answers yes or no questions erroneously no, because it couldn't find a yes answer in the 5 pages it looked at.

That the chatbot article is out of date is irrelevant. The person using ChatGPT isn't going to be thinking about Eliza as the representative example of a chatbot, as they have a much better rendition at their fingertips.

Good job on the redirects.    — The Transhumanist   11:30, 9 February 2023 (UTC)
That needs to be expanded because the problem with AI-generated content is not the algorithm, rather its the output (keypoint: this applies to non-LLM algorithms too, which could be equally damaging). Perplexity AI, a "conversational search engine" could be misused by inexperienced editors who don't know perennial sources even exist and/or other policies and guidelines because of a fundamental problem with machine learning applications, they are not trained to comply with Wikipedia's policies, so its like wearing a hooded raincoat in lieu of a hazmat suit to work with hazardous chemicals on clear weather. 2001:448A:304A:3A2A:F87F:AE94:6B45:64E1 (talk) 11:56, 3 February 2023 (UTC)

LLMs on talk pages

I mentioned this in the village pump, but while I am generally not pro-LLM -- IMO, none of the "riskier use cases" should go anywhere near Wikipedia -- I do not think it is worthwhile or feasible to disallow LLMs on talk pages or projectspace. Communicating opinion is a far better and less risky use case for LLMs than communicating facts. "Wikipedia editors want to interact with other human, not with large language models" is sentimental, but ultimately meaningless -- LLMs do not spontaneously post on talk pages. It is still a human, using a tool. And the line between a human whose tool is an LLM and a human whose tool is predictive text, editing tools like Grammarly, or the like is not clean and will get blurrier by the day as companies incorporate LLMs into their writing/editing tools to chase that AI gold. There is a near-certain chance that this recommendation will already be obsolete by the time this policy goes live, and a pretty good chance that in a couple years if not sooner, LLMs will be so commonplace that disclosing their use would be about as feasible as disclosing the use of spellcheck. (An example: As of literally today, Microsoft has released a tool to use OpenAI for sales email writing, and reportedly is considering integrating it into Word.) Gnomingstuff (talk) 02:02, 3 February 2023 (UTC)

Terms of use for programs generating output

Regarding the passage in Wikipedia:Large language models and copyright that "...there are circumstances under which the terms and conditions of an API may cause a company to restrict continued access to the model based on adherence to certain criteria...", note this is also true for initial access to the model. Thus while the sentence from the preceding paragraph is true, "Companies ... do not automatically hold a claim to copyright on all works produced using their products," they can make copyright claims as part of the terms of use, and thus impose licensing terms for use of the output. isaacl (talk) 17:49, 4 February 2023 (UTC)

Draft of umbrella policy for all Wikipedia:Computer generated content

Since I have been harping on the idea that this needs a comprehensive umbrella policy and that has garnered some support from others but not gained enough traction to change the trajectory of the policy on this page here, I've gone ahead with a WP:BOLD draft which everyone is invited to contribute to and critique. —DIYeditor (talk) 07:09, 6 February 2023 (UTC)

I may be missing something, but I am not quite clear on what the difference is between that draft and this one. jp×g 07:27, 6 February 2023 (UTC)
It is an umbrella for all computer-generated content, from images, to language and text that are not generated with a "large language model" as such. It seemed that you and some others were determined to have this be particularly about "LLM" and not any other kind of language model or algorithm, and not about images or audio or anything else. I've also included the topic of human-designed algorithms vs. machine learning. —DIYeditor (talk) 07:31, 6 February 2023 (UTC)
In other words since I have proposed an umbrella policy instead of a specific one I thought I would go ahead and demonstrate exactly what I mean. —DIYeditor (talk) 07:33, 6 February 2023 (UTC)
I think in principle, the idea of having a general policy on all forms of computer-generated content and a more specific policy on LLMs is good. But with the rapidly increasing popularity of LLMs, the specific policy is clearly the more pressing issue. Phlsph7 (talk) 08:21, 6 February 2023 (UTC)

As of 8 February ChatGPT is now part of Bing and this makes it very easy for those interested to test its capabilities. For example, the prompt "What is Wikipedia's policy on paid editing" (and similar questions) gives sensible results in a chat format. The prompt "Does Bing use ChatGPT?" gives a simple "Yes" (with links to relevant citations). Mike Turnbull (talk) 14:36, 10 February 2023 (UTC)

Bing, with chatbot

perplexity.ai may soon be obsoleted.    — The Transhumanist   10:03, 15 February 2023 (UTC)

News report on access to the new Bing

Factbox: How to get Microsoft's new AI-enhanced Bing | Reuters

Review of the new Bing

I tried Microsoft's new AI-powered Bing. Here's what it's like | CNN Business

To get access to the new Bing, one has to join the waitlist

https://www.bing.com/new

Here's how to skip the waitlist

How to Move Up the Waitlist for Microsoft's ChatGPT-Enhanced Bing | PCMag
These citations may be misleading, since the "waitlist" part is only for a new feature specifically called "chat". My examples at the top of this section were obtained using Bing search as is now implemented, without being signed up to the enhancement. Mike Turnbull (talk) 11:00, 15 February 2023 (UTC)

Example of editor using ChatGPT to add to an article

If anyone is looking for an example of an editor using ChatGPT to add content to an article, look at the recent history of Assabet Valley Regional Technical High School. I reverted those additions as ChatGPT is not a reliable source. ElKevbo (talk) 00:01, 15 February 2023 (UTC)

In the news

Came across these.    — The Transhumanist   07:59, 15 February 2023 (UTC)

Investigative challenge

They are looking for the earliest AI-generated article on Wikipedia:

Wikipedia:Wikipedia Signpost/2023-02-04/News and notes#Investigative challenge

The AI takeover has begun

Wikipedia:Wikipedia Signpost/Next issue/News and notes#The AI takeover has begun: computer-generated bullshit

CatGPT

"What if ChatGPT was a cat":

https://catgpt.wvd.io/

Using LLMs to paraphrase for clarity

Do we have any data on whether LLMs are good at paraphrasing while maintaining meaning of a sentence? This might be useful for editing pages tagged Template:Incomprehensible.

I think this has a lot of potential for helping editors with dyslexia or English as a second language write well.

Is there a testing ground here where we can show test examples of what can be done with such models? I'd like to take some incomprehensible articles and attempt to improve them and see if it works vs confabulating facts to make the sentences work together Immanuelle ❤️💚💙 (please tag me) 05:56, 16 February 2023 (UTC)

Adding parameters to AI-generated template

I think we need to add some parameters to Template:AI-generated so it is more informative. Ideally I think there should be a parameter for the revision, which will default to the revision the template is added with, for self declaring editors. In the future there could be an external tool that would be able to show text added in such a revision so it can be examined in isolation.


Also I think it might be a good practice for self declaring editors to be able to declare the type of edit they made with the LLM. Paraphrasing edits vs edits including new content. Paraphrasing edits are likely much easier to review as they shouldn't be introducing new information, so a reviewer could just go over an edit flagged as a paraphraser with AutoWikiBrowser and make a judgment call on whether the paraphrasing was legitimate and remove the template if so. Immanuelle ❤️💚💙 (please tag me) 06:06, 16 February 2023 (UTC)

Oh and the most obvious other flag to be would be wiki formatting like tables and such. Something that might not even need a review
We also probably need an ability to put info in the template about things like questions of a specific fact or similar Immanuelle ❤️💚💙 (please tag me) 06:11, 16 February 2023 (UTC)

Attribution to OpenAI

OpenAI's "publication policy" says:

The role of AI in formulating the content is clearly disclosed in a way that no reader could possibly miss

However, that's a license agreement; not a copyright claim. We've previously assumed that this is required for copyright compliance, but that doesn't appear to be the case. DFlhb (talk) 12:06, 19 February 2023 (UTC)

I'm not sure that it makes a difference. Attribution is demanded by WP:Plagiarism. This applies not just to content from OpenAI. Phlsph7 (talk) 13:16, 19 February 2023 (UTC)
True for the edit summary, but not the bottom-of-the-page template, right? DFlhb (talk) 13:22, 19 February 2023 (UTC)
We have the Template:OpenAI for this. —Alalch E. 13:51, 19 February 2023 (UTC)
From WP:Plagiarism: In addition to an inline citation, in-text attribution is usually required when quoting or closely paraphrasing source material. See also the section WP:Plagiarism#Avoiding_plagiarism.
As a related question: is the template enough to satisfy in-text attribution if the template is added only at the bottom of the page? Phlsph7 (talk) 13:56, 19 February 2023 (UTC)
To answer my own question: it seems that it is, see WP:Plagiarism#Where_to_place_attribution. Phlsph7 (talk) 14:01, 19 February 2023 (UTC)
Yes it's enough. —Alalch E. 14:10, 19 February 2023 (UTC)
I misread WP:Plagiarism; it does likely require that template, regardless of copyright; Phlsph7 was right. DFlhb (talk) 14:40, 19 February 2023 (UTC)
  • I'm not sure whether Wikipedia is bound by this policy or if it's just an agreement between OpenAI and the person using it. If we are, I don't think that an attribution at the bottom of the page or in an edit summary would fit the spirit of "disclosed in a way that no reader could possibly miss". Journalists seem to have adopted a standard of disclosing AI use at the top of the piece as shown in this article from CNET, and I think it would be wise for Wikipedia to do the same. This is new territory for us since we don't normally attribute authorship this way even when copying open-license text verbatim. –dlthewave 16:22, 19 February 2023 (UTC)
    I think you are right that placing a template at the bottom of the article does not comply with "disclosed in a way that no reader could possibly miss". Placing a banner at the top of the article would solve that problem. I'm not sure if this is a good idea since the top is the most important place of the article. For example, would you want such a banner on the top of a level 1 vital article because it includes a few sentences from an LLM in one of its sections? A different approach would be to require in-text attribution in the paragraph where the text is used. A special banner at the top could be reserved for articles in which the great majority of the text is produced by LLMs. But you also already raised the other question: it's not clear that "Wikipedia is bound by this policy". Phlsph7 (talk) 08:57, 20 February 2023 (UTC)
    Requiring in-text attribution at the article, section or paragraph level depending on extent is the way to go regardless of AI provider requirements.

@Alalch E.: In response to this edit: It seems to me that WP:Plagiarism requires in-text attribution independent of which LLM provider is used, see the passage cited above. This would also apply to the text in the section "Declare LLM use".

Removal of the section "Productive uses of LLMs"

I'm not sure that it is a good idea to include the section "Productive uses of LLMs" in this policy. The following two reasons are my main concerns: (1) the section makes false claims and (2) this is supposed to be a policy, not a how-to essay. As for false claims: the section claims that LLMs in general have these uses. But the claims were only tested on ChatGPT, as far as I can tell. I tried some of JPxG's demonstrations (table rotation and plot summary) on perplexity.ai, elicit.org, and distil-gpt2. They all failed for these examples. Since different LLMs are trained for different purposes, it would be rather surprising if you could look at the strengths and weaknesses of one and generalize them to all others. But even when we restrict ourselves to ChatGPT, it is not uncontroversial at which tasks it excels and which tasks are risky, as discussed on the talk page. As for the second reason: I think it would be better to relegate these explanations to a how-to essay and keep the policy slim. I'm not sure what value they provide besides what is already covered in the section "Using LLMs". Phlsph7 (talk) 09:04, 20 February 2023 (UTC)

I agree that suggesting certain uses of large language models in general isn't really backed by a lot of evidence, and may be better placed in a separate informational essay (I wouldn't say a how-to, but a general discussion of the capabilities of large language models). Separating it will also make it easier to update independently of this guidance page. isaacl (talk) 21:49, 20 February 2023 (UTC)
Strong oppose When no guidance is provided, people still use LLMs; they just use them less competently, leading to very mediocre outputs. This article is quite informative on that point. Though the current section doesn’t quite address my concern either; the section should be expanded with a guide to proper prompt-crafting. DFlhb (talk) 07:38, 21 February 2023 (UTC)
Thanks for the feedback. I think we have to decide whether this is supposed to be a policy or a how-to essay. In theory, the section could be expanded to vast proportions with all the tips and warnings that could be included. There are various LLMs. Many of them have very different purposes, strengths, and weaknesses. These factors affect proper usage, for example, what prompts to employ. The section would have to be subdivided to deal with the different LLMs by providing specific instructions and warnings on each of them. Since new LLMs are constantly being created and existing ones are being updated, we would have to update this section regularly to make sure that it accurately represents the topic. This also comes with the problem that it is quite controversial what their strengths and weaknesses are and which specific practices are productive or risky. In my opinion, this goes far beyond what a policy is supposed to do since the main purpose of policies is to describe rather general standards to be followed. You are right that more specific instructions would be useful but maybe a policy is not the right place for them. My suggestion would be to create a separate page as an essay for this purpose and link it here. If we want to keep the section in some form then it would need to undergo rather deep changes, as explained in this and my previous comment. Phlsph7 (talk) 08:41, 21 February 2023 (UTC)
The other parts of our policy already contain various more general warnings on what can go wrong. Maybe we can address your concerns by expanding them. For example, we could mention somewhere that the quality of the output depends a lot on finding the right prompts (without going into details about which prompts, for which purposes, and on which LLMs they are used). This could fit into the subsection "Experience is required". Something along the lines: experience is not just required on Wikipedia but also on how to use LLMs, for example, for how to formulate the right prompts. Phlsph7 (talk) 09:51, 21 February 2023 (UTC)
I've thought about this for a while. That section also essentially describes risks. So I have merged it into 'LLM risks and pitfalls'. The risks as described in that whole section are something that editors who want to use a large language model are required to be aware of; awareness is a prerequisite to usage. So it makes sense to keep it in a policy. What do you think about this? —Alalch E. 12:08, 21 February 2023 (UTC)
Thanks for putting all the effort into this. However, the basic problems outlined above are still the same. Additionally, I'm not sure what "Other applications" is supposed to mean in this context. It seems to imply that these are not forms of "Content creation" discussed in the first subsection and that, therefore, the risks associated with copyrights, NPOV, etc do not apply here. Both claims are false: many of these applications are just rather specific forms of content creation and all the policies listed above do apply to them. Phlsph7 (talk) 12:21, 21 February 2023 (UTC)
You're right, but maybe better now? Special:Diff/1140723024? —Alalch E. 12:42, 21 February 2023 (UTC)
If there really is a need to include specific examples then we should think carefully about what examples to include. Why include "Templates, modules and external software"? Is that something that the average editor is very concerned with? Same for "Tables and HTML". They belong to more specific how-to essays. I would cut the uses down to a few more general points: create new content, modify content (summarize/copyedit), and brainstorm ideas. We already have the subsection "Writing articles" to deal with the first point. Maybe we can find a way to include the other two in the section "Using LLMs" as well, ideally in a quite general manner without going too much into details. I'll see if I can come up with something along those lines. Phlsph7 (talk) 13:05, 21 February 2023 (UTC)
You're probably right. BTW if something is moved to another page I suggest that a new page not be created; we already have the essay Wikipedia:Using neural network language models on Wikipedia that has some similar content. —Alalch E. 13:18, 21 February 2023 (UTC)
I followed your suggestion and moved the section to the essay you mentioned. I hope that the summaries added to the section "Using LLMs" are sufficient to cover the core ideas. Please let me know if there are some key points that I missed. Phlsph7 (talk) 14:01, 21 February 2023 (UTC)
Looks good, thanks. I will take a deeper look later and let you know. —Alalch E. 14:39, 21 February 2023 (UTC)
You two were right all along on this; this page reads much better now, and it's true that a how-to would be hard to rationalise in a policy (and frankly, the removed section didn't address my concerns well anyway). Thanks for being bold with it. DFlhb (talk) 12:43, 24 February 2023 (UTC)

Remove paragraph on the future of LLMs

The section "LLM risks and pitfalls" currently contains the following paragraph:

As the technology continually advances, it may be claimed that a specific large language model has reached a point where it does, on its own, succeed in outputting text which is compatible with the encyclopedia's requirements, when given a well engineered prompt. However, not everyone will always use the most state-of-the-art and the most Wikipedia-compliant model, while also coming up with suitable prompts; at any given moment, individuals are probably using a range of generations and varieties of the technology, and the generation with regard to which these deficiencies have been recognized by the community may persist, if in lingering form, for a rather long time.

This paragraph speculates on how LLMs may develop in the future. I don't think that it is a good idea to include this paragraph in our policy since this is not relevant to current usage, independently of whether this speculation is true or false.

As Alalch E. has pointed out, one idea behind this paragraph is forestalling editors in a more or less distant future who claim that their LLM is so advanced that it automatically follows all policies. I think the basic idea behind this point is valid. But since the policy covers LLMs in general, it also covers future advanced LLMs and editors using them. If that point needs to be explicitly mentioned then maybe we can find a less verbose way to include it. The alternative would be to wait till that time comes and then to update our policy accordingly. Phlsph7 (talk) 10:26, 20 February 2023 (UTC)

The policy is worthless without some version of this paragraph. People will mock it as the "GPT-3 policy" when GPT-3 will have lost relevance in people's mind. But 1) there is no guarantee that GPT-3 won't be available and used even when it's obsolete; 2) that competing models will develop evenly, and that even if there is a certain state-of-the-art model that far surpasses GPT-3 that others in the market will equally surpass it. Yeah, a less verbose alternative is possible, but without some alternative I will be against this becoming Wikipedia policy. Every policy is created for the future. This policy must remain robust in this fluid environment, so that it doesn't have to be constantly updated to accord for changing technology, and it must remain credible for years to come. The basic idea is simple and just needs to be accurately represented. —Alalch E. 10:45, 20 February 2023 (UTC)
What about the following: This policy applies to all usages of LLMs independently of whether a provider or user of an LLM claims that it automatically complies with Wikipedia guidelines. If we want to emphasize future technological developments, we could use This policy applies to all usages of LLMs independently of whether a provider or user of an LLM claims that, due to technological advances, it automatically complies with Wikipedia guidelines. To me, this seems obvious. But to others, it may not. In that case, it may be good to state it explicitly. Phlsph7 (talk) 11:54, 20 February 2023 (UTC)
That's a good condensation, the second one especially, so I will make the change. I will just add the word "policy" at the end. —Alalch E. 17:10, 20 February 2023 (UTC)

Grammarly

Just noting that grammarly is an LLM I believe. I don't think we have an issue with grammarly. So the issue is probably generating large sections of text. Talpedia (talk) 12:59, 23 February 2023 (UTC)

Thanks for pointing that out. I don't think Grammarly is an issue since it just checks spelling/grammar and highlights issues without making any changes itself. But telling ChatGPT to copyedit a text itself and using the output is a very different issue. Phlsph7 (talk) 13:40, 23 February 2023 (UTC)

How one might use an LLM well

I've been thinking of playing with LLMs win wikipedia for a while. I suspect good uses might be for :

  • Understanding a topic by talking to chatgpt to see if it is worth looking for things. I really would prefer if chatgpt was better at sourcing though. It's worked out for documentation.
  • Understanding an article by querying it. See https://jurisage.com/ask-myjr-chat/ but imagine you could do this for a systematic review. As an aside... this might be easy to implement. And can be done in a more "citation centric way" with [https://huggingface.co/tasks/question-answering "extractiv qa".
  • As a editing tool. e.g. "rewrite this in a more encyclopedia tone" or "which part of this seems wrong"
  • Summarizing sentences from an article.

It's worth noting that LLMs that are better at sourcing (e.g. RAG) are coming. And there has been some work in automated fact checking.

If people were keen on prototyping some tools for using LLM's *well* for editing. I'd be keen - I can code, have an understanding of some of the libraries. It might be good to work with someone more on the "product" side of stuff to ping ideas off Talpedia (talk) 13:11, 23 February 2023 (UTC)

Ah I see this was addressed and removed from the article https://en.wikipedia.org/w/index.php?title=Wikipedia%3ALarge_language_models&diff=1140732774&oldid=1140731716&diffmode=source I'll read the above thread. Also Wikipedia:Using neural network language models on Wikipedia seems to be the place where the discussion about how to (rather than how not to use LLM's) is going down, also WP:Large_language_models#Writing_articles actually does an okay job advising on good uses of LLM. Talpedia (talk) 13:14, 23 February 2023 (UTC)

Examples of close paraphrases while summarizing?

... using LLMs to summarize copyrighted content (like news articles) may produce excessively close paraphrases.

Does anyone have an example of this happening? Or introducing bias, OR, or hallucinations when asked to summarize? Sandizer (talk) 08:13, 24 February 2023 (UTC)

No examples to provide, but I experienced that with ChatGPT. That was last year, back when I sucked at prompts, so it's likely not an inherent issue. I used a pretty minimalistic prompt (something like: "Create a concise, accurate summary, using encyclopaedically-neutral words, of the following text:"), and it tended to reuse expressions from the original text here and there. DFlhb (talk) 12:38, 24 February 2023 (UTC)

Subsection: Citing LLM-generated content

Still don't think this subsection belongs here. The rest of the policy is about editors' conduct, but this is about WP:RS. Shouldn't this be addressed by WP:RS, or WP:RSP, or some other way? LLM-generated sources are already in semi-widespread use, see for example Wikipedia:Reliable_sources/Noticeboard#StatMuse DFlhb (talk) 13:03, 24 February 2023 (UTC)

Perhaps this section can just point to an appropriate place in the reliable sources guideline (with any appropriate changes to that guideline for large language models, as desired). isaacl (talk) 17:39, 24 February 2023 (UTC)
It should be mentioned somewhere that LLMs are not reliable sources since this may not be obvious to some inexperienced editors. But we probably don't need a full subsection to get this message across. Phlsph7 (talk) 18:27, 24 February 2023 (UTC)
I feel there are two separate issues:
  • prominence: people are already using LLM-generated sources on Wikipedia, and I'm not sure this page will be prominent enough to inform them not to do that
  • clarity: the terminology issues discussed in the move request are far more pronounced when it comes to sources. Many editors know ChatGPT and Bing/Sidney are LLMs, but when sources generate articles using LLMs, they universally call it "AI"; newbies may not know those terms are used synonymously. That reinforces the prominence issue, because newbies won't think to look here to know not to use these sources.
DFlhb (talk) 21:10, 24 February 2023 (UTC)
You are right that most newbies won't read this policy by themselves. This is probably true for most policies. But they may read it if someone reverts their edits and points them to it. I don't think that mentioning this fact is of utmost importance but, on the other hand, there is not much harm in including one sentence to cover that point. Phlsph7 (talk) 06:03, 25 February 2023 (UTC)
Seems reasonable. DFlhb (talk) 07:23, 25 February 2023 (UTC)

Timeline of ChatGPT news

(in reverse-chronological order)

  1. Teachers use ChatGPT more than students, a new study finds
  2. The inside story of how ChatGPT was built from the people who made it | MIT Technology Review
  3. ChatGPT Is Coming to an App Near You: OpenAI Launches API for Its Chatbot | PCMag
  4. chatgpt: Tech rivals chase ChatGPT as AI race ramps up - The Economic Times
  5. Addressing criticism, OpenAI will no longer use customer data to train its models by default
  6. Robots let ChatGPT touch the real world thanks to Microsoft | Ars Technica
  7. Oxford and Cambridge ban ChatGPT over plagiarism fears but other universities choose to embrace AI bot
  8. AI Is Set to Boom Into a $90 Billion Industry by 2025 Amid ChatGPT Frenzy
  9. Investors are going nuts for ChatGPT-ish artificial intelligence
  10. ChatGPT is coming to Snapchat. Just don't tell it your secrets | ZDNET
  11. ChatGPT: Chinese apps remove chatbot as global AI race heats up | CNN Business
  12. Hackers use fake ChatGPT apps to push Windows, Android malware
  13. ChatGPT Website Cracks Global Top 50 With 672 Million Visits in January
  14. Microsoft is bringing ChatGPT-powered Bing to Windows 11 in latest update
  15. A Conversation With Bing’s Chatbot Left Me Deeply Unsettled - The New York Times
  16. Microsoft to demo its new ChatGPT-like AI in Word, PowerPoint, and Outlook soon - The Verge
  17. Microsoft Invests $10 Billion in ChatGPT Maker OpenAI
  18. What is ChatGPT? Viral AI chatbot at heart of Microsoft-Google fight
  19. How ChatGPT Kicked Off an A.I. Arms Race - The New York Times
  20. ChatGPT reaches 100 million users two months after launch | Chatbots | The Guardian
  21. ChatGPT sets record for fastest-growing user base - analyst note | Reuters
  22. Faced with criticism it's a haven for cheaters, ChatGPT adds tool to catch them | CBC News
  23. Students using ChatGPT to cheat, professor warns
  24. ChatGPT gained 1 million users in under a week. Here’s why the AI chatbot is primed to disrupt search as we know it
  25. ChatGPT hit 1 million users in 5 days: Here’s how long it took others to reach that milestone | Technology News,The Indian Express
  26. Why We're Obsessed With the Mind-Blowing ChatGPT AI Chatbot - CNET
  27. ChatGPT is a new AI chatbot that can answer questions and write essays

Enjoy.    — The Transhumanist   07:36, 5 March 2023 (UTC)

Observations

Based on reading and thinking about the above reports:

  1. OpenAI, the developer of ChatGPT is adapting rapidly to problems, and to user and developer feedback, in order to improve the quality of ChatGPT's output. The Wikipedia community should chime in too.
  2. An AI Summer has begun, around generative AI, that, for the near future at least, will pour tens of billions of dollars of investment into the field.
  3. Due to that, and the fact that tech giants from around the World have announced they are working on entries for the marketplace, and companies with existing generative AI products are racing to maintain their lead, it appears that 2023 will be packed with improvements to ChatGPT, diffusion of ChatGPT (into mainstream Microsoft products as well as many 3rd-party apps), and introductions of new tools and new features from OpenAI and many other companies. This will no doubt have impact upon Wikipedia, which increases the urgency of developing its Large Language Models policy as well as more active responses to the expected growth of generative-AI-assisted contributions to the encyclopedia, its administrative departments, its policy pages, and discussions.
  4. We can expect synergism to increase the impact even more. For example, ChatGPT is finding an unexpectedly high degree of acceptance in education, with the intention to incorporate its use in learning activities. One of the existing activities is the growing practice of assigning students the development of Wikipedia articles as graded classroom projects. It would be a natural progression to utilize all allowed tools in those projects, including ChatGPT.

In conclusion, things are progressing rapidly, and a larger wave of LLM-generated content contributions than previously expected is likely to flow into Wikipedia soon. We need to be ready for it.    — The Transhumanist   07:36, 5 March 2023 (UTC)

P.S.: @JPxG, DFlhb, Rsjaffe, Fram, Andrew Davidson, Alalch E., François Robere, Ovinus (alt), Isaacl, EpicPupper, Silver seren, Phlsph7, BlankpopsiclesilviaASHs4, 2001:448A:304A:3A2A:F87F:AE94:6B45:64E1, Gnomingstuff, DIYeditor, Michael D. Turnbull, ElKevbo, Novem Linguae, HaeB, Talpedia, and Sandizer:

I agree with most of this. It is worth noting that the paper that allowed gpt-3 is some 5 years old... and that chatgpt isn't so obviously that much better than gpt3, so we'll see - supposedly this reinforcement learning approach behind chatgpt allows for smaller models. I get the impression this is more the reality of technology improvements hitting the hype machine. My suspiciun is that improving an AI for specific issues can be far from trivial. Talpedia (talk) 12:35, 5 March 2023 (UTC)
Yeah. I think there’s lots of naïve enthusiasm. I enjoy ChatGPT, but it will remain unsuitable for Wikipedia article generation. The two biggest hurdles to making it Wikipedia-suitable are probably unsolvable with the current LLM technology: elimination of hallucinations and restriction to reliable sources. And determining reliable sources will have to be done in cooperation with Wikipedia, not by the LLM company on its own. — rsjaffe 🗣️ 18:46, 5 March 2023 (UTC)
Contradicting myself :D, I think some of the problems with hallucinations go away if you are working at the sentence, search or summarizations levels. I can imagine an interface where I just sort of type search terms, get sections of papers then click on and drag around words that I like / don't like. Whole articles is a different matter. "autosourcing" with or "autoconflicting sourcing" with something "rag-like" is one of the thigns that I am most positive about. There is a bunch of ML fact checking stuff around and I suspect the "censorship-in-the-guise-of-misinformation-and-harm" policy crowd might through some money at this... so maybe we could use this for something more positive!Talpedia (talk) 19:35, 5 March 2023 (UTC)

This page, in the news

@JPxG, DFlhb, Rsjaffe, Fram, Andrew Davidson, Alalch E., François Robere, Ovinus (alt), Isaacl, EpicPupper, Silver seren, Phlsph7, BlankpopsiclesilviaASHs4, 2001:448A:304A:3A2A:F87F:AE94:6B45:64E1, Gnomingstuff, DIYeditor, Michael D. Turnbull, ElKevbo, Novem Linguae, HaeB, Talpedia, and Sandizer:

Check it out! Here's an article that is in part about the effort to write this guideline. (See the section "What about Wikipedia?")

https://wikiedu.org/blog/2023/02/21/chatgpt-wikipedia-and-student-writing-assignments/

   — The Transhumanist   09:18, 1 March 2023 (UTC)

Thanks Transhumanist ... that's an informative and well-written article. - Dank (push to talk) 16:50, 3 March 2023 (UTC)
Thank you Ragesoss for this interesting and balanced article, and the accurate portrayal of this draft policy, hope it reaches the right audience! —Alalch E. 16:56, 3 March 2023 (UTC)
Thanks! ragesoss (talk) 22:15, 3 March 2023 (UTC)
@Ragesoss: Thanks for the article. By the way, I forgot to ping you in the Observations section above. Consider yourself pinged.    — The Transhumanist   08:02, 5 March 2023 (UTC)