Jump to content

Wikipedia talk:Database reports

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Requests: Please list any requests for reports below in a new section. Be as specific as possible, including how often you would like the report run.

Filter out template disambiguation

[edit]

I think templates in Category:Template disambiguation pages should be excluded from Wikipedia:Database reports/Unused templates (filtered), since it's a subcategory of Category:Wikipedia transclusionless templates. jlwoodwa (talk) 21:34, 8 September 2024 (UTC)[reply]

I might be useful, if we're going to change that page, to convert it to use the format at User:Jonesey95/self-transcluded-templates so that regular editors (or template editors, if we want to protect the report a bit) can make changes like the above after discussion. That way, nobody has to mess with off-wiki code. – Jonesey95 (talk) 00:19, 10 September 2024 (UTC)[reply]
I have raised a PR (https://github.com/mzmcbride/database-reports/pull/141) to get HaleBot to stop updating pages containing {{nobots}}. This would enable usurping reports following on-wiki discussions without getting the two bots to overwrite each others' reports. @0xDeadbeef Can you review it? – SD0001 (talk) 19:55, 25 September 2024 (UTC)[reply]
Just to follow up, the PR has been merged and deployed. Please go ahead with {{database reports}}-ification! Legoktm (talk) 16:54, 4 October 2024 (UTC)[reply]

Dusty Articles should exclude soft redirects and potentially Set indexes

[edit]

@Legoktm since the report already excludes hard redirects, it should exclude soft redirects, which is currently does not do, for example, various wiktionary redirects like Technical tap, this could probablly be resolved by excluding pages found in Category:Wikipedia soft redirects and it's subcategories.


Another issue is Set index articles, which are functionally another disambiguation page and often do not need edits for long periods of time, such as some surnames, geographic details, etc. I'd propose excluding these from Dusty Articles and perhaps have it be a sub report exclusive to set indexes, maybe for particular categories of sex indexes, as sometimes they can be genuinely overlooked(i.e. an article was created for a person with an obscure surname that does indeed already have SIA). Anyway I understand this to likely be a much more nuanced issue to resolve than the soft redirects so more disccusion is likely needed on that matter. Akaibu (talk) 21:40, 6 October 2024 (UTC)[reply]

TV articles with "was"

[edit]

Could we get a database report on TV show articles that have "was" in the first sentence (i.e., Name of Show was...) MOS:TV has dicated use of "is" since forever, but I'm still finding "was"es all over the place. Ten Pound Hammer(What did I screw up now?) 20:56, 27 October 2024 (UTC)[reply]

FastilyBot is dead?

[edit]

@Fastily: Wikipedia:Database reports/Transclusions of non-existent templates hasn't updated in ~two days; in fact, if you look at Special:Contributions/FastilyBot this was the very last page the bot edited before it went on hiatus (usually it makes several edits per day). Can someone restart the bot? Duckmather (talk) 00:52, 18 November 2024 (UTC)[reply]

Fastily is no longer running a bot that updates reports. I have updated that report to use the quarry query that the bot was using. It should update daily. Improvements to that page are welcome. – Jonesey95 (talk) 05:20, 19 November 2024 (UTC)[reply]

Linked misspellings enhancement request

[edit]

@Legoktm and 0xDeadbeef: please update the SQL for Wikipedia:Database reports/Linked misspellings to exclude AnomieBOT-created en-dash redirects (per this discussion on my talk), as I modified my similar personal report HERE. For example, because the bot created 1941-42 Subsitute Gold Cup, your report includes one incoming link to 1941–42 Subsitute Gold Cup but that can be ignored because it was caused by that bot's edit which created more problems than it solved. Note that my report excludes it. Sorry to bother you with this work-around request, but I trust I'll have better luck asking you than I did asking the bot operator to tweak their bot code to avoid this issue. Thanks, wbm1058 (talk) 22:51, 13 December 2024 (UTC)[reply]

You can just take it over with {{database report}}; see a couple sections up. —Cryptic 23:22, 13 December 2024 (UTC)[reply]
I tried that back in June, and it worked for a few days, until HaleBot woke up and started working again. – wbm1058 (talk) 00:05, 14 December 2024 (UTC)[reply]
As per that section, the change to ignore pages matching /\{\{[Dd]atabase report\s*\|/ was merged in September (commit). —Cryptic 14:30, 15 December 2024 (UTC)[reply]
 Done – taken over with {{database report}}wbm1058 (talk) 11:53, 16 December 2024 (UTC)[reply]

Drafts containing mainspace categories

[edit]

Can we have a report of content in draftspace containing mainspace categories? This routinely needs to be cleaned. BD2412 T 22:44, 9 January 2025 (UTC)[reply]

Wikipedia:Database reports/Drafts with categories? – Jonesey95 (talk) 04:17, 10 January 2025 (UTC)[reply]
I've just updated that query; it was ignoring categories transcluding any template, rather than just ones transcluding {{Maintenance category}} as intended. Whoops.
As mentioned in the notes to the query it was copied from, there are about two dozen categories either starting with "Drafts about " or ending with " drafts" that are neither hidden nor transclude {{maintenance category}}, and those cats should really be fixed rather than be omitted from this report. (Examples: Category:Drafts about horror films, Category:X-Men drafts.) If I list them here, are they likely to get fixed, or will I just irritate the people watching for drafts like Draft:Debanjan Pakrashy? What about ones like Category:Pages translated from German Wikipedia? —Cryptic 05:21, 10 January 2025 (UTC)[reply]

Problem with User:HaleBot at Wikipedia:List of Wikipedians by number of edits/1–1000

[edit]

There is a problem regarding User:HaleBot's updating of Wikipedia:List of Wikipedians by number of edits/1–1000. According to this section, "A user name in black (unlinked) has not been used for editing in the last 30 days. This list is normally updated daily by a bot.". User:Koavf has not published an edit since more than 30 days ago (in fact, 6 months ago), but the username still appears linked in that list. — AP 499D25 (talk) 03:04, 28 January 2025 (UTC)[reply]

Indefinitely blocked IP report

[edit]

It appears as if the report at Wikipedia:Database reports/Indefinitely blocked IPs hasn't been functioning properly since April 2024, where the list went from 78 entries down to zero, and has been zero ever since. I double checked the block log of a few users to see if they had simply been unblocked, but that doesn't appear to be the case, meaning there's an issue with the report itself. Can this be looked in to? Thanks, VegaDark (talk) 17:37, 3 February 2025 (UTC)[reply]

I've replaced it. —Cryptic 18:28, 3 February 2025 (UTC)[reply]

Database reports/Long stubs - request - filter to remove articles with less than 250 words "readable prose"

[edit]

Greetings, Recently I was advised to stop removing stubs from articles with "Page size" less than 250 words. Before doing "undo self" to cleanup (January 2025) mess that I made, I updated guidance at WikiProject Stub improvement What_is_a_stub? to clarify stub-size and help others from making this same error. Now I am asking that HaleBot exclude those same small prose size articles from the list. Regards, JoeNMLC (talk) 18:13, 19 February 2025 (UTC)[reply]

I assume determining readable page size of an article requires some sophisticated analysis possibly beyond the capabilities of the bot. But it does appear that about half or more of the articles in this report are legit stubs per the definition you cite. ~Kvng (talk) 17:48, 24 February 2025 (UTC)[reply]
@Kvng - wondering if any of the existing code for Gadget Prosesize can be incorporated into the bot? Yes, that would be a major task but would greatly improve accuracy of "Long stubs" report. Cheers, JoeNMLC (talk) 17:55, 24 February 2025 (UTC)[reply]
How about a simpler solution: Create a hidden category, maybe Category:Long stubs with short prose. We apply that manually to legit stubs per the definition. We then request to exclude articles in this category from the report. ~Kvng (talk) 17:06, 3 March 2025 (UTC)[reply]
@Kvng - Yes, that would work for the bot to exclude so articles are not repeatedly included. Currently about 80-90 percent on the weekly report should not be there & it is a big-waste-of-time skipping those. When one of the articles are expanded with more prose, how would people know to remove Category:Long stubs with short prose? I do know about the setting to show all hidden categories at bottom of articles. Would this change (if done) be included into "Tech News"? JoeNMLC (talk) 19:00, 3 March 2025 (UTC)[reply]
We could put description and a link to the category on the Wikipedia:WikiProject Stub improvement page. Someone would have to go through them periodically and review articles that have been significantly improved. Not so different from reviewing articles listed in the report. I'm not sure anyone needs to show hidden categories. The key pieces are excluding articles in this hidden category from the report and the automatically generated listing of the members of the hidden category (Category:Long stubs with short prose). ~Kvng (talk) 19:18, 3 March 2025 (UTC)[reply]
A similar solution would be to make a configuration page linking to article titles that should be excluded from the report. This is just as easy, and completely sidesteps the issues with changing the articles themselves. (Directly excluding pages with x amount of readable prose isn't possible in a pure database report; it could conceivably be done by the bot running the regular report then fetching the text of each page on it, but that's significantly more work.) —Cryptic 19:21, 3 March 2025 (UTC)[reply]
I think it would be better to use Category:Long stubs with short prose than to have to search and update a separate list of known long stubs. ~Kvng (talk) 19:48, 3 March 2025 (UTC)[reply]
I've created the category and added 9 articles to it so the report generation can be tested.
I gather we need to modify Wikipedia:Database reports/Long stubs/Configuration. I don't know SQL but ChatGPT recommends adding an AND NOT EXISTS clause to the existing:
SELECT
:::::  page_title,
:::::  page_len
:::::FROM
:::::  page
:::::  JOIN categorylinks ON cl_from = page_id
:::::WHERE
:::::  cl_to LIKE '%stubs'
:::::  AND page_namespace = 0
:::::  AND page_len > 2000
:::::  AND NOT EXISTS (
:::::    SELECT 1
:::::    FROM categorylinks AS cl_exclude
:::::    WHERE cl_exclude.cl_from = page.page_id
:::::    AND cl_exclude.cl_to = 'Long_stubs_with_short_prose'
:::::  )
:::::GROUP BY
:::::  page_title
:::::ORDER BY
:::::  page_len DESC
:::::LIMIT
:::::  1000;
~Kvng (talk) 19:59, 3 March 2025 (UTC)[reply]
@Kvng - I concur with above. During my past working years I did have a few "close-encounters" with SQL but have no knowledge of WP database, coding, testing, etc. Maybe help from Page watcher here, or the bot operator? Would be great if can be done before Wednesday's weekly processing. Cheers, JoeNMLC (talk) 21:51, 3 March 2025 (UTC)[reply]
In the meantime, it's probably safe and productive to load up Category:Long stubs with short prose ~Kvng (talk) 22:10, 3 March 2025 (UTC)[reply]
Just for FYI, I will begin at #500 of the 1,000 and work back to #1... If not tonight, tomorrow morning. Cheers! JoeNMLC (talk) 22:44, 3 March 2025 (UTC)[reply]
Progress: completed articles 10 to 100. JoeNMLC (talk) 22:57, 4 March 2025 (UTC)[reply]
I have sucessfully constructed the query with Petscan. This is probably better than the report since it can be updated at will. ~Kvng (talk) 18:53, 7 March 2025 (UTC)[reply]

Losing battle

[edit]

While I appreciate the effort, I think maintaining Category:Long stubs with short prose is a losing battle. You have to populate it and then remove articles as they get expanded. It's a decent amount of toil. Just count the words per article, it's not that difficult, computers are good at counting. :-) Yes, for annoying and stupid reasons you can't get the word count with SQL alone, but you can use a programming language to iterate through the list of stubs and extract a rough word count. Then you could either have the report exclude based on a word count threshold or you could include a sortable column with the word count. --MZMcBride (talk) 04:23, 6 March 2025 (UTC)[reply]

Here's a very basic script. There are likely better or smarter ways to do this, but this shows the general idea:

#!/usr/bin/env python3

import re
import requests
from bs4 import BeautifulSoup

urls = [
    'https://en.wikipedia.org/wiki/1999_Shetland_Islands_Council_election',
    'https://en.wikipedia.org/wiki/England_Open',
]

for url in urls:
    html_doc = requests.get(url).text

    soup = BeautifulSoup(html_doc, 'html.parser')

    word_count = 0
    text = ''

    for p in soup.find_all('p'):
        if p.text.find('You can help Wikipedia by expanding it.') == -1:
            text = re.sub(r'\[\d\]', '', text)
            text += p.text.strip() + ' '

    print(url)

    print(text)

    print('Word count: {}'.format(len(re.findall(r'\w+', text))))
    print()

And then the output is:

$ ./venv/bin/python ./wiki_word_count.py 

https://en.wikipedia.org/wiki/1999_Shetland_Islands_Council_election
 Lewis Shand Smith
Independent Tom Stove
Independent Elections to the Shetland Islands Council were held on 6 May 1999 as part of Scottish local elections. The Liberal Democrats won 9 seats, the party's best result in a Shetland Islands Council election.  Nine seats were uncontested.  
Word count: 46

https://en.wikipedia.org/wiki/England_Open
The England Open  is a darts tournament that has been held annually since 1995.  
Word count: 14

This is for 1999 Shetland Islands Council election and England Open. This approach is not perfect, of course, but it's a decent approximation. --MZMcBride (talk) 04:55, 6 March 2025 (UTC)[reply]

@MZMcBride, thanks for the suggestion. The word count we're looking for needs to match what Wikipedia:Prosesize reports. This exclude references, lists and tables and I'm not sure what else. We could try to borrow source code from there or reverse engineer exactly what it is doing, but that all seems like a large project requiring ongoing maintenance and producing relatively small reward. I'm not sure who's qualified and willing to take this on. The Category:Long stubs with short prose solution discussed above has support and we've started using it and it appears likely to meet our needs. I just need someone to show me how to test the SQL changes I've proposed above. ~Kvng (talk) 15:20, 6 March 2025 (UTC)[reply]
Observations: - The current "Long stubs" wikitable is 80-90 percent incorrect. Would it be simpler/easier to disable that HaleBot task for now? Then make a single-function bot to read through the 2.3 million stubs and 1. find articles with prose-size over 250 words; 2. output to a plain wikitable (no need to sort by size) just the first 1,000 articles. While the Category:Long stubs with short prose approach may be a short-term fix, it is very time consuming, treats the symptom and does not solve the actual problem. (just my opinion). Regards, JoeNMLC (talk) 16:12, 6 March 2025 (UTC)[reply]
I disagree with the implied assertion that the report is not valuable and I oppose the suggestion to disable the task that generates it. Editors have already worked the top portion of the list. There are more than 10-20 percent actionable articles in the lower half of the report. It would be nice if the report took readable prose into account but I disagree that the Category:Long stubs with short prose approach is very time consuming. I think it is beneficial to have actual eyes on some of these marginal stubs as, in general, there's no mechanical formula for assessments. ~Kvng (talk) 16:43, 6 March 2025 (UTC)[reply]

Ammended observations: - Because other editors have completed some of the 1,000 articles, I am repeating the same work already done. Keep the bot running, just add a tracking system for editors to communicate to others what parts of the list are completed. See the example below for details. JoeNMLC (talk) 15:06, 7 March 2025 (UTC)[reply]

WP Stub improvement progress

[edit]

Below is the "Announcement panel" of a progress tracker for weekly HaleBot report Long stubs. Identifies articles "Open", "In process", and "Done". Note that when added to the bot report, this will be deleted with each new report. Ask if bot can output a new panel?

WikiProject Stub improvement – Long stubs Progress

Instructions: Un-comment the Open line below to activate In progress line for articles to check.
Articles – Status
  • 1–100 – Open
  • 101–200 – Open
  • 201–300 – Open
  • 301–400 – Open
  • 401–500 – Open
  • 501–600 – Open
  • 601–700 – Open
  • 701–800 – Open
  • 801–900 – Open
  • 901–1000 – Open

When completed, please change In progress line to Done.

Sorry this report doesn't conform to your suggested format. I've checked 1-575 in the 5 March report and all improperly marked stubs have been assessed. There are still some legitimate stubs in the middle of this range that have not yet been put into Category:Long stubs with short prose.
I suggest we move this discussion to Wikipedia_talk:WikiProject_Stub_improvement. ~Kvng (talk) 15:44, 7 March 2025 (UTC)[reply]
As of now, I'm up to #180 for adding Category:Long stubs with short prose, not doing assessing. JoeNMLC (talk) 15:56, 7 March 2025 (UTC)[reply]
Have you considered editing the report to remove pages that have been processed? Use an appropriate edit summary, and keep the table's basic format intact. That way, editors who visit the report will not have to repeat work. When the bot runs again, it will replace the page contents with an updated table. – Jonesey95 (talk) 16:38, 7 March 2025 (UTC)[reply]
Thanks for the suggestion. The table uses class="wikitable sortable static-row-numbers static-row-header-text" so table source doesn't have the item numbers in it which makes it difficult to find things. If things are removed, everything gets renumbered. Might be better to have the bot refresh this report more often (at least until we're done working on the backlog).
Most important I think is improving the selection criteria for the report to omit entries in Category:Long stubs with short prose. Who can help me learn how to test my proposed changes? ~Kvng (talk) 18:38, 7 March 2025 (UTC)[reply]
The item numbers won't matter if the table is kept up to date by removing processed items. As for refreshing the report more often, if you can get access to the SQL that is used to generate the report, I can convert the page to use {{database report}}. – Jonesey95 (talk) 20:35, 7 March 2025 (UTC)[reply]
I've got Petscan set up (see above #Losing battle) and this seems to remove the need for more frequent updates or report editing. The current SQL is at Wikipedia:Database_reports/Long_stubs/Configuration. My proposed improvement is also above #Losing battle. ~Kvng (talk) 23:19, 7 March 2025 (UTC)[reply]
I have updated Wikipedia:Database reports/Long stubs so that it will run every day and can be updated manually with the link at the top of the page (it takes a few minutes to run). Further improvements to the report SQL can be made on the page (please test in Quarry first). Updates to the page's display can be made using the parameters documented at {{database report}}. – Jonesey95 (talk) 23:46, 7 March 2025 (UTC)[reply]
@Jonesey95 - Thank you for these changes, i.e. daily wikitable update excluding articles with Category:Long stubs with short prose. This is most helpful! Yesterday I tagged about 10 or so articles with that cat. & today they are not within the wikitable, so I would conclude: IT'S Working. Cheers! JoeNMLC (talk) 15:23, 9 March 2025 (UTC)[reply]

Schedule of HaleBot

[edit]

HaleBot updates Wikipedia:Database reports/Template categories containing articles every seven days, which makes for a very regular schedule. This leads to the situation where the cleanup is done by the same editors every time.

Legoktm or 0xDeadbeef, could you please make the schedule of the bot less regular? Say, every 181 hours, a prime of a similar magnitude to 24×7=168. Or every 8 days – same time of day will cause the same regularity issue, but it's probably easier and different day of the week might be good enough of a variance. —⁠andrybak (talk) 11:00, 25 February 2025 (UTC)[reply]

It's doable, but I'm not entirely sure on the status of the project. SDZeroBot's {{database report}} template is probably more reliable, and it might be nice to migrate all existing report to using that (What do you think @Legoktm?), so it might be more worth it to add this support to {{database report}} instead? 0xDeadbeef→∞ (talk to me) 12:05, 25 February 2025 (UTC)[reply]
Andrybak, lately I've been clearing this report, although not every time. Here and here I was late getting to it on those weeks and I think someone else had fixed all or most, so I just noted it was already done. Thing is, I watchlist that report. So unless I put it off and forget about it for a couple days, I'm going to see the update and fix them regardless of what day it's done. Would you rather me wait and let you or someone else handle it more often? Or am I misunderstanding something? --DB1729talk 12:28, 25 February 2025 (UTC)[reply]
DB1729, basically, I want to avoid any potential of you burning out on this task :-)
The report used to happen at a time, when both of us would get to it, splitting the effort. But nowadays it shifted to a time, when I don't usually edit Wikipedia. —⁠andrybak (talk) 14:23, 25 February 2025 (UTC)[reply]
Understood. Honestly, it doesn't seem to take much time or effort for me to keep up with it. If I burn out, it won't because of the demands of this database report:) I suppose there is some small degree of skill, knowledge, and patience required to track down those problematic transclusions, but nothing most any other editor couldn't pick up very quickly whenever if I stop editing or abandon the task. That said, if varying the update schedule will help spread the work to you, or to anyone else for that matter, probably not a bad idea. DB1729talk 14:43, 25 February 2025 (UTC)[reply]
@Andrybak: I don't understand the problem here, and why you'd want a less regular schedule? People always complain when it isn't on a regular schedule. Legoktm (talk) 19:33, 26 February 2025 (UTC)[reply]
My understand is that the timing of the update is a little later each week, and over several months, it gradually progressed to a time and day of the week when andrybak is not available to edit. Andrybak has expressed concern that since the task has currently fallen mostly on myself, that I may suffer burnout (I have addressed that concern above).
If we use the suggestion by 0xDEADBEEF {{database report}} that may be best. We would have regularly scheduled updates, but also andrybak and others would have the option to manually update the report when they are able to edit. That make sense? DB1729talk 19:48, 26 February 2025 (UTC)[reply]
"less regular" is a poor choice of words, my bad. I would like the updates to be regular, but on a schedule intentionally unaligned to the day-week 24/7 cycle. —⁠andrybak (talk) 20:51, 26 February 2025 (UTC)[reply]
If someone can provide the query code, I'll be happy to update the page to use {{database report}}. I don't think there are any downsides to doing so. – Jonesey95 (talk) 17:34, 28 February 2025 (UTC)[reply]
HaleBot's source is linked from its user page. The queries are in the dbreps2/src/enwiki and dbreps2/src/general subdirectories, though finding the right file would probably be an adventure if you're not familiar with git; this particular report is here. —Cryptic 19:28, 3 March 2025 (UTC)[reply]
It's simpler than that - every report gets an autogenerated /Configure subpage with the latest source code. Legoktm (talk) 21:58, 3 March 2025 (UTC)[reply]
As someone who took the time to get familiar with Git and Rust, it's mildly disheartening that <https://github.com/mzmcbride/database-reports/pull/153/files> still isn't live. --MZMcBride (talk) 02:27, 4 March 2025 (UTC)[reply]
I have updated the report to use {{database report}}, using the SQL on the /Configure page. It currently returns "No items retrieved", which may be valid or may be an error on my part. Someone could create an intentional error to test the report. Update: the report is working; I forget how to get column headers to show properly, i.e. "Category" instead of "page_title". – Jonesey95 (talk) 15:36, 9 March 2025 (UTC)[reply]
Thanks Jonesey!:) DB1729talk 18:37, 9 March 2025 (UTC)[reply]

Add warning / instructions to database reports

[edit]

Inexperienced users often think that the fact that database reports list a page means that something must be done. They discover for example Wikipedia:Database reports/Linked miscapitalizations and they think that a page appearing in the database reports is something bad that should be fixed. I think we should add a warning to each database report page that says something like: "The fact that a page appears in a database report does not necessarily mean that there is a problem that needs to be fixed". And maybe remind people of WP:COSMETIC/WP:AWBRULES#4/WP:COSMETICBOT. What do y'all think? Polygnotus (talk) 04:33, 20 March 2025 (UTC)[reply]

Is there a way to clear a page from a report, other than fixing whatever put it there? DB1729talk 06:48, 20 March 2025 (UTC)[reply]
Probably not at this point in time, but something like that is of course possible to implement. But if you realize that a page appearing in a database report is not a problem then there is no longer a reason to remove it, right? Polygnotus (talk) 07:03, 20 March 2025 (UTC)[reply]
Why would a page appear on a report if it does not need to be fixed? Perhaps reports with false positives need better queries. If false positives are unavoidable on a specific report, such as Wikipedia:Database reports/Linked miscapitalizations, custom text can be added to the top of the page. – Jonesey95 (talk) 15:59, 20 March 2025 (UTC)[reply]
@Jonesey95 Because not all reports are intended for people who want to fix problems. Some reports are just for people who do research or are curious about certain aspects of Wikipedia. For example, Wikipedia:Database reports/Active editors with the longest-established accounts exists but that does not mean we should get rid of those people. Polygnotus (talk) 16:06, 20 March 2025 (UTC)[reply]
Thanks for the link. With most reports, it is pretty easy to put a custom note at the top. See, for example, Wikipedia:Database reports/Uncategorized templates. – Jonesey95 (talk) 16:13, 20 March 2025 (UTC)[reply]
...But if you realize that a page appearing in a database report is not a problem then there is no longer a reason to remove it, right?
No, that's still a problem.
1) It means there is a false positive, either a misapplied rcat, or a poor query.
2) If those false positive aren't removed, then the report will eventually become very large and far less easy to use.
3) These reports are limited to 1,000 displayed items, so we wouldn't even see some important ones if they're left outside the cut. DB1729talk 16:26, 20 March 2025 (UTC)[reply]
@DB1729 Then we should decide how to implement a system to deal with false positives. Polygnotus (talk) 16:31, 20 March 2025 (UTC)[reply]
@Polygnotus: I guess my point is many editors who use these reports realize, or should realize, if they find a page that doesn't actually have an error, they should do a little investigation of what triggered the error and fix that, if possible. For example, if a linked misspelling occurs in a direct quote, you might recognize that the misspelling itself should not be fixed, but rather you should maybe use {{sic}} to solve the problem. Then the page would be excluded from the next update. Another example is if a certain spelling is an alternate spelling, rather than a misspelling. In that case, one would go to the redirect page and replace {{R from misspelling}} with {{R from alternative spelling}} to resolve the issue. DB1729talk 17:00, 20 March 2025 (UTC)[reply]
@DB1729 Maybe we should broaden the scope of this section to "Add a warning and instructions to database reports". These instructions could contain information on how to deal with false positives, and when and how to fix problems. I don't think I'm qualified but maybe you could give it a shot? Polygnotus (talk) 17:06, 20 March 2025 (UTC)[reply]
@Polygnotus:, exactly which reports do you have in mind? Are there others besides Wikipedia:Database reports/Linked miscapitalizations and Wikipedia:Database reports/Linked misspellings? A miscapitalization or a misspelling is something that should be fixed by changing to a correct spelling or a correctly capitalized form (and perhaps piping the link in the case of a direct quote). If a particular redirect is not actually a miscapitalization or misspelling, and links to that redirect do not need to be fixed, it should have {{R from alternative spelling}} or {{R from other capitalisation}} and then it will not appear in the miscapitalization/misspelling database reports. But the database reports for miscapitalization and misspellings do represent things that need to be fixed one way or another. Plantdrew (talk) 19:23, 20 March 2025 (UTC)[reply]
@Plantdrew I have also encountered someone who was doing unproductive stuff based on Wikipedia:Database_reports#Long_pages_by_namespace (more specifically Wikipedia:Database reports/Long pages/Talk and Wikipedia:Database reports/Long pages/Talk (no subpages)) and there may very well be more database report that could be improved with some information at the top. Maybe we can start with those and if we or others notice problems with other database reports we can add warnings/instructions to those as well.
I have never really dealt with redirects (it is not really something I am interested in) and the information you just posted could be very useful to people who look at those database reports and are unsure of what to do. Would you be so kind to add instructions to those pages? Thanks, Polygnotus (talk) 02:27, 21 March 2025 (UTC)[reply]
I'm not sure how to add instructions to the reports for misspellings and miscapitalisations. As far as I can tell, the bot will completely overwrite the existing contents of the page (including any instructions) when it updates the page. Plantdrew (talk) 20:04, 21 March 2025 (UTC)[reply]
@Jonesey95: Probably knows. Polygnotus (talk) 20:09, 21 March 2025 (UTC)[reply]
I have updated the report to use {{database report}}, which makes it easier to modify and refresh. The text at the top of the page can be modified to provide instructions or an explanation. The table is missing column headers at the moment; I'll work on that when I get a chance. – Jonesey95 (talk) 20:40, 21 March 2025 (UTC)[reply]
@Plantdrew Jonesey95 fixed it! Polygnotus (talk) 21:38, 21 March 2025 (UTC)[reply]
Ok, I've added a note there about {{R from other capitalisation}} vs {{R from miscapitalisation}}. Plantdrew (talk) 21:42, 21 March 2025 (UTC)[reply]
Thank you! Polygnotus (talk) 22:26, 21 March 2025 (UTC)[reply]

Talk pages

[edit]

One thing that happens from time to time is talkspace pages ending up in mainspace categories; the most common form of this is when somebody tries to link to a category in a talk page discussion, but forgets to put a colon in front of the word category to render it into a text link, but there are also occasional cases where editors place a full-on copy of the article onto the talk page.

And conversely, there are also instances where somebody has placed "WikiProject [Something] articles" categories that are supposed to be on the talk page directly onto the mainspace article instead, so that instead of a talk page being filed in a mainspace category it's an article being filed in a talkspace category. I can catch pages of this type if the category also has draftspace or userspace content in it, because it will then show up in the existing draft or user reports, but that doesn't necessarily cover off all such errors.

So, similarly to the "polluted category" reports that already list categories with a mixture of mainspace and userspace or draftspace content, could one be created that lists categories mixing mainspace and talkspace content? If so, we would want it to exclude some classes of internal project category that aren't really of end-reader concern — such as pages tagged with {{Polluted category}}, {{Tracking category}}, {{Wikipedia category}} or {{Monthly clean-up category}} — so that it doesn't get permanently cluttered up with categories that don't actually need to be addressed. But we would not want it to exclude "WikiProject articles" categories, since those actually do have to be removed if they've been added to the main reader-facing article instead of the talk page.

And also, because WikiProject categories in particular can be massive and nearly impossible to manually search, such a thing should be structured more like the user report (which provides incategory search links) than the draft report (which I've asked in the past to have updated to provide incategory search links as well, only to have that request go unaddressed).

So I just wanted to ask if this would be feasible. Bearcat (talk) 17:28, 21 March 2025 (UTC)[reply]