Paperless-ngx Bug: Relative 'This Year' Filter Mismatch
Hey everyone! It looks like we've stumbled upon a bit of a head-scratcher in Paperless-ngx, specifically with how the date filters are behaving. Guys, this is a really interesting one because it shows a subtle difference between setting an explicit date range and using a relative date filter. We're talking about the "this year" filter versus manually setting the date from the start of the year to today. It seems like when you use the "this year" filter, one document that should be there is going missing. Pretty wild, right? Let's dive into what's happening and why this might be an issue for some of you.
The Core Issue: "This Year" vs. Explicit Date Range
So, the main problem we're seeing here is a discrepancy in the documents displayed when you filter your paperless-ngx setup. When you go in and explicitly set a date range, say from January 1st, 2025, to November 8th, 2025, you get a certain set of documents. Everything looks good, and you see all the files you expect within that period. However, when you switch gears and use the relative date filter, specifically set to "this year" (also in 2025), one document mysteriously vanishes from the results. This isn't a small glitch; it means the filtering isn't behaving consistently, which can be super confusing and problematic when you're trying to find specific documents.
Imagine you're organizing your finances, and you know you have a crucial utility bill from April. You filter for "this year," and poof! It's gone. But if you manually set the date range, there it is. This inconsistency can lead to missed information and a general lack of trust in the filtering system. The user who reported this noticed it with a document named 2025-04-Hola...., which is missing when the "this year" filter is applied. This tells us it's not a random document disappearing; it's a specific behavior triggered by the relative date setting.
This discrepancy is particularly concerning because date filters are fundamental to managing documents. Whether you're a small business owner trying to track invoices or just an individual trying to keep your digital life in order, you need to be able to rely on these tools. The fact that a relative filter is giving different results than an explicit one suggests a potential bug in how Paperless-ngx interprets or applies these date parameters. We need to figure out why the "this year" filter might be excluding documents that fall within its intended range, especially when an explicit range for the exact same period includes them. It’s not just about a missing document; it’s about understanding the logic behind the filter and ensuring it works as expected for everyone.
Steps to Reproduce: Let's See How It Happens
Reproducing this issue seems pretty straightforward, though the reporter mentions it might be dependent on the specific documents you have. But based on the description, here’s the workflow that highlights the problem. First things first, you'll want to select a specific correspondent. This helps narrow down the search and isolate the issue. Let’s say you choose "Holaluz" as in the example. Then, you set your date filter. The key here is to compare two methods:
- Explicit Date Filter: Set the "Date From" to
1.1.25(January 1st, 2025) and "Date To" to today's date (in the example,8.11.25, November 8th, 2025). After applying this, check the filtered documents and, importantly, the count of documents returned. This is your baseline – what should be there. - Relative Date Filter: Now, keep the same correspondent selected (or select it again, just to be sure). This time, set the filter to the relative option: "this year" (which would also correspond to 2025). After applying this, check the filtered selection again and compare it to the previous explicit filter. The critical observation is that you should see one document less than before.
The user pointed out that the document 2025-04-Hola.... is the one that goes missing with the relative "this year" filter. This makes the reproduction steps concrete: if you have a document from April 2025 associated with the "Holaluz" correspondent, you should be able to see this difference.
It's worth noting that the reporter's comment about it maybe being dependent on documents could be a hint. Perhaps there's something about the metadata, the OCR data, or even the filename of that specific missing document that interacts strangely with the relative date logic. However, the fundamental test is the comparison between the two filter types. If your setup shows the same discrepancy, then it's definitely something we need to look into further. This systematic approach helps us pinpoint exactly where the system is diverging in its behavior.
Under the Hood: Log Files and Clues
When bugs like this pop up, the log files are our best friends, guys! They often contain the breadcrumbs that lead us to the root cause. The provided webserver logs give us some really interesting insights into what Paperless-ngx is doing when it processes the documents in question. We can see the consumer processing two PDF files: 2025-04-HOLALUZ_681db2ab376ca0.56422550.pdf and 2025-05-HOLALUZ_688de5530f2616.84516254.pdf.
The logs show the typical workflow: parsing the PDF, detecting the MIME type, executing pre-consume scripts, and then extracting text. Notice that for both documents, Tesseract detected text, so OCRmyPDF was skipped. This is good – it means the text extraction itself is likely working correctly. We also see the generation of thumbnails and the execution of the convert command, which are standard parts of the document handling process.
Now, here’s where it gets really interesting. The logs show the creation dates being parsed from the documents: 2025-06-05 00:00:00+02:00 for one and 2025-05-08 00:00:00+02:00 for the other. These are the dates that Paperless-ngx is using internally. The logs then detail the matching of the correspondent "Holaluz" and assigning document types and storage paths. Finally, the index is updated for both documents.
The crucial part comes at the end of the consumption process for 2025-04-HOLALUZ_681db2ab376ca0.56422550.pdf. The logs explicitly state: Creation date from parse_date: 2025-05-08 00:00:00+02:00. This document, with an original filename indicating April, is being recorded with a creation date of May 8th. This is a major clue! It suggests that maybe the date parsing for this specific document isn't picking up the intended date, or perhaps the filename itself is misleading and the actual creation date is what's being parsed.
Let's look closer at the document names and the dates recorded:
2025-04-HOLALUZ_681db2ab376ca0.56422550.pdfis recorded with creation date2025-05-08.2025-05-HOLALUZ_688de5530f2616.84516254.pdfis recorded with creation date2025-06-05.
This seems counterintuitive. The document named with '04' (April) is registered with a May date, and the document named with '05' (May) is registered with a June date. It's possible that the date 2025-05-08 is indeed the correct creation date for the file named 2025-04-HOLALUZ_681db2ab376ca0.56422550.pdf, and the filename is just misleading. If this is the case, then when filtering for "this year" (2025), the document registered with 2025-05-08 should be included. The fact that it's missing when using the "this year" filter suggests that the relative date filter might have a subtle bug, or perhaps the date comparison logic itself is flawed for certain date formats or timezones.
The log also mentions the consumption of 2025-04-HOLALUZ_681db2ab376ca0.56422550.pdf and the resulting generated filename: 2025-05-08 Holaluz 2025-04-HOLALUZ_681db2ab376ca0.56422550.pdf. This further reinforces that the date 2025-05-08 is the one being used by Paperless-ngx for this document. The bug might lie in how the "relative this year" filter handles dates that are parsed, especially if there's a slight mismatch or if the filter logic prioritizes differently compared to an explicit range.
System Details: What We're Working With
To help anyone trying to debug or replicate this, here's a rundown of the system setup. We're running Paperless-ngx version v2.19.5. The host operating system is Ubuntu 22.04 LTS on an aarch64 architecture. The installation method is via Docker using the official image, which is pretty standard these days.
The system status JSON provides even more granular details. We can see that the storage is healthy, with plenty of space available. The database is a MySQL instance, and it's reporting "OK" status with no unapplied migrations, which is great. Task management seems to be running smoothly, with Redis and Celery both in an "OK" state. The document index is up-to-date, and the classifier was last trained recently. Sanity checks are also passing.
In terms of browser, Firefox is being used, and there were no reported configuration changes that might have influenced this behavior. This is important because it suggests the issue isn't likely tied to a specific browser extension or a recent user-made configuration tweak. The fact that the system is generally healthy and up-to-date points towards a potential bug within Paperless-ngx itself, rather than an environmental problem.
Given these details, especially the version number and Docker installation, this bug could potentially affect a wide range of users who are using similar setups. It's always good practice to keep Paperless-ngx updated, but this report confirms that even with a recent version, these kinds of discrepancies can surface. The OS and architecture (aarch64) might be relevant if there's some low-level difference in how certain operations are handled, but typically, Docker abstracts most of that away. The presence of a MySQL database and the specific Celery/Redis setup are also standard components that shouldn't inherently cause date filter issues, but it's always good to have all the facts on the table.
Why This Matters: Ensuring Filter Accuracy
Guys, the reason we're digging into this is simple: accurate filtering is crucial for document management. Whether you're using Paperless-ngx for personal archiving or for business operations, you need to trust that when you ask for documents from a specific period, you get all of them. A discrepancy between an explicit date range and a relative date filter, like "this year," erodes that trust. It means that users might be missing important information without even realizing it.
Imagine you're an accountant looking for all invoices from 2025. You might use the "this year" filter, confidently thinking you've got everything. But if a document from, say, May 2025 gets excluded because of a bug in the relative filter, you could be missing a critical piece of financial data. This could lead to errors in reporting, missed deadlines, or incorrect financial assessments. The consequences can range from minor inconveniences to significant professional or financial problems.
Furthermore, this bug highlights a potential inconsistency in how Paperless-ngx handles date parsing and filtering. The logs suggest that a document originally named for April is being internally registered with a May date. While Paperless-ngx is supposed to be smart enough to handle this and include it in a "this year" filter, it seems to be failing. This could indicate a flaw in the logic that compares the current date against the document's parsed date when using relative filters. The explicit filter, by directly comparing the start and end dates, might be using a more robust comparison method that doesn't hit the same snag.
For developers and maintainers of Paperless-ngx, identifying and fixing this bug is essential for maintaining the integrity and reliability of the software. For users, it’s important to be aware of such potential issues. If you encounter similar discrepancies, reporting them with detailed logs and system information, just like in this case, is incredibly helpful. It allows the community to work together to make Paperless-ngx even better. The goal is to have a system where every filter, whether explicit or relative, works flawlessly and predictably, ensuring you always have access to the documents you need, when you need them.
What's Next?
This bug report is a fantastic starting point. The detailed steps to reproduce, the specific document involved, and the comprehensive log analysis provide the development team with a clear path to investigate. Hopefully, the clue about the date parsing (2025-05-08 for a document potentially related to April) will be key to unlocking the mystery. If you're experiencing this same issue, make sure to check the logs for similar date parsing anomalies. It’s through this kind of community collaboration and detailed reporting that we can ensure Paperless-ngx continues to be the powerful, reliable document management tool we all love. Keep up the great work, everyone!