||2 years ago|
|bin||2 years ago|
|src||2 years ago|
|.gitignore||2 years ago|
|LICENSE||2 years ago|
|README.md||2 years ago|
|composer.json||2 years ago|
|composer.lock||2 years ago|
PDF Finder 🗞 🖇
This is a simple command line utility that allows you to look for PDF documents in any directory (recursively).
I have a lot of PDF documents spread around my home directory and subfolders and I'm too unorganized to do something
about it. Instead of taking an hour to organize the files, I took 7 hours to write this program. It uses
collect metadata. The same can probably be achieved with simple shell scripts (globbing combined with
gets you very very far). I chose PHP because I wanted to do something more with this (JSON API for my home network).
That part is left as an exercise for the reader.
There's two executables in
To run it, you need Composer and PHP >= 7.4, as well as poppler-utils. Installation of poppler-utils on Ubuntu is very simple:
sudo apt update && sudo apt install poppler-utils
The scripts are marked as executable so you can easily add them to your PATH. In my case,
~/.local/bin is in my PATH, so I did the following:
ln -s /path/to/repository/bin/pdf-finder.php ~/.local/bin/pdf-finder ln -s /path/to/repository/bin/pdf-show-info.php ~/.local/bin/pdf-show-info
Now you can just invoke them with
The first executable,
pdf-finder.php, is used to actually find PDFs based on search terms. The first argument should
always be the directory. Filters are optional.
To find every PDF document with 'python' in its path, filename or any metadata field in the ~/Documents folder:
bin/pdf-finder.php ~/Documents python
... with 'python' in the title (metadata property):
bin/pdf-finder.php ~/Documents title=python
... with 'ritchie' in the author field and where the title property is set:
bin/pdf-finder.php ~/Documents author=ritchie title=
... with 'programming' and 'python' in the filename:
bin/pdf-finder.php ~/Documents filename=programming filename=python
Filters are based on the information supplied by the
command (man page here). Dates, when given, are printed in ISO-8601
format. Common fields are listed below.
path) is the path excluding the filename.
name) is the name of the file excluding the path.
A note on filters
About 50% of the PDF files on my computer contain usable metadata. It's almost never complete, although this depends on the source you got your files from.
path=python yields the same results as
path is an alias to
filepath. The same goes
name: both are aliases to
Filters are cumulative: adding more filters further restricts the output.
Listing document info:
The second utility is basically a fancy wrapper for
pdfinfo. It takes one argument, the path to a PDF document, and
spits out a table with information about the document.
$ bin/pdf-show-info.php ~/path/to/document.pdf
Do as you please, as that is the beauty of open source.