Converting PDF slides to PNG then CSV to bulk import into Anki
I had a large slide deck in PDF format and wanted to add the individual slides to Anki to revise from. There is some benefit to the learning process in doing this manually, but it was more practical to study the slides first and then automate the import into Anki.
The final bash command to produce an importable CSV file for Anki looks like this:
pdftoppm my_slides.pdf my_slide \
-progress -png -f 20 -l 923 -rx 60 -ry 60 && \
cp ./*.png ~/.local/share/Anki2/my_username/collection.media/ && \
echo >| my_list.csv && \
ls *.png \
| awk '{printf("<img src='%s'/>\t\tmy_tag_1 my_tag_2\n",$1)}' \
>> my_list.csv
First it uses the pdftoppm
tool to create a PNG image file for each slide:
pdftoppm my_slides.pdf my_slide -progress -png -f 20 -l 923 -rx 60 -ry 60
The -f
and -l
options set the range of slides to process, because we don’t
want the introductory or closing slides. The -rx
and -ry
options set the
DPI resolution. 60 is a relatively low value for this, but it’s a high enough
resolution for revision in Anki, and keeps the file size lower to reduce the
size of the media for the Anki deck.
The script then copies all of the newly created png files to the Anki media
location (note that this is for Linux, and also that my_username
needs
replacing with your Anki username):
cp ./*.png ~/.local/share/Anki2/my_username/collection.media/
Next the script truncates the CSV file with no-clobber mode disabled:
echo >| my_list.csv
Finally it pipes the list of PNG files through awk
to produce CSV (or TSV)
rows with an HTML img
tag referring to the relative image path, and adding
any Anki tags we want as the third column:
ls *.png \
| awk '{printf("<img src='%s'/>\t\tmy_tag_1 my_tag_2\n",$1)}' \
>> my_list.csv
It’s a good idea to give these rows a specific tag to make it easy to delete all
of them later if you need to. You could also put any “answer” side you want in
the second column between \t\t
there.
Another possibility would have been to pipe the output of pdftoppm -progress
into awk
and produce the CSV that way, but separating the two steps out via
ls
makes it easier to re-use different parts of this script for other
purposes.
Then you can import this CSV file into Anki, making sure that the HTML option is enabled.