Converting PDF slides to PNG then CSV to bulk import into Anki

I had a large slide deck in PDF format and wanted to add the individual slides to Anki to revise from. There is some benefit to the learning process in doing this manually, but it was more practical to study the slides first and then automate the import into Anki.

The final bash command to produce an importable CSV file for Anki looks like this:

pdftoppm my_slides.pdf my_slide \
  -progress -png -f 20 -l 923 -rx 60 -ry 60 && \

cp ./*.png ~/.local/share/Anki2/my_username/collection.media/ && \

echo >| my_list.csv && \

ls *.png \
  | awk '{printf("<img src='%s'/>\t\tmy_tag_1 my_tag_2\n",$1)}' \
  >> my_list.csv

First it uses the pdftoppm tool to create a PNG image file for each slide:

pdftoppm my_slides.pdf my_slide -progress -png -f 20 -l 923 -rx 60 -ry 60

The -f and -l options set the range of slides to process, because we don’t want the introductory or closing slides. The -rx and -ry options set the DPI resolution. 60 is a relatively low value for this, but it’s a high enough resolution for revision in Anki, and keeps the file size lower to reduce the size of the media for the Anki deck.

The script then copies all of the newly created png files to the Anki media location (note that this is for Linux, and also that my_username needs replacing with your Anki username):

cp ./*.png ~/.local/share/Anki2/my_username/collection.media/

Next the script truncates the CSV file with no-clobber mode disabled:

echo >| my_list.csv

Finally it pipes the list of PNG files through awk to produce CSV (or TSV) rows with an HTML img tag referring to the relative image path, and adding any Anki tags we want as the third column:

ls *.png \
  | awk '{printf("<img src='%s'/>\t\tmy_tag_1 my_tag_2\n",$1)}' \
  >> my_list.csv

It’s a good idea to give these rows a specific tag to make it easy to delete all of them later if you need to. You could also put any “answer” side you want in the second column between \t\t there.

Another possibility would have been to pipe the output of pdftoppm -progress into awk and produce the CSV that way, but separating the two steps out via ls makes it easier to re-use different parts of this script for other purposes.

Then you can import this CSV file into Anki, making sure that the HTML option is enabled.


View post: Converting PDF slides to PNG then CSV to bulk import into Anki