by otter-in-a-suit on 12/25/21, 10:17 PM with 15 comments
by faizshah on 12/26/21, 1:23 PM
For the places where bash was used I would just use python and any cli tools you want to call I just use subprocess. It’s much simpler and I can run the scripts in a repl and execute cells in Jupyter or just normal pycharm so its quick and interactive.
Love that you included something on building a data dictionary, I am honestly guilty of in the past not including a good data dictionary for the source data. I would just leave in the output of df.describe() or df.info() at the top of the jupyter notebook where you restructure the source data before processing it. I now think you should include and save as a CSV a data dictionary of the source data and the final data as it’s more maintainable or at least leave a comment in your script.
Otherwise everything else is pretty similar to what I would do, I just went to my google takeout and apparently all my google play data and songs are gone so I guess I can’t try this myself…
by progbits on 12/26/21, 1:23 PM
The article never mentioned how this showed up in the GPM app itself which feels lacking.
Otherwise a nice article but it reminds me why I long ago gave up on media metadata organization. So much work, so much mess...
by wodenokoto on 12/26/21, 4:50 PM
How do you parallelize a loop in bash without getting all the echo's intertwined and jumbled together?