by deca6cda37d0 on 4/19/20, 12:25 PM with 4 comments
For example I only allow PDF files. What is the best way to validate that the file uploaded is indeed a PDF file. And not a otherfileformat.pdf? So the files uploaded will be rendered correctly. This is to prevent human errors.
by necovek on 4/19/20, 12:38 PM
* how much do you care about performance?
* how much do you care about safety?
The simplest and fastest would be to check for the "PDF" signature at the start of the file. Refer to the open PDF spec to ensure you are allowing anything that's acceptable (eg. do you care about FDF files?).
If you need to protect against malicious attempts, rather than user errors, it gets much harder quickly (and theoretically impossible, since you can construct files which will be both valid PDFs and something else).
To give another example, if you are aiming to protect yourself from being used as a media-sharing service, PDF allows embedding media as well, so allowing PDFs will not stop that — they are container formats as much as anything else.
The safest would be to reprocess and re-render only the subset you allow: but that's most expensive in terms of implementation and CPU time, and also somewhat limiting — you can't keep digital signatures, for instance.