python文件拷贝并校验_Django的/ Python的:如何读取文件,并验证它是音频文件?...

I'm building a web app, where users are able to upload media content, including audio files.

I've got a clean method in my AudioFileUploadForm that validates the following:

That the audio file isn't too big.

That the audio file has a valid content_type (MIME type).

That the audio file has a valid extension.

However, I'm worried about security. A user could upload a file with malicious code, and easily pass the above validations. What I want to do next is validate that the audio file is, indeed, an audio file (before it writes to disk).

How should I do this?

class UploadAudioForm(forms.ModelForm):

audio_file = forms.FileField()

def clean_audio_file(self):

file = self.cleaned_data.get('audio_file',False):

if file:

if file._size > 12*1024*1024:

raise ValidationError("Audio file too large ( > 12mb )")

if not file.content_type in ['audio/mpeg','audio/mp4', 'audio/basic', 'audio/x-midi', 'audio/vorbis', 'audio/x-pn-realaudio', 'audio/vnd.rn-realaudio', 'audio/x-pn-realaudio', 'audio/vnd.rn-realaudio', 'audio/wav', 'audio/x-wav']:

raise ValidationError("Sorry, we do not support that audio MIME type. Please try uploading an mp3 file, or other common audio type.")

if not os.path.splitext(file.name)[1] in ['.mp3', '.au', '.midi', '.ogg', '.ra', '.ram', '.wav']:

raise ValidationError("Sorry, your audio file doesn't have a proper extension.")

# Next, I want to read the file and make sure it is

# a valid audio file. How should I do this? Use a library?

# Read a portion of the file? ...?

if not ???.is_audio(file.content):

raise ValidationError("Not a valid audio file.")

return file

else:

raise ValidationError("Couldn't read uploaded file")

EDIT: By "validate that the audio file is, indeed, an audio file", I mean the following:

A file that contains data typical of an audio file. I'm worried that a user could upload files with appropriate headers, and malicious script in the place of audio data. For example... is the mp3 file an mp3 file? Or does it contain something uncharacteristic of an mp3 file?

解决方案

An alternative to the other posted answer which does header parsing. This means someone could still include other data behind a valid header.

Is to verify the whole file which costs more CPU but also has a stricter policy. A library that can do this is python audiotools and the relevant API method is AudioFile.verify.

Used like this:

import audiotools

f = audiotools.open(filename)

try:

result = f.verify()

except audiotools.InvalidFile:

# Invalid file.

print("Invalid File")

else:

# Valid file.

print("Valid File")

A warning is that this verify method is quite strict and can actually flag badly encoded files as invalid. You have to decide yourself if this is a proper method or not for your use case.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值