让图像随机缩放进行数据增强_缩放和增强终于来了

本文介绍了如何使用图像随机缩放技术进行数据增强,以提升机器学习和人工智能模型的训练效果。通过翻译自https://onezero.medium.com/zoom-and-enhance-is-finally-here-c727b3258a11的内容,讨论了这一增强策略的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

让图像随机缩放进行数据增强

We all know the scene. Two detectives on a cop show stand in a dimly lit room filled with monitors, reviewing surveillance images. A tech guy (yes, it’s almost always a guy) queues up image after image as the detectives look on, squinting at the screen in concentration. “There’s nothing here!” one detective insists. They’re about to give up, when the other detective (our hero) shouts, “Wait!”

我们都知道现场。 警察表演中的两名侦探站在昏暗的房间里,房间里装满监视器,查看监视图像。 当侦探望着镜头时,一个技术人员(是的,几乎总是一个人)在一个接一个的图像中排队,凝视着屏幕。 “这里什么都没有!” 一名侦探坚持。 当另一名侦探(我们的英雄)大喊:“等等!”时,他们将放弃。

Everyone stops. “Zoom in there!” the detective says. The tech guy obligingly zooms in on a grainy corner of the image. “Enhance that!” the detective intones. The tech guy taps some keys, mutters something about algorithms, and suddenly the image comes into focus, revealing some tiny, significant detail. The case is cracked wide open!

大家停下来 “放大那里!” T他的侦探说。 技术人员会努力放大图像的颗粒状角落。 “增强!” 侦探声调。 技术人员轻按一些键,喃喃自语一些算法,然后图像立即成为焦点,揭示了一些微小的重要细节。 案子开得裂了!

This scene is a crime drama cliché so pervasive that it has inspired its own meme video with nearly a million views.

这个场景是一个犯罪戏剧陈词滥调,它无处不在,以近百万的观看次数启发了自己的模因视频

Scenes like these drive real tech people bananas, because “zoom and enhance” has always seemed like an impossible fantasy. Until now. Thanks to two recent innovations, zoom and enhance is finally here. It has the potential to radically change police surveillance, often in concerning ways — or at least help you bring back your photos from the early ’00s.

诸如此类的场景推动了真正的技术人员的成长,因为“ 缩放和增强 ”一直看起来像是不可能的幻想。 到现在。 多亏了最近的两项创新,缩放和增强终于到了。 它有可能从根本上改变警察的监视方式,通常以令人关注的方式进行,或者至少可以帮助您恢复20年代初期的照片。

The first innovation behind real-life zoom and enhance comes from the world of photography. Until recently, photographers had two primary options for digital cameras: professional DSLRs like the Nikon D series, or cheap compact consumer cameras, like the kind you’d use for birthday or travel snapshots. DSLRs take great photos, but they’re bulky and conspicuous and can be hard to operate — not a great combo for surveillance work. Compact cameras rarely have the quality necessary for surveillance professionals.

现实生活中缩放和增强背后的第一个创新来自摄影界。 直到最近,摄影师还为数码相机提供了两个主要选择: 尼康D系列等专业数码单反相机,或者像您用于生日或旅行快照的廉价紧凑型家用相机。 DSLR可以拍摄出色的照片,但它们体积庞大且引人注目,并且难以操作-并不是监视工作的绝佳组合 。 紧凑型摄像机很少具有监视专业人员所需的质量。

That all began to change around 2015, with the rise of mirrorless cameras. These cameras have the tiny form factor of a compact camera, but thanks to advances in imaging chips driven in part by smartphones, they pack in the same high-quality image sensors usually found in a DSLR. Increasingly, they also borrow complex image processing software from the smartphone world, further enhancing their capabilities. And crucially, they allow for the use of professional lenses — easily the most important factor for taking high-quality photos.

随着无反光镜相机的兴起,这一切在2015年左右开始改变。 这些相机的外形比紧凑型相机小,但是由于部分由智能手机驱动的成像芯片的进步,它们采用了通常与DSLR相同的高质量图像传感器。 他们也越来越多地从智能手机领域借用复杂的图像处理软件 ,从而进一步增强了功能。 至关重要的是,它们允许使用专业镜头-轻松成为拍摄高质量照片的最重要因素

For a few thousand dollars, a surveillance professional or police force can now purchase tiny, easy-to-use cameras that take better photos than the best professional cameras from just a few years ago.

监视专业人员或警察部队现在只需花几千美元,就可以购买微型,易于使用的相机,这些相机比几年前最好的专业相机拍摄的照片更好。

The end result is a tiny camera that you can carry and use inconspicuously, while taking extremely detailed, high-resolution photos. The Q, a mirrorless camera from legendary German camera maker Leica, largely kicked off the trend. The latest Q model weighs just 1.4 pounds and takes 47-megapixel photos through an obscenely crisp lens that sees more detail than the human eye. With an ISO rating of 50,000 (15 times higher than that achieved by the fastest analog films), it can also essentially see in the dark.

最终结果是一个微型相机,您可以毫不费力地携带和使用它,同时可以拍摄极其详细的高分辨率照片。 Q是传奇的德国相机制造商莱卡(Leica)生产的无反光镜相机,在很大程度上开始了这一趋势。 最新的Q型号仅重1.4磅,可通过一个令人眼花lens乱的清晰镜头拍摄47兆像素的照片,该镜头比人眼可以看到更多细节 。 ISO等级为50,000(比最快的模拟胶片高15倍),它实际上也可以在黑暗中看到。

Image for post
The Leica Q2. Photo: Leica
徕卡Q2。 照片:徕卡

Lower-priced competitors, like the Sony Alpha, have since emerged. For a few thousand dollars, a surveillance professional or police force can now purchase tiny, easy-to-use cameras that take better photos than the best professional cameras from just a few years ago. Zooming into photos taken on these cameras can sometimes feel like using zoom and enhance. The detail they capture — especially paired with modern software — is remarkable.

此后出现了诸如Sony Alpha之类的低价竞争对手。 监视专业人员或警察部队现在只需花费几千美元,就可以购买微型,易于使用的相机,这些相机比几年前最好的专业相机拍摄的照片更好。 放大在这些相机上拍摄的照片有时会感觉像是使用变焦和增强功能。 他们捕获的细节(尤其是与现代软件搭配使用)非常出色。

But combine mirrorless camera images with compressive sensing, and zoom and enhance is truly here. Compressive sensing allows you to massively enlarge an image without a major loss in quality. The tech has been around since the early 2000s, but it gained prominence in 2010 when researchers showed how it could be used to reconstruct an image of President Barack Obama using a tiny sample of randomly distributed pixels.

但是,将无反光镜相机图像与压缩感测相结合,变焦和增强确实在这里。 压缩感应使您可以在不损失质量的情况下大幅放大图像。 该技术自2000年代初就出现了,但是在2010年引起了人们的关注,研究人员展示了如何利用随机分布的微小像素样本将其用于重建巴拉克·奥巴马(Barack Obama)总统的图像

In 2017, Google showed how principles of compressive sensing could be combined with neural networks to reconstruct degraded or low-quality images in a process called A.I. super-resolution. The tech works by starting with sample images — often of faces or rooms — and deliberately messing them up by making them blurry, running them through a terrible JPEG compression system, and the like.

2017年,谷歌展示了如何将压缩感测原理与神经网络结合起来,以称为AI超分辨率的过程重建退化或低质量的图像。 该技术的工作原理是从样本图像(通常是面Kong或房间)开始,然后通过使图像模糊,故意通过可怕的JPEG压缩系统运行等等来故意弄乱它们。

A neural network then looks at the degraded images, compares them to their high-quality counterparts, and learns how the two relate. Essentially, the network teaches itself all the ways that a digital image can degrade. Once it knows this, the process is reversed. The system is handed a low-quality or degraded image, and based on its training, it constructs a high-quality, undegraded version from scratch.

然后,神经网络查看降级的图像,将其与高质量的图像进行比较,并了解两者之间的关系。 本质上,网络会自学数字图像可能降级的所有方法。 一旦知道这一点,该过程就被逆转。 该系统将获得低质量或降级的图像,并基于其培训,从头开始构建高质量,未降级的版本。

Though Google has since largely exited the field, A.I. super-resolution has taken off. Services like Big JPG allow users to upload a low-quality photograph and have it instantly upscaled 400% or more, often with minimal loss of quality. Photoshop plugins have delivered similar tech to photographers, who use it to remove blurriness and sharpen images. My A.I.-driven photography company often uses the tech to upscale digital camera photos taken in the early 2000s, allowing even these low-quality early images to meet today’s standards for use in publications.

尽管Google从此基本上退出了该领域,但AI超分辨率已经起飞。 Big JPG之类的服务允许用户上传低质量的照片,并立即将其放大400%或更多,而质量损失通常最小。 Photoshop插件向摄影师提供了类似的技术,摄影师使用它来消除模糊和锐化图像。 我的AI驱动的摄影公司经常使用该技术来放大2000年代初拍摄的数码相机照片,甚至使这些低质量的早期图像也能满足当今出版物使用的标准。

The tech, though, is also being used for surveillance. Quickly after its development, researchers began to show how the super-resolution could be used to upscale low-resolution surveillance photos or frames from surveillance videos. Others focused on using the tech for targeted applications, like license plate recognition. And many groups have focused on super-resolution for facial recognition images, going so far as to develop specialized algorithms for enhancing facial images.

但是,该技术也正在用于监视。 在其开发后不久,研究人员开始展示如何将超分辨率用于放大低分辨率的监视照片监视视频的帧 。 其他人则专注于将该技术用于目标应用,例如车牌识别 。 并且许多小组都致力于面部识别图像的超分辨率,甚至开发了增强面部图像的专用算法

Several vendors have integrated these algorithms into dedicated software products. Topaz Labs, in my experience, is the most advanced. Pair its Gigapixel AI product with the output of a modern mirrorless camera, and you’ve got zoom and enhance that rivals the imagined systems on shows like CSI.

多家供应商已将这些算法集成到专用软件产品中。 以我的经验, Topaz Labs是最先进的。 将其Gigapixel AI产品与现代无反光镜相机的输出配合使用,您将获得与CSI这类可与想象中的系统匹敌的变焦和增强功能。

Here, for example, is a photo of a Jamba Juice restaurant in Marin County, California, taken on my Leica Q mirrorless camera.

例如,这是使用我的Leica Q无反光镜相机拍摄的加利福尼亚州马林县Jamba Juice餐厅的照片。

Image for post
Jamba Juice restaurant taken on a Leica Q mirrorless camera. Photos courtesy of the author.
使用Leica Q无反光镜相机拍摄的Jamba Juice餐厅。 照片由作者提供。

I took this from across a street, with the palm-sized camera hanging around my neck. I then ran the photo through Topaz’s Gigapixel AI software, upscaling it 400% and using the company’s proprietary face reconstruction and sharpening algorithms.

我从一条街对面拿来的,手掌大小的相机挂在脖子上。 然后,我通过Topaz的Gigapixel AI软件运行该照片,将其放大400%,并使用该公司专有的面部重建和锐化算法。

Zooming in to full size on the enhanced image, you can see some incredible detail. Through the restaurant’s front window, you can clearly see a patron waiting in line and examining a menu.

放大放大后的图像,可以看到一些令人难以置信的细节。 通过餐厅的前窗,您可以清楚地看到顾客在排队等候并检查菜单。

Image for post
The red box shows the region that was zoomed and enhanced in the photo below.
红色框显示在下面的照片中放大并增强的区域。
Image for post
People are visible after applying zoom and enhance.
应用缩放和增强后,人们将可见。

You can even see that he’s wearing a blue surgical mask. Great job staying safe, unknown smoothie man! Flyers posted on the door are also visible, including some of the graphics on the flyer. You can see patrons inside placing their orders.

您甚至可以看到他戴着蓝色的外科口罩。 保持安全的好工作,不知名的冰沙人! 张贴在门上的传单也可见,包括传单上的一些图形。 您会看到顾客在下订单。

Zooming and enhancing another part of the image, you can see the text on signs in the far background (“Jamba Curbside Pickup”) and how they’ve been attached to pillars using yellow tape. And in the far distance, you can see the mannequins in another nearby store and diners eating at outdoor tables.

缩放并增强图像的另一部分,您可以在远处的背景上看到招牌上的文字(“ Jamba路边拾音器”)以及如何使用黄色胶带将其连接到Struts上。 在远处,您可以看到附近的另一家商店里的人体模特和在户外餐桌上用餐的食客。

Image for post
The red box shows the region that was zoomed and enhanced in the photo below.
红色框显示在下面的照片中放大并增强的区域。
Image for post
Text is visible on signs in the far background after applying zoom and enhance.
应用缩放和增强后,在远方背景的标志上即可看到文本。

With more extreme zoom and a tweak to exposure, you can clearly make out the store’s signature Blendtec blenders on the counter inside.

通过更大的变焦和调整曝光,您可以在柜台内部清楚地辨认出商店的招牌Blendtec搅拌机。

Image for post
The red box shows the region that was zoomed and enhanced in the photo below.
红色框显示在下面的照片中放大并增强的区域。
Image for post
Left: Zoomed and enhanced image of blender inside the restaurant. Right: A similar model of Blendtec blender for comparison. Photo: Blendtec via PRWeb
左:餐厅内搅拌机的放大和增强图像。 右:用于比较的Blendtec搅拌机的类似模型。 照片:通过PRWeb的Blendtec

Blender identification, of course, is not the most groundbreaking use of a new technology. But when you apply zoom and enhance in a surveillance context, things get scary fast.

当然,搅拌机识别并不是最先进的新技术。 但是,当您在监视环境中应用缩放和增强功能时,事情会很快变得可怕。

Here, for example, is a photo I took of a Black Lives Matter protest in Times Square in 2016.

例如,这是我在2016年在时代广场(Times Square)拍摄的关于黑人生活问题抗议活动的照片。

Image for post
Black Lives Matter protest on July 7, 2016. The red box at the center left of image is zoomed and enhanced in the photo below.
2016年7月7日,Black Lives Matter抗议。图像中央左方的红色框在下面的照片中放大和增强。

Applying zoom and enhance, you can clearly see the faces of police officers in the far back of the crowd. With facial reconstruction applied, these images would likely be good enough to find matches in a facial recognition database.

应用缩放和增强功能,您可以清晰地看到人群后面的警察的面Kong。 应用面部重建后,这些图像可能足以在面部识别数据库中找到匹配项。

Image for post
A police officer’s face from the back of the crowd is clearly visible after applying zoom and enhance. His eyes are redacted with black bar to protect the officer’s identity.
应用缩放和增强后,可以清晰地看到人群后方的警官脸。 黑色的眼睛修饰了他的眼睛,以保护军官的身份。

Combining this tech with facial recognition systems like Clearview AI would make it trivial to identify large numbers of people in a crowd of protesters. A plainclothes police officer or federal agent posing as a tourist could easily walk through a crowd of protesters while snapping photos on a tiny mirrorless camera. The photos could be run through a super-resolution system, enlarging them massively and enhancing the faces present.

将该技术与Clearview AI等面部识别系统结合使用,可以轻松地识别出一群抗议者中的大量人员。 伪装成游客的便衣警察或联邦特工可以轻松地穿过一群抗议者,同时用微型无反相机拍摄照片。 这些照片可以通过超分辨率系统运行,可以对其进行大规模放大并增强当前的面Kong。

Individual faces could then be pulled out of the image and run through a system like Clearview’s to identify every individual by name. Police forces and other agencies are reportedly already using A.I. to identify different actions (like breaking into a vehicle or loitering) and to search surveillance images for people based on their physical descriptions. It’s unclear if any are using super-resolution yet, but undoubtedly that will come. Face reconstruction tech will likely improve as well — many faces today still come out distorted when enhanced, but facial reconstruction errors will likely diminish with time.

然后可以将个人面部从图像中拉出,并通过像Clearview这样的系统运行,以通过名称识别每个个人。 据报道,警察部队和其他机构已经在使用AI识别不同的动作(例如闯入车辆或游荡),并根据其身体描述搜索监视图像。 尚不清楚是否正在使用超分辨率,但是无疑会出现。 面部重建技术也可能会得到改善-如今,许多面部在增强后仍然会变形,但是面部重建错误可能会随着时间而减少。

We need to ensure that technologies like zoom and enhance are available to law enforcement when they’re truly needed. But we also need to make sure that they’re not abused.

我们需要确保缩放和增强等技术在真正需要时可供执法人员使用。 但是我们还需要确保它们未被滥用。

As the tech improves, you might not even need a mirrorless camera or other high-quality cameras. Super-resolution may ultimately become good enough to perform zoom-and-enhance functions on the low-resolution output of a traditional surveillance camera, identifying every individual in a crowd using footage from traffic cams, surveillance cameras from a store or nearby home, or even a circling drone. It could also one day be applied to photos taken on a smartphone or even the low-resolution photos displayed on social media platforms like Instagram.

随着技术的进步,您甚至可能不需要无反光镜相机或其他高质量的相机。 超分辨率最终可能会变得足够好,可以在传统监控摄像头的低分辨率输出上执行缩放和增强功能,使用交通摄像头,商店或附近家庭的监控摄像头的镜头识别人群中的每个人,或者甚至是盘旋的无人机。 它也可能有一天应用于在智能手机上拍摄的照片,甚至是在Instagram等社交媒体平台上显示的低分辨率照片。

As with any new surveillance technology, ensuring responsible use of zoom and enhance is a matter of establishing the right laws and policies. The Fourth Amendment of the U.S. Constitution already provides protection against searches without a warrant. Courts have weighed issues of new tech in the past — for example, looking at whether surveillance with telephoto lenses violates the Fourth Amendment. They have generally ruled that widely available tech like zoom lenses can be used in many contexts, but specialized tech like radar that sees through walls cannot.

与任何新的监视技术一样,确保负责任地使用缩放和增强是建立正确的法律和政策的问题。 美国宪法第四修正案已经提供了防止搜查而没有逮捕令的保护。 过去,法院一直在权衡新技术问题,例如,研究用远摄镜头监视是否违反了第四修正案。 他们通常裁定可以在许多情况下使用诸如变焦镜头之类的广泛使用的技术,但是不能穿透墙壁的雷达等专门技术却可以使用。

It’s not yet clear where zoom and enhance would fall on that spectrum. The technology might be viewed as just another version of the zoom lens on a traditional camera. But given its elements of artificial intelligence, courts might find that it’s too specialized of a technology to be mobilized without proper search warrants.

目前尚不清楚缩放和增强将在该频谱上落在何处。 该技术可能只是传统相机上变焦镜头的另一个版本。 但是考虑到人工智能的要素,法院可能会发现,它过于专业化,无法在没有适当搜查令的情况下进行动员。

For now, the tech is too new for these precedents to have been established. As citizens, the best thing we can do is to be aware of its existence. If you’re at a protest or another sensitive event, assume that you’re being surveilled and photographed. Even if you don’t see someone with a professional-looking camera, authorities could still be capturing your image at a high enough quality to look you up using facial recognition and identify you by name.

就目前而言,这项技术对于这些先例尚不成熟。 作为公民,我们能做的最好的事情就是意识到它的存在。 如果您在抗议或其他敏感事件中 ,请假设您正在接受监视和拍照。 即使您看不到带有专业外观的相机的人,当局也可能会以足够高的质量捕获图像,从而可以使用面部识别功能查找您并按名称识别您。

We can also proactively inform lawmakers about which new technologies we’re comfortable with and which ones we’re not. Popular anger over facial recognition technologies led to a proposed bill to ban the use of this tech in policing. We need to ensure that technologies like zoom and enhance are available to law enforcement when they’re truly needed. But we also need to make sure that they’re not abused.

我们还可以主动告知立法者哪些技术适合我们,哪些技术不适合。 对面部识别技术的普遍愤怒导致提议的一项法案禁止在警务中使用该技术。 我们需要确保缩放和增强等技术在真正需要时可供执法人员使用。 但是我们还需要确保它们未被滥用。

Much as science fiction did a good job of preparing us for space travel and computers, shows like CSI have done a good job of introducing us to the concept of zoom and enhance before it existed. But when you move beyond the imagined world of a good-guy cop fighting evil criminals, the real-world ethics of tech like zoom and enhance get blurry fast.

就像科幻小说为我们为太空旅行和计算机做准备做得很好一样,像CSI一样的节目也为我们介绍了缩放和增强的概念做得很好。 但是,当您超越一个好人警察与邪恶的罪犯作战的想象世界时,变焦和增强等现实世界的技术伦理就会Swift变得模糊。

翻译自: https://onezero.medium.com/zoom-and-enhance-is-finally-here-c727b3258a11

让图像随机缩放进行数据增强

### 随机缩放与抠图在图像数据增强中的应用 #### 方法概述 随机缩放是一种常见的图像数据增强技术,通过对原始图像进行不同比例的放大或缩小来模拟不同的拍摄距离视角变化。这种方法有助于提升模型对尺度变化的鲁棒性[^3]。而抠图则是指从复杂背景中提取目标对象的技术,在数据增强过程中可以通过改变背景或将目标放置于新环境中进一步增加数据集的变化。 #### 实现方式 以下是两种常用工具及其具体实现方法: 1. **PyTorch 中的 `torchvision.transforms`** 使用 PyTorch 的 `transforms.RandomResizedCrop` 自定义变换函数可以轻松实现随机缩放抠图操作。 ```python import torchvision.transforms as transforms transform = transforms.Compose([ transforms.RandomResizedCrop(size=(256, 256), scale=(0.8, 1.2)), # 随机缩放 transforms.ToTensor() # 转换为张量 ]) ``` 对于抠图部分,可以结合掩码(mask)信息完成目标区域的选择,并将其叠加到新的背景上。这通常需要额外的标注文件或者借助预训练的分割模型生成掩码[^4]。 2. **Albumentations 库** Albumentations 是一个专注于计算机视觉任务的数据增强库,提供了丰富的功能支持复杂的组合变换。例如,下面展示了如何使用该库执行类似的增强流程: ```python from albumentations import Compose, RandomScale, Cutout augmentation_pipeline = Compose([ RandomScale(scale_limit=0.2, p=0.7), # ±20%范围内的随机缩放 Cutout(num_holes=8, max_h_size=16, max_w_size=16, p=0.5) # 抠图效果模拟 ]) ``` 此外还可以集成其他插件如 Mosaic 或者 Mixup 来创造更多样化的合成样本。 #### 工具对比分析 | 特性 | PyTorch (`torchvision`) | Albumentations | |-----------------|--------------------------------------------|-----------------------------------------| | 易用性灵活性 | 较简单易学但定制能力有限 | 提供高度灵活可配置的各种高级特性 | | 性能表现 | 原生支持 GPU 加速 | 主要依赖 CPU 运算 | 综上所述,无论是采用标准框架还是第三方扩展包都能很好地达成目的;实际项目选型时需综合考虑开发效率以及硬件资源等因素做出权衡决定][^[^23].
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值