背景:有一份内容为图片url的csv文件,需要将该csv中的图片url下载为图片文件,再上传到另一区域的cdn。
读取csv文件
const handleFile = async (file: File) => {
const reader = new FileReader()
reader.readAsText(file)
reader.onload = async () => {
const allUrls = ((reader.result as string)?.split('\r\n') || []).filter((url) => !!url?.trim()).map((str) => str.trim())
// TODO: do sth
}
}
通过url获取图片文件
请求库用的是axios,通过http请求获取到Blob对象,再转为File对象。
参考:How to convert image source into a JavaScript File object
import axios from 'axios'
const urlToFile = async (url: string) => {
return axios
.get(url, { responseType: 'blob' })
.then(async (res) => {
const contentType = res.headers['content-type'] || ''
const blob = await new Blob([res.data])
const suffix = contentType.replace('image/', '')
const file = new File([blob], `${fileName}_${new Date().valueOf()}.${suffix}`, { type: contentType })
return file
})
.then((data) => {
console.log('upload to cloud success')
return Promise.resolve(data)
})
.catch((err) => {
console.error('upload to cloud error, ', err)
return Promise.reject(url)
})
}
并发请求
数据量比较大,因此用了Promise.allsettled实现并发请求,调用了p-limit库来控制并发请求数。
import pLimit from 'p-limit'、
const fileName = 'mytest'
/** 并发数 */
const PROMISE_LIMIT = 20
const plimit = pLimit(PROMISE_LIMIT)
const deal = async (data: string[], offset: number = 0, limit: number = 100) => {
const rows = getUrls(data, offset, limit)
const promises = []
for (let i = 0; i < rows.length; i++) {
promises.push(plimit(() => urlToFile(rows[i])))
}
return Promise.allSettled(promises)
}
下载结果文件
将转存的cdn地址下载到csv
const download = (content: string, fileName = 'list.csv') => {
const blob = new Blob([ `\ufeff${content}` ], { type: 'text/plain,charset=utf-8' })
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
a.download = fileName
a.click()
URL.revokeObjectURL(url)
}
完整流程
const handleFile = async (file: File) => {
const reader = new FileReader()
reader.readAsText(file)
reader.onload = async () => {
const allUrls = ((reader.result as string)?.split('\r\n') || []).filter((url) => !!url?.trim()).map((str) => str.trim())
deal(allUrls, OFFSET, LIMIT).then((results) => {
let success = [] as string[]
let errors = [] as string[]
results.forEach((result) => {
if (result.status === 'fulfilled') success.push(result.value)
if (result.status === 'rejected') errors.push(result.reason)
})
console.log('done >>>>>>>>>', 'sucess: ', success.length, 'error: ', errors.length)
download(success.join('\n'), `success_${OFFSET}_${OFFSET + LIMIT}.csv`)
if (errors.length) download(errors.join('\n'), `errors_${OFFSET}_${OFFSET + LIMIT}.csv`)
})
}
}
p-limit源码解析
安装
npm install p-limit
使用
const pLimit = require('p-limit');
const limit = pLimit(1);
const input = [
limit(() => fetchSomething('foo')),
limit(() => fetchSomething('bar')),
limit(() => doSomething())
];
(async () => {
// Only one promise is run at once
const result = await Promise.all(input);
console.log(result);
})();
完整源码如下:
'use strict';
const Queue = require('yocto-queue');
const pLimit = concurrency => {
if (!((Number.isInteger(concurrency) || concurrency === Infinity) && concurrency > 0)) {
throw new TypeError('Expected `concurrency` to be a number from 1 and up');
}
const queue = new Queue();
let activeCount = 0;
const next = () => {
activeCount--;
if (queue.size > 0) {
queue.dequeue()();
}
};
const run = async (fn, resolve, args) => {
activeCount++;
const result = (async () => fn(...args))();
resolve(result);
try {
await result;
} catch {}
next();
};
const enqueue = (fn, resolve, args) => {
queue.enqueue(run.bind(null, fn, resolve, args));
(async () => {
// This function needs to wait until the next microtask before comparing
// `activeCount` to `concurrency`, because `activeCount` is updated asynchronously
// when the run function is dequeued and called. The comparison in the if-statement
// needs to happen asynchronously as well to get an up-to-date value for `activeCount`.
await Promise.resolve();
if (activeCount < concurrency && queue.size > 0) {
queue.dequeue()();
}
})();
};
const generator = (fn, ...args) => new Promise(resolve => {
enqueue(fn, resolve, args);
});
Object.defineProperties(generator, {
activeCount: {
get: () => activeCount
},
pendingCount: {
get: () => queue.size
},
clearQueue: {
value: () => {
queue.clear();
}
}
});
return generator;
};
module.exports = pLimit;
- pLimit 函数的入参 concurrency 是最大并发数,变量 activeCount 表示当前在执行的异步函数的数量,变量 pendingCount 表示正在等待执行的异步函数的数量。
- 调用一次 pLimit 会生成一个限制并发的函数 generator,多个 generator 函数会共用一个队列
- pLimit 的实现依据队列(yocto-queue)
- 队列有两个方法:equeue 和 dequeue,equeue 负责进入队列
- 每个 generator 函数执行会将一个函数压如队列
- 当发现 activeCount 小于最大并发数时,则调用 dequeue 弹出一个函数,并执行它。
- 每次被压入队列的不是原始函数,而是经过 run 函数处理的函数
- run函数开始执行时,activeCount+1
- 执行实际的异步函数
- 执行完毕后调用next,activeCount-1,如果还有异步函数在等待,取出下一个继续执行