使用TensorFlow.js进行人脸触摸检测第1部分:将实时网络摄像头数据与深度学习配合使用

目录

起点

将HTML5网络摄像头API与TensorFlow.js结合使用

检测脸部触摸

技术脚注

终点线

下一步是什么?我们是否可以在未经培训的情况下检测到面部触摸?


TensorFlow + JavaScript。现在,最流行,最先进的AI框架支持地球上使用最广泛的编程语言,因此,让我们在我们的web浏览器中通过深度学习实现奇迹,通过TensorFlow.jsWebGL GPU加速!

这是我们六个系列的第四篇文章:

  1. 使用TensorFlow.js在浏览器中进行深度学习入门
  2. 狗和披萨:使用TensorFlow.js在浏览器中实现计算机视觉
  3. 绒毛动物探测器:通过TensorFlow.js中的迁移学习识别浏览器中的自定义对象
  4. 使用TensorFlow.js进行人脸触摸检测第1部分:将实时网络摄像头数据与深度学习配合使用
  5. 使用TensorFlow.js进行人脸触摸检测第2部分:使用BodyPix
  6. 使用TensorFlow.js进行AI在网络摄像头中翻译手势和手语

关于支持HTML5的现代Web浏览器的最佳部分之一就是可以轻松访问各种API,例如网络摄像头和音频。随着最近影响公共卫生的COVID-19问题,一堆非常有创造力的开发人员使用它来构建一个名为donottouchyourface.com的应用程序,该应用程序可以帮助人们通过学会戒除接触面部的方法来降低患病的风险。在本文中,我们将使用到目前为止在TensorFlow.js中通过计算机视觉学到的所有知识来尝试自己构建此应用程序的版本。

起点

我们将在对象识别模型代码中添加网络摄像头功能,然后将实时捕获帧,以训练和预测面部触摸动作。如果您跟随上一篇文章,将对这段代码感到熟悉。这是结果代码将执行的操作:

  • 导入TensorFlow.jsTensorFlowtf-data.js
  • 定义触摸与非触摸类别标签
  • 为网络摄像头添加视频元素
  • 首次训练后,每200毫秒运行一次模型预测
  • 显示预测结果
  • 加载预先训练的MobileNet模型并为迁移学习做准备
  • 训练和分类图像中的自定义对象
  • 在训练过程中跳过图像和目标样本的处理,以保持它们进行多次训练

在添加实时网络摄像头功能之前,这将是该项目的起点:

<html>
    <head>
        <title>Face Touch Detection with TensorFlow.js Part 1: Using Real-Time Webcam Data with Deep Learning</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-data@2.0.0/dist/tf-data.min.js"></script>
        <style>
            img, video {
                object-fit: cover;
            }
        </style>
    </head>
    <body>
        <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
        <h1 id="status">Loading...</h1>
        <script>
        let touch = [];
        let notouch = [];

        const labels = [
            "Touch!",
            "No Touch"
        ];

        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        async function predictImage() {
            if( !hasTrained ) { return; } // Skip prediction until trained
            const img = await getWebcamImage();
            let result = tf.tidy( () => {
                const input = img.reshape( [ 1, 224, 224, 3 ] );
                return model.predict( input );
            });
            img.dispose();
            let prediction = await result.data();
            result.dispose();
            // Get the index of the highest value in the prediction
            let id = prediction.indexOf( Math.max( ...prediction ) );
            setText( labels[ id ] );
        }

        function createTransferModel( model ) {
            // Create the truncated base model (remove the "top" layers, classification + bottleneck layers)
            const bottleneck = model.getLayer( "dropout" ); // This is the final layer before the conv_pred pre-trained classification layer
            const baseModel = tf.model({
                inputs: model.inputs,
                outputs: bottleneck.output
            });
            // Freeze the convolutional base
            for( const layer of baseModel.layers ) {
                layer.trainable = false;
            }
            // Add a classification head
            const newHead = tf.sequential();
            newHead.add( tf.layers.flatten( {
                inputShape: baseModel.outputs[ 0 ].shape.slice( 1 )
            } ) );
            newHead.add( tf.layers.dense( { units: 100, activation: 'relu' } ) );
            newHead.add( tf.layers.dense( { units: 100, activation: 'relu' } ) );
            newHead.add( tf.layers.dense( { units: 10, activation: 'relu' } ) );
            newHead.add( tf.layers.dense( {
                units: 2,
                kernelInitializer: 'varianceScaling',
                useBias: false,
                activation: 'softmax'
            } ) );
            // Build the new model
            const newOutput = newHead.apply( baseModel.outputs[ 0 ] );
            const newModel = tf.model( { inputs: baseModel.inputs, outputs: newOutput } );
            return newModel;
        }

        async function trainModel() {
            hasTrained = false;
            setText( "Training..." );

            // Setup training data
            const imageSamples = [];
            const targetSamples = [];
            for( let i = 0; i < touch.length; i++ ) {
                let result = touch[ i ];
                imageSamples.push( result );
                targetSamples.push( tf.tensor1d( [ 1, 0 ] ) );
            }
            for( let i = 0; i < notouch.length; i++ ) {
                let result = notouch[ i ];
                imageSamples.push( result );
                targetSamples.push( tf.tensor1d( [ 0, 1 ] ) );
            }
            const xs = tf.stack( imageSamples );
            const ys = tf.stack( targetSamples );

            // Train the model on new image samples
            model.compile( { loss: "meanSquaredError", optimizer: "adam", metrics: [ "acc" ] } );

            await model.fit( xs, ys, {
                epochs: 30,
                shuffle: true,
                callbacks: {
                    onEpochEnd: ( epoch, logs ) => {
                        console.log( "Epoch #", epoch, logs );
                    }
                }
            });
            hasTrained = true;
        }

        // Mobilenet v1 0.25 224x224 model
        const mobilenet = "https://storage.googleapis.com/tfjs-models/tfjs/mobilenet_v1_0.25_224/model.json";

        let model = null;
        let hasTrained = false;

        (async () => {
            // Load the model
            model = await tf.loadLayersModel( mobilenet );
            model = createTransferModel( model );
            // Your Code Goes Here
            // Setup prediction every 200 ms
            setInterval( predictImage, 200 );
        })();
        </script>
    </body>
</html>

HTML5网络摄像头APITensorFlow.js结合使用

有了代码段后,使用JavaScript启动网络摄像头非常简单。这是一个实用程序功能,您可以启动它并请求用户访问:

async function setupWebcam() {
    return new Promise( ( resolve, reject ) => {
        const webcamElement = document.getElementById( "webcam" );
        const navigatorAny = navigator;
        navigator.getUserMedia = navigator.getUserMedia ||
        navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
        navigatorAny.msGetUserMedia;
        if( navigator.getUserMedia ) {
            navigator.getUserMedia( { video: true },
                stream => {
                    webcamElement.srcObject = stream;
                    webcamElement.addEventListener( "loadeddata", resolve, false );
                },
            error => reject());
        }
        else {
            reject();
        }
    });
}

现在,在创建模型之后,在代码中调用setupWebcam()函数,它将开始在网页上工作。让我们使用该tf-data库初始化全局摄像头,以便我们可以使用其辅助函数并轻松地从摄像头框架创建张量。

let webcam = null;

(async () => {
    // Load the model
    model = await tf.loadLayersModel( mobilenet );
    model = createTransferModel( model );
    await setupWebcam();
    webcam = await tf.data.webcam( document.getElementById( "webcam" ) );
    // Setup prediction every 200 ms
    setInterval( predictImage, 200 );
})();

使用TensorFlow网络摄像头帮助程序捕获框架并标准化像素可以通过以下功能完成:

async function getWebcamImage() {
    const img = ( await webcam.capture() ).toFloat();
    const normalized = img.div( 127 ).sub( 1 );
    return normalized;
}

然后,让我们使用此功能来捕获图像以在另一个功能中训练数据:

async function getWebcamImage() {
    const img = ( await webcam.capture() ).toFloat();
    const normalized = img.div( 127 ).sub( 1 );
    return normalized;
}

最后,让我们在页面上的网络摄像头视频元素下方添加三个按钮,以激活示例图像捕获和模型训练:

<video autoplay playsinline muted id="webcam" width="224" height="224"></video>
<button onclick="captureSample(0)">Touch</button>
<button onclick="captureSample(1)">No Touch</button>
<button onclick="trainModel()">Train</button>
<h1 id="status">Loading...</h1>

检测脸部触摸

添加了网络摄像头功能后,我们就可以尝试进行面部触摸检测了。

在摄像头视图中打开网页并使用触摸不触摸按钮捕获不同的样本图像。似乎每个触摸非触摸捕获大约10-15个样本就足以开始很好地检测。

技术脚注

  • 因为我们可能只在很小的样本上训练我们的模型,而没有拍摄很多不同的人的照片,所以当其他人尝试您的应用程序时,受过训练的AI的准确性将会很低
  • AI可能无法很好地区分深度,并且其行为可能比人脸触摸检测更像人脸检测
  • 我们可能已经将按钮和相应的类别命名为“Touch vs. No Touch”,但是模型无法识别含义。可以对捕获的照片的任意两种变化进行训练,例如Dog vs CatCircle vs Rectangle

终点线

供您参考,下面是完整的代码:

<html>
    <head>
        <title>Face Touch Detection with TensorFlow.js Part 1: Using Real-Time Webcam Data with Deep Learning</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-data@2.0.0/dist/tf-data.min.js"></script>
        <style>
            img, video {
                object-fit: cover;
            }
        </style>
    </head>
    <body>
        <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
        <button onclick="captureSample(0)">Touch</button>
        <button onclick="captureSample(1)">No Touch</button>
        <button onclick="trainModel()">Train</button>
        <h1 id="status">Loading...</h1>
        <script>
        let touch = [];
        let notouch = [];

        const labels = [
            "Touch!",
            "No Touch"
        ];

        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        async function predictImage() {
            if( !hasTrained ) { return; } // Skip prediction until trained
            const img = await getWebcamImage();
            let result = tf.tidy( () => {
                const input = img.reshape( [ 1, 224, 224, 3 ] );
                return model.predict( input );
            });
            img.dispose();
            let prediction = await result.data();
            result.dispose();
            // Get the index of the highest value in the prediction
            let id = prediction.indexOf( Math.max( ...prediction ) );
            setText( labels[ id ] );
        }

        function createTransferModel( model ) {
            // Create the truncated base model (remove the "top" layers, classification + bottleneck layers)
            // const bottleneck = model.getLayer( "conv_pw_13_relu" ); // Intercepting at the convolution layer might give better results
            const bottleneck = model.getLayer( "dropout" ); // This is the final layer before the conv_pred pre-trained classification layer
            const baseModel = tf.model({
                inputs: model.inputs,
                outputs: bottleneck.output
            });
            // Freeze the convolutional base
            for( const layer of baseModel.layers ) {
                layer.trainable = false;
            }
            // Add a classification head
            const newHead = tf.sequential();
            newHead.add( tf.layers.flatten( {
                inputShape: baseModel.outputs[ 0 ].shape.slice( 1 )
            } ) );
            newHead.add( tf.layers.dense( { units: 100, activation: 'relu' } ) );
            newHead.add( tf.layers.dense( { units: 100, activation: 'relu' } ) );
            newHead.add( tf.layers.dense( { units: 10, activation: 'relu' } ) );
            newHead.add( tf.layers.dense( {
                units: 2,
                kernelInitializer: 'varianceScaling',
                useBias: false,
                activation: 'softmax'
            } ) );
            // Build the new model
            const newOutput = newHead.apply( baseModel.outputs[ 0 ] );
            const newModel = tf.model( { inputs: baseModel.inputs, outputs: newOutput } );
            return newModel;
        }

        async function trainModel() {
            hasTrained = false;
            setText( "Training..." );

            // Setup training data
            const imageSamples = [];
            const targetSamples = [];
            for( let i = 0; i < touch.length; i++ ) {
                let result = touch[ i ];
                imageSamples.push( result );
                targetSamples.push( tf.tensor1d( [ 1, 0 ] ) );
            }
            for( let i = 0; i < notouch.length; i++ ) {
                let result = notouch[ i ];
                imageSamples.push( result );
                targetSamples.push( tf.tensor1d( [ 0, 1 ] ) );
            }
            const xs = tf.stack( imageSamples );
            const ys = tf.stack( targetSamples );

            // Train the model on new image samples
            model.compile( { loss: "meanSquaredError", optimizer: "adam", metrics: [ "acc" ] } );

            await model.fit( xs, ys, {
                epochs: 30,
                shuffle: true,
                callbacks: {
                    onEpochEnd: ( epoch, logs ) => {
                        console.log( "Epoch #", epoch, logs );
                    }
                }
            });
            hasTrained = true;
        }

        // Mobilenet v1 0.25 224x224 model
        const mobilenet = "https://storage.googleapis.com/tfjs-models/tfjs/mobilenet_v1_0.25_224/model.json";

        let model = null;
        let hasTrained = false;

        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( "loadeddata", resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        async function getWebcamImage() {
            const img = ( await webcam.capture() ).toFloat();
            const normalized = img.div( 127 ).sub( 1 );
            return normalized;
        }

        async function captureSample( category ) {
            if( category === 0 ) {
                touch.push( await getWebcamImage() );
                setText( "Captured: " + labels[ category ] + " x" + touch.length );
            }
            else {
                notouch.push( await getWebcamImage() );
                setText( "Captured: " + labels[ category ] + " x" + notouch.length );
            }
        }

        let webcam = null;

        (async () => {
            // Load the model
            model = await tf.loadLayersModel( mobilenet );
            model = createTransferModel( model );
            await setupWebcam();
            webcam = await tf.data.webcam( document.getElementById( "webcam" ) );
            // Setup prediction every 200 ms
            setInterval( predictImage, 200 );
        })();
        </script>
    </body>
</html>

下一步是什么?我们是否可以在未经培训的情况下检测到面部触摸?

这次,我们学习了如何使用浏览器的网络摄像头功能来完全训练和识别实时视频中的帧。如果用户甚至不必真正触摸自己的脸部就可以开始使用该应用程序,这会更好吗?

请继续阅读本系列的下一篇文章,我们将使用经过预先训练的BodyPix模型进行检测。

https://www.codeproject.com/Articles/5272773/Face-Touch-Detection-with-TensorFlow-js-Part-1-Usi

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值