Udacity_Differencial Privacy

4.8

4.8-1

In this concept, we’re going to answer the question, how do we actually use epsilon( ϵ \epsilon ϵ) and delta( δ \delta δ)? We’re going to learn how to take a query and at a certain degree of noise to make what we call a randomized mechanism. We want this randomized mechanism to satisfy a certain degree of differential privacy.

We went to augment a query like a sum, threshold, mean or average and add a certain amount of noise to this query so that we get a certain amount of differential privacy. In particular, we’re going to leave behind(抛弃) the local differential privacy previously discussed, and instead opt(选择) for global differential privacy. As I mentioned earlier, the difference between local and global is that global differential privacy adds noise to the output of a query while local differential privacy adds noise to each data input to the query. So given that we are going for global dP, we’re adding noise to the output and how much noise should we add?

We’re going to add the minimum amount required to satisfy a certain level of epsilon and delta, which we will term our privacy budget for a given query. Now, in order to do this, there are two types of noise we could add as I mentioned earlier; Gaussian noise or Laplacian noise. Generally speaking, Laplacian noise works better but technically both are still valid and can give us varying levels of epsilon-delta privacy. In this case, we’re going to exclusively focus on Laplacian. Now, to the hard question.

How much noise should we add?

The amount of noise necessary to add to the output of the query is a function of four things.

  • First, the amount of noise is dependent on the type of noise that we’re adding. We’re just going to focus on Laplacian here, so that one’s easy.
  • Second, we must take into account the sensitivity of the query that we are using to query the database. As mentioned, some queries are way more sensitive to removing a person from the database and other queries. Some sensitivity is very consistent sensitivity, as in every database always has the same level of sensitivity for that query type. Whereas some queries, have varying levels of sensitivity that is dependent on the database.
  • Then of course, the two other things we must take into account is the desired epsilon and delta.

Thus, for each type of noise that we’re adding, we have a different way of calculating how much noise to add as a function of the sensitivity of the query to meet a certain epsilon-delta constraint. So to restate this, each noise type has a specific function, which tells us how much noise to add given a certain sensitivity, epsilon and delta. For Laplacian noise, this function is the following.

4.8-2

Laplacian noise takes an input parameter beta, which determines how significant the noise is. We set the beta by taking the sensitivity of the query and dividing it by the epsilon that we want to achieve. As it happens, delta is always zero for Laplacian noise, so we can just ignore it.

In other words, if we set beta to be this value when creating our Laplacian noise, then we know we will have a privacy leakage which is less than or equal to a certain amount of epsilon. Furthermore, the nice thing about Laplacian noise is that we don’t have to worry about delta because it’s always set to zero.

β = s e n s i t i v i t y ( q u e r y ) ϵ \beta = \frac{sensitivity(query)}{\epsilon} β=ϵsensitivity(query)

Gaussian noise has a non-zero delta, which is why it’s somewhat less desirable. Thus, we’re using Laplacian for this exercise. There’s this really awesome proof for why this is the case but that proof is not necessary to know how to use Laplacian noise. Furthermore, when reading literature about differential privacy, you’ve heard the term Laplacian mechanism, which refers to a function being augmented with Laplacian noise in this way, forming the mechanism " N " in the original and differential privacy function discussed earlier.

The thing we need to know here however is that we can take any query for which we have a measure of sensitivity, choose any arbitrary epsilon budget that we want to preserve and we can add the appropriate amount of Laplacian noise to the alphabet of the query, pretty neat.

In the next project, I want you to do this yourself. First, modify a query for sum with the appropriate amount of Laplacian noise so that you can satisfy a certain epsilon delta constraint. So this new sum query should automatically add the appropriate noise given an arbitrary epsilon level.

For Laplace, you can use the Laplace function np.random.laplace. After you had this mechanism working for the sum function, I then want you to do the same thing for the mean function. Scaling the Laplacian noise correctly given the fact that, mean has a different level of sensitivity than sum.

4.8-3

So Laplacian noise, increase and decrease according to a scale parameter beta(\beta). Before I get to you, there are wide variety of different kinds of randomized mechanisms. In this course, we’re only going to go through a small handful of them and I highly encourage you, when you do finish this course, to Google around and learn some more about the different kinds of differential private randomized mechanisms that can be appropriate for different use cases.

Okay. So back to Laplacian noise. So Laplacian noise is the amount of noise you’re adding for a Laplacian distribution is increased or decreased according to a scale parameter beta. We choose beta based on the following formula. B or beta equals the sensitivity of our query. That’s the query that we are adding this noise to, divided by epsilon, right?

This epsilon again, we’re spending(花费) this epsilon for every query, right? So if we’re querying a database, right? Every time we do it, we’re going to spend this amount of epsilon, right?

So the notion here is that we have a certain epsilon budget that we wanted to stay underneath and that by using this simple formula, we can know how much noise we have to add to the output of these queries in order to make sure that we are preserving privacy.

So in other words, if we set b this value, that we know that we’ll have a privacy leakage of less than or equal to epsilon. The nice thing about Laplacian noise is that it actually guarantees that we do this with a delta that is equal to zero, right? So we have these four things right here.

So type of noise, sensitivity, epsilon-delta. Laplacian noise always has a delta that’s zero. So if you remember, delta was a probability that we would accidentally leak more than this amount of epsilon, right? So Laplacian is guaranteed to not leak more than this amount of epsilon. Now, one other question you might have, what happens if we want to query repeatedly?

Well, as it happens, if we do query repeatedly, then we can simply add the epsilons across the different queries. So if we have epsilon of say five, we could do five queries that leak epsilon of value one for example. This is how the Laplacian mechanism works.

In the next section, what I would like for you to do is actually perform a sum and a mean query. So you can take the sum and the mean query over the database and use the ones we used previously in the course. I want you to add a certain amount of Laplacian noise to the output, so that you’re underneath a certain level epsilon. In the next lesson, I’ll show you how I would do this. See you then.

5 Differential Privacy for Deep Learning

5.1

In the last few lessons, you might have been wondering, what does all this have to do with deep learning? Well, it turns out the same techniques that we were just studying formed the core principles for how differential privacy provides guarantees in the context of deep learning.

Previously, we defined perfect privacy as something like, a query to a database returns the same value even if we remove any person from that database. If we’re able to do that, then no person is really contributing information to the final query and their privacy is protected. We use this intuition in the description of epsilon delta.

In the context of deep learning, we have a similar standard, which is based on these ideas, which instead of querying a database, we’re training a model. Our definition of perfect privacy would then be something like, training a model on a dataset should return the same model even if we remove any person from the training dataset. So we’ve replaced, “querying a database with training a model on a dataset”.

In essence(从本质上讲), the training process is actually a query, but one should notice that this adds two points of complexity, which the databases didn’t have.

  • First, do we always know where people are referenced in a training dataset?

In a database, every row corresponded to a person, so it was very easy to calculate the sensitivity because we can just remove individuals. We knew where all of them were. However, in a training dataset, let’s say I’m training a sentiment classifier on movie reviews, I have no idea where all the people are reference inside of that training dataset because, it’s just a bunch of natural language. So in some cases, this can actually be quite a bit more challenging.

  • Secondly, neural models rarely ever trained to the same state, the same location even when they’re trained on the same dataset twice.

So if I train the same deep neural network twice,even if I train over the exact same data, the model is not going to train to the same state.There’s already an element of randomness in the training process.

So, how do we actually prove or create training setups where differential privacy is present? (那么,我们如何实际证明或创建存在差异隐私的训练设置?)

The answer to the first question by default seems to be, to treat each training example as a single separate person. Strictly speaking, this is often a bit overzealous(过于热情) as many examples have no relevance to people at all, but others may have multiple partial individuals contained within that training example. Consider an image, which has multiple people contained within it, localizing exactly where people are referenced, thus how much the model would change if those people will remove, could be quite challenging. But obviously, there’s a technique we’re about to talk about that tries to overcome this.

The answer to the second question regarding how models rarely ever trained at the same location, how do we know what sensitivity truly is, has several interesting proposed solutions as well which we’ll be discussing shortly.

But first, let’s suppose a new scenario within which we want to train a deep neural network. As mentioned previously, privacy preserving technology is ultimately(最终是要) about protecting data owners from individuals or parties they don’t trust. We only want to add as much noise as is necessary to protect these individuals as adding excess noise(过量的噪声) needlessly hurts(不必要地损害) the model accuracy, or failing to add enough noise might expose someone to privacy risk.

Thus, when discussing tools with differential privacy, it’s very important to discuss it in the context of different parties who either do or do not trust each other, so that we can make sure that we’re using an appropriate technique.

5.2 Demo Intro

To ground our discussion of differentially private deep learning, let’s consider a scenario.

Let’s say you work for a hospital, and you have a large collection of images about your patients. However, you don’t know what’s in them. You would like to use images develop a neural network which can automatically classify them.

However, since your images aren’t labeled, they aren’t sufficient to train a classifier. Whoever, being a cunning strategist(狡猾的策略家), you realize that you can reach out to 10 partner hospitals, which do have annotated data.

It is your hope to train your new classifier on their datasets so you can automatically label your own. While these hospitals are interested in helping, they have privacy concerns regarding information about their own patients. Thus, you will use the following technique to train a classifier which protects the privacy of the patients in the other hospitals.

  1. So first, you’ll ask each of the 10 hospitals to train a model on their own datasets, so generating 10 different models.

  2. Second, you’ll then use each of these 10 partner models to predict on your local dataset generating 10 labels for each of your datapoints for each of your images.

  3. Then, for each local datapoint, now with 10 labels, you will perform a differentially private query to generate a final true label for each example. This query will be a max function, where max is the most frequent label across the 10 labels assigned for each individual image.

  4. We will then need to add Laplacian noise to make this differentially private to a certain epsilon delta constraint.

  5. Finally, we will then retrain a new model on our local dataset, which now has these labels that we have automatically generated.

This will be our final differentially private model. So let’s walk through these steps. I will assume you are already familiar with how to train and predicted deep neural network, so we’ll skip steps one and two and work with that example data.

We’ll focus instead on step three, namely how to perform the differentially private query for each example using toy data.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
4S店客户管理小程序-毕业设计,基于微信小程序+SSM+MySql开发,源码+数据库+论文答辩+毕业论文+视频演示 社会的发展和科学技术的进步,互联网技术越来越受欢迎。手机也逐渐受到广大人民群众的喜爱,也逐渐进入了每个用户的使用。手机具有便利性,速度快,效率高,成本低等优点。 因此,构建符合自己要求的操作系统是非常有意义的。 本文从管理员、用户的功能要求出发,4S店客户管理系统中的功能模块主要是实现管理员服务端;首页、个人中心、用户管理、门店管理、车展管理、汽车品牌管理、新闻头条管理、预约试驾管理、我的收藏管理、系统管理,用户客户端:首页、车展、新闻头条、我的。门店客户端:首页、车展、新闻头条、我的经过认真细致的研究,精心准备和规划,最后测试成功,系统可以正常使用。分析功能调整与4S店客户管理系统实现的实际需求相结合,讨论了微信开发者技术与后台结合java语言和MySQL数据库开发4S店客户管理系统的使用。 关键字:4S店客户管理系统小程序 微信开发者 Java技术 MySQL数据库 软件的功能: 1、开发实现4S店客户管理系统的整个系统程序; 2、管理员服务端;首页、个人中心、用户管理、门店管理、车展管理、汽车品牌管理、新闻头条管理、预约试驾管理、我的收藏管理、系统管理等。 3、用户客户端:首页、车展、新闻头条、我的 4、门店客户端:首页、车展、新闻头条、我的等相应操作; 5、基础数据管理:实现系统基本信息的添加、修改及删除等操作,并且根据需求进行交流信息的查看及回复相应操作。
现代经济快节奏发展以及不断完善升级的信息化技术,让传统数据信息的管理升级为软件存储,归纳,集中处理数据信息的管理方式。本微信小程序医院挂号预约系统就是在这样的大环境下诞生,其可以帮助管理者在短时间内处理完毕庞大的数据信息,使用这种软件工具可以帮助管理人员提高事务处理效率,达到事半功倍的效果。此微信小程序医院挂号预约系统利用当下成熟完善的SSM框架,使用跨平台的可开发大型商业网站的Java语言,以及最受欢迎的RDBMS应用软件之一的MySQL数据库进行程序开发。微信小程序医院挂号预约系统有管理员,用户两个角色。管理员功能有个人中心,用户管理,医生信息管理,医院信息管理,科室信息管理,预约信息管理,预约取消管理,留言板,系统管理。微信小程序用户可以注册登录,查看医院信息,查看医生信息,查看公告资讯,在科室信息里面进行预约,也可以取消预约。微信小程序医院挂号预约系统的开发根据操作人员需要设计的界面简洁美观,在功能模块布局上跟同类型网站保持一致,程序在实现基本要求功能时,也为数据信息面临的安全问题提供了一些实用的解决方案。可以说该程序在帮助管理者高效率地处理工作事务的同时,也实现了数据信息的整体化,规范化与自动化。
Udacity自动驾驶数据集是Udacity为其自动驾驶算法比赛专门准备的数据集。该数据集对连续视频图片进行了仔细的标注,主要包含了汽车、行人、大型车辆等类别。数据集的大小为1.5G,共有9420张图像。标注格式采用了2D坐标,包括了Car、Truck、Pedestrian三类。如果你需要使用该数据集,你可以通过下载dataset1来获取数据。同时,你可以使用数据格式转化工具将数据转化为voc格式,以便更好地进行处理和分析。\[2\]\[3\] #### 引用[.reference_title] - *1* [Udacity CH2 数据集解析小技巧](https://blog.csdn.net/weixin_44337149/article/details/118541085)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [Udacity Self-Driving 目标检测数据集简介与使用](https://blog.csdn.net/Jesse_Mx/article/details/72599220)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [Udacity Self-Driving自动驾驶目标检测数据集使用指南](https://blog.csdn.net/u010801994/article/details/85092375)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值