耗子叔ARTS：第十三周

最新推荐文章于 2023-05-15 15:59:21 发布

筱筱世家

最新推荐文章于 2023-05-15 15:59:21 发布

阅读量404

点赞数

本文链接：https://blog.csdn.net/csdn_subaodong/article/details/95938044

版权

耗子叔ARTS：第十三周

Algorithm:

/**

 * 1089. Duplicate Zeros

 Easy81 78 Favorite

 Share
 Given a fixed length array arr of integers, duplicate each occurrence of zero, shifting the remaining       elements to the right.
 Note that elements beyond the length of the original array are not written.
 Do the above modifications to the input array in place, do not return anything from your function.
 Example 1:

 Input: [1,0,2,3,0,4,5,0]

 Output: null

 Explanation: After calling your function, the input array is modified to: [1,0,0,2,3,0,0,4]

 Example 2:

 Input: [1,2,3]

 Output: null

 Explanation: After calling your function, the input array is modified to: [1,2,3]

 * @param arr

 */

JAVA：

public static void duplicateZeros(int[] arr) {

    int index = 0;

    int[] copy = arr.clone();

    for(int i : copy){

        if(index>=arr.length) break;

        if(i==0&&index+1<arr.length) { arr[index++]=0; arr[index++]=0; }

        else arr[index++] = i;

    }

}

Review：

https://onezero.medium.com/google-promises-recaptcha-isn-t-exploiting-users-should-you-trust-it-ed99f1543f28

Google Promises ‘reCAPTCHA’ Isn’t Exploiting Users. Should You Trust It?

An innovative security feature to separate humans from bots online comes with some major concerns

Asurprising amount of work online goes into proving you’re not a robot. It’s the basis of those “CAPTCHA” questions often seen after logging into websites: blurry photos of crosswalks, traffic lights, and store fronts that users are tasked with identifying through a series of clicks.

They come in many forms, from blurry letters that must be identified and typed into a box to branded slogans like “Comfort Plus” on the Delta website— as if the sorry state of modern air travel wasn’t already dystopian enough. The most common, however, is Google’s reCAPTCHA, which launched its third version at the end of 2018. It’s designed to drastically reduce the number of challenges you’ll have to complete to log into a website, assigning an invisible score to users depending on how “human” their behavior is. CAPTCHA, after all, is designed to weed out “bot” accounts that flood systems for nefarious ends.

But Google’s innovation has a downside: The new version monitors your every move across a website to determine whether you are, in fact, a person.

A necessary advancement?

Before we get into the “how” of this new technology, it’s useful to understand where it’s coming from. The new reCAPTCHA disrupts a relatively ancient web technology which has been harnessed for plenty of things beyond security.

CAPTCHA — which stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart” — first appeared in the late ’90s, and it was designed by a team at the early search engine AltaVista. Before CAPTCHA, it was easy for people to program bots that would automatically sign up for services and post spam comments by the thousands. AltaVista’s technology was based on a printer manual’s advice for avoiding bad optical character recognition (OCR); the iconic blurry text in a CAPTCHA was specifically designed to be difficult for a computer to read but legible for humans, thereby foiling bots.

By the early 2000s, these tests were everywhere. Then came reCAPTCHA, developed by researchers at Carnegie Mellon and purchased by Google in 2009. It used the same idea but in an innovative way: the text typed by human users would identify specific words that programs were having trouble recognizing. Essentially, programs would scan text and flag words they couldn’t recognize. Those words would then be placed next to known examples in reCAPTCHA tests — humans would verify the known words and identify the new ones.

By 2011, Google had digitized the entire archive of the New York Times through reCAPTCHA alone. People would type in text from newspaper scans one blurry CAPTCHA at a time, ultimately allowing Google to make the Times’ back catalog searchable, forever. While creating a velvet rope to keep bots off sites, Google had managed to conscript human users into doing the company’s grunt work.

There’s no way to opt out of reCAPTCHA on a site you need to use, forcing you to either accept being tracked or stop using a given service altogether.

With that achievement under its belt, reCAPTCHA switched to showing pictures from Google’s Street View software in 2014, as it does today. After pressing the “I’m not a robot” box, you might be prompted to recognize which of nine images contain “bicycles” or “streetlights.” Behind the scenes, Google reduced the frequency at which people were asked to complete these tests by performing behavioral analysis — reCAPTCHA can now run in the background and track how people use websites.

If a Google cookie is present on your machine, or if the way you use your mouse and keyboard on the page doesn’t seem suspiciously bot-like, visitors will skip the Street View test entirely. But some privacy-conscious users have complained that clearing their cookies or browsing in “Incognito Mode” drastically increases the number of reCAPTCHA tests they’re asked to complete.

Users have also pointed out that browsers competing with Google Chrome, like Firefox, require users to complete more challenges, which naturally raises a question: Is Google using reCAPTCHA to cement its own dominance?

This raises serious privacy concerns, given that Google’s revenue is primarily from its ad business, which relies on tracking data. You might worry that reCAPTCHA is essentially a secret ad tracker, hiding in plain site just like the Facebook “Like” button embedded on web pages.

Google’s perspective

To use its latest version of reCAPTCHA, Google asks that developers include its tracking tags on as many pages of their websites as possible, in order to paint a better picture of the user. This doesn’t exist in a vacuum: Google also offers Google Analytics, for example, which helps developers and marketers understand how visitors use their website. It’s a fantastic tool, included on more than 100,000 of the top 1 million visited websites according to Built With, but it’s also part of a strategy to monitor users’ habits across the internet.

The new version of reCAPTCHA fills in the missing pieces of that picture, allowing Google to further reach into those sites that might not use its Analytics tool. When pressed on this, Google told Fast Company that it won’t capture user data from reCAPTCHA for advertising, and that the data it does collect is used for improving the service.

But that data remains sealed within a black box, even to the developers who implement the technology. The documentation for reCAPTCHA doesn’t mention user data, how users might be tracked, or where the information ends up — it simply discusses the practical parts of the implementation.

I asked Google for more information, and what its commitment is to the long-term independence of reCAPTCHA relative to its advertising business — just because the two aren’t bound together now doesn’t mean they couldn’t be in the future, after all.

“It will not be used for personalized advertising by Google.”

A Google representative says “reCAPTCHA may only be used to fight spam and abuse” and that “the reCAPTCHA API works by collecting hardware and software information, such as device and application data, and sending these data to Google for analysis. The information collected in connection with your use of the service will be used for improving reCAPTCHA and for general security purposes. It will not be used for personalized advertising by Google.”

That’s great, and hopefully Google maintains this commitment. The problem is that there’s no reason to believe it will. The introduction of a powerful tracking technology like this is a move that should come with public scrutiny, because we’ve seen in the past how easily things can go sour. Facebook, for example, promised in 2014 that WhatsApp would remain independent, separate from its backend infrastructure — but went back on that decision after just two years. When Google acquired Nest, it promised to keep it independent, but recanted five years later, requiring owners to migrate to a Google account or lose functionality.

For the same reason Google is able to build reCAPTCHA in the first place — its vast resources and reach — we should be suspicious of where all this might lead us.

Unfortunately, as users, there’s little we can do. There’s no way to opt out of reCAPTCHA on a site you need to use, forcing you to either accept being tracked or stop using a given service altogether. If you don’t like those full-body scanners at airports, you can at least still opt-out and get a manual pat-down. But if a site has reCAPTCHA, there’s no opting out at all.

If Google intends to build tools like this with the public good in mind, rather than its bottom line, then the company must find better ways to reassure the world that they won’t change the rules when it’s convenient. If it were willing to open-source the project (as it has with many, many others), move it outside the company, or, at the very least establish third-party oversight, perhaps we could start building that trust.

Tip：

数据库表：

数据库表在建造同时要考虑索引、字段、表明注释。按照自己的规范进行梳理、

28个Java开发常用规范技巧总结

idea 后端技术精选 1周前

点击上方“后端技术精选”，选择“置顶公众号”

技术文章第一时间送达！

推荐阅读(点击即可跳转阅读)

常用的规范技巧总结（参考自：华山版《Java开发手册》）

1、类的命名使用驼峰式命名的规范。

例如：UserService，但是以下情景例外：DO / BO / PO / DTO / VO。

例如说：UserPO，StudentPO（PO,VO,DTO,等这类名词需要全大写）

@Data@Builderpublic class CustomBodyDTO {    private String name;private String idCode;   private String status;}

2、如果在模块或者接口，类，方法中使用了设计模式，那么请在命名的时候体现出来。

例如说：TokenFactory，LoginProxy等。

public classTokenFactory {public TokenDTO buildToken(LoginInfo loginInfo) {        String token = UUID.randomUUID().toString();        TokenDTO tokenDTO = TokenDTO.builder() .token(token) .createTime(LocalDateTime.now())  .build();        String redisKey = RedisKeyBuilder.buildTokenKey(token);        redisService.setObject(redisKey, loginInfo, Timeout.ONE_DAY * 30 * 2);        log.info("创建token成功|loginInfo={}", loginInfo.toString());        return tokenDTO;    }}

3、Object 的 equals 方法容易抛空指针异常。

从源码来进行分析equals方法是属于Object类的，如果调用方为null，那么自然在运行的时候会抛出空指针异常的情况。

object类中的源码：

    public boolean equals(Object obj) {        return (this == obj);    }

为了避免这种现况出现，在比对的时候尽量将常量或者有确定值的对象置前。

例如说：

正确：“test”.equals(object);错误：object.equals(“test”);

4、对于所有相同类型的包装类进行比较的时候，都是用equal来进行操作。

对于Integer类来说，当相应的变量数值范围在-128到127之间的时候，该对象会被存储在IntegerCache.cache里面，因此会有对象复用的情况发生。

所以对于包装类进行比较的时候，最好统一使用equal方法。

private static class IntegerCache { static final int low = -128; static final int high; static final Integer cache[]; static { // high value may be configured by property int h = 127; String integerCacheHighPropValue = sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high"); if (integerCacheHighPropValue != null) { try { int i = parseInt(integerCacheHighPropValue); i = Math.max(i, 127); // Maximum array size is Integer.MAX_VALUE h = Math.min(i, Integer.MAX_VALUE - (-low) -1); } catch( NumberFormatException nfe) { // If the property cannot be parsed into an int, ignore it. } } high = h; cache = new Integer[(high - low) + 1]; int j = low; for(int k = 0; k < cache.length; k++) cache[k] = new Integer(j++); // range [-128, 127] must be interned (JLS7 5.1.7) assert IntegerCache.high >= 127; } private IntegerCache() {} } public static Integer valueOf(int i) { if (i >= IntegerCache.low && i <= IntegerCache.high) return IntegerCache.cache[i + (-IntegerCache.low)]; return new Integer(i); }

5、所有的pojo类中的属性最好统一使用包装类属性类型数据。RPC方法的返回值和参数都统一使用包装类数据。局部变量中使用基本的数据类型。

对于实际的应用场景来说，例如说一个学生类，当我们设置里面的成绩字段为int类型的时候，如果学生没有考试，那么这个成绩字段应该为空，但是int默认会赋值为0，那么这个时候使用基本数据类型就容易产生误区，到底是考了0分，还是说没有参加考试。

如果换成使用包装类Integer类型的话，就可以通过null值来进行区分了。

6、当pojo类在进行编写的时候要重写相应的toString方法，如果该pojo中继承了另外的一个pojo类，那么请在相应的tostring函数中加入super.toString()方法。

通过重写toString方法有利于在日志输出的时候查看相应对象的属性内容进行逐一分析，对于一些有继承关系的对象而言，加入了super.toString方法更加有助于对该对象的理解和分析。

7、在pojo的getter和setter方法里面，不要增加业务逻辑的代码编写，这样会增加问题排查的难度。

正确做法：

public class User { private Integer id; private String username; public Integer getId() { return id; } public User setId(Integer id) { this.id = id; return this; } public String getUsername() { return username; } public User setUsername(String username) { this.username = username; return this; } }

错误做法：

public class User { private Integer id; private String username; public Integer getId() { return id; } public User setId(Integer id) { this.id = id; return this; } public String getUsername() { return "key-prefix-"+username; } public User setUsername(String username) { this.username = "key-prefix-"+username; return this; } }

8、final 可以声明类、成员变量、方法、以及本地变量。

下列情况使用 final 关键字：

不允许被继承的类，如：String 类。
不允许修改引用的域对象，如：POJO 类的域变量。
不允许被重写的方法，如：POJO 类的 setter 方法。
不允许运行过程中重新赋值的局部变量。
避免上下文重复使用一个变量，使用 final 描述可以强制重新定义一个变量，方便更好地进行重构。

9、对于任何类而言，只要重写了equals就必须重写hashcode。

举例说明：

1）HashSet在存储数据的时候是存储不重复对象的，这些对象在进行判断的时候需要依赖hashcode和equals方法，因此需要重写。

2）在自定义对象作为key键时，需要重写hashcode和equals方法，例如说String类就比较适合用于做key来使用。

10、不要在 foreach 循环里进行元素的 remove/add 操作。

remove 元素请使用 Iterator方式，如果并发操作，需要对 Iterator 对象加锁。Iterator<String> iterator = list.iterator(); while (iterator.hasNext()) { String item = iterator.next(); if (删除元素的条件) { iterator.remove(); } }

11、使用HashMap的时候，可以指定集合的初始化大小。

例如说，HashMap里面需要存放10000个元素，但是由于没有进行初始化大小操作，所以在添加元素的时候，hashmap的内部会一直在进行扩容操作，影响性能。

那么为了减少扩容操作，可以在初始化的时候将hashmap的大小设置为：已知需要存储的大小/负载因子（0.75）+1HashMap hashMap=new HashMap<>(13334);

12、Map类集合中，K/V对于null类型存储的情况：

集合名称	key	value	说明
HashMap	允许为null	允许为null	线程不安全
TreeMap	不允许为null	允许为null	线程不安全
HashTable	不允许为null	不允许为null	线程安全
ConcurrentHashMap	不允许为null	不允许为null	线程安全

13、可以利用 Set 元素唯一的特性，可以快速对一个集合进行去重操作，避免使用 List 的contains 方法进行遍历、对比、去重操作。

通关观察可以发现，HashSet底层通过将传入的值再传入到一个HashMap里面去进行操作，进入到HashMap里面之后，会先通过调用该对象的hashcode来判断是否有重复的值，如果有再进行equals判断，如果没有相同元素则插入处理。public boolean add(E e) { return map.put(e, PRESENT)==null; }

14、线程池不允许使用 Executors 去创建，而是通过 ThreadPoolExecutor 的方式，这样的处理方式让写的同学更加明确线程池的运行规则，规避资源耗尽的风险。

错误做法：ExecutorService executors = Executors.newSingleThreadExecutor(); ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(5);

对于线程池的参数需要有深入的理解后，结合实际的机器参数来进行参数设置，从而防止在使用中出现异常。 ExecutorService fixedExecutorService = new ThreadPoolExecutor( 1, 2, 60, TimeUnit.SECONDS, linkedBlockingQueue, new MyThreadFactory(), new ThreadPoolExecutor.AbortPolicy() );

ps：使用Executors.new方式创建线程池的缺点：

对于FixedThreadPool 和 SingleThreadPool而言

允许的请求队列长度为 Integer.MAX_VALUE，可能会堆积大量的请求，从而导致 OOM。

对于CachedThreadPool 和 ScheduledThreadPool而言

允许的创建线程数量为 Integer.MAX_VALUE，可能会创建大量的线程，从而导致 OOM。

15、使用一些日期类的时候，推荐使用LocalDateTime来替代Calendar类，或者说使用Instant来替代掉Date类。

16、尽量避免在for循环里面执行try-catch操作，可以选择将try-catch操作放在循环体外部使用。

正确做法：try { for (int i = 0; i < 100; i++) { doSomeThing(); } }catch (Exception e){ e.printStackTrace(); }

不推荐做法：

for (int i = 0; i < 100; i++) {t

ry { doSomeThing(); } catch (Exception e) { e.printStackTrace(); } }

17、对于大段的代码进行try-catch操作，这是一种不负责任的行为，将稳定的代码也都包围在了try-catch语句块里面没能很好的分清代码的稳定性范围。

通常我们称在运行中不会出错的代码块为稳定性代码，可能会有异常出错的部分为非稳定性代码块，后者才是try-catch重点需要关注的对象。

18、在jdk7之后，对于流这类需要关闭连接释放资源的对象，可以使用try-with-resource处理机制来应对。

例如下方代码： File file = new File("*****"); try (FileInputStream fin = new FileInputStream(file)) { //执行相关操作 } catch (Exception e) { //异常捕获操作 }

19、使用ArrayList的时候，如果清楚它的指定大小的话，可以尽量在初始化的时候进行大小指定，因为随着arraylist不断添加新的元素之后，链表的体积会不断增大扩容。

private void grow (int minCapacity) { // overflow-conscious code int oldCapacity = elementData.length;int newCapacity = oldCapacity + (oldCapacity >> 1); if (newCapacity - minCapacity < 0) newCapacity = minCapacity; if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity); // minCapacity is usually close to size, so this is a win: elementData = Arrays.copyOf(elementData, newCapacity); }

20、对于一些短信，邮件，电话，下单，支付等应用场景而言，开发的时候需要设置相关的防重复功能限制，防止出现某些恶意刷单，滥刷这类型情况。

21、对于敏感词汇发表的时候，需要考虑一些文本过滤的策略。

这一块的功能可以考虑直接接入市面上已有的成熟的UGC监控服务，或者使用公司内部自研的ugc过滤工具，防止用户发表恶意评论等情况出现。

22、在建立索引的时候，对于索引的命名需要遵循一定的规范:

索引类型	命名规则	案例
主键索引	pk_字段名，pk是指primary key	pk_order_id
唯一索引	uk_字段名，uk是指 unique key	uk_order_id
普通索引	idx_字段名，idx是指 index	idx_order_id

23、当我们需要存储一段文本信息的时候，需要先考虑存储文本的长度。

如果文本的长度超过了5000，则不建议再选择使用varchar类型来进行存储，可以考虑使用text类型进行数据存储，这个时候可以考虑单独用一张表来进行存储数据，并且通过一个额外的主键id来对应，从而避免影响其他字段的查询。

24、在进行数据库命名的时候尽量保证数据库的名称和项目工程的名称一致。

25、在进行表结构设计的时候，只要具有唯一性质的字段都需要建立唯一索引。

这样有助于后期进行查询的时候提高查询的效率，没有唯一索引这一层的保障，即使在业务层加入了拦截，但是依然容易造成线上脏数据的产生。

26、在进行order by这类型sql查询的时候，需要注意查询索引的有序性。

关于索引的建立，可以去了解一下索引的星级评定，例如三星索引。但是个人认为索引没有所谓的最优性，需要结合实际的业务场景来设计。

27、在MySQL中，使用count(*)会统计值为 NULL 的行，而 count(列名)不会统计此列为 NULL 值的行。

28、在进行数据库存储引擎选择的时候，需要结合相关的应用场景来选择，如果是需要应用在select操作较多的情况下，可以选择使用MyIsAM存储引擎，如果是对于数据的insert，update，这类修改操作较多的业务场景，则优先推荐使用innodb存储引擎。目前普遍互联网公司都推荐使用innodb较多。