Programming Perl--Column1

原创 2012年03月23日 18:11:17

problem:

以计算机的角度来分析problem

input:一个至多包含n=10,000,000个正整数的file,所有的integer都必须小于n,integer不允许重复出现

output:升序排序的integer list

约束条件:内存最大为1M,磁盘空间可认为无限大,运行时间不可到分钟级,要在seconds范围


solution:

解决方案很简单,就是使用一个bitmap 或者说是bit vector来表示integer,如果数字i出现在file中,则对bitmap中的第i个bit为1。这样子就标识了所有出现的数字。

这里有一个关键的约束条件:所有数字不会重复出现


伪码:

/* phase 1: initialize set to empty */

for i = [0,n)

bit[i] = 0

/* phase 2: insert present elements into the set */

for each i in the input file

bit[i] = 1

/* phase  3: write sorted output */

for i = [0,n)

if bit[i] == 1

write i on the output file


课后题目

1、如果没有memory限制时,代码如何写

#include <iostream>
#include <set>

int main (int argc, char *argv[]) {
    std::set<int> integerSet;

    int i;

    std::set<int>::iterator iter;
    while (std::cin >> i) {
        integerSet.insert(i);
    }

    for (iter = integerSet.begin(); iter != integerSet.end(); ++iter) {
        std::cout << *iter << " ";
    }
    std::cout << std::endl;

    return 0;
}

为何选择set,而不是list数据结构呢?这还得看一下哪个结构适合本题目,或者说代价更小,这就涉及到了list和set的实现本质问题

Set

Sets are a kind of associative containers that stores unique elements, and in which the elements themselves are thekeys.

Associative containers are containers especially designed to be efficient accessing its elements by their key (unlike sequence containers, which are more efficient accessing elements by their relative or absolute position).

Internally, the elements in a set are always sorted from lower to higher following a specific strict weak ordering criterion set on container construction.

Sets are typically implemented as binary search trees.

Therefore, the main characteristics of set as an associative container are:

  • Unique element values: no two elements in the set can compare equal to each other. For a similar associative container allowing for multiple equivalent elements, seemultiset.
  • The element value is the key itself. For a similar associative container where elements are accessed using a key, but map to a value different than this key, seemap.
  • Elements follow a strict weak ordering at all times. Unordered associative arrays, likeunordered_set, are available in implementations following TR1.

This container class supports bidirectional iterators.

为了节省时间就直接copy C++Library Reference了,可以看到set不允许有重复的key,并且是有序集合,采用二分查找树来search。

再来看一下list:

List

Lists are a kind of sequence containers. As such, their elements are ordered following a linear sequence.

List containers are implemented as doubly-linked lists; Doubly linked lists can store each of the elements they contain in different and unrelated storage locations. The ordering is kept by the association to each element of a link to the element preceding it and a link to the element following it.

This provides the following advantages to list containers:

  • Efficient insertion and removal of elements anywhere in the container (constant time).
  • Efficient moving elements and block of elements within the container or even between different containers (constant time).
  • Iterating over the elements in forward or reverse order (linear time).

Compared to other base standard sequence containers (vectors anddeques), lists perform generally better in inserting, extracting and moving elements in any position within the container, and therefore also in algorithms that make intensive use of these, like sorting algorithms.

The main drawback of lists compared to these other sequence containers is that they lack direct access to the elements by their position; For example, to access the sixth element in alist one has to iterate from a known position (like the beginning or the end) to that position, which takes linear time in the distance between these. They also consume some extra memory to keep the linking information associated to each element (which may be an important factor for large lists of small-sized elements).

Storage is handled automatically by the class, allowing lists to be expanded and contracted as needed.

list采用双向链表的方式实现的,这对于频繁进行插入删除操作比较有利,但对于本问题而言就有些声东击西了,但是同时也可以看到list也是有序的

Vector

Vectors are a kind of sequence containers. As such, their elements are ordered following a strict linear sequence.

Vector containers are implemented as dynamic arrays; Just as regular arrays, vector containers have their elements stored in contiguous storage locations, which means that their elements can be accessed not only using iterators but also using offsets on regular pointers to elements.

But unlike regular arrays, storage in vectors is handled automatically, allowing it to be expanded and contracted as needed.

Vectors are good at:

  • Accessing individual elements by their position index (constant time).
  • Iterating over the elements in any order (linear time).
  • Add and remove elements from its end (constant amortized time).

Compared to arrays, they provide almost the same performance for these tasks, plus they have the ability to be easily resized. Although, they usually consume more memory than arrays when their capacity is handled automatically (this is in order to accomodate for extra storage space for future growth).

vector的优点或者说突出点在于动态内存空间分配。

2、使用位操作符实现bitset操作

首先考虑到要使用int数组来完成上述bitset的构建,另外考虑到不同的计算机可能int的位数不同,考虑到移植性问题决定用int32_t

第二创建一个int32_t的数组,需要多大的数组,应该用n/32, 注意计算机是取上限的,所以需要 +1

第三set操作,首先需要定位到数组index,下标从0开始,则直接i/32即可,然后设置i位则需要原来的数组data取或操作,和谁|呢?需要与i%32 进行按位或操作

然后就构建bitset的代码,并进行测试:

#include <iostream>

#define MAX_LENGTH 10000000
#define INT_LENGTH 32
#define SHIFT 5
#define MASK 0X1F

int32_t integerArray[1 + MAX_LENGTH >> SHIFT];

void set(int32_t i){
    integerArray[i >> SHIFT] |= (1 << (i & MASK));
}

void clear(int32_t i) {
    integerArray[i >> SHIFT] &= ~(1 << (i & MASK));
}

int test(int32_t i) {
    return integerArray[i >> SHIFT] & (1 << (i & MASK));
}

int main (int argc, char *argv[]) {
    for (int32_t i = 0; i < 10000000; i++) {
        clear(i);
    }
    int32_t i;
    while (std::cin >> i) {
        set(i);
        if (test(i)) {
            std::cout << i << " is set" << std::endl;
        }
    }

    return 0;
}

今天就写到这里好了,等有机会再继续。。。


(译文)The Linux Programming Interface:第1章(历史和标准)

1 HISTORY AND STANDARDS (译者:鱼时代  校对:fgn)       Linux 是UNIX操作系统家族中的一员,在计算机出现以来,UNXI已经有很长的历史了。在这一章...
  • a82793510
  • a82793510
  • 2015年06月10日 16:58
  • 462

Expert Python Programming, 2nd Edition(读书笔记,似乎对Python 3.5并未着墨强调,但是代码示例容易看懂一点)

Expert Python Programming, 2nd Edition 目录 1 Python当前状态2 语法最佳实践:class级别以下3 语法最...
  • cteng
  • cteng
  • 2016年08月09日 13:41
  • 830

《Programming Hive》读书笔记(一)Hadoop和hive环境搭建

《Programming Hive》读书笔记(一)Hadoop和Hive环境搭建 先把基本的技术和工具学好,才能更高效地思考和工作。...
  • linger2012liu
  • linger2012liu
  • 2014年11月12日 00:26
  • 1965

《The C Programming Language》读书笔记总结 <一>.基础篇

写了这么多年的C代码,回过头来再看《The C Programming Language》这本书,作者Brian W. Kernighan和C语言之父Dennis M. Ritchie。感觉里面的知识...
  • Eastmount
  • Eastmount
  • 2015年10月21日 16:14
  • 2080

五大常用算法 ----DP 动态规划(Dynamic Programming)

一、基本概念     动态规划过程是:每次决策依赖于当前状态,又随即引起状态的转移。一个决策序列就是在变化的状态中产生出来的,所以,这种多阶段最优化决策解决问题的过程就称为动态规划。 二、基本思想...
  • Caroline424
  • Caroline424
  • 2016年07月24日 22:35
  • 2187

动态规划(dynamic programming)初步入门

通过金矿模型介绍动态规划 点击下载01背包测试数据.rar                 对于动态规划,每个刚接触的人都需要一段时间来理解,特别是第一次接触的时候总是想不通为什么这种方法可行,...
  • bit_zcy
  • bit_zcy
  • 2016年03月22日 10:16
  • 3170

针对Android上的ROP攻击剖析

引言        ROP(Return-oriented programming),即“返回导向编程技术”。其核心思想是在整个进程空间内现存的函数中寻找适合指令片断(gadget),并通过精心设计返...
  • L173864930
  • L173864930
  • 2013年11月01日 14:03
  • 13173

最有价值的50道java面试题

1、面向对象的特征有哪些方面? 答:面向对象的特征主要有以下几个方面: 1)抽象:抽象是将一类对象的共同特征总结出来构造类的过程,包括数据抽象和行为抽象两方面。抽象只关注对象有哪些属性和行为,并不...
  • qq_36721257
  • qq_36721257
  • 2018年01月03日 19:13
  • 369

RxJava学习资源

目前由于产品需求 要用到RxJava 所以开始学习和收集RxJava相关资源了,会一直更新。RxJava: Reactive Extensions for the JVM 它是响应式编程基于JVM的...
  • tiankong1206
  • tiankong1206
  • 2015年11月05日 15:51
  • 3184

读经典《C程序设计语言》(The C Programming Language)

作为软工专业的学生,大三下,我重新开始学习C语言,为什么呢?因为我发现在大学生涯里,我花了很多时间去刷GPA,花了很多时间去做了很多无谓的事情,以至于马上就面临找实习、找工作。自己甚至连一门语言都没有...
  • stc_XC
  • stc_XC
  • 2017年05月30日 22:54
  • 754
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Programming Perl--Column1
举报原因:
原因补充:

(最多只允许输入30个字)