Use External Storage Process Big Data(1)

Problem:

   We discussed big data is that data can not fit in main memory(often called RAM, for Random Access Memory) all at once, how would you handle this situation?

Solution:

  We can use Divide-Conquer algorithm to solve big problem by dividing it into small problems, then solving every small problem with the same method, and finally merge every results. In this case a different kind of storage is necessary. Disk files generally have a much larger capacity than main memory, but we should clearly know that external storage is much slower than main memory. This speed difference means that different techniques must be used to handle it efficiently.

  Here we suppose our big data(suppose holds many records) are in a file. We can divide the file into blocks(data is stored on the disk in chunks called blocks,pages,allocation units; the disk drive always reads or writes a minimum of one block of data at a time; here block can be the biggest size your main memory can afford;Data is read from and written to disk in units known as blocks. The Block Size property specifies the number of bytes per block.) , then we can read the block what we want into main memory. But the problem is how can you find the block quickly.


Problem:How can you find the block quickly?

Solution:

  We must keep in mind a fact that the time to access a block is much larger than any internal processing on data in main memory, so the overriding consideration in devising an external storage strategy is minimizing the number of block accesses.  The usually techniques to handle this problem are hashing, index and B-tree.

1 Hashing and External Storage

  The  central feature in external hashing is a hash table containing block numbers, which refer to block in external storage. The hash table is sometimes called an index (in the sense of a bool's index). It can be stored in main memory or, if it is too large, stored externally on disk, with only part of it being read into main memory at a time.

1)Firstly, all records with keys that hash to the same value are located in the same block.

2)Secondly, to find a record with a particular key, the search algorithm hashes the key, uses the hash value as an index to the hash table, gets the block number at that index, and reads the block.


  To implement this scheme, we must choose the hash function and the size of the hash table with some care so that a limited number of keys hash to the same value.

For example: 

  We can put all the blocks  in a catalog, and the hash values are the bock files names. So you can find the block file according the block file name. For instance, if your search key's hash value is 2, then you can find the 2.txt file and read it into main memory because all the keys with the same hash value are in the same block.


  You may confused the 11.txt in the above figure. 11.txt is the overflow bock file of 1.txt if the 1.txt is full. This is the separate chaining method to handle the full blocks, of course, you can use other methods to find the overflow blocks. In seperate chaining, special overflow blocks are made available; when a primary block is found to be full,the new record in the overflow block.

内容概要:该题库专为研究生入学考试计算机组成原理科目设计,涵盖名校考研真题、经典教材课后习题、章节题库和模拟试题四大核心模块。名校考研真题精选多所知名高校的计算机组成原理科目及计算机联考真题,并提供详尽解析,帮助考生把握考研命题趋势与难度。经典教材课后习题包括白中英《计算机组成原理》(第5版)和唐朔飞《计算机组成原理》(第2版)的全部课后习题解答,这两部教材被众多名校列为考研指定参考书目。章节题库精选代表性考题,注重基础知识与重难点内容,帮助考生全面掌握考试大纲要求的知识点。模拟试题依据历年考研真题命题规律和热门考点,精心编制两套全真模拟试题,并附标准答案,帮助考生检验学习成果,评估应试能力。 适用人群:计划参加研究生入学考试并报考计算机组成原理科目的考生,尤其是需要系统复习和强化训练的学生。 使用场景及目标:①通过研读名校考研真题,考生可以准确把握考研命题趋势与难度,有效评估复习成效;②通过经典教材课后习题的练习,考生可以巩固基础知识,掌握解题技巧;③通过章节题库的系统练习,考生可以全面掌握考试大纲要求的各个知识点,为备考打下坚实基础;④通过模拟试题的测试,考生可以检验学习成果,评估应试能力,为正式考试做好充分准备。 其他说明:该题库不仅提供详细的题目解析,还涵盖了计算机组成原理的各个方面,包括计算机系统概述、数据表示与运算、存储器分层、指令系统、中央处理器、总线系统和输入输出系统等。考生在使用过程中应结合理论学习与实践操作,注重理解与应用,以提高应试能力和专业知识水平。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值