tx经典面试题2

最新推荐文章于 2024-08-23 09:59:49 发布

WINCOL

最新推荐文章于 2024-08-23 09:59:49 发布

阅读量1.5k

点赞数

文章标签：面试 dictionary string sms byte variables

本文链接：https://blog.csdn.net/WINCOL/article/details/4800521

版权

题目：
class AAA
{
    int a;
    char b[5];
    short c;
    int d;
};

AAA* pA = 0x10000000;
问1： pA + 10= ?
问2：(char*) pA + 10 = ?
问3: (int*) pA + 10 = ?
回答：

1: 0x10000000 + 10 * 16 //双字节对齐
2: 0x10000000 + 10
3: 0x10000000 + 10 * 4

题目二：有一个集合，由0-1000的数字组成，要求写下列方法insert, erase, find, size, begin, end.要求性能最好。
memcpy函数的实现。

回答：

申请一个1001字节大小的数组，这就，要存放的数字与数组的下标就对应起来了，即：
1->a[1]
2->a[2]
另外，还要一个1000字节大小的数组b，b[1]里面存放值为a[1]的数据个数，b[2]里面存放值为a[12]的数据个数……
#include <iostream>
using namespace std;

const int ELEM_NUM = 1001;
int elemArr[ELEM_NUM] = {0};

void insert(const int nNum)
{
elemArr[nNum]++;
}

void erase(const int nNum)
{
    if (elemArr[nNum] > 0)
    {
       elemArr[nNum]--;
    }
}

bool find(const int nNum) //const
{
    if (elemArr[nNum] > 0)
       return true;
    else
        return false;
}

改为：

int find (int nNum)
{
return elemArr[nNum]; //即可以知道nNum是否存在，还知道存在多个少！
}

int begin() //const
{
int i = 0;
for (; elemArr[i] == 0 && i < ELEM_NUM; i++);

    if (i == ELEM_NUM )
        return -1;
    else
        return i;

}

int next(const int nNum) //const
{
int i = nNum + 1;
for (; elemArr[i] == 0 && i < ELEM_NUM; i++);

    if (i == ELEM_NUM )
        return -1;
    else
        return i;
}

int main()
{
    int i = 0;
    cout << "initiate the array please: /n";
    while (i != -1)
    {
        cin >> i;
        insert (i);
    }

    for (int i=begin(); i!=-1; i=next(i))
    {
        cout << i << ' ';
    }

    cout << "/ndelete some element: /n";
    i = 0;
    while (i != -1)
    {
        cin >> i;
        erase (i);
    }

    cout << "find some element: /n";
    i = 0;
    while (i != -1)
    {
        cin >> i;
        if (find (i))
            cout << "elem found/n";
        else
            cout << "sorry, nothing found/n";
    }

system ("pause");
}

＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝

题目：
int a[]={10,20,30,40};
short *p,*q;
p=(short*)(a+1);
q=(short*)a;
a[p-q]=？
来源： http://topic.csdn.net/u/20071101/22/deb510b5-e643-4d7c-a8bc-68053c9db9f6.html

解答：
int型，每个元素占4个字节
short型，每个元素占2个字节
a为int型数组，因为(a+1)与a差一个int型步长，即4个字节（设a的地址为x，则a+1=x+4）
p,q为short型,每个步长为2个字节，即走4个字节需要走2步，即p-q=2
所以a[p-q]=a[2]=30

指针相减的值（每步的步长），取决于其类型。
步长=地址的差值/sizeof(类型)=(地址1-地址2)/sizeof(类型)
char *p,p++：p每次走一个字节（在sizeof(char)=1的机器上）
short *p,p++：p每次走两个字节（在sizeof(short)=2的机器上

=========================================================

给40亿个不重复的unsigned int的整数，没排过序的，然后再给几个数，如何快速判断这几个数是否在那40亿个数当中?

答案: 使用位图, 使用位图标识40亿个不重复的数,大概需要不到512M的内存. 因为512M内存共有4.2949673 × 10⁹ bit位,能表示4.3亿个数. 先把这40亿个数加载到内存里,对应数的bit位置1, 然后查询对应的bit位是否为1就可以知道了

==============================================================

#####################################################################

http://topic.csdn.net/u/20081029/22/C8FE34C1-25AB-4B94-986E-4C2FD4CAA664.html

1、设计一个魔方（六面）的程序。
2、有一千万条短信，有重复，以文本文件的形式保存，一行一条，有重复。请用5分钟时间，找出重复出现最多的前10条。
3、收藏了1万条url，现在给你一条url，如何找出相似的url。（面试官不解释何为相似）

1,把魔方展开，得到六个正方形，定义六个结构体，内容为一个9个点和一个编号，每个点包括一个颜色标示；
在魔方展开图中根据正方形的相邻关系编号，每个正方形都有四个函数：左翻、右翻、上翻、下翻。
根据相邻关系，每个操作会引起相邻面的相关操作；比如一个面的左翻会调用右边相邻面的左翻；也就
意味着左相邻面的0、1、2三个元素与当前面互换；递归下去，直到所有面都交换完毕；

2，建立一个红黑树a;遍历短信,对每条短信取MD5值，对每个MD5值在a中做操作：如果有值，这个key对应的值
就+1，否则就=1；遍历完后对红黑树取值最大的10个数，复杂度为10lg n.

找出来的还是一个md5值

=============================

第二题可以读取文件的内容，读取一行，算MD5值，存入到数据库中，同时加一个count字段：
MD5值-短信文本内容-count
读取一条update到数据库一条，最后选取count最多的是个即为结果

================================

1.短信长度是有限的，例如在中国短信长度范围为0-140字节，
2.题目中没有提到内存限制，假设内存是足够的(本题按下面算法最坏情况下需要1个多G)
2.建立140个元素的multimap数组（空短信可另行特殊处理），下标为i的multimap与长度为i的字符串相对应。键为字符串的hash，值为字符串及其出现次数
3.遍历短信，将短信根据长度进行处理，怎么处理我就不细说了
4.对每一个multimap，按字符串的出现次数，找出前10个字符串（也可能不足10个），（可以用堆排序，复杂度为O(n*logn)）
5.在4找出的所有字符串组成的集合中，按字符串的出现次数，找出出现最多的前10个

（此题题目好像有点儿问题，次数最多的有可能不是刚好10个，例如有9个字符串各出现10次，有2个字符串出现9次，其他均小于9次）

[原创] 解答一下吧，不知道看的算不算晚。
第1题：就是个数组和发生上下左右移动时候相关的数据变换问题；
第3题：正则表达式的应用问题，还涉及到一些思路问题；

重点第2题，我的解法如下：
思路：由于大量的数据，同时基于存储条件限制，尽量利用二分法，然后采用hash/binary结构，尽量避免循环。

------------------------------ Logic & Steps ------------------------------
/*
* 1. 对于1000万条存储在[硬盘, 文本文件]的短信记录，则设-> n = 10,000,000 (10 million)
* 2. 已知一条短信的长度是0 - 70 字符之间（最多70个字母，35个汉字），则设-> ArrayList <ArrayList> SMS[70] （1-70个不同长度，空短信另计）
*
* 3. 取 (第一个) 字符的[最后一位(bit)]，为其建立-> ArrayList <ArrayList> bitFirstChar （取值0, 1）
* 4. 取 (最后一个) 字符的[最后一位(bit)]，为其建立-> ArrayList <ArrayList> bitLastChar （取值0, 1）
* 5. 取((长度+1)除2) 字符的[最后一位(bit)]（(i+1)/2），为其建立-> ArrayList <Dictionary> bitMidChar （取值0, 1）
*
* 6. 检查 (第一个) 字符[byte]，为其建立-> Dictionary hashFirstCharacter <char, Dictionary> （首字相同放一起）
* 7. 检查 (最后一个) 字符[byte]，为其建立-> Dictionary hashLastCharacter <char, Dictionary> （末字相同放一起）
* 8. 检查 ((长度+1)除2) 字符[byte]，为其建立-> Dictionary hashMidCharacter <char, Dictionary> （中间字相同放一起）
*
* 9. 全文[String]比较，为其建立-> Dictionary hashSMS <String, long> （最终计数）
*
* 10. 维持一个全局 Dictionary <String, long> topSMS 遍历检验topSMS[i]中计数最小的String和当前更新的hashSMS[String]的大小，小则替换，否则无操作？或者最后再检查取前十？？？（这个我没考虑哪个时间开销更小）
*/
------------------------------ Logic & Steps ------------------------------

------------------------------ Possible Solution Code ------------------------------
/*
* TopMostSMSReader Sudo Code (C# version)
* This is a demo for Reading top most SMS (or alike) Strings from files. partial code are pseudocode
*
* Author: Leemax Li
* Created: 2008.11.03
* MSN: leemax@live.com ; QQ: 735291192
* Email: 1850018@qq.com
* Last Modify: 2008.11.03 by Leemax Li
*/
public class LeemaxTopMostSMSReader implements IDisposable {
// global variables
private int _fetchTotal;
private Dictionary <String, long> _topSMS;
private long _topTenTotal, _emptySMSCount;
private long _totalCount;
private bool _isUnderGo;

// constructor
public LeemaxTopMostSMSReader() {
_fetchTotal = 10; // default number of sms fetched
_topSMS = new Dictionary <String, long>;
_topTenTotal = 0;
_emptySMSCount = 0;
_totalCount = 0;
_isUnderGo = false; // lock flag, preserved for further use?
}

// main entry
public Dictionary <String, long> CheckNow(String fileName, long maxLine) {
// local variables
FileHandle fileSrc = open(fileName);
String nowSMS = "";
long currentCount;
Dictionary hashSMS <String, long>;
Dictionary hashMidCharacter <char, Dictionary>;
// ..... initial all members mentioned above
// for (bit series) ... for(character series) ... for(hashSMS) ...
// ...........................................
// ...........................................

try {
// TryLock(this); // signal sync....
_isUnderGo = true;
while (nowSMS = fileSrc.ReadNextLine()) {
_totalCount++; // total counter increament

// condition checks, return if already no valuable candidates
// if need precise number, just block these checks
if (maxLine > 0 && _topTenTotal * 10 > maxLine) return _topSMS;

// an empty sms string
if (nowSMS = EMPTY) {
_emptySMSCount++;
currentCount = _emptySMSCount;
//CheckTopSMS(___EMPTY_, currentCount); // low eficiency
continue;
}
if (SMS[nowSMS.length].bitFirstChar[0 or 1].bitLastChar[0 or 1].bitMidChar[0 or 1].hashFirstCharacter[nowSMS[0]].hashLastCharacter[nowSMS[nowSMS.length - 1]].hashMidCharacter[nowSMS[(nowSMS.length + 1) / 2]].hashSMS.HasKey(nowSMS))
{
SMS[nowSMS.length].bitFirstChar[0 or 1].bitLastChar[0 or 1].bitMidChar[0 or 1].hashFirstCharacter[nowSMS[0]].hashLastCharacter[nowSMS[nowSMS.length - 1]].hashMidCharacter[nowSMS[(nowSMS.length + 1) / 2]].hashSMS[nowSMS].Value += 1;
}
else
{
SMS[nowSMS.length].bitFirstChar[0 or 1].bitLastChar[0 or 1].bitMidChar[0 or 1].hashFirstCharacter[nowSMS[0]].hashLastCharacter[nowSMS[nowSMS.length - 1]].hashMidCharacter[nowSMS[(nowSMS.length + 1) / 2]].hashSMS.Add(nowSMS, 1);
}
currentCount = SMS[nowSMS.length].bitFirstChar[0 or 1].bitLastChar[0 or 1].bitMidChar[0 or 1].hashFirstCharacter[nowSMS[0]].hashLastCharacter[nowSMS[nowSMS.length - 1]].hashMidCharacter[nowSMS[(nowSMS.length + 1) / 2]].hashSMS[nowSMS].Value;
//CheckTopSMS(nowSMS, currentCount); // low efficiency
}
CheckOutTops();
}
catch (Exception ex) {
// blah...blah...blah...
}
finally {
// UnLock(this); // signal sync...
_isUnderGo = false;
}

return _topSMS;
}

private void CheckOutTops() {
// blah...blah...blah...blah...
}
// this is not good, should check out after all cleared out
/*
private void CheckTopSMS(String newString, long newCount) {
long currentSmallest = _topSMS[0].Value;
long currentString = _topSMS.Keys[0];
foreach (KeyValuePair <String, long> currentPair in _topSMS) {
if (currentPair.Value < currentSmallest) {
currentSmallest = currentPair.Value;
currentString = currentPair.Key;
if (currentString == newString) {
_topSMS[newString].Value = newCount;
return;
}
}
}
// no match found, check the smallest
if (currentSmallest < newCount) {
_topSMS.Remove[currentString];
_topSMS.Add(newString, newCount);
}
}
*/

// implementation of IDisposable
public void Dispose() {
// dispose all members here...
// delete .... delete .... delete ....
}

// attributes and others
public int ReturnLength
{
get { return _fetchTotal; }
set { if(!_isUnderGo) _fetchTotal = value; else throw new Excetion("....."); }
}
public long CurrentLine
{
get { return _totalCount; }
}
public long CurrentEmptySMSCount
{
get { return _emptySMSCount; }
}
}
------------------------------ Possible Solution Code ------------------------------

------------------------------ analysis ------------------------------
/*
* 根据 Logic & Steps 中的设计
* 1. N0 = 10,000,000 (10 million) （硬盘读取时间）
* 2. SMS[70] （不需要遍历, N1 = 10 million / 71 < 140846）
*
* 3. 取 (第一个) 字符的[最后一位(bit)]，bitFirstChar （CPU时间，不需要遍历， N2 = N1/2 < 70423）
* 4. 取 (最后一个) 字符的[最后一位(bit)]，（CPU时间，不需要遍历， N3 = N2/2 < 35212）
* 5. 取((长度+1)除2) 字符的[最后一位(bit)]（(i+1)/2），（CPU时间，不需要遍历， N4 = N3/2 < 17606）
*
* 6. 检查 (第一个) 字符[byte]，hashFirstCharacter（首字相同放一起）（设为 M = N4）
* 7. 检查 (最后一个) 字符[byte]，hashLastCharacter（末字相同放一起）（设为X = N4/ln(N4)）
* 8. 检查 ((长度+1)除2) 字符[byte]， hashMidCharacter（中间字相同放一起）（设为Y = N4/(ln(N4)*ln(X))）
*
* 9. 全文[String]比较，hashSMS（最终计数）（设为Z = N4/(ln(N4)*ln(X)*ln(Y))）
*
* 10. 维持一个全局 topSMS 遍历检验（每次检查为：N0*ln(FetchCount)次，最后遍历 <= N4次）
*
* 时间代价为：O(Program): lnM*lnX*lnY*lnZ

N = 10000000

M = 71428.571428571428571428571428571
lgM = 11.176453228349015489585363863205

X = 6390.9873704292188700245451792576
lgX = 8.7626440534989345490873775956615

Y = 729.34462833478780320337939734221
lgY = 6.5921463615017245913923031187162

Z = 110.63841552338339979745756356137
lgZ = 4.7062673662433216260688480137905

Total Cost = lgM * lgX * lgY * lgZ
= 3038.3836675663354054154917591655
*/
------------------------------ analysis ------------------------------

想了半个小时。。。写了半天~如果只是我自己“自我感觉良好”的话，也请看在我写了一下午6个小时的面子上不要喷我。
欢迎大家探讨~~~
* Author: Leemax Li
* Created: 2008.11.03
* MSN: leemax@live.com ; QQ: 735291192
* Email: 1850018@qq.com
* Last Modify: 2008.11.03 by Leemax Li

======================================

别的感觉没有意思,我只试试第二道试试,也就是"有一千万条短信，有重复，以文本文件的形式保存，一行一条，有重复。请用5分钟时间，找出重复出现最多的前10条。".

首先,一千万条短信按现在的短信长度将不会超过700M(平均情况下应该是350M),使用内存映射文件比较合适.可以一次映射(当然如果更大的数据量的话,可以采用分段映射),由于不需要频繁使用文件I/O和频繁分配小内存,这将大大提高了数据的加载速度.
其次,对每条短信的第i(i从0到70)个字母按ASCII码进行分组,其实也就是创建树.i是树的深度,也是短信第i个字母.
//树结点定义
struct TNode
{
BYTE* pText;//直接指向文件映射的内存地址,使用BYTE而不用char是为符号问题
DWORD dwCount;//计算器,记录此结点的相同短信数
TNode* ChildNodes[256]; //子结点数据,由于一个字母的ASCII值不可能超过256,所以子结点也不可能超过256

TNode()
{
//初始化成员
}
~TNode()
{
//释放资源
}
};

//BYTE* pText直接指向文件映射的内存地址,使用BYTE而不用char是为符号问题
//int nIndex是字母下标
void CreateChildNode(TNode* pNode, const BYTE* pText, int nIndex)
{
if(pNode->ChildNodes[pText[nIndex]] == NULL)
{//如果不存在此子结点,就创建.TNode构造函数应该有初始化代码
//为了处理方便,这里也可以在创建的同时把此结点加到一个数组中.
pNode->ChildNodes[pText[nIndex]] = new TNode;
}

if(pText[nIndex+1] == '/0')
{//此短信已完成,记数器加1,并保存此短信内容
pNode->ChildNodes[pText[nIndex]]->dwCount++;
pNode->ChildNodes[pText[nIndex]]->pText = pText;
}
else //if(pText[nIndex] != '/0')
{//如果还未结束,就创建下一级结点
CreateNode(pNode->ChildNodes[pText[nIndex]], pText, nIndex+1);
}
}

//创建根结点,pTexts是短信数组,dwCount是短信数量(这里是一千万)
void CreateRootNode(const BYTE** pTexts, DWORD dwCount)
{
TNode RootNode;
for(DWORD dwIndex=0;dwIndex <dwCount;dwIndex++)
{
CreateNode(&RootNode, pTexts[dwIndex], 0);
}
//所有结点按dwCount的值进行排序
//代码略...

//取前10个结点,显示结果
//代码略...

}

这样处理看起来很复杂,其实就是为了减少比较次数.我认为大家看了这小段代码应该可以明白我的意思了,其它的不多说了.
最后,就是对这些结点按dwCount的值进行排序,取前面的前10个结点就可以了.

我认为这问题只要是解决两方面的内容,一是内容加载,二是短信内容比较.采用文件内存映射技术可以解决内容加载的性能问题(不仅仅不需要调用文件I/O函数,而且也不需要每读出一条短信都要分配一小块内存),而使用树技术可以有效减少比较的次数.当然基本思路是这样,如果有心情还可以在这基础上做一些优化处理,效果一定不会差的.

WINCOL

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
tx经典面试题2

题目：class AAA{ int a; char b[5]; short c; int d;};AAA* pA = 0x10000000;问1： pA + 10= ?问2：(char*) pA + 10 = ?问3: (int*) pA + 10 = ?回答：1: 0x10000000 + 10 * 16 //双字
复制链接

扫一扫