320SC Programming

最新推荐文章于 2024-07-27 16:11:00 发布

tyytareyou3

最新推荐文章于 2024-07-27 16:11:00 发布

阅读量12

点赞数

文章标签：人工智能

本文链接：https://blog.csdn.net/tyytareyou3/article/details/134071361

版权

Computer Science 320SC – (2023)
Programming Assignment 6
Due: Oct 29 2023 (11:59pm)
Academic Integrity
Before attempting to solve the assignment, please read the message below very carefully.
As described on https://academicintegrity.cs.auckland.ac.nz/, you must NOT
Use all or part of another student’s solution to the assignment. Changing variable names or
substituting words in a sentence does not make it your solution.
Allow someone else to complete all or part of the assignment for you.
Solicit answers for the assignment on contract-cheating websites such as Chegg.com and
Bartleby.com.
Use code from Internet sources such as StackOverflow or generative-AI tools such as ChatGPT. You are encouraged to learn from Internet sources and tools, but you need to come up
with your own implementation of the code to show your learning.
Allow another student to copy all or part of your solution to the assignment.
Do all or part of an assignment for someone else.
Share code that can lead to the solution of an assignment.
Post the assignment anywhere online or share it with anyone else. The assignment material
is copyrighted and sharing or posting them online violates our copyright.
Post your solution online on public websites. Your online solutions will encourage other
students to copy your solution. Private GitHub repositories and other private online storage drives are acceptable, and you can also share your solution privately with prospective
employers.
Reuse your own work unless discussed otherwise with the lecturer.
Leave your computers, devices, and belongings unattended — you must secure these at all
times to prevent anyone having access to your assessments or solutions.
1
Requirements
This 6th assignment lets you get familiar with randomized algorithm design and development. It is
worth 5% of your total course marks. We would like you to implement a Bloom Filter to simulate a
email spam filtering.
An excessive number of submissions (over 10) for a particular problem will accrue a 20% penalty per
that problem if you eventually solve it. Therefore, please write a bruteforce algorithm and test
your randomized algorithm with your own generated inputs at scale before submitting to
the automated marker.
We only accept Python programs that use built-in packages (i.e. packages that do not require pip
install).
1 Email spam filtering
1.1 Problem description
You are implementing a Bloom filter for email spam filtering for the University of Auckland. Given
that you know the set A of n trusted email addresses, e.g. ninh.pham@auckland.ac.nz. If the email
comes from one of these trusted addresses, it is not a spam. Otherwise, it will be a spam. Since the
server memory is very limited, you can not keep the list of trusted addresses on its memory (if an email
address requires 25 bytes on average and you have 1 billion trusted emails, it would take 25 GB to store
A in the memory).
Due to the limited resource of automarker, instead of using a string, we use a random integer in [0 . . . 2
31]
to present an email address. Also, to simpify the task, please use the following code to generate the list
of n integers that represent for n email addresses. Note that you have to use random.seed(320) to
ensure that you generate the same set of integers as automarker.
You would need to write a Python script to implement a Bloom filter given the setting of m = 8n bits and
different numbers of hash functions k = {2, 4, 6, 8}. The used hash function is: f(x) = (a·x+b) mod m
where a and b are generated using the following code:
import random
random.seed(320)
n = 100
a = [1, 2, 3, 4, 5, 6, 7, 8]
b = [8, 7, 6, 5, 4, 3, 2, 1]
for i in range(n):
x = random.randint(0, 2**31)
# Construct several Bloom filters here
2
# Testing here
for line in sys.stdin:
y = int(line)
# Check y is in the set S here
Note that you must not call x = random.randint(0, 2**31) several times since it will generate different sets S.
Given k = i, you will use i hash functions, from f1 to fi(x) = (a[i − 1] · x + b[i − 1]) mod m. For
example, given k = 2, you will use
f1(x) = (a[0] · x + b[0]) mod m = (x + 8) mod m
and
f2(x) = (a[1] · x + b[1]) mod m = (2x + 7) mod m.
1.2 Test case description
Your test will be a sequence of 2000 integers, each value per line corresponding to the email address,
containing non-spam and spam emails. Your output will be 4 integers, each per line, that count the
number of detected spam emails for each value of k = {2, 4, 6, 8}.
WX：codehelp