INFS7410 Python

最新推荐文章于 2024-07-12 16:16:27 发布

blst7675

最新推荐文章于 2024-07-12 16:16:27 发布

阅读量608

点赞数 10

文章标签： python 深度学习开发语言

本文链接：https://blog.csdn.net/blst7675/article/details/140223244

版权

Java Python INFS7410 Project - Part 2
version 1.1
Preamble
The due date for this assignment is 27 October 2023 1600 Eastern Australia Standard Time.
This part of the project is worth 20% of the overall mark for INFS7410 (part 1 + part 2 = 40%). A detailed
marking sheet for this assignment is provided alongside this notebook. The project is to be completed
individually.
We recommend that you make an early start on this assignment and proceed in steps. There are several
activities you may have already tackled, including setting up the pipeline, manipulating the queries,
implementing some retrieval functions, and performing evaluation and analysis. Most of the assignment relies
on knowledge and code you should have already experienced in the computer practicals; however, there are
some hidden challenges here and there that you may require some time to solve.
Aim
Project aim: The aim of this project is for you to implement several neural information retrieval methods,
evaluate them and compare them in the context of a multi-stage ranking pipeline.
The specific objectives of Part 2 are to:
Set up your infrastructure to index the collection and evaluate queries.
Implement neural information retrieval models (only inference).
Examine your ability to perform evaluation and analysis when different neural models are used.
The Information Retrieval Task: Web Passage Ranking
As in part 1 of the project, in part 2 we will consider the problem of open-domain passage ranking in answer to
web queries. In this context, users pose queries to the search engine and expect answers in the form of a
ranked list of passages (maximum 1000 passages to be retrieved).
The provided queries are actual queries submitted to the Microsoft Bing search engine. There are approximately
8.8 million passages in the collection, and the goal is to rank them based on their relevance to the queries.
What we provide you with:
Files from practical
A collection of 8.8 million text passages extracted from web pages ( collection.tsv — provided in Week
1).
Pytorch file for ANCE model(refer to week10-prac ).
Standard DPR model, use BertModel.from_pretrained("ielabgroup/StandardBERTDR").eval() to load this model.
Extra files for this project
A query dev file that contains 30 queries for you to perform retrieval experiments
( data/dev_queries.tsv ).
A query dev file that contains 30 queries (same query ids with previous one, but with typos in the query
text ( data/dev_typo_queries.tsv )
A qrel file that contains relevance judgements for you that can be used to tune your methods for dev
queries( data/dev.qrels ).
A leaderboard system for you to evaluate how well your system performs.
A test query file that contains 60 queries for you to generate run files to submit to the leaderboard
( data/test_queries.tsv ).
This jupyter notebook, which you will include inside your implementation, evaluation and report.
An hdf5 file that contains TILDEv2 pre-computed terms weights for the collection. Download from this link
Typo-aware DPR model, use BertModel.from_pretrained("ielabgroup/StandardBERT-DRaug").eval() to load this model.
Put this notebook and the provided files under the same directory.
What you need to produce
You need to produce:
Correct implementations of the methods required by this project's specifications.
An explanation of the retrieval methods used, including the formulas that represent the models you
implemented and the code that implements that formula, an explanation of the evaluation settings
followed, and a discussion of the findings. Please refer to the marking sheet to understand how each of
these requirements is graded.
You are required to produce both of these within this jupyter notebook.
Required methods to implement
In Part 2 of the project, you are required to implement the following retrieval methods as two-stage ranking
pipelines (bm25 + one dense retriever). All implementations should be based on your code (except for BM25,
where you can use the Pyserini built-in SimpleSearcher).
1. ANCE Dense Retriever: Use ANCE to re-rank BM25 top-k documents. See the practical in Week 10 for
background information.
2. Standard DPR Dense Retriever: Use standard DPR to re-rank BM25 top-k documents. See the practical in
Week 10 for background information.
. Typo-aware DPR Dense Retriever: typo-aware DPR is a DPR model that is fine-tuned with augumented
typos in the training samples, please use this model (provided in the project) to re-rank BM25 top-k
documents, the inference is the same to standard DPR Dense Retriever.
4. TILDEv2: Use TILDEv2 to re-rank BM25 top-k documents. See the practical in Week 10 for background
information.
For TILDEv2, unlike what you did in practical, we offer you the pre-computed term weights for the whole
collection (for more details, see the Initial packages and functions cell). This means you can have a
fast re-ranking speed for TILDEv2. Use this advantage to trade off effectiveness and efficiency for your ranking
pipeline implementation.
You should have already attempted many of these implementations above as part of the computer prac
exercises.
Required evaluation to perform
In Part 2 of the project, you are required to perform the following evaluation: we consider two types of queries,
one of which contains typos (i.e. typographical mistakes, like writing iformation for information , and
another one with the typos resolved. An important aspect of the evaluation in the project is to compare the
retrieval behaviour of search methods on queries with and without typos (note this is the same as project part
1).
1. For all methods, evaluate their performance on data/dev_typo_queries.tsv (qu INFS7410、Python eries with typos) and
data/dev_queries.tsv (the same queries, but typos are corrected), using data/dev.qrels with
four evaluation metrics (see below).
2. Report every method's effectiveness and efficiency (average query latency) on the
data/dev_queries.tsv (no need for typo queries) and the corresponding cut-off k for reranking
into a table. Perform statistical significance analysis across the results of the methods and report them
in the tables.
. Produce a gain-loss plot that compares the most and least effective ones of the four required methods
above in terms of nDCG@10 on data/dev_typo_queries.tsv .
4. Comment on trends and differences observed when comparing your findings.
Does the typo-aware DPR model outperform the others on the data/dev_typo_queries.tsv
queries?
When evaluating the data/dev_queries.tsv queries, is there any indication that this model loses
its effectiveness?
Is this gain/loss statistically significant? (remember to perform a t-test as well for this task).
5. (optional) submit your runs on the data/test_queries.tsv based on your implemented methods
from the dev sets to the leaderboard system (not counted in your mark for this assignment, but the
top-ranked student on the leaderboard could request for a recommendation letter from Professor Guido
Zuccon). The submission link is: https://infs7410.uqcloud.net/leaderboard/, other instructions refer to
Project 1.
Regarding evaluation measures, evaluate the retrieval methods with respect to nDCG at 10 ( ndcg_cut_10 ),
reciprocal rank at 1000 ( recip_rank ), MAP ( map ) and Recall at 1000 ( recall_1000 ).
For all statistical significance analysis, use a paired t-test and distinguish between p<0.05 and p<0.01.
How to submit
You will have to submit one file:
1. A zip file containing this notebook (.ipynb) and this notebook as a PDF report. The code should be able
to be executed by us. Remember to include all your discussion and analysis in this notebook and report,
not as a separate file.
Tips: for printing as a pdf, you can first save and export as HTML in jupyter and use the
browser's print function to save as a pdf.
2. It needs to be submitted via the link in the INFS7410 BlackBoard site by 27 October 2023, 1600 Eastern
Australia Standard Time, unless you have been given an extension (according to UQ policy), before the
due date of the assignment.
Initial packages and functions
Unlike prac week 10 which we compute contextualized term weights with TILDEv2 in an "on-the-fly" manner. In
this project, we provide an hdf5 file that contains pre-computed term weights for all the passages in the
collection.
Frist, pip install the h5py library:
!pip install h5py
Collecting h5py
Downloading h5py-3.4.0-cp37-cp37m-macosx_10_9_x86_64.whl (2.9 MB)2.9 MB 10.4 MB/s eta 0:00:01
Collecting cached-property
Using cached cached_property-1.5.2-py2.py3-none-any.whl (7.6 kB)
Requirement already satisfied: numpy>=1.14.5 in /Users/s4416495/anaconda3/envs/infs7410/lib/pyth
on3.7/site-packages (from h5py) (1.21.1)
Installing collected packages: cached-property, h5py
Successfully installed cached-property-1.5.2 h5py-3.4.0
The following cell gives you an example of how to use the file to access token weights and their corresponding
token ids given a document id.
Note: make sure you have already downloaded the hdf5 file introduced above and placed it in a valid
location
In [2]:
import h5py
from transformers import BertTokenizer
f = h5py.File("tildev2_weights.hdf5", 'r')
weights_file = f['documents'][:] # load the hdf5 file to the memory.
docid = 0
token_weights, token_ids = weights_file[docid]
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
for token_id, weight in zip(token_ids.tolist(), token_weights):
print(f"{tokenizer.decode([token_id])}: {weight}")
presence: 3.62109375
communication: 7.53515625
amid: 5.79296875
scientific: 6.140625
minds: 6.53515625
equally: 3.400390625
important: 6.296875
success: 7.19140625
manhattan: 9.015625
project: 5.45703125
scientific: 5.1640625
intellect: 7.328125
cloud: 6.1171875
hanging: 3.318359375
impressive: 6.5234375
achievement: 6.48828125
atomic: 8.421875
researchers: 4.9375
engineers: 6.203125
what: -1.1708984375
success: 6.421875
truly: 3.67578125
meant: 4.25
hundreds: 3.19140625
thousands: 2.98828125
innocent: 5.12890625
lives: 3.029296875
ob: 2.35546875
##lite: 1.427734375
##rated: 2.828125
importance: 7.96484375
purpose: 4.69140625
quiz: 3.28515625
scientists: 5.0390625
bomb: 3.7109375
genius: 3.8828125
development: 2.55859375
solving: 3.224609375
significance: 3.90625
successful: 5.0703125
intelligence: 5.35546875
solve: 2.751953125
effect: 1.2392578125
objective: 2.2265625
research: 1.953125
_: -2.36328125
accomplish: 2.759765625
brains: 4.046875
progress: 1.6943359375
scientist: 3.0234375
Note, these token_ids include stopwords' ids, remember to remove stopwords' ids for query tokens.
# Import all your python libraries and put setup code here.
Double-click to edit this markdown cell and describe the first method you are going to implement, e.g., ANCE
# Put your implementation of methods here.
In [18]: In [ ]: In [ ]:
When you have described and provided implementations for each method, include a table with statistical
analysis here.
For convenience, you can use tools like this one to make it easier:
https://www.tablesgenerator.com/markdown_tables, or if you are using pandas, you can convert dataframes to
markdown https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_markdown