EBU6610 Information Systems Management信息系统管理

本文内容根据课上讲义整理而成,标黄部分为可能考到的知识点,曾在小测或期末考试中出现过原题。

目录

Part 1

• Information and data bias:

• Distributed systems:

• Cloud computing:

• Cloud computing and virtualisation:

• Information theory:

Part 2

• Cryptopgraphy:

• Basic cryptographic process:

• Blockchain:

Part 3

• Web Security

• Information, metadata and semantic web:信息、元数据和语义网

Part 4

• XML(metadata) for application integration and data transfer:用于应用程序集成和数据传输的XML

• CSV and JSON:

• HTTP request/response and methods (GET, POST, etc.).

• RESTful API:

• AJAX

Block 1 Questions

Block 4 Questions


Part 1

• Information and data bias:

Information: data processed in a human consumable form

Data Bias refers to the systematic error introduced during the collection, processing, or analysis of data that leads to a deviation from the true representation of the population or phenomenon under study. It can occur due to various factors, such as sampling methods, data collection techniques, measurement errors, or human biases. Data bias can significantly affect the accuracy and validity of results, leading to incorrect conclusions and flawed decision-making.数据偏差是指在数据收集、处理或分析过程中引入的系统误差,导致偏离所研究人群或现象的真实表现。它可能是由于各种因素造成的,如采样方法、数据收集技术、测量误差或人为偏差。数据偏差会显著影响结果的准确性和有效性,导致错误的结论和有缺陷的决策。

• Difference between data, information and knowledge.

Data: raw stuff

Information: data processed in a human consumable form

Knowledge: internalised information (what-you-know)/understanding of information

• Normatively correct vs descriptively correct.规范正确/描述正确

DESCRIPTIVELY CORRECT – the suggestions statistically reflect the searches made by users描述正确:统计学上显示用户搜索的结果

• Information is not neutral:

• Selection bias.

Selection bias can occur in distributed systems if the selection of nodes or resources for analysis is not representative of the entire system. This can result in misleading performance evaluations and inefficient resource allocation.如果用于分析的节点或资源的选择不能代表整个系统,那么在分布式系统中可能会出现选择偏差。这可能会导致误导性的性能评估和低效的资源分配。

• Annotation bias.

Model may amplify bias if trained on biased data (male surgeons, racial stereotypes of anger/passivity )can design adversarially to become bad at predicting some correlated features (age, gender) to e.g. concentrate on how language works如果对有偏见的数据(男性外科医生、愤怒/被动的种族刻板印象)进行训练,可能会通过逆向设计来预测一些相关特征(年龄、性别),例如关注语言如何工作,模型可能会放大偏见

• Algorithms can amplify bias – especially AI!算法可以放大偏差

• Distributed systems:

-Networks of interconnected computers working together to achieve a common goal.

定义:相互连接的计算机网络共同工作,以实现一个共同的目标。

-Advantages of distributed systems: Improved fault tolerance, scalability, and improved performance due to parallel processing.优点:由于并行处理,提高了容错性、可扩展性和提高了性能。

-Discuss the challenges of distributed systems: Synchronization, consistency, and fault tolerance issues.挑战:同步、一致性和容错问题。

• Different definitions: Any system that works across multiple computers.

• Thinking about distributed systems:

• Byzantine Generals Problem.拜占庭将军问题

The message may be missing

Communication in distributed systems is difficult.:: Messages fail./ Messages are out of order./ Messages are “corrupted.

• Dining Philosophers.

Certain problems arise in distributed systems (deadlock, livelock, starvation)在分布式系统中出现的某些问题(死锁、生计、饥饿) 例:n个人(2n-1)根筷子

• Dijkstra algorithm.

The Dijkstra solution is like having an order in which resources can be acquired to prevent deadlock.

• The “waiter” solution is like a central broker that uses “mutex locks” to keep control of resources.

• Which is more efficient depends on the situation.

- What are the positives and negatives of a distributed system?

Positives: • May have no central point of failure (if well designed).

• May scale as demand increases (if well designed).

• May simplify design by having units focused on a single task.

• May simplify reuse of components.

• May simplify sharing of resources.

Negatives: • May have many points of failure (if poorly designed).

• May scale worse than “monolithic” system (if poorly designed).

• May be hard to debug.

• May be hard to know the root cause of a failure.

• May be hard to know order of events.

• Parts of system may compete for resources (starvation, deadlock).

- What can go wrong with distributed systems? -deadlosck and resource starvation.

• Deadlock: each member of a group is waiting for another to release a resource (system completely stuck, no movement).

• Livelock: some members of group release and reacquire resources but make no “progress”.

• Resource starvation: One or more members of a group never makes “progress" (in this case the program might still complete but it is inefficient).

• Cloud computing:

• What is cloud computing?

The delivery of computing resources over the internet, providing on-demand access to a shared pool of configurable resources.云计算定义:通过互联网交付计算资源,提供对可配置资源的共享池的按需访问。

• Popular computing model: run on virtual machines on servers in the cloud.

• Leadership consensus and Paxos algorithm.

Achieving Consensus = distributed system acting as one entity达成共识:分布式系统作为一个实体行动

Consensus Problem = getting nodes in a distributed system to agree on something (value, operation, etc.)共识问题:让分布式系统中的节点达成共识

Paxos algorithm: choosing a node as a leader, select different leaders if the leader fails, leadere log the decisions.

• Raft consensus algorithm, and the roles involved.

Leader-based approach to consensus: 以领导者为基础的共识方法

Leader/ Candidate/ Follower

保证安全:

Election safety: at most one leader can be elected in a given term.

Uses a “heartbeat” mechanism: All nodes continually receive messages from the leader.

• Cloud computing and virtualisation:

• ‘Scale up’ vs ‘scale out’.

Scale up (vertical scaling) – buy a larger machine (more memory, faster CPU) to deal with the problem.购买一台更大的机器

• Scale out (horizontal scaling) – buy more machines to deal with the problem. Split the load across machines (distributed computing).购买更多的机器

• Building data centres:

• Physical requirements: size and types.

Physical requirements物理要求

-Computing needs:计算需求

Computing power (many CPU cores).//Bandwidth (high capacity network connection).//Storage (content storage and backups).

-Datacentre needs:数据中心需求

-Power (lots of it).//“Uninterruptable” power supply (UPS) - own generators.//Cooling (power produces heat).//Physical security – theft of machines/data.//Maintenance – replacement of failed parts or upgrade

-Large server and storage farms大型服务器和存储场 1000s of servers// Many TBs or PBs of data

-Used by所有者 Enterprises for server applications// Internet companies// Some of the biggest DCs are owned by Google, Facebook, etc.

-Used for用途 Data processing// Web sites// Business apps

• Electricity and topography.

• Failure.

- How might cloud computing scale up to meet increased demand?

Buy a larger machine + 上面building的那几条

• Data centre loading:

• Cloud service models: SaaS, PaaS and IaaS.

Software as a Service (SaaS):

• Cloud provider sells access to software/applications hosted on their cloud to users as a service.

• E.g., customer relationship management, email, database, etc.

• Typically software accessed through web browser.

• Avoid costs of installation, maintenance, patches, etc.

Platform as a Service (PaaS):

• Cloud provider offers software platform (OS and tools with an API) that allows user to develop applications.

• E.g., Google’s App-Engine

• Avoid worrying about scalability of platform (i.e., often tied with ability to automatically scale out)

Infrastructure as a Service (IaaS):

• Provider offers raw computing, storage and network (i.e., hypervisor ability to install and connect virtual machine images).

• E.g., Amazon’s Elastic Computing Cloud (EC2)

• Avoid buying servers and estimating resource needs

• Users responsibility to install and configure those VMs.

• Public, private and hybrid clouds.

Public cloud公共云

• “Rent” your computer• Mega-scale infrastructure• Vast capability• Metered usage

Private cloud 私有云

• Enterprise owned or leased• Tight security control

Hybrid cloud混合云

• Composition of two or more clouds

• Balancing cache, webserver and database components.

• Big data and map/reduce computation, e.g. Hadoop.

 to obtain computational result from very large data set从非常大的数据集中获得计算结果

• Hypervisors and virtualisation.

Virtualisation in computing is allowing one environment to appear as another.( Decoupling from the physical computing resources)允许一个环境以另一个形式出现。(与物理计算资源的解耦)

Hardware virtualisation is where hardware can be used as if it is a virtual machine (VM).硬件虚拟化是指硬件可以像使用虚拟机(VM)一样被使用的地方。

• Virtual machine: decoupling from physical computing resources and allowing one environment to appear as another.

- What are the advantages of virtualisation?

Better resource usage更好的资源使用• Processors, RAM and disk space are partitioned among multiple VMs. • Many VMs on single computer.

Performance isolation:性能隔离:• Each VM should run independently. • My VM cannot detect details of yours.

Reduced impact of failures减少故障影响 • Failure in one VM does not affect remaining elements.

• Utility computing and types of virtualisation, migration and replication.

• Hypervisors, lightweight containers and microkernels.

Hypervisor or Virtual Machine Monitor is responsible for managing the VMs on a single piece of hardware.系统监控程序或虚拟机监视器负责管理单个硬件块上的虚拟机。

Lightweight containers: Containers are a “lighter” approach than full virtualisation. //Allow isolated environments (containers) within single OS.// Containers like full VMs but require fewer resources.// Less “isolation” because of this resource sharing.// Docker is a platform for controlling containers.轻质容器:容器是一种比完全虚拟化“更轻”的方法。//允许在单个操作系统中隔离环境(容器)。//容器像完整的虚拟机,但需要更少的resources.//“隔离”,因为sharing.// Docker是一个控制容器的平台。

Microkernel is a very lightweight way to run services within a VM.// General purpose operating system (Windows/Linux) has many components we might not need. // Microkernel runs an extremely small operating system (KB not MB) and boots quickly微内核是一种在虚拟机中运行服务的非常轻量级的方式。//通用操作系统(Windows/Linux)有许多我们可能不需要的组件。//微内核运行一个非常小的操作系统(KB而不是MB),并且可以快速启动

- What is the role of a hypervisor(or Virtual Machine Monitor)?

Hypervisor or Virtual Machine Monitor is responsible for managing the VMs on a single piece of hardware.负责管理单个硬件块上的虚拟机。

• Information theory:

Probability and Entropy概率和熵

Independent P(A,B)=P(A)P(B)

Bayes’ theorem贝叶斯定理

P(B|A) A情况下B发生的概率P(B|A)=P(A|B)P(B)/P(B)

A为真时,P(B|A)P(A)=P(A,B)

• Probability and Bayes’ Theorem.

• Ensembles for discrete random variables.

an ensemble (for a discrete system):An observation of a discrete random variable. The possible values for the random variable. The probabilities of each value. Together

X=(x,Ax,Px)一个变量,变量的可能值,每个值的概率

• Shannon information and entropy of an ensemble.

X的香农信息h(x)=log2 1/P(x), 单位是bits

熵The entropy H(X) of an ensemble X in bits is ∑P(X)log2(1/P(X))

• Compressing data:压缩数据

• Symbol codes – notation and definition.

Definition: A way to write all possible values of our random variable in a binary string 将二进制随机变量的所有可能值写入二进制字符串的方法

要求:uniquely decodable, easily decodable, compression解码独特,易解码,压缩

- What is a prefix code?

Definition: a symbol code is a prefix code if no set of bits in the code is the “prefix” of another set of bits (e.g. ‘01’ is the prefix of ‘01000’)

- Can you identify prefix codes?

看集合里面有没有元素在其他元素前面,{01,10,010}不是,{0,10,11}是

• Efficient codes and expected lengths.

Expected length: lipi的和L(X,C)= ∑lipi(li是某元素长度pi是某元素概率)

Efficient codes: L(X,C)≥H(X)  代码长度不小于熵(概率乘概率倒数的对数的和)

• Kraft inequality and Gibbs inequality.

Kraft inequality ∑2^(-li)≤1 代码长度占用总预算(1)的空间

Gibbs inequality ∑pi log 1/qi ≥ ∑ pi log i/pi 吉布不等式是卡夫不等式的中间证明

• Huff codes and stream coding.

Huff codes: 取两个最小的不断画树画到概率为1,往前推,选一个分支0一个分支1,得结果,最坏情况:H(X)<= L(C,X)< H(X)+1,大0小1从后往前数

Stream codes流代码:编码字母流而不是单个字母(多数zip文件) There are many types of stream coding:Arithmetic coding// Lempel-Ziv coding

-How do u calculate the expected length of a symble code A with ensemble X?

Expected length is L(X,C)  (Take an ensemble:• Alphabet AX and probabilities PX

• Take a code for the alphabet C with lengths li• Expected length of message isL)

• Error correction and noisy channels.

这就是我们正在考虑的模式。发射机发送二进制信号。嘈杂的通道可能会导致“比特错误”(1变为0或0变为1。)我们有时称之为“翻转”

This is the kind of model we are thinking of. The transmitter sends binary. The noisy channel may cause "bit errors" (1s become 0s or 0s become 1s.) We sometimes call these "flips".

Randomly with probability p a bit will be "corrupted" (1 becomes 0 or 0 becomes 1).随机的概率是p位将被“破坏”(1变成0或0变成1)。

• Channel capacity and maximising mutual information.信道容量和互信息的最大化

信道容量C(Q) = max I(X,Y) I(X,Y)=H(X)-H(X|Y)=H(Y)-H(Y|X) (XY熵重叠部分)

Part 2

• Cryptopgraphy:

• Types of security attacks and security requirements:

• Interruption, interception, modification and fabrication attacks.

Attack on availability/ confidentiality/ integrity/ authenticity可用性 机密性 完整性 真实性(中途截断、窃取信息、收到发送方信息后发新的信息、伪造信息)

• Confidentiality, authentication, integrity, non-repudiation and authorisation.

保密性、认证、完整性、不否认、授权

Confidentiality

Exchange is protected from eavesdroppers交换免被窃听

Authentication

Providing proof of identity提供身份证明

Integrity

Message is not modified accidentally or deliberately in transit消息不会在传输过程中被意外或故意修改

Non-repudiation

Sender of message cannot deny having sent it发送方不能否认已发送

Authorisation

Can an entity with a given identity access a resource?决定具有给定标识的实体是否可以访问资源

- Do u understand the difference between symmetric and asymmet, ric states?

Symmetric: both sides use a same key, e.g. enigma machine, book code, AES, DES

Symmetric (or private)

-Encryption relies on a single key

-Key must be kept secret

-Sender and receiver have same key

-Same key used to encrypt and decrypt.

Asymmetric (public-private key)

-Encryption relies on key pair--one public, one private

-One key can decrypt encryption from other

-eparate keys lying with sender and receiver--one for encryption and other one for decryption

• Send or receive privately

• Basic cryptographic process:

• Encryption methods and keys.

• RSA algorithm. asymmetric cipher非对称编码

Prime number质数

φ(x)= 小于x的和x互质数字的个数

质数p, φ(p)=p-1, φ(pq)=(p-1)(q-1)

模运算a= y(mod n) :  a= kn + y,余数

公钥(e,n)私钥d, ed=1(mod φ(n))

C=m^e(mod n)

m=c^d(mod n)         

m=m

- Use of public/private keys e.g.g in the RSA algorithm.

给定pqm,求n,求φ(n),求(e,n)求d, 算c, 算m’,看m?=m

• Message authentication and digital signing/certificates.

Message authentication:用发送方的私钥和接收方的公钥一起编码,接收方的私钥和发送方的公钥进行解码

使用私钥可以进行身份验证

digital signing: -Use a message digest/-Digest is very short message that proves you wrote a longer message

--Used when we want to prove who wrote a message but the message is public.

--Message digest shorter than message – quick to check

!!MAn-in-the-middle attack->solution: certificate X.509

• HTTPS/TLS/SSL.

Use HTTPS not HTTP for security

HTTPS security is called Transport Layer Security (TLS) – previously Secure Sockets Layer (SSL)

Contains 2 parts: signature and secure connection

• Blockchain:

• Ledger and double-entry book-keeping.分类帐和复式簿记。

Double-entry bookkeeping: Ledger

Every dated/timed credit has a dated/timed debit

Accounting was manual: entries into a ledger (=database)

会计是手动的:录入一个分类帐(=数据库)

• Blockchain and distributed ledger.区块链和分布式账本。

Distributed Ledger: blockchain

Blockchain is: a method of storing data amongst multiple parties that ensures data integrity区块链是:一种在多方之间存储数据,以确保数据的完整性的方法

Blockchain is: a distributed ledger (shared database) where every participant holds a copy区块链是:一个分布式账本(共享数据库),其中每个参与者都持有一份副本

• Trust protocol and cryptocurrency.信任协议和加密货币。

Blockchain = a trust protocol

-mechanise trust via an electronic distributed/shared ledger通过电子分布式/共享分类帐实现机械化信任

-prevents double-spending防止了双重消费

-reduces the cost of trust降低了信任的成本

 “Cryptocoin”: token that can be put in a block added to the blockchain

-scarcity: maximum number of coins can be mined稀缺性:可开采的最大硬币数量

--scarcity not sufficient: belief in value (useful as a means of exchange)

-impartial – e.g. ignores currency controls公正的---忽略了货币控制

--BUT: tracking as a public good (sanctions, terrorists, taxes)

-privacy limited by law not technology隐私受法律限制,而不是技术

• Varifiable accounting and nounce.可变会计和零。

 timestamped transactions recorded in a distributed digital ledger/spreadsheet在分布式数字账本/电子表格中记录的带有时间戳的交易

Consensus mechanisms and mining.共识机制和挖掘。

Step1 transaction

Step2 verification

Step3&4 creat& complete new blocks

Step5 add new block t othe chain

• Proof of work, proof of state and proof of authority.工作证明,状态证明和权威证明。

Proof of Stake introduces advantages to honest participants by

• Introducing Penalties

• Assigning voting privileges based upon the currency associated with a participant

Proof of Stake

• Users join a validator pool

• Forgers* who validate transaction are selected through a deterministic process which may or may not involve their “stake”

• Stake in this case is defined as their level of cryptocurrency wealth or how long they have been a part of the validator pool

• Once the forgers have been selected they reach a consensus on which is the next valid block in the chain

• Range of problems associated with proof of stake

Proof of authority

• Another consensus algorithm

• Proof of authority relies on a certain set of trusted nodes, known as "authorities" who are specifically granted the ability to secure the blockchain by verifying transactions and creating new blocks.

• Validation of the transactions in new blocks by other nodes is done exactly the same way as proof of work.

• Since this consensus algorithm depends on trusted nodes, it can only really be used for private or test chains.

• Electricity and cost problems.电力和成本问题。

• Blockchain applications and NFT.区块链应用程序和NFT。

• Abuse, fraud, inefficiency and limitations.滥用、欺诈、低效和限制。

• Misconceptions and use cases.误解和用例。

Part 3

Web Security

• Authentication and authorisation of users 身份验证/授权

Authentication: Verify the user is who they say they are

– Login form

– Ambient authority (e.g. HTTP cookies)

– HTTP authentication

Authorisation: Decide if a user has permission to access a resource, e.g. via:

– Access control lists (ACLs)

– Capability URLs

• Browser (HTTP or Certificate), form and open authorisation (OAuth).

浏览器(http或证书),表单 和 开放授权(oauth)

Authentication types, browser(basic authentication, digest authentication, use digital certificate at client, other possibilities); form based(creat ur own authentication system); open standard(OAuth, ur site authenticates/authorises against one eles你的网站对其他人的身份验证或授权)

·1.1.basic authentication, provided a username and password, avoids cookies, session identifiers, login page(举的例子是一个弹窗,HTTP包括username和password请求“request”,不加密发送send unencrypted, base 64 encoded), HTTPS Must

·1.2.digest access authentication摘要访问身份验证, send un and pw encoded, server sends nonce(随机数), browser sends un and pw hash and nonce, server performs hash on un pw nonce

·1.3.certificate based authentication证书, client side use certificate, signed by trusted CA, HTTPS connection(may also use un and pw).

Client send(request connection with list ciphers) to server

Server send(agree cipher)(certificate and public key) to client, client check it

Client send(certificate and public key) to server, server check it

Client and server exchange(key)(encrypted data using session key)

C and S forget the session key and close connection

·2.form based authentication, website responsible for authentication, login page type un and pw into a HTML form, redirected to login if on the unauthorised page, C GET a website to S, S redirected to login page for C, C post us and pw on login page, S send the website or error page(usually HTTPS)

·OAuth, 对其他网站进行认证和授权, HTTP, used common/ used by web service to access APIs(user do not want to create an account, quick and convenient by user, saves users more passwords, storing less data)

• Two/multi-factor and hardware-based authentication.

两/多因素 和 基于硬件的身份验证

·Two-(multi-)factor authentication, use more than one means to authenticate(us and pw/ physical IDs/ digital certificate/ physical hardware 如电子狗dougle/ other as PIN.../ query based as text or SMS or email), 情境性登录,如登录新设备/unusual转账transfer)

·Hardware-based authentication,work with physical hardware e.g. USB key Dougle, or with challenge response e.g. unique CA, or online banking security(bank card)

-Do u understand how passwords might be hacked using these methods?

Basic au: Internet Explorer has JavaScript to clear it/ Redirect to a URL on same domain to invalidate password

Digest access au: server identity not valid/ Password stored in reversible manner on server(not secure), attacker无法解码密码或者响应不同的nonce

Certificate au:get username/password/certificate(may need procedure for generation and distribution certificates)

OAuth: lost LOTS of things if lose access to ur social media account.

Two-(multi-)factor authentication:factors may not independent, hacker have the mobile phone, they can read the SMS or message

Hardware-based authentication: missing the hardware, hacker steal the login and hardware

-How might u prevent these attacks?

• Session tracking and using cookies会话跟踪和使用Cookie

·session tracking会话跟踪, HTTP is stateless, but we need to keep state

  Session tracking options会话跟踪选项: cookies/URL rewriting/hidden form fields/combination of two or three组合

-Cookies, files stored on client machine by browser or send by browser to the server.

-URL rewriting, URL contains information about session, fails if reload or back, URL in logs vulnerable to snooping日志中的URL容易被窥视(can be managed by forms)

-Hidden form fields, extra fields in submitted forms convey information, fails id reload or back

·HTTP Cookie, Two types, Session cookie kept in memory会话cookie储存在内存中, persistent cookie stored on disk and used for tracking持久性cookie储存在磁盘中用于跟踪. Attached to a particular domain储存在特定域, send along with HTTP request when user request to that domain.

 C send(HTTP GET web1 web.html) to S

 S send(HTTP RESPONSE)(Cookie web1 cookie data) to C !!cookie可标识客户唯一性

 C send(HTTP GET web1 page.html)(Cookie web1 cookie data) to S

Cookie properties 属性: {Secure cookie (server setting)坚守HTTP安全/HttpOnly (server setting)客户端无法从API访问的/Third party cookie非同一站点的cookie有可能是广告/SameSite (client setting)最新的不发送第三方cookie的}{period of validity生命周期/a scope(where to send it)domain or path/send by the browser unless turn it off}

Bad use: session hijack(会话劫持)利用已知的登录状态强制执行命令, forge requests from different sites伪造不同站点请求, gain information sender shouldnt have获取发送方不该有的信息, big security problems

Servers keep user session data too

• Securing passwords and pwning.保护密码和pwning(owning)

Pwned,owned or黑客入侵

Password attacks: brute force, dictionary and rainbow table attacks.

密码攻击:暴力破解、字典和彩虹表攻击

·brute-force attack: test all pw up to the given length see if encoding matches: Minimum strength(which involves minimum pw length)

·dictionary attack: pw are simple combination of dictionary words

·Rainbow table attack: uses a special table (a “rainbow table”) to crack the password hashes in a database. 破解数据库中的密码散列,彩虹表是一个纯文本字符hash value的预计算表: a precomputed table that contains the password hash value for each plain text character used during the authentication process.

• Hash function and adding salt.散列和加盐

·Hash function:Applications don’t store passwords in plaintext, but instead encrypt passwords using hashes.应用程序以hash value储存密码(hash function is one-way, cant deduce the pw from hash, similar pw have different hash value)

Hackers: 找到stored pw hash And lookup/compare with the table

·adding salt: against hash table: perpend or append random data to pw before hashing it, store salt and hashed pw together在密码前加一段数据hash,一同salt值和hash

!!still needs a good hash algorithm and a strong pw

-What is hash collision?什么是哈希碰撞

  Sometimes different passwords have same hash (rare enough you can't simply "guess"). 有时不同的密码有相同的散列(罕见到你不能简单地“猜”)。

• Capture and good password security practice.捕获和良好的密码安全实践

·share passwords/ post-it notes/ use same pw for different server/ phishing sites钓鱼网站/ CAPTCHA验证码

·sufficiently long pw/ not dictionary words/ stored hashed/ add salt/ use CAPTCHA

• Web attacks: Injection, XSS, CSRF, etc…网络攻击:注入、XSS、CSRF等

• SQL injection: Malicious code executed on server. Malicious user sends code to server.在服务器上执行的恶意代码。恶意用户向服务器发送代码

• XSS reflected: Malicious code executed by innocent user. Malicious user sends link.由无辜的用户执行的恶意代码。恶意用户发送链接

• XSS (persistent): Malicious code executed by innocent user. Malicious user gets code stored on server.由无辜的用户执行的恶意代码。恶意用户获取存储在服务器上的代码

• CSRF: Malicious code executed on server. Requires user to be “authenticated”. Malicious user sends link.在服务器上执行的恶意代码。要求对用户进行“身份验证”。恶意用户发送链接

·Injection注入:on the server, user send extra data that is interpreted as a command

 例在输入想要的信息后加“;代码语句”,可以利用后面的语句注入恶意命令如edit information/delete information/change the administrator/set root pw

 解决:sanitise/ do not creat SQL by adding together strings(use special languages)

·XSS(Cross-site scripting)跨网站指令码: cross-site scripting: Attackers add "malicious" code into data, innocent user downloads this data在数据中添加恶意代码,无辜用户下载这些数据和恶意代码并运行, on clients side via browser similar to injection, 无辜用户在他们的计算机上执行命令

 -reflected XSS: Trick innocent user into using a URL(clients provide data as a part of URL as server-side script to generate HTML displayed back to client)(放的是js代码)

 -persistent XSS: flaw is rarer, more damaging, 在网站上永久显示

 -XSS worms蠕虫: if XSS can happen, then it can be transmitted from page to page

·CSRF(Cross-Site Request Forgery)跨网站请求伪造: on the server盗用身份不经同意执行操作 forces an end user to execute unwanted actions on a web application in which they’re currently authenticated(在终端用户已通过身份验证的web应用程序上执行未经授权的操作)Trick user into clicking link and hope they are currently logged into the site. (authenticated) ONLY victim is logged into the site.

预防:Treat all user input as potentially malicious. / Lock down access where possible.

• Distributed Denial of Service (DDoS).分布式拒绝服务

·stopping users accessing service阻止用户访问服务

·DDoS is when a large number of sources send traffic to your website, increased data

·can make websites offline(数据爆炸网站下线)

-How do u identify a DDoS attack?你怎么发现了DDoS攻击?

Traffic

-What can be used to mitigate these attack?可以用什么来减轻这些攻击?

Scale out cloud systems and content distribution networks

• Core principle for protecting web systems/reducing attacks.保护web系统/减少攻击的核心原则

Minimise the “attack surface”

-What might u do to ensure a website or web serrvice is secure?

你会做些什么来确保一个网站或网络服务是安全的呢?

Keep system patched, Minimise access and know who has it(firewall, no network, workspace), Restrict physical access to systems(key logger, access or replace key seystems), Know when there’s a problem(logging and monitoring日志和监控, instrusion detection system), Protecting yourself as user(log out sensitive applications after actions, avoid suspicious links, long random passwords).

-What are 4 different ways how the core principle could be applied?

应用核心原则的4种不同的方法是什么?

• Consider splitting software across different machines to reduce the attack surface.

• Remove software your machine does not need. 卸载不必要软件 ↑多终端↑

• Close ports. 关闭端口

• Restrict access.限制访问

• Information, metadata and semantic web:信息、元数据和语义网

• Definitions of metadata and how information is structured.元数据的定义和信息的结构化方式

·definition:Data about data (weakest definition)=Metadata is a map, which is a means by which the complexity of an object is represented in a simpler form, and is a statement about a potentially informative object.映射,简单表示复杂对象,潜在信息性对象。(关于数据的数据)(例:户口本是个人信息的元数据)

Data are structured(except random data)

• Descriptive metadata: what is this? Why?描述性元数据:这是什么?为什么

How can we provide more details about resources so we can find them, link them to other resources etc. 如何提供有关资源的更多细节,以便我们可以找到它们,将它们链接到其他资源等, element-value pairs元素-值对(例书本type-novel,data-2007)

descriptive metadata/application semantics are core to information sharing信息共享的核心(application semantics,定义应用程序的语义)

• Administrative metadata: what is this? Why?管理元数据:这是什么?为什么

Information about the lifecycle of a resource, usedfor administration关于资源生命周期的信息,用于管理

子类:技术元数据technical metadata例:储存照片信息,在相册中按时间或地点分类

• User metadata: what is this? Why?用户元数据:这是什么?为什么

Information about individuals and their social networks个人及其社交网络的信息

(data exhaust – it is generated by our day-today activities也叫数据废气,日常活动产生)created deliberately, e.g. to help resource discovery故意产生的 e.g. web server logs, online resources analytics, services data.

• The Semantic Web语义网

web of data, link data between webpages(the web)/link between structured online data connections between datasets(the semantic web)

Accessible to both humans and machines

Part 4

• XML(metadata) for application integration and data transfer:用于应用程序集成和数据传输的XML

XML tags: metadata is modeled through tags

Use elements... nested information使用元素。。。嵌套的信息

Data types:defined(or provided) and custom已定义或自定义

XML Schema: defines the data structure

• XML tags, nested elements, attributes and data types.

XML标记、嵌套元素、属性和数据类型。

Tags are about information, not about presentation

Attributes 例<ellipse cx="210" cy="45" rx="170" ry="15"

style="fill:yellow"/>

嵌套:<name><fName></fName><gName></gName></name>

• Root, parent and child decomposition.根分解、父分解、子分解

name→first name, given name...

 -What is root element?

XML文件中的根元素是文档中的顶层元素,可以是一个文档的任何元素,但一般是标签。

• Well-formed XML and elements.格式良好的XML和元素。

 -What are well-formed XML rules?

Well-formed = grammatically correct

the element names must NOT have a space, consistent capitalisation (case sensitive), need a closing tag for every opening tag, some tag is both an opening and closing tag e.g.<br>, close tags in order, cannot start with a number,  dont use reserved names, can start with an underscore,  cannot include greater/less than signs, avoid end with colons

元素名称必须不有空格,不能从一个数字开始,不要使用保留名称,不能包括大于/小于,避免与冒号结束一致的大写(区分大小写),需要一个关闭标签为每个打开标签,一些标签既开又闭例如<br>,按顺序关闭标签,可以从一个下划线开始

 -Practice writing XML using the rules?

XML 必须包含 prolog、 root(易错),XML纠错题

• Duplication and namespacing.复制和命名空间

·Duplication can be correct if you have >1 of the same sort of object, e.g.:

<photographs>

<photo>my cat</photo>

<photo>cat at the window</photo>

<photographs>同类型的可以复制(但是书title和作者title不行)

·namespacing 关联元素qualifies a name so as to eliminate ambiguity,比如不混淆书的title和书作者的title

• Entity references and prolog.实体引用和序言

Entity reference符号& < >  →& &it > & apos "

• Valid XML and schema.有效的XML和架构

在XML Schema里面定义rules 例<xs:element name =Type type=XS:String/>, 则在XML file里可以用<Type>A</Type>

 -What is the difference between well-formed and valid XML?

Well-formed = grammatically correct; valid = follows the rules, including element names, attribute names, order, datatypes, how many etc. As laid out in a business document such as an XML schema.

• CSV and JSON:

XMLand JSON: serialisation(encoding) formats序列化(编码)格式

• CSV: advantages and disadvantages.CSV优缺点

CSV comma separated variable/value.CSV是逗号分隔的变量或值

Solution of verbose XML

Advantage:Provide huge datasets to be read into programmers for statistic analysis/also a code book(columns details but not machine readable)/simple or short not have to provide metadata

Disadvantages:No schemas,no meaning of rows and columns/have to manually handle changes such as now rows or columns/Vague format, difficult to manage a value containing a comma or new line character

Modifying:CSV:write a parser to modify or load into a table in programming language

XML:extensible, can use the DOM to modify

 -Know how to convert XML to CSV or vice-versa.

• JSON vs XML. JSON string and name:value pairs.JSON字符串和名称是值对

JSON JavaScript 对象符号

!!XXE(XML security):attackers uploads malicious XML(upload) or puts hostile content in an XML file(insert)→e.g. owasp.org:solution:do not serialisesensitive data./patch, newer versions, etc./Validation./use JSON

Disadvantages: not support hypermedia(ket property of RESTful Web API)/the JAVA serialisation difficult to model generics or polymorphism/security risks:injection; insecure deserialization不安全反序列化

同:Simple,open,interoperable/human readable/exchange format/structured data

异:XML: can manage by reusable software/many views of the same data(JSON is not a document mark-up language so not provide this)/can define new tags(extensible,to handle images, charts, graphs...)

JSON:easier as more restrictive in human readable

 -Know how to convert from XML to JSON or vice-vers.

• HTTP request/response and methods (GET, POST, etc.).

HTTP请求/响应和方法(GET、POST等)

USED BY SOAP: a transport protocol(reuse infrastructure and avoid firewalls)

HTTP: application protocol, defined methods(GET POST) and use URL/URI identify oject

Request/response:client sends a request message and server sends back a response message.

Request message: Method /Path HTTP/1.1

HOST:

Headers:

GET: query string查询

GET /path?queryone=valueone&querytwo=value+value HTTP/1.1

Host: www.host.cn (no informationin message body)

POST:POST used to modify server state修改服务器状态

POST /details HTTP/1.1 (no information in the first line)

Host: www.sendDetails.cn

Content-Type:(建议标题)

Content-Length:(建议标题)

(空白行)

queryone=valueone&querytwo=value+value (Maeeage body)

Response:successful/ not(be moved/clients may not authorised)

An entire HTML page is returned in the body of the messageThis leads to a browser full-page refresh (browser loads the new page)在响应中回复整个页面,导致刷新/新页面

AJAX(response simple text or JSON)

• RESTful API:

·REST: Representational State Transfer表示状态转移

·API: Application Programming Interface应用程序接口

·MASH: extract data from one AP, pass to another API从一个API中提取数据传递给另一个API例:用Google data从ig中提取图像 Response响应:Header: include how many remanding calls, Body: JSON user,num of likes, keywords, link to the photo)

·Web service→Web APIs: first web service: heavyweight, XML based, SOA/SOAP architecture, the ncame REST, SOAP and web service has closed link so when stopped using SOAP they stopped talking web service. Then came to REST.

·REST vs SOAP: Most web service traffic is via REST interface/ REST used by Yahoo, Google, Amazon, Facebook/ SOAP is still used by some large companies, and is still a development and research area(e.g. finacila,for its strongly typed data and strictly enforced structure).

·Web API:(!!not all web APIs are RESTful), dataflow througe service - communication, Interface(not tied to operation system or programming language, application-specific), request/response message system, express using XML or JSON, expose via a web server/browser(web) - discoverable, HTTPis  transport protocol

·Web API vs API: Users normally outside your organization/ Not created (usually) for one client(Difficult to change anything on the server side without hurting customers because clients are silent partners may be unknown in your server implementation)/ Public APIs should change rarely.

•Web APIs vs RESTful Web APIs: Web API is API made public through the Web/Rest API:(serves self-descriptive content/obeys the connectedness principle/constrained media-type/bridges the semantic gap提供自我描述内容/遵循连通性原则/限制媒体类型/ 弥补语义上的差距)

 -What does REST stand for(or represent)?

·REST: Representational State Transfer (not a protocol, framework or standard)

·Set of design constraints/restrictions (Fielding constraints): statelessness, hypermedia, self-descriptive messages, identification of resources.一组设计约束/限制(字段约束):无状态、超媒体、自描述消息、资源标识

A: REST paradigm fundamental concepts范式基本概念:

1. Addressable resources 可寻址资源(Information abstracted as resources. Each resource is located by a URL 抽象为资源的信息。每个资源都由URL定位)

2. Uniform, restricted interface 统一、受限接口(Small common set of methods 小的通用方法集)

3. Client and server exchange resource representations 客户端和服务器交换资源表示(A resource can be represented using multiple formats, for different platforms(HTML browser, javascript-JSON, Java/C#-XML). 资源可以使用多种格式表示,适用于不同的平台(HTML浏览器、javascript-JSON、Java/ c# -XML)。)

4. Stateless Communications 无状态通信(Vastly improved scalability 极大地提高了可伸缩性)

5. Hypermedia As The Engine Of Application State (HATEOAS) 超媒体作为应用状态引擎(HATEOAS)(Application state is defined / driven by hyperlinks in representation 应用程序状态由)表示中的超链接定义/驱动

• Web applications vs micro-/web services vs web APIs.

 -What is the difference between a web application, web service and web API?

Users/Clients:

Web Application:C is a browser

Web Service/API:C is an application/software information is structured to make it easier for a computer to dynamically(动态的) access.

·Web API: API made public through the web通过网络公开的API

·User:website:people, web crawler can stape HTML人和爬虫、API:computer/system, person can read XML/JSON、webservice: computer(better structure the information and computer programmes can find the information more easily)

• Battle between SOAP and RESTful web APIs.

·RESTful web API: a Web service API follows the REST architectural constraints

·Web service→Web APIs: first web service: heavyweight, XML based, SOA/SOAP architecture, the ncame REST, SOAP and web service has closed link so when stopped using SOAP they stopped talking web service. Then came to REST.

·REST vs SOAP: Most web service traffic is via REST interface/ REST used by Yahoo, Google, Amazon, Facebook/ SOAP is still used by some large companies, and is still a development and research area(e.g. finacila,for its strongly typed data and strictly enforced structure).

 -What is the difference between SOAP and a REST-Type web API?

• REST application state and resource state.REST应用程序状态和资源状态

Application state=clients state: pages user is currently visiting/stateless(unknown to the server)/links provide a connection between states

Resource state=server state: modified e.g. by POST* (a new message results in an update to a message list)(GET cant modified)/ Multiple resources: home page, account page, message list etc.

Server sends representation, changes clients application state. Server改变Apl state

Clients sends representation, discribes the new resourcee state on the server. C描述R

• Resource vs representation.资源vs表征

Resource: fundamental information abstraction基本信息抽象, can be anything, must have a URL (全局唯一地址, globally unique address)(thing +URL = resource)

Representation (machine-readable document with any information about current resource state包含有关当前资源状态的任何信息的机器可读文档)

Server sends resource representation to client; client sends representation to server

API理想场景:API client<->Web API:1. API client requests main page 2. Examine response3. Discover available options 4. Follow links/fill forms5Accomplishes a given task!

• Using HTTP methods/verbs and response codes.

·GET: retrieve a representation of an existing resource.检索现有资源的表示形式

·POST: client sends a representation to the server of the state it would like a resource to have.客户端将一个表示形式发送到它希望有一个资源所拥有的状态的服务器

·HTTP Request:Idempotent幂等same effect if triggered/called once or repeatedly重复调用效果相同(also when at any safe state transitions, e.g. GET)Headers:HOST(HTTP/1.1)

·HTTP Response: Status/response code状态或响应代码(200 OK; 201 Created; 303 See Other)Headers: Content-Type (MIME type); Location (redirect).

• Hypermedia constraint and HATEOAS.

·Hypermedia constraint超媒体约束:REST最重要的方面

Each application tells u how to get the adjoining applications每个应用都告诉你如何到达相邻的应用(adds flexibility增加灵活性,as companies cant change public APIs without impact users experience)

·HATEOAS:stands for hypermedia as the engine of application state超媒体作为程序应用状态的引擎, Connects resources to each other将资源相互连接

 API client interacts withRESTful API via hypermedia responsesAPI客户端通过超媒体响应与RESTful API交互

Allows the server to tell the client in a descriptive resource message what HTTP requeststhey can make允许服务器在描述性资源消息中告诉客户端它们可以发出哪些HTTP请求

Must be able to separate data from metadata.必须能够从元数据中分离数据。

• Application semantics, and descriptive, normative correctness.

 应用程序的语义,和描述性的、规范的正确性

JSON/XML Application Semantics: Data-interchange formats do not restrict the meaning of content, JSON/XML may show a list of: books, food, music, films etc.Each will have fields/names/elements such as year, price, length. These are the application semantics.数据交换格式不限制内容的含义, JSON/XML可以显示如下列表:书籍、食物、音乐、电影等。每个都有字段/名称/元素,如年份、价格、长度, 这些是应用程序语义.

(即字段含义,如微博:文本text发布日期data_posted鸟类指南:颜色colour声音song)

• IANA and MIME/content/media types.

·IANA Media Types:constraint约束,

IANA: Internet Assigned Numbers Authority互联网号码分配局,Responsible for the global coordination of负责全球政府间协调工(DNS root, IP addressing, other internet protocol resources).Also controls ‘standard’ names, link relations and media types.

·MIME/media/content types

Webpage: text/html

Image: image/jpeg

vCard: text/vcard (business card)

Maze: application/vnd.amundsen.maze+xml

XML doc: application/atom+xml

JSON doc: application/json(BUT JSON does not support hypermedia, unless…– JSON doc: application/vnd.collection+json( JSON with constraints [also lets us group similar JSON content]))

·ADD: Designing an APl

1. Identify semantic descriptors 识别语义描述符

2. draw an API state diagram (not covered on EBU6610) 绘制API状态图

3. reconcile descriptor names 协调描述符名称

4. choose a media type - constraint 选择媒体类型-约束

5. API documentation/profile (describe application semantics - helps to bridge the semantic gap) (notcovered on EBU6610) API文档/配置文件(描述应用程序语义——有助于弥合语义差距)

6. write the code

7. publish

• AJAX

AJAX: client-side JavaScript application running inside a web browser uses an XMLHttpRequest object to become an HTTP client.

AJAX allows information on a page to be updated:

– Without having to reload a whole new HTML page

– In response to a range of user-generated events

The server only sends back the necessary information (text, XML, JSON) IN THE BODY OF THE HTTP RESPONSE MESSAGE (Not an entire HTML page)

• Role of browser and XMLHttpRequest.浏览器和XMLHttpRequest的角色

– handles making requests to web server.

– manages responses from server

• stores server’s response in the request object.

– passes events (eg button clicks) to JavaScript function or server program

• “Call back” to JavaScript when gets a response.

– returns control to user (asynchronous).

• Javascript function to set up AJAX control.用js函数设置AJAX控制

JavaScript function is called in response to an event在响应事件时调用JavaScript函数

u Located inside <HEAD><script> tags

• Callback function and update page.回调和更新页面

Callback functions start with e.g. if statement…these functions are called/fired every time an event happens, but frequently we are not interested in the current state/event回调函数以开始,例如if语句…每次发生事件时,这些函数都被调用/触发,但我们通常对当前状态/事件不感兴趣

• JSON and AJAX.

 JSON is written in an array format

- Modern browsers access the JavaScript JSON Object:

u var jObj =

JSON.parse(request.responseText);

u document.getElementById(“someId”).inn

erHTML=

jObt.totals[0].carrots-sold;

 -Why is JSON oftern the preferred format to use in AJAX-enable pages?

 为什么JSON格式是在ajax启用的页面中使用的首选格式?

Leadbility and maintainability:易读性和可维护性:JSON的语法结构简单明了,易于理解,而且它是一种基于文本的数据格式,这使得JSON非常易于阅读和编写。对于开发人员来说,这是一种高效的数据交换方式。

Seamless integration with the JavaScript:数据交换的轻量级:JSON是轻量级的数据交换格式,这意味着它在进行数据传输时,相对于其他格式(如XML)占用更少的带宽,提高了数据传输的效率。

与JavaScript的无缝集成:由于JSON与JavaScript的数据类型兼容,前端可以直接解析JSON数据,无需进行额外的转换。这大大简化了数据处理的过程。

Extensive support and applications:广泛的支持和应用:几乎所有的现代编程语言都支持JSON,这使得在不同的系统或平台之间交换数据变得简单和通用。

Strong data description capabilities:强大的数据描述能力:JSON不仅可以表示简单的数据类型,如数字、字符串、布尔值等,还可以描述复杂的数据结构,如数组和对象。这使得JSON能够准确地表示各种数据结构,满足各种复杂的数据交换需求。

Flexible data processing:灵活的数据处理:通过Ajax技术,前端可以动态地请求和接收后端返回的JSON数据。然后,前端可以方便地使用这些数据进行页面的动态更新。这种灵活性使得开发人员可以更加灵活地设计和构建Web应用。

• Security concerns, cross-domain requests & SOP, JSONP and CORS.

 安全问题,跨域请求和SOP,JSONP和CORS

Security concerns: Some older websites/books may recommend using eval() to parse JSON sent from a server  ((  eval evaluates or executes an argument– if the argument is a JavaScript statement, evalexecutes this – this is potentially VERY unsafe.

Solution: JSONP

SOP:浏览器限制AJAX请求

CORS: Cross-Origin Resource Sharing跨资源共享,放宽SOP标准,考虑代替JSONP

Block 1 Questions

1. 信息论的主要目标是什么?量化和理解沟通和信息处理的限制

What is the main goal of information theory?

a) To quantify and understand communication and information processing limits

量化和理解沟通和信息处理的限制

b) To develop efficient error correction techniques

c) To design optimal compression algorithms

d) To study the behaviour of noisy channels 

2. 信道容量的定义是什么?在一个信道上的可靠传输的最大速率

What is the definition of channel capacity?

  1. The maximum rate of reliable transmission over a channel

在一个信道上的可靠传输的最大速率

b) The amount of information that can be transmitted in a given time period

c) The resistance of a channel to noise and errors

d) The bandwidth of a communication channel

3. 信息论中的熵是什么?对概率分布中的不确定性或随机性的度量

What is entropy in information theory?

a) The measure of uncertainty or randomness in a probability distribution

对概率分布中的不确定性或随机性的度量

b) The amount of information required to represent symbols in a code

c) The rate of reliable transmission over a channel

d) The compression ratio achieved by a coding algorithm

4. 哪种编码技术被用于最优的信息压缩?霍夫曼编码

Which coding technique is used for optimal information compression?

a) Huffman coding

b) Reed-Solomon coding

c) Parity check coding

d) Hamming coding

5. 哪个定理说明了在一个有噪声的信道上的可靠通信的最大可实现速率?香农的信道容量定理

Which theorem states the maximum achievable rate of reliable communication over a noisy channel?

a) Shannon's channel capacity theorem

b) Source coding theorem

c) Channel coding theorem

d) Huffman coding theorem

6. 纠错代码的目的是什么检测和纠正通信信道中的错误

What is the purpose of error correction codes?

a) To detect and correct errors in a communication channel

b) To compress data for efficient storage

c) To improve the signal-to-noise ratio in a channel

d) To mitigate the effects of noise in a channel

7. 无损压缩和有损压缩之间的主要区别是什么?无损压缩删除一些数据以减少文件大小,而有损压缩牺牲一些数据质量以获得更高的压缩比

What is the main difference between lossless and lossy compression?

a) Lossless compression achieves higher compression ratios than lossy compression.

b) Lossless compression removes some data to reduce file size, while lossy compression sacrifices some data quality for higher compression ratios.

无损压缩删除一些数据以减少文件大小,而有损压缩牺牲一些数据质量以获得更高的压缩比。

c) Lossless compression is used for images, while lossy compression is used for audio and video.

d) Lossless compression is reversible, while lossy compression is not.

8. 哪种技术可以基于输入数据动态地更新编码方案?自适应编码

Which technique dynamically updates the encoding scheme based on input data?

a) Stream coding

b) Adaptive coding

c) Huffman coding

d) Source coding

9. 哪种错误恢复技术涉及到重新传输丢失或损坏的数据包?自动重复请求(ARQ)

Which error recovery technique involves retransmitting lost or corrupted data packets?

a) Forward error correction (FEC)

b) Automatic repeat request (ARQ)

c) Convolutional coding

d) Checksum computation

10. 校验和或循环冗余校验(CRC)的主要目的是什么?检测数据传输中的错误

What is the main purpose of a checksum or cyclic redundancy check (CRC)?

a) To detect errors in a data transmission

b) To correct errors in a data transmission

c) To compress data for efficient storage

d) To encode symbols into binary representations

Block 4 Questions

  1. What is the XML Root?什么是XML根目录

This is a list of collection of something, e.g. books. All XML files have a root, which is the ultimate parent element.

  1. What does verbose mean?冗长的意思是什么

Means using more words than needed. XML uses a lot od characters, so it is verbose, especially so if only sending small pieces.

  1. What is the difference between well-formed vs Valid XML?

Well-formed = grammatically correct; valid = follows the rules, including element names, attribute names, order, datatypes, how many etc. As laid out in a business document such as an XML schema.

  1. What are XML and JSON attributes?

XML has attributes as part of the name tag. JSON cant explicitly model this as an attribute so has to treat this in the same way as other features.

  1. What is “application semantics”?

This is appropriate language used for an application, e.g. for a book: author, aubject, genre, title; for a film: director, prizes, leadActor; for a menu: starter, soup, rice.

  1. Between POST and GET in RESTful, which is safer?

Get, as it doesnt modify the resource(the server state).

  1. What is XML namespacing?

A namespace qualifies a name so as to eliminate ambiguity, e.g. the title of a book, and the title of the author. Uses: (a colon), e.g. b:title and a:title.

  1. Is JSON used exclusively in Web APIs?

No, XML can also be used.

  1. What is the client in RESTful Web APIs?

Another computer (e.g. laptop with browser).

  1. Summary the differences, advantages and disadvantages of CSV, XML and JSON.

These are lister in the JSON slides.

CSV:

• Huge datasets are often provided as CSV files to be read into programmes for statistical analysis ——benefit from smaller file sizes, and can be read by Excel.

• Possibly also a “code book” – details of columns (not machine readable).

• Simple/short – useful for an individual or within organisations/departments (do not have to provide metadata).

CSV disadvantages

• No schemas – application defines the meaning of rows (records/objects) and columns(characteristics/features).

• Have to manually handle changes such as new rows or columns.

• Type: is 1234 a number or a string containing digits? (May need integer conversion.)

Difference between CSV and XML

• CSV – would have to write a parser to add a feature to each line or to modify data——or load into e.g. a Table in a programming language, modify and rewrite back to the file.

• XML is extensible: can use the DOM (Document Object Model) to add, remove and update features——loads document in memory as a tree of nodes.

JSON vs XML

• JSON has increased focus and has advantages over XML (discussed later).

• XML is still used in multiple applications.

• JSON has disadvantages:

– plain JSON does not support hypermedia (key property of RESTful Web APIs)

– How do you model generics or polymorphism if using JSON for Java serialisation?

– security risks: injection; insecure deserialization.

XML & JSON

• Simple, open, interoperable.

• Human Readable (JSON even easier as more restrictive).

• Exchange format.

• Structured data.

• Reusable software to manage XML:

– JSON does not need this as simpler.

• Many views of the same data – JSON is not a document mark-up language so does not provide this.

• Extensible – XML can define new tags…documents require extensibility to handle images, charts, graphs, and other formatting types.

  • 23
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值