【DataStructure】Another usage of Map: Concordance

Statements: This blog was written by me, but most of content  is quoted from book【Data Structure with Java Hubbard】 


【Description】

Aconcordanceis a list of words that appear in a textdocument along with the numbers of the lines on which the words appear. It is just like an index of a book except that it lists line numbers instead of page.numbers. Concordances are useful for analyzing documents to find word frequencies and associations that are not evident from reading the document directly. This program builds a concordance for a text file. The run here uses this particular text taken from Shakespeare’s play Julius Caesar. The first part of the resulting concordance is shown on the right.

【Implement】

package com.albertshao.ds.map;

//  Data Structures with Java, Second Edition
//  by John R. Hubbard
//  Copyright 2007 by McGraw-Hill

import java.io.*;
import java.util.*;

public class Concordance {
  private Map<String,String> map = new HashMap<String,String>();
  
  public Concordance(String file) {
    int lineNumber = 0;
    try {
      Scanner input = new Scanner(new File(file));
      while (input.hasNextLine()) {
        String line = input.nextLine();
        ++lineNumber;
        StringTokenizer parser = new StringTokenizer(line,",.;:()-!?' ");
        while (parser.hasMoreTokens()) {
          String word = parser.nextToken().toUpperCase();
          String listing = map.get(word);
          if (listing == null) {
            listing = "" + lineNumber;
          } else {
            listing += ", " + lineNumber;
          }
          map.put(word,listing);
        }
      }
      input.close();
    } catch(IOException e) {
      System.out.println(e);
    }
  }
  
  public void write(String file) {
    try {
      PrintWriter output = new PrintWriter(file);
      for (Map.Entry<String,String> entry : map.entrySet()) {
        output.println(entry);
      }
      output.close();
    } catch(IOException e) {
      System.out.println(e);
    }
  }
}

package com.albertshao.ds.map;

//  Data Structures with Java, Second Edition
//  by John R. Hubbard
//  Copyright 2007 by McGraw-Hill


public class TestConcordance {
  public static final String PATH = "D:\\machao\\DataStructure\\src\\com\\albertshao\\ds\\map\\";
  public static final String IN_FILE = "Shakespeare.txt";
  public static final String OUT_FILE = "Shakespeare.out";

  public static void main(String[] args) {
    Concordance c = new Concordance(PATH+IN_FILE);
    c.write(PATH+OUT_FILE);
  }
}

【Result】

The content in the Shakespeare.txt:
<span style="font-family:Arial;">Friends, Romans, countrymen, lend me your ears!
I come to bury Caesar, not to praise him.
The evil that men do lives after them,
The good is oft interred with their bones;
So let it be with Caesar. The noble Brutus
Hath told you Caesar was ambitious;
If it were so, it was a grievous fault;
And grievously hath Caesar answer'd it.
Here, under leave of Brutus and the rest, --
For Brutus is an honourable man;
So are they all, all honourable men.
Come I to speak in Caesar's funeral.
He was my friend, faithful and just to me.
But Brutus says he was ambitious;
And Brutus is an honourable man.
He hath brought many captives home to Rome.
Whose ransoms did the general coffers fill:
Did this in Caesar seem ambitious?
When that the poor have cried, Caesar hath wept;
Ambition should be made of sterner stuff.
Yet Brutus says he was ambitious;
And Brutus is an honourable man.
You all did see that on the Lupercal
I thrice presented him with a kingly crown,
Which he did thrice refuse: was this ambition?
Yet Brutus says he was ambitious;
And, sure, is an honourable man.
I speak not to disprove what Brutus spoke,
But here I am to speak what I do know.
You all did love him once, not without cause.
What cause withholds you, then, to mourn for him?
O judgement! thou art fled to brutish beasts,
And men have lost their reason!
</span>

The result also means the content in the Shakespeare.out:
<span style="font-family:Arial;font-size:14px;">GRIEVOUS=7
WHAT=28, 29, 31
KINGLY=24
REST=9
JUDGEMENT=32
SURE=27
CAUSE=30, 31
REFUSE=25
ME=1, 13
DO=3, 29
THEIR=4, 33
FUNERAL=12
NOT=2, 28, 30
YET=21, 26
CAESAR=2, 5, 6, 8, 12, 18, 19
LEAVE=9
THAT=3, 19, 23
COFFERS=17
HIM=2, 24, 30, 31
ARE=11
MADE=20
MY=13
CROWN=24
MOURN=31
FRIEND=13
THIS=18, 25
CAPTIVES=16
OFT=4
PRAISE=2
ROMANS=1
YOU=6, 23, 30, 31
HERE=9, 29
BURY=2
GRIEVOUSLY=8
WITHHOLDS=31
D=8
BEASTS=32
A=7, 24
O=32
LEND=1
WITHOUT=30
I=2, 12, 24, 28, 29, 29
SAYS=14, 21, 26
ANSWER=8
ON=23
CRIED=19
BUT=14, 29
STUFF=20
WEPT=19
ART=32
YOUR=1
S=12
OF=9, 20
AMBITIOUS=6, 14, 18, 21, 26
MANY=16
FLED=32
GENERAL=17
HE=13, 14, 16, 21, 25, 26
INTERRED=4
MEN=3, 11, 33
EVIL=3
FRIENDS=1
POOR=19
NOBLE=5
KNOW=29
WHOSE=17
LUPERCAL=23
BRUTISH=32
FAULT=7
THE=3, 4, 5, 9, 17, 19, 23
WERE=7
FOR=10, 31
THEY=11
THRICE=24, 25
AND=8, 9, 13, 15, 22, 27, 33
IF=7
UNDER=9
THEM=3
THEN=31
SEE=23
IN=12, 18
FILL=17
IS=4, 10, 15, 22, 27
ROME=16
IT=5, 7, 7, 8
WAS=6, 7, 13, 14, 21, 25, 26
ALL=11, 11, 23, 30
HAVE=19, 33
TOLD=6
LOST=33
ONCE=30
FAITHFUL=13
BRUTUS=5, 9, 10, 14, 15, 21, 22, 26, 28
AM=29
WITH=4, 5, 24
AN=10, 15, 22, 27
WHICH=25
HONOURABLE=10, 11, 15, 22, 27
TO=2, 2, 12, 13, 16, 28, 29, 31, 32
SPOKE=28
SHOULD=20
LIVES=3
BONES=4
BE=5, 20
AFTER=3
COUNTRYMEN=1
SPEAK=12, 28, 29
DID=17, 18, 23, 25, 30
AMBITION=20, 25
COME=2, 12
SEEM=18
REASON=33
BROUGHT=16
LOVE=30
STERNER=20
MAN=10, 15, 22, 27
WHEN=19
DISPROVE=28
PRESENTED=24
SO=5, 7, 11
HATH=6, 8, 16, 19
JUST=13
HOME=16
THOU=32
EARS=1
GOOD=4
LET=5
RANSOMS=17
</span>



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值