Parsing Strings with split

http://pages.cs.wisc.edu/~hasti/cs302/examples/Parsing/parseString.html


Parsing

parsing
dividing a string into tokens based on the given delimiters
token
one piece of information, a "word"
delimiter
one (or more) characters used to separate tokens

When we have a situation where strings contain multiple pieces of information (for example,when reading in data from a file on a line-by-line basis), then we will need to parse (i.e., divideup) the string to extract the individual pieces.

Parsing Strings in Java

Strings in Java can be parsed using the split method of the String class.  ( StringTokenizer can also be used to parse a string; we won't be covering it here).  This just gives a brief overview (and some examples) of some of the common (and easiest) ways to use the split method; for more detailed information see the Java API documentation for split.

Issues to consider when parsing a string:

  • What are the delimiters (and how many are there)?
  • How should consecutive delimiters be treated?

When there is just one character used as a delimiter

Example 1

We want to divide up a phrase into words where spaces are used to separate words. For example

the music made   it   hard      to        concentrate
In this case,we have just one delimiter (space) and consecutive delimiters (i.e., several spaces in a row) should be treated as one delimiter. To parse this string in Java, we do
String phrase = "the music made   it   hard      to        concentrate";
String delims = "[ ]+";
String[] tokens = phrase.split(delims);

Note that

  • the general form for specifying the delimiters that we will use is "[delim_characters]+" . (This form is a kind of regular expression.You don't need to know about regular expressions - just use the template shown here.) The plus sign (+) is usedto indicate that consecutive delimiters should be treated as one.
  • the split method returns an array containing the tokens (as strings).  To see what the tokens are, just use a for loop:
    for (int i = 0; i < tokens.length; i++)
        System.out.println(tokens[i]);
    
    You should find that there are seven tokens: the, music, made, it, hard, to, concentrate
Example 2

Suppose each string contains an employee's last name, first name, employee ID#, and the number ofhours worked for each day of the week, separated by commas. So

Smith,Katie,3014,,8.25,6.5,,,10.75,8.5
represents an employee named Katie Smith, whose ID was 3014, and who worked 8.25 hours on Monday,6.5 hours on Tuesday, 10.75 hours on Friday, and 8.5 hours on Saturday. In this case, we have just one delimiter (comma) and consecutive delimiters (i.e., more than one comma in a row) should not be treated as one.  To parse this string, we do
String employee = "Smith,Katie,3014,,8.25,6.5,,,10.75,8.5";
String delims = "[,]";
String[] tokens = employee.split(delims);

After this code executes, the tokens array will contain ten strings (note the empty strings):"Smith", "Katie", "3014", "", "8.25", "6.5", "", "", "10.75", "8.5"

There is one small wrinkle to be aware of (regardless of how consecutive delimiters are handled): if the string starts with one (or more) delimiters, then the first token will be the empty string ("").

When there are several characters being used as delimiters

Example 3

Suppose we have a string containing several English sentences that uses only commas, periods, question marks, and exclamation points as punctuation.  We wish to extract the individual words in the string (excluding the punctuation).  In this situation we have several delimiters (the punctuation marks as well as spaces) and we want to treat consecutive delimiters as one

String str = "This is a sentence.  This is a question, right?  Yes!  It is.";
String delims = "[ .,?!]+";
String[] tokens = str.split(delims);

All we had to do was list all the delimiter characters inside the square brackets ( [ ] ).

Example 4

Suppose we are representing arithmetic expressions using strings and wish to parse out the operands (that is, use the arithmetic operators as delimiters).  The arithmetic operators that we will allow are addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (^) and we will not allow parentheses (to make it a little simpler).  This situation is not as straight-forward as it might seem.  There are several characters that have a special meaning when they appear inside [ ].  The characters are ^  -  [  and two &s in a row(&&). In order to use one of these characters, we need to put \\ in front of the character:

String expr = "2*x^3 - 4/5*y + z^2";
String delims = "[+\\-*/\\^ ]+"; // so the delimiters are:  + - * / ^ space
String[] tokens = expr.split(delims);

General template for using split

String s = string_to_parse;
String delims = "[delimiters]+"; // use + to treat consecutive delims as one;
                                 // omit to treat consecutive delims separately
String[] tokens = s.split(delims);

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值