Scala集合
- scala的集合有三大类:序列Seq、集Set、映射Map,所有的集合都扩展自Iterable
- 对于所有的集合,scala都提供的可变和不可变的版本
- 不可变集合:scala.collection.immutable
- 可变集合:scala.collection.mutable
- scala不可变集合,指集合对象不可变,每次修改都会返回一个新的对象,类似java的String
- 可变集合,指可以对原对象进行修改,类似java中的StringBuilder
- 建议在操作集合时,不可变用符号,可变用方法
不可变集合
-
List归属Seq,和java中的概念不同
-
for循环的 1 to 3 就是IndexSeq的Range
-
scala中的Map体系中有一个SortedMap,说明scala的Map支持排序
-
IndexedSeq和LinearSeq的区别:
-
IndexedSeq是通过索引来查找和定位,速度快
-
LinearSeq是线型的,有头和尾的概念,一般通过遍历查找
可变集合
数组
不可变数组
-
val arr = new Array[Int] (10)
-
new是关键字
-
[Int]指定存放的数据类型,如果希望存放任意数据,指定Any
-
(10)表示数组大小,确定后不能变化
-
val arr = new Array(1,2)
-
定义数组,直接赋初值
-
使用apply方法创建数组对象
可变数组
- 定义变长数组
- val arr = ArrayBuffer[Any] (3,2,5)
- ArrayBuffer是有序的集合
- 增加元素使用append(),支持可变参数
- insert 向特定位置插入元素
可变数组与不可变数组的转换
- arr.toBuffer 不可变数组转可变数组(返回结果是可变数组,arr没变)
- arr.toArray 可变数组转不可变数组(同上)
多维数组
- val arr = Array.ofDrim[Double] (3,4)
- 二维数组中有三个一维数组,每个一维数组中有四个元素
列表List
不可变List
- List默认是不可变集合
- 不能更改元素的值
object Test {
def main(args : Array[String]) : Unit = {
// 创建一个List(数据有顺序,可重复)
val list : List[Int] = List(1,2,3,4,3)
println(list) // List(1, 2, 3, 4, 3)
list.foreach(println) // 12343
println(list(1)) // 2
// list(1) = 12 不能操作
// 添加元素
var list2 = list.+:(10) // 开头加
var list3 = list.:+(10) // 结尾加
println(list)
println(list2)
println(list3)
// 输出结果
//List(1, 2, 3, 4, 3)
//List(10, 1, 2, 3, 4, 3)
//List(1, 2, 3, 4, 3, 10)
val list4 = list.::(123)
println(list4) // List(123, 1, 2, 3, 4, 3)
val list5 = Nil.::(13)
println(list5) // List(13)
val list6 = 17 :: 28 :: 59 :: 16 :: Nil
println(list6) // List(17, 28, 59, 16)
val list7 = list6 :: list5
println(list7) // List(List(17, 28, 59, 16), 13)
// 扁平化:整体拆成个体,然后整合成一个list
val list8 = list5 ::: list6
println(list8) //List(13, 17, 28, 59, 16)
val list9 = list5 ++ list6
println(list9) // List(13, 17, 28, 59, 16)
}
}
可变List
- ListBuffer
object Test {
def main(args : Array[String]) : Unit = {
// 创建一个可变的集合
val buffer = ListBuffer(1, 2, 3, 4)
// 向集合中添加元素
buffer.+=(5)
buffer.append(6)
buffer.insert(1, 2)
// 打印数据
buffer.foreach(print) // 1223456
println()
// 修改数据
buffer(1) = 6
buffer.update(1, 7)
buffer.foreach(print) //1723456
println()
// 删除数据
buffer.-(5)
buffer.-=(1)
buffer.remove(2)
buffer.foreach(print) //72456
}
}
集合Set
默认情况下,Scala使用的是不可变集合,如果想使用可变集合,需要引用scala.collection.mutable.Set包
不可变Set
- Set默认是不可变集合,数据无序
- 数据不可重复
object Test {
def main(args : Array[String]) : Unit = {
// Set默认是不可变集合,数据无序
val set = Set(1, 2, 3, 4, 5, 6)
// 数据不可重复
val set1 = Set(1, 2, 3, 4, 5 ,6, 3)
// 遍历集合
for (x <- set1) {
println(x) // 516234
}
}
}
可变mutable.Set
object Test {
def main(args : Array[String]) : Unit = {
// 创建可变集合
val set = mutable.Set(1, 2, 3, 4, 5)
// 添加元素
set += 8
// 向集合中添加元素,返回一个新的Set
val set1 = set.+(9)
println(set1) // Set(9, 1, 5, 2, 3, 4, 8)
println(set) //Set(1, 5, 2, 3, 4, 8)
// 删除数据】
set -= (5)
// 遍历
set.foreach(print) // 12348
println(set.mkString(",")) //1,2,3,4,8
}
}
Map集合
散列表,存储的内容是键值对
不可变Map
object Test {
def main(args : Array[String]) : Unit = {
// 创建不可变集合
val map = Map("a" -> 1, "b" -> 2)
// 访问数据
for (elem <- map.keys) {
// 使用get访问map的数据,返回特殊类型Option(选项),Some(有值),None(无值)
print(elem + "=" + map.get(elem).get) // a=1b=2
}
// 如果key不存在,返回0
println(map.get("d").getOrElse(0)) // 0
println(map.getOrElse("b", 0)) // 2
// 遍历
map.foreach((kv) => {println(kv)})
//(a,1)
//(b,2)
}
}
可变Map
object Test {
def main(args : Array[String]) : Unit = {
// 创建可变集合
val map = mutable.Map("a" -> 1, "b" -> 2)
// 添加数据
map .+= ("c" -> 3)
// 将4添加到集合,把结合中的原值1返回
val maybeInt = map.put("a", 4)
println(maybeInt.getOrElse(0)) //1
// 删除数据
map .-=("b")
// 修改数据
map.update("d", 5)
map("d") = 5
// 遍历
map.foreach((kv) => {println(kv)})
}
}
元组
- 元组将多个和无关的数据封装成一个整体
- 最大有22个元素
object Test {
def main(args : Array[String]) : Unit = {
// 声明元组
val tuple : (Int, String, Boolean) = (20, "jx", true)
// 通过元素的顺序进行访问, _顺序号
println(tuple._1)
println(tuple._2)
println(tuple._3)
// 通过索引访问数据
println(tuple.productElement(0))
// 通过迭代器来访问数据
for (elem <- tuple.productIterator) {
println(elem)
}
// Map中的键值对就是元组,不过元组的元素个数为2,称之为对偶
val map = Map("a" -> 1, "b" -> 2)
val map1 = Map(("a", 1), ("b", 2))
map.foreach(tuple => {println(tuple._1 + "=" + tuple._2)})
map1.foreach(tuple => {println(tuple._1 + "=" + tuple._2)})
/*
* 输出结果:
* 20
jx
true
20
20
jx
true
a=1
b=2
a=1
b=2
* */
}
}
集合常用函数
常用操作
object Test {
def main(args : Array[String]) : Unit = {
var list : List[Int] = List(1, 2, 3, 4, 5, 6)
// 获取数组长度
println(list.length)
// 获取集合大小(等同于length)
println(list.size)
// 循环遍历
list.foreach(println)
// 迭代器
for(elem <- list.iterator) {
println(elem)
}
// 生成字符串
println(list.mkString(","))
// 是否包含
println(list.contain(0))
}
}
衍生集合
object Test {
def main(args : Array[String]) : Unit = {
var list1 : List[Int] = List(1, 2, 3, 4, 5, 6)
var list2 : List[Int] = List(7, 8, 9, 0, 10, 11)
// 获取集合的头
println(list1.head)
// 获取集合的尾(除了头都是尾)
println(list1.tail)
// 集合中的最后一个元素
println(list1.last)
// 集合初始元素(不包含最后一个元素)
println(list1.init)
// 反转
println(list1.reverse)
// 取前(后)n个元素
println(list1.take(3))
println(list1.takeRight(3))
// 去掉前(后)n个元素
println(list1.drop(3))
println(list1.dropRight(3))
// 并集
println(list1.union(list2))
// 交集
println(list1.intersect(list2))
// 差集
println(list1.diff(list2))
// 拉链
// 如果两个集合的元素个数不同,那么会将同等数量的数据进行拉链
println(list1.zip(list2))
// 滑窗
// 2: 窗口大小
// 5: 滑动距离
list1.sliding(2, 5).foreach(println)
/*
* 1
List(2, 3, 4, 5, 6)
6
List(1, 2, 3, 4, 5)
List(6, 5, 4, 3, 2, 1)
List(1, 2, 3)
List(4, 5, 6)
List(4, 5, 6)
List(1, 2, 3)
List(1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 10, 11)
List()
List(1, 2, 3, 4, 5, 6)
List((1,7), (2,8), (3,9), (4,0), (5,10), (6,11))
List(1, 2)
List(6)
* */
}
}
集合计算的简单函数
- sorted:对集合进行自然排序
- sortBy:对一个属性或多个属性进行排序,通过类型
- sortWith:基于函数的排序,通过一个comparator函数,实现自定义排序的逻辑
object test {
def mian(args : Array[String]) : Unit = {
var list : List[Int] = List(1, 2, 3, 4, 5, 6)
// 求和
println(list.sum)
// 求乘积
println(list.product)
// 最大值
println(list.max)
// 最小值
println(list.min)
// 排序
// 按照元素大小
println(list.sortBy(x => x))
// 按照元素绝对值大小排序
println(list.sortBy(x => x.abs))
// 按照元素大小升序排序
println(list.sortWith((x, y) => x < y))
// 按照元素大小降序排列
println(list.sortWith((x, y) => x > y))
/*
* 21
720
6
1
List(1, 2, 3, 4, 5, 6)
List(1, 2, 3, 4, 5, 6)
List(1, 2, 3, 4, 5, 6)
List(6, 5, 4, 3, 2, 1)
*
* */
}
}
集合计算高级函数
Map操作
object test {
def main(args: Array[String]): Unit = {
val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9)
// 过滤
//选取偶数
val evenList = list.filter((elem : Int) => {elem % 2 == 0})
println(evenList)
println(list.filter(_ % 2 == 1))
println("===============")
// map
// 把集合中每个数*2
println(list.map(_ * 2))
// println(list.map(_ * _)) ×
println(list.map(x => x * x))
println("===========")
// 扁平化
val nestedList : List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
// val flatList = nestedList(0) ::: nestedList(1) ::: nestedList(2)
// println(flatList)
println(nestedList)
val flatList = nestedList.flatten
println(flatList)
println("==========")
// 扁平映射
// 将一组字符串进行分词,并保存成单词的列表
val strings : List[String] = List("hello world", "hello scala", "hello spark")
val splitList : List[Array[String]] = strings.map(_.split(" "))
println(splitList)
val flattenList = splitList.flatten // 打散扁平化
println(flattenList)
println("===========")
val flatmapList = strings.flatMap(_.split(" "))
println(flatmapList)
println("=============")
// 分组groupBy
// 分成奇偶两组
val groupMap : Map[Int, List[Int]] = list.groupBy(_ % 2)
println(groupMap)
val groupMap2 : Map[String, List[Int]] = list.groupBy(data => {
if (data % 2 == 0) "偶数" else "奇数"
})
println(groupMap2)
// 给定一组词汇,按照单词的首字母进行分组
val wordList = List("China", "Americ", "asd", "fgh", "jkl")
println(wordList.groupBy(_.charAt(0)))
}
}
/*
* 输出结果:
* List(2, 4, 6, 8)
List(1, 3, 5, 7, 9)
===============
List(2, 4, 6, 8, 10, 12, 14, 16, 18)
List(1, 4, 9, 16, 25, 36, 49, 64, 81)
===========
List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
List(1, 2, 3, 4, 5, 6, 7, 8, 9)
==========
List([Ljava.lang.String;@3532ec19, [Ljava.lang.String;@68c4039c, [Ljava.lang.String;@ae45eb6)
List(hello, world, hello, scala, hello, spark)
===========
List(hello, world, hello, scala, hello, spark)
=============
Map(1 -> List(1, 3, 5, 7, 9), 0 -> List(2, 4, 6, 8))
Map(奇数 -> List(1, 3, 5, 7, 9), 偶数 -> List(2, 4, 6, 8))
Map(j -> List(jkl), f -> List(fgh), A -> List(Americ), a -> List(asd), C -> List(China))
*
* */
reduce操作
object test1 {
def main(args: Array[String]): Unit = {
val list = List(1, 2, 3, 4)
// reduce
// list.reduce((a : Int, b : Int) => a + b)
println(list.reduce(_ + _))
println(list.reduceLeft(_ + _))
println(list.reduceRight(_ + _))
println("================")
val list2 = List(3, 4, 5, 8, 10)
println(list2.reduce(_ - _))
println(list2.reduceLeft(_ - _))
println(list2.reduceRight(_ - _)) // 3 - (4 - (5 - (8 - 10)))
}
}
/*
* 输出结果
* 10
10
10
================
-24
-24
6
* */
fold
object test1 {
def main(args: Array[String]): Unit = {
val list = List(1, 2, 3, 4)
// fold
println(list.fold(10)(_ + _)) // 10 + 1 + 2 + 3 + 4
println(list.foldLeft(10)(_ - _))
println(list.foldRight(10)(_ - _)) // 1 - (2 - (3 - (4 - 10)))
}
}
/*
* 输出结果
* 20
* 0
* 8
* */
案例
实现合并
object test1 {
def main(args: Array[String]): Unit = {
val map1 = Map("a" -> 1, "b" -> 3, "c" -> 6)
val map2 = mutable.Map("a" -> 2, "b" -> 4, "c" -> 5)
println(map1 ++ map2)
println(map2 ++ map1)
val stringToInt = map1.foldLeft(map2)(
(mergedMap, kv) => {
val key = kv._1
val value = kv._2
mergedMap(key) = mergedMap.getOrElse(key, 0) + value
mergedMap
}
)
println(stringToInt)
}
/*
* 输出结果
* Map(a -> 2, b -> 4, c -> 5)
Map(b -> 3, a -> 1, c -> 6)
Map(b -> 7, a -> 3, c -> 11)
* */
}
WordCount
object test1 {
def main(args: Array[String]): Unit = {
val stringList : List[String] = List (
"hello",
"hello world",
"hello scala",
"hello spark from scala",
"hello fink from scala"
)
// 对字符串进行切分,得到一个打散所有单词的列表
val wordList1 : List[Array[String]] = stringList.map(_.split(" "))
val wordList2 : List[String] = wordList1.flatten
println(wordList2)
val wordList = stringList.flatMap(_.split(" "))
println(wordList)
// 对相同的单词进行分组
val groupMap = wordList.groupBy(word => word)
println(groupMap)
// 对分组之后的list取长度,得到每个单词的个数
val countMap = groupMap.map(kv => (kv._1, kv._2.length))
println(countMap)
val sortList : List[(String, Int)] = countMap.toList
.sortWith(_._2 > _._2)
.take(3)
println(sortList)
}
/*
* 输出结果
* List(hello, hello, world, hello, scala, hello, spark, from, scala, hello, fink, from, scala)
List(hello, hello, world, hello, scala, hello, spark, from, scala, hello, fink, from, scala)
Map(fink -> List(fink), world -> List(world), spark -> List(spark), scala -> List(scala, scala, scala), from -> List(from, from), hello -> List(hello, hello, hello, hello, hello))
Map(fink -> 1, world -> 1, spark -> 1, scala -> 3, from -> 2, hello -> 5)
List((fink,1), (world,1), (spark,1), (scala,3), (from,2), (hello,5))
List((hello,5), (scala,3), (from,2))
*
* */
}
object test1 {
def main(args: Array[String]): Unit = {
val tupleList : List[(String, Int)] = List (
("hello", 1),
("hello world", 2),
("hello scala", 3),
("hello spark from scala", 1),
("hello fink from scala", 2)
)
// 思路一:直接展开尾普通版本
val newStringList = tupleList.map(
kv => {
(kv._1.trim + " ") * kv._2
}
)
println(newStringList)
val wordCountList = newStringList
.flatMap(_.split(" ")) // 空格分词
.groupBy(word => word) // 单词分组
.map(kv => (kv._1, kv._2.size)) // 统计出每个单词的个数
.toList
.sortBy(_._2)(Ordering[Int].reverse)
.take(3)
println(wordCountList)
// 思路二
val preCountList : List[(String, Int)] = tupleList.flatMap(
tuple => {
val strings : Array[String] = tuple._1.split(" ")
strings.map(word => (word, tuple._2))
}
)
println(preCountList)
// 对二元组按照单词进行分组
val preCountMap = preCountList.groupBy(_._1)
println(preCountMap)
// 叠加每个单词的个数值
val countMap = preCountMap.mapValues(
tupleList => tupleList.map(_._2).sum
)
println(countMap)
// 转换成List,排序取前三
val countList = countMap.toList
.sortWith(_._2 > _._2)
.take(3)
println(countList)
}
/*
输出结果
List(hello , hello world hello world , hello scala hello scala hello scala , hello spark from scala , hello fink from scala hello fink from scala )
List((hello,9), (scala,6), (from,3))
List((hello,1), (hello,2), (world,2), (hello,3), (scala,3), (hello,1), (spark,1), (from,1), (scala,1), (hello,2), (fink,2), (from,2), (scala,2))
Map(fink -> List((fink,2)), world -> List((world,2)), spark -> List((spark,1)), scala -> List((scala,3), (scala,1), (scala,2)), from -> List((from,1), (from,2)), hello -> List((hello,1), (hello,2), (hello,3), (hello,1), (hello,2)))
Map(fink -> 2, world -> 2, spark -> 1, scala -> 6, from -> 3, hello -> 9)
List((hello,9), (scala,6), (from,3))
* */
}
队列
object test1 {
def main(args: Array[String]): Unit = {
// 创建一个可变队列
val q = new mutable.Queue[String]()
q.enqueue("a", "b", "c", "d", "fff")
println(q)
println(q.dequeue())
/*
* Queue(a, b, c, d, fff)
a
* */
}
}
并行集合
scala提供了并行集合,可用于多核环境的并行计算
object test1 {
def main(args: Array[String]): Unit = {
val strings = (1 to 100).map(
x => Thread.currentThread.getName
)
println(strings)
val strings2 = (1 to 100).par.map(
x => Thread.currentThread.getName
)
println(strings2)
/*
* Vector(main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main, main)
ParVector(scala-execution-context-global-20, scala-execution-context-global-20, scala-execution-context-global-20, scala-execution-context-global-20, scala-execution-context-global-20, scala-execution-context-global-20, scala-execution-context-global-30, scala-execution-context-global-30, scala-execution-context-global-30, scala-execution-context-global-20, scala-execution-context-global-30, scala-execution-context-global-20, scala-execution-context-global-26, scala-execution-context-global-26, scala-execution-context-global-26, scala-execution-context-global-29, scala-execution-context-global-28, scala-execution-context-global-25, scala-execution-context-global-33, scala-execution-context-global-33, scala-execution-context-global-33, scala-execution-context-global-31, scala-execution-context-global-32, scala-execution-context-global-34, scala-execution-context-global-22, scala-execution-context-global-22, scala-execution-context-global-22, scala-execution-context-global-22, scala-execution-context-global-22, scala-execution-context-global-22, scala-execution-context-global-22, scala-execution-context-global-34, scala-execution-context-global-34, scala-execution-context-global-34, scala-execution-context-global-34, scala-execution-context-global-34, scala-execution-context-global-34, scala-execution-context-global-25, scala-execution-context-global-25, scala-execution-context-global-25, scala-execution-context-global-32, scala-execution-context-global-32, scala-execution-context-global-32, scala-execution-context-global-31, scala-execution-context-global-31, scala-execution-context-global-31, scala-execution-context-global-31, scala-execution-context-global-31, scala-execution-context-global-31, scala-execution-context-global-31, scala-execution-context-global-21, scala-execution-context-global-21, scala-execution-context-global-21, scala-execution-context-global-32, scala-execution-context-global-32, scala-execution-context-global-32, scala-execution-context-global-28, scala-execution-context-global-28, scala-execution-context-global-28, scala-execution-context-global-28, scala-execution-context-global-28, scala-execution-context-global-28, scala-execution-context-global-24, scala-execution-context-global-24, scala-execution-context-global-24, scala-execution-context-global-24, scala-execution-context-global-24, scala-execution-context-global-24, scala-execution-context-global-29, scala-execution-context-global-29, scala-execution-context-global-29, scala-execution-context-global-29, scala-execution-context-global-29, scala-execution-context-global-29, scala-execution-context-global-29, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-23, scala-execution-context-global-27, scala-execution-context-global-27, scala-execution-context-global-27, scala-execution-context-global-27, scala-execution-context-global-27, scala-execution-context-global-27, scala-execution-context-global-35, scala-execution-context-global-35, scala-execution-context-global-35, scala-execution-context-global-24, scala-execution-context-global-25, scala-execution-context-global-32, scala-execution-context-global-22)
* */
}
}