java.net.SocketTimeoutException: Read timed out

最新推荐文章于 2024-09-06 15:44:58 发布

老农民挖数据

最新推荐文章于 2024-09-06 15:44:58 发布

阅读量7.6k

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/shushugood/article/details/51973692

版权

hadoop 专栏收录该内容

37 篇文章 0 订阅

订阅专栏

今天在处理impala同步过程，遇到了一个报错，貌似连接满了。

直接上图：

说明：

hive.metastore.client.socket.timeout
说明: Client socket 的超时时间
默认值：20秒

原因：这里有可能是通信网络不好，网卡问题，延迟等问题，

所以通常可以设置500秒

处理步骤

hive> set hive.metastore.client.socket.timeout=500;

在所有节点执行。

然后重启程序，问题解决。

分析：

然后查看连接数状态

netstat -anp |grep 21050 |wc ,可以找一个少一点作为入口。原则上集群每个节点的连接数在150-300之间，多了有负担。

hive.exec.max.created.files
说明：所有hive运行的map与reduce任务可以产生的文件的和
默认值:100000 
hive.exec.dynamic.partition
说明：是否为自动分区
默认值：false
hive.mapred.reduce.tasks.speculative.execution
说明：是否打开推测执行
默认值：true
hive.input.format
说明：Hive默认的input format
默认值： org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
如果有问题可以使用org.apache.hadoop.hive.ql.io.HiveInputFormat
hive.exec.counters.pull.interval
说明：Hive与JobTracker拉取counter信息的时间
默认值：1000ms 
hive.script.recordreader
说明：使用脚本时默认的读取类
默认值： org.apache.hadoop.hive.ql.exec.TextRecordReader
hive.script.recordwriter
说明：使用脚本时默认的数据写入类
默认值： org.apache.hadoop.hive.ql.exec.TextRecordWriter
hive.mapjoin.check.memory.rows
说明： 内存里可以存储数据的行数
默认值： 100000
hive.mapjoin.smalltable.filesize
说明：输入小表的文件大小的阀值，如果小于该值，就采用普通的join
默认值： 25000000
hive.auto.convert.join
说明：是不是依据输入文件的大小，将Join转成普通的Map Join
默认值： false
hive.mapjoin.followby.gby.localtask.max.memory.usage
说明：map join做group by 操作时，可以使用多大的内存来存储数据，如果数据太大，则不会保存在内存里
默认值：0.55
hive.mapjoin.localtask.max.memory.usage
说明：本地任务可以使用内存的百分比
默认值： 0.90
hive.heartbeat.interval
说明：在进行MapJoin与过滤操作时，发送心跳的时间
默认值1000
hive.merge.size.per.task
说明： 合并后文件的大小
默认值： 256000000
hive.mergejob.maponly
说明： 在只有Map任务的时候 合并输出结果
默认值： true
hive.merge.mapredfiles
默认值： 在作业结束的时候是否合并小文件
说明： false
hive.merge.mapfiles
说明：Map-Only Job是否合并小文件
默认值：true
hive.hwi.listen.host
说明：Hive UI 默认的host
默认值：0.0.0.0
hive.hwi.listen.port
说明：Ui监听端口
默认值：9999
hive.exec.parallel.thread.number
说明：hive可以并行处理Job的线程数
默认值：8
hive.exec.parallel
说明：是否并行提交任务
默认值：false
hive.exec.compress.output
说明：输出使用压缩
默认值： false
hive.mapred.mode
说明： MapReduce的操作的限制模式，操作的运行在该模式下没有什么限制
默认值： nonstrict
hive.join.cache.size
说明： join操作时，可以存在内存里的条数
默认值： 25000
hive.mapjoin.cache.numrows
说明： mapjoin 存在内存里的数据量
默认值：25000
hive.join.emit.interval
说明： 有连接时Hive在输出前，缓存的时间
默认值： 1000
hive.optimize.groupby
说明：在做分组统计时，是否使用bucket table
默认值： true
hive.fileformat.check
说明：是否检测文件输入格式
默认值：true
hive.metastore.client.connect.retry.delay
说明： client 连接失败时,retry的时间间隔
默认值：1秒
hive.metastore.client.socket.timeout
说明:  Client socket 的超时时间
默认值：20秒
mapred.reduce.tasks
默认值：-1
说明：每个任务reduce的默认值
 -1 代表自动根据作业的情况来设置reduce的值 
hive.exec.reducers.bytes.per.reducer
默认值： 1000000000 （1G）
说明：每个reduce的接受的数据量
    如果送到reduce的数据为10G,那么将生成10个reduce任务 
hive.exec.reducers.max
默认值：999
说明： reduce的最大个数      
hive.exec.reducers.max
默认值：999
说明： reduce的最大个数
hive.metastore.warehouse.dir
默认值：/user/hive/warehouse
说明： 默认的数据库存放位置
hive.default.fileformat
默认值：TextFile
说明： 默认的fileformat
hive.map.aggr
默认值：true
说明： Map端聚合，相当于combiner
hive.exec.max.dynamic.partitions.pernode
默认值：100
说明：每个任务节点可以产生的最大的分区数
hive.exec.max.dynamic.partitions
默认值：1000
说明： 默认的可以创建的分区数
hive.metastore.server.max.threads
默认值：100000
说明： metastore默认的最大的处理线程数
hive.metastore.server.min.threads
默认值：200
说明： metastore默认的最小的处理线程数
 
转载请注明出处【 http://sishuok.com/forum/blogPost/list/0/6225.html】