题目是为了便于搜索,介意者请忽略。
sqoop 从mysql 导入数据到hive 中 ,如果利用sqoop自动建表,那么mysql 中的decimal到底是对应hive中的double 类型还是string类型。
首先,sqoop 能自动建表并且导入hive 的有两种数据格式 一种是 text file 一种是parquet ,不能自动建表的我们今天在这里不做讨论。
如果是以text file导入hive 的话 ,那么decimal 就对应hive中的double
对应的源码在此
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.sqoop.hive;
import java.sql.Types;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
/**
* Defines conversion between SQL types and Hive types.
*/
public final class HiveTypes {
public static final Log LOG = LogFactory.getLog(HiveTypes.class.getName());
private HiveTypes() { }
/**
* Given JDBC SQL types coming from another database, what is the best
* mapping to a Hive-specific type?
*/
public static String toHiveType(int sqlType) {
switch (sqlType) {
case Types.INTEGER:
case Types.SMALLINT:
return "INT";
case Types.VARCHAR:
case Types.CHAR:
case Types.LONGVARCHAR:
case Types.NVARCHAR:
case Types.NCHAR:
case Types.LONGNVARCHAR:
case Types.DATE:
case Types.TIME:
case Types.TIMESTAMP:
case Types.CLOB:
return "STRING";
case Types.NUMERIC:
case Types.DECIMAL:
case Types.FLOAT:
case Types.DOUBLE:
case Types.REAL:
return "DOUBLE";
case Types.BIT:
case Types.BOOLEAN:
return "BOOLEAN";
case Types.TINYINT:
return "TINYINT";
case Types.BIGINT:
return "BIGINT";
default:
// TODO(aaron): Support BINARY, VARBINARY, LONGVARBINARY, DISTINCT,
// BLOB, ARRAY, STRUCT, REF, JAVA_OBJECT.
return null;
}
}
/**
* @return true if a sql type can't be translated to a precise match
* in Hive, and we have to cast it to something more generic.
*/
public static boolean isHiveTypeImprovised(int sqlType) {
return sqlType == Types.DATE || sqlType == Types.TIME
|| sqlType == Types.TIMESTAMP
|| sqlType == Types.DECIMAL
|| sqlType == Types.NUMERIC;
}
}
如果是parquet方式导入的话
那么 decimal 就对应的是string
/**
* Resolve a database-specific type to Avro data type.
* @param sqlType sql type
* @return avro type
*/
public Type toAvroType(int sqlType) {
switch (sqlType) {
case Types.TINYINT:
case Types.SMALLINT:
case Types.INTEGER:
return Type.INT;
case Types.BIGINT:
return Type.LONG;
case Types.BIT:
case Types.BOOLEAN:
return Type.BOOLEAN;
case Types.REAL:
return Type.FLOAT;
case Types.FLOAT:
case Types.DOUBLE:
return Type.DOUBLE;
case Types.NUMERIC:
case Types.DECIMAL:
return Type.STRING;
case Types.CHAR:
case Types.VARCHAR:
case Types.LONGVARCHAR:
case Types.LONGNVARCHAR:
case Types.NVARCHAR:
case Types.NCHAR:
return Type.STRING;
case Types.DATE:
case Types.TIME:
case Types.TIMESTAMP:
return Type.LONG;
case Types.BLOB:
case Types.BINARY:
case Types.VARBINARY:
case Types.LONGVARBINARY:
return Type.BYTES;
default:
throw new IllegalArgumentException("Cannot convert SQL type "
+ sqlType);
}
}
该方法具体的包与类是
org.apache.sqoop.manager.ConnManager
其实二者之所以会出现同样的类型在导入到hive中就变得不一样 是
二者的建表方式不一样
parquet导入文件的建表方式是(以下是代码中的注释,在此摘抄)
For Parquet file, the import action will create hive table directly via kite
代码如下
/**
* Import a table or query.
* @return true if an import was performed, false otherwise.
*/
protected boolean importTable(SqoopOptions options, String tableName,
HiveImport hiveImport) throws IOException, ImportException {
String jarFile = null;
// Generate the ORM code for the tables.
jarFile = codeGenerator.generateORM(options, tableName);
Path outputPath = getOutputPath(options, tableName);
// Do the actual import.
ImportJobContext context = new ImportJobContext(tableName, jarFile,
options, outputPath);
// If we're doing an incremental import, set up the
// filtering conditions used to get the latest records.
if (!initIncrementalConstraints(options, context)) {
return false;
}
if (options.isDeleteMode()) {
deleteTargetDir(context);
}
if (null != tableName) {
manager.importTable(context);
} else {
manager.importQuery(context);
}
if (options.isAppendMode()) {
AppendUtils app = new AppendUtils(context);
app.append();
} else if (options.getIncrementalMode() == SqoopOptions.IncrementalMode.DateLastModified) {
lastModifiedMerge(options, context);
}
// If the user wants this table to be in Hive, perform that post-load.
if (options.doHiveImport()) {
// For Parquet file, the import action will create hive table directly via
// kite. So there is no need to do hive import as a post step again.
if (options.getFileLayout() != SqoopOptions.FileLayout.ParquetFile) {
hiveImport.importTable(tableName, options.getHiveTableName(), false);
}
}
saveIncrementalState(options);
return true;
}
从此代码中我们可以看到 parquet 的文件到该方法的时候,应该是已经导入到hive表中了