sqoop mysql decimal hive到底是double 还是string

最新推荐文章于 2023-07-01 20:05:38 发布

罗密欧V1

最新推荐文章于 2023-07-01 20:05:38 发布

阅读量1.1k

点赞数

分类专栏： sqoop 大数据

本文链接：https://blog.csdn.net/weixin_41660300/article/details/115967175

版权

大数据同时被 2 个专栏收录

14 篇文章 0 订阅

订阅专栏

sqoop

3 篇文章 0 订阅

订阅专栏

题目是为了便于搜索，介意者请忽略。

sqoop 从mysql 导入数据到hive 中，如果利用sqoop自动建表，那么mysql 中的decimal到底是对应hive中的double 类型还是string类型。
首先，sqoop 能自动建表并且导入hive 的有两种数据格式一种是 text file 一种是parquet ，不能自动建表的我们今天在这里不做讨论。
如果是以text file导入hive 的话，那么decimal 就对应hive中的double
对应的源码在此

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.sqoop.hive;

import java.sql.Types;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

/**
 * Defines conversion between SQL types and Hive types.
 */
public final class HiveTypes {

  public static final Log LOG = LogFactory.getLog(HiveTypes.class.getName());

  private HiveTypes() { }

  /**
   * Given JDBC SQL types coming from another database, what is the best
   * mapping to a Hive-specific type?
   */
  public static String toHiveType(int sqlType) {

      switch (sqlType) {
          case Types.INTEGER:
          case Types.SMALLINT:
              return "INT";
          case Types.VARCHAR:
          case Types.CHAR:
          case Types.LONGVARCHAR:
          case Types.NVARCHAR:
          case Types.NCHAR:
          case Types.LONGNVARCHAR:
          case Types.DATE:
          case Types.TIME:
          case Types.TIMESTAMP:
          case Types.CLOB:
              return "STRING";
          case Types.NUMERIC:
          case Types.DECIMAL:
          case Types.FLOAT:
          case Types.DOUBLE:
          case Types.REAL:
              return "DOUBLE";
          case Types.BIT:
          case Types.BOOLEAN:
              return "BOOLEAN";
          case Types.TINYINT:
              return "TINYINT";
          case Types.BIGINT:
              return "BIGINT";
          default:
        // TODO(aaron): Support BINARY, VARBINARY, LONGVARBINARY, DISTINCT,
        // BLOB, ARRAY, STRUCT, REF, JAVA_OBJECT.
        return null;
      }
  }

  /**
   * @return true if a sql type can't be translated to a precise match
   * in Hive, and we have to cast it to something more generic.
   */
  public static boolean isHiveTypeImprovised(int sqlType) {
    return sqlType == Types.DATE || sqlType == Types.TIME
        || sqlType == Types.TIMESTAMP
        || sqlType == Types.DECIMAL
        || sqlType == Types.NUMERIC;
  }
}

如果是parquet方式导入的话
那么 decimal 就对应的是string

/**
   * Resolve a database-specific type to Avro data type.
   * @param sqlType     sql type
   * @return            avro type
   */
  public Type toAvroType(int sqlType) {
    switch (sqlType) {
    case Types.TINYINT:
    case Types.SMALLINT:
    case Types.INTEGER:
      return Type.INT;
    case Types.BIGINT:
      return Type.LONG;
    case Types.BIT:
    case Types.BOOLEAN:
      return Type.BOOLEAN;
    case Types.REAL:
      return Type.FLOAT;
    case Types.FLOAT:
    case Types.DOUBLE:
      return Type.DOUBLE;
    case Types.NUMERIC:
    case Types.DECIMAL:
      return Type.STRING;
    case Types.CHAR:
    case Types.VARCHAR:
    case Types.LONGVARCHAR:
    case Types.LONGNVARCHAR:
    case Types.NVARCHAR:
    case Types.NCHAR:
      return Type.STRING;
    case Types.DATE:
    case Types.TIME:
    case Types.TIMESTAMP:
      return Type.LONG;
    case Types.BLOB:
    case Types.BINARY:
    case Types.VARBINARY:
    case Types.LONGVARBINARY:
      return Type.BYTES;
    default:
      throw new IllegalArgumentException("Cannot convert SQL type "
          + sqlType);
    }
  }

该方法具体的包与类是
org.apache.sqoop.manager.ConnManager
其实二者之所以会出现同样的类型在导入到hive中就变得不一样是
二者的建表方式不一样
parquet导入文件的建表方式是（以下是代码中的注释，在此摘抄）
For Parquet file, the import action will create hive table directly via kite

代码如下

/**
   * Import a table or query.
   * @return true if an import was performed, false otherwise.
   */
  protected boolean importTable(SqoopOptions options, String tableName,
      HiveImport hiveImport) throws IOException, ImportException {
    String jarFile = null;

    // Generate the ORM code for the tables.
    jarFile = codeGenerator.generateORM(options, tableName);

    Path outputPath = getOutputPath(options, tableName);

    // Do the actual import.
    ImportJobContext context = new ImportJobContext(tableName, jarFile,
        options, outputPath);

    // If we're doing an incremental import, set up the
    // filtering conditions used to get the latest records.
    if (!initIncrementalConstraints(options, context)) {
      return false;
    }

    if (options.isDeleteMode()) {
      deleteTargetDir(context);
    }

    if (null != tableName) {
      manager.importTable(context);
    } else {
      manager.importQuery(context);
    }

    if (options.isAppendMode()) {
      AppendUtils app = new AppendUtils(context);
      app.append();
    } else if (options.getIncrementalMode() == SqoopOptions.IncrementalMode.DateLastModified) {
      lastModifiedMerge(options, context);
    }

    // If the user wants this table to be in Hive, perform that post-load.
    if (options.doHiveImport()) {
      // For Parquet file, the import action will create hive table directly via
      // kite. So there is no need to do hive import as a post step again.
      if (options.getFileLayout() != SqoopOptions.FileLayout.ParquetFile) {
        hiveImport.importTable(tableName, options.getHiveTableName(), false);
      }
    }

    saveIncrementalState(options);

    return true;
  }

从此代码中我们可以看到 parquet 的文件到该方法的时候，应该是已经导入到hive表中了

罗密欧V1

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
sqoop mysql decimal hive到底是double 还是string

题目是为了便于搜索，介意者请忽略。sqoop 从mysql 导入数据到hive 中，如果利用sqoop自动建表，那么mysql 中的decimal到底是对应hive中的double 类型还是string类型。首先，sqoop 能自动建表并且导入hive 的有两种数据格式一种是 text file 一种是parquet ，不能自动建表的我们今天在这里不做讨论。如果是以text file导入hive 的话，那么decimal 就对应hive中的double对应的源码在此/** * Licens
复制链接

扫一扫