spark row在java和scala中实例化的方法

最新推荐文章于 2023-02-02 21:49:11 发布

zxfBdd

最新推荐文章于 2023-02-02 21:49:11 发布

阅读量3.9k

点赞数

分类专栏： spark 大数据

本文链接：https://blog.csdn.net/u011250186/article/details/105392979

版权

大数据同时被 2 个专栏收录

595 篇文章 30 订阅

订阅专栏

spark

63 篇文章 4 订阅

订阅专栏

scala中实例化方法:

It is invalid to use the native primitive interface to retrieve a value that is null, instead a user must check isNullAt before attempting to retrieve a value that might be null.

To create a new Row, use RowFactory.create() in Java or Row.apply() in Scala.

A Row object can be constructed by providing field values. Example:

import org.apache.spark.sql._

// Create a Row from values.
Row(value1, value2, value3, ...)
// Create a Row from a Seq of values.
Row.fromSeq(Seq(value1, value2, ...))

A value of a row can be accessed through both generic access by ordinal, which will incur boxing overhead for primitives, as well as native primitive access. An example of generic access by ordinal:

import org.apache.spark.sql._

val row = Row(1, true, "a string", null)
// row: Row = [1,true,a string,null]
val firstValue = row(0)
// firstValue: Any = 1
val fourthValue = row(3)
// fourthValue: Any = null

For native primitive access, it is invalid to use the native primitive interface to retrieve a value that is null, instead a user must check isNullAt before attempting to retrieve a value that might be null. An example of native primitive access:

// using the row from the previous example.
val firstValue = row.getInt(0)
// firstValue: Int = 1
val isNull = row.isNullAt(3)
// isNull: Boolean = true

java中实例化方法:

The following are top voted examples for showing how to use org.apache.spark.sql.Row. These examples are extracted from open source projects. You can vote up the examples you like and your votes will be used in our system to generate more good examples.

Example 1

Project: uberscriptquery File: SparkUtilsTest.java Source Code and License

14 votes

@Test
public void test_getDataSetResult() {

    StructField[] structFields = new StructField[]{
            new StructField("intColumn", DataTypes.IntegerType, true, Metadata.empty()),
            new StructField("stringColumn", DataTypes.StringType, true, Metadata.empty())
    };

    StructType structType = new StructType(structFields);

    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(1, "v1"));
    rows.add(RowFactory.create(2, "v2"));

    Dataset<Row> df = sparkSession.createDataFrame(rows, structType);

    DataSetResult dataSetResult = SparkUtils.getDataSetResult(df);
    Assert.assertEquals(2, dataSetResult.getColumnNames().size());
    Assert.assertEquals(2, dataSetResult.getRows().size());
    Assert.assertEquals(new Integer(1), dataSetResult.getRows().get(0).get(0));
    Assert.assertEquals("v1", dataSetResult.getRows().get(0).get(1));
    Assert.assertEquals(new Integer(2), dataSetResult.getRows().get(1).get(0));
    Assert.assertEquals("v2", dataSetResult.getRows().get(1).get(1));
}

Example 2

Project: bunsen File: Loinc.java Source Code and License

11 votes

/**
 * Reads the LOINC mutliaxial hierarchy file and converts it to a {@link HierarchicalElement}
 * dataset.
 *
 * @param spark the Spark session
 * @param loincHierarchyPath path to the multiaxial hierarchy CSV
 * @return a dataset of {@link HierarchicalElement} representing the hierarchical relationship.
 */
public static Dataset<HierarchicalElement> readMultiaxialHierarchyFile(SparkSession spark,
    String loincHierarchyPath) {

  return spark.read()
      .option("header", true)
      .csv(loincHierarchyPath)
      .select(col("IMMEDIATE_PARENT"), col("CODE"))
      .where(col("IMMEDIATE_PARENT").isNotNull()
          .and(col("IMMEDIATE_PARENT").notEqual(lit(""))))
      .where(col("CODE").isNotNull()
          .and(col("CODE").notEqual(lit(""))))
      .map((MapFunction<Row, HierarchicalElement>) row -> {

        HierarchicalElement element = new HierarchicalElement();

        element.setAncestorSystem(LOINC_CODE_SYSTEM_URI);
        element.setAncestorValue(row.getString(0));

        element.setDescendantSystem(LOINC_CODE_SYSTEM_URI);
        element.setDescendantValue(row.getString(1));

        return element;
      }, Hierarchies.getHierarchicalElementEncoder());
}

Example 3

Project: gaffer-doc File: GetDataFrameOfElementsExample.java Source Code and License

9 votes

public void getDataFrameOfElementsWithEntityGroup() {
    // ---------------------------------------------------------
    final GetDataFrameOfElements operation = new GetDataFrameOfElements.Builder()
            .view(new View.Builder()
                    .entity("entity")
                    .build())
            .build();
    // ---------------------------------------------------------

    final Dataset<Row> df = runExample(operation, null);

    // Restrict to entities involving certain vertices
    final Dataset<Row> seeded = df.filter("vertex = 1 OR vertex = 2");
    String result = seeded.showString(100, 20);
    printJava("df.filter(\"vertex = 1 OR vertex = 2\").show();");
    print("The results are:\n");
    print("```");
    print(result.substring(0, result.length() - 2));
    print("```");

    // Filter by property
    final Dataset<Row> filtered = df.filter("count > 1");
    result = filtered.showString(100, 20);
    printJava("df.filter(\"count > 1\").show();");
    print("The results are:\n");
    print("```");
    print(result.substring(0, result.length() - 2));
    print("```");
}

Example 4

Project: PRoST File: JoinTree.java Source Code and License

7 votes

public Dataset<Row> computeJoins(SQLContext sqlContext){
	// compute all the joins
	Dataset<Row> results = node.computeJoinWithChildren(sqlContext);
	// select only the requested result
	Column [] selectedColumns = new Column[node.projection.size()];
	for (int i = 0; i < selectedColumns.length; i++) {
		selectedColumns[i]= new Column(node.projection.get(i));
	}

	// if there is a filter set, apply it
	results =  filter == null ? results.select(selectedColumns) : results.filter(filter).select(selectedColumns);
	
	// if results are distinct
	if(selectDistinct) results = results.distinct();
	
	return results;
	
}

Example 5

Project: integrations File: ClerkOfCourtsDemo2010.java Source Code and License

7 votes

public static String getSubjectIdentification( Row row ) {
    String name = row.getAs( "Defendant Name" );
    String gender = row.getAs( "Gender" );
    String race = row.getAs( "Race" );
    String dob = row.getAs( "DOB" );

    StringBuilder sb = new StringBuilder();
    sb
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( name ) ) )
            .append( "|" )
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( gender ) ) )
            .append( "|" )
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( race ) ) )
            .append( "|" )
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( dob ) ) );
    return sb.toString();
}

Example 6

Project: Explainer File: ExplainerUtils.java Source Code and License

7 votes

public static List<List<Double>> constructListWithColumnNames(DataFrame dataframe,
    String[] columnNames) {

  List<Double> l;
  Row[] rows;

  List<List<Double>> list = new ArrayList<>();
  for (String name : columnNames) {
    l = new ArrayList<>();
    rows = dataframe.select(name).collect();
    for (Row r : rows) {
      l.add(Double.valueOf(r.get(0).toString()));
    }
    list.add(l);
  }
  return list;

}

Example 7

Project: bunsen File: FhirEncodersTest.java Source Code and License

6 votes

@Test
public void coding() {

  Coding expectedCoding = condition.getSeverity().getCodingFirstRep();
  Coding actualCoding = decodedCondition.getSeverity().getCodingFirstRep();

  // Codings are a nested array, so we explode them into a table of the coding
  // fields so we can easily select and compare individual fields.
  Dataset<Row> severityCodings = conditionsDataset
      .select(functions.explode(conditionsDataset.col("severity.coding"))
          .alias("coding"))
      .select("coding.*") // Pull all fields in the coding to the top level.
      .cache();

  Assert.assertEquals(expectedCoding.getCode(),
      severityCodings.select("code").head().get(0));
  Assert.assertEquals(expectedCoding.getCode(),
      actualCoding.getCode());

  Assert.assertEquals(expectedCoding.getSystem(),
      severityCodings.select("system").head().get(0));
  Assert.assertEquals(expectedCoding.getSystem(),
      actualCoding.getSystem());

  Assert.assertEquals(expectedCoding.getUserSelected(),
      severityCodings.select("userSelected").head().get(0));
  Assert.assertEquals(expectedCoding.getUserSelected(),
      actualCoding.getUserSelected());

  Assert.assertEquals(expectedCoding.getDisplay(),
      severityCodings.select("display").head().get(0));
  Assert.assertEquals(expectedCoding.getDisplay(),
      actualCoding.getDisplay());
}

Example 8

Project: embulk-input-parquet_hadoop File: SparkTestBase.java Source Code and License

6 votes

public List<Value> read() throws IOException
{
    spark.conf().set(SQLConf$.MODULE$.PARQUET_WRITE_LEGACY_FORMAT().key(), isLegacyFormat);

    Dataset<Row> dataFrame = spark.createDataFrame(data, schema).repartition(1);
    File file = new File(SparkTestBase.this.tempFolder.getRoot(), name);
    dataFrame.write().options(options).parquet(file.getPath());

    ArrayList<Value> results = new ArrayList<>();
    try (ParquetReader<Value> reader = ParquetReader
            .builder(new MessagePackReadSupport(), new Path(file.getPath()))
            .build()) {
        Value v;
        while ((v = reader.read()) != null) {
            results.add(v);
        }
    }
    return results;
}

Example 9

Project: rdf2x File: RelationExtractor.java Source Code and License

6 votes

/**
 * Map a {@link Instance} into an Iterator of all of its relations
 * represented as rows of (related URI, predicate index, type index, instance ID)
 *
 * @param instance the requested {@link Instance}
 * @return an Iterator of all of its relations represented as rows of (related URI, predicate index, type index, instance ID)
 */
private Iterable<Row> getRelatedTypeIDs(Instance instance) {
    // typeIDs representing references to the instance in each table (or a single one, if instance has a single type)
    final Long id = instance.getId();

    final List<Tuple2<Integer, Long>> instanceTypeIDs = getRelationEntityTypes(instance)
            .map(typeIndex -> new Tuple2<>(typeIndex, id))
            .collect(Collectors.toList());

    return instance.getRelations().stream()
            .flatMap(relation ->
                    instanceTypeIDs.stream()
                            .map(instanceTypeID -> RowFactory.create(
                                    relation.getObjectURI(),
                                    relation.getPredicateIndex(),
                                    instanceTypeID._1(),
                                    instanceTypeID._2()
                            ))
            ).collect(Collectors.toList());
}

Example 10

Project: MegaSparkDiff File: JdbcToJdbcTest.java Source Code and License

6 votes

private Pair<Dataset<Row>, Dataset<Row>> returnDiff(String table1, String table2)
{
    AppleTable leftAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from " + table1 + ")", "table1");

    AppleTable rightAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from " + table2 + ")", "table2");

    return SparkCompare.compareAppleTables(leftAppleTable, rightAppleTable);
}

Example 11

Project: stonk File: Submiter.java Source Code and License

6 votes

public static void main(String[] args) throws Exception {
    //加载配置
    loadArgs(args);
    //生成Context
    JavaSparkContext context = buildJavaSparkContext();

    Dataset<Row> dataset = SparkDataFileConverter.extractDataFrame(taskInfo, context);
    String mlAlgoName = taskInfo.getSparkTaskAlgorithm().getName();
    MLAlgorithmDesc mlAlgoDesc = MLAlgorithmLoader.getMLAlgorithmDesc(mlAlgoName);

    if (mlAlgoDesc.getComponentsType() == ComponentType.ESTIMATOR) {
        excuteEstimator(taskInfo, dataset);
    } else if (mlAlgoDesc.getComponentsType() == ComponentType.TRANSFORMER) {
        excuteTransformer(taskInfo, dataset);
    }
}

Example 12

Project: MegaSparkDiff File: SparkCompareTest.java Source Code and License

6 votes

/**
 * Test of compareRdd method, of class SparkCompare.
 */
@Test
public void testCompareRdd() {
   
    //code to get file1 location
    String file1Path = this.getClass().getClassLoader().
            getResource("TC5NullsAndEmptyData1.txt").getPath();
    
    String file2Path = this.getClass().getClassLoader().
            getResource("TC5NullsAndEmptyData2.txt").getPath();

    Pair<Dataset<Row>, Dataset<Row>> comparisonResult = SparkCompare.compareFiles(file1Path, file2Path);

    try {
        comparisonResult.getLeft().show();
        comparisonResult.getRight().show();
    } catch (Exception e) {
        Assert.fail("Straightforward output of test results somehow failed");
    }
}

Example 13

Project: MegaSparkDiff File: SparkCompareTest.java Source Code and License

6 votes

@Test
public void testCompareJDBCtpFileAppleTablesWithDifference()
{
    AppleTable leftAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from Persons1)", "table1");

    String file1Path = this.getClass().getClassLoader().
            getResource("TC1DiffsAndDups1.txt").getPath();

    AppleTable rightAppleTable = SparkFactory.parallelizeTextSource(file1Path,"table2");

    Pair<Dataset<Row>, Dataset<Row>> pair = SparkCompare.compareAppleTables(leftAppleTable, rightAppleTable);

    //the expectation is one difference
    if (pair.getLeft().count() != 2)
    {
        Assert.fail("expected 2 different record in left");
    }
    if (pair.getRight().count() != 5)
    {
        Assert.fail("expected 5 different record in right");
    }
}

Example 14

Project: bunsen File: Snomed.java Source Code and License

6 votes

/**
 * Reads a Snomed relationship file and converts it to a {@link HierarchicalElement} dataset.
 *
 * @param spark the Spark session
 * @param snomedRelationshipPath path to the SNOMED relationship file
 * @return a dataset of{@link HierarchicalElement} representing the hierarchical relationship.
 */
public static Dataset<HierarchicalElement> readRelationshipFile(SparkSession spark,
    String snomedRelationshipPath) {

  return spark.read()
      .option("header", true)
      .option("delimiter", "\t")
      .csv(snomedRelationshipPath)
      .where(col("typeId").equalTo(lit(SNOMED_ISA_RELATIONSHIP_ID)))
      .where(col("active").equalTo(lit("1")))
      .select(col("destinationId"), col("sourceId"))
      .where(col("destinationId").isNotNull()
          .and(col("destinationId").notEqual(lit(""))))
      .where(col("sourceId").isNotNull()
          .and(col("sourceId").notEqual(lit(""))))
      .map((MapFunction<Row, HierarchicalElement>) row -> {

        HierarchicalElement element = new HierarchicalElement();

        element.setAncestorSystem(SNOMED_CODE_SYSTEM_URI);
        element.setAncestorValue(row.getString(0));

        element.setDescendantSystem(SNOMED_CODE_SYSTEM_URI);
        element.setDescendantValue(row.getString(1));

        return element;
      }, Hierarchies.getHierarchicalElementEncoder());
}

Example 15

Project: PRoST File: VerticalPartitioningLoader.java Source Code and License

6 votes

private TableStats calculate_stats_table(Dataset<Row> table, String tableName) {
	TableStats.Builder table_stats_builder = TableStats.newBuilder();
	
	// calculate the stats
	int table_size = (int) table.count();
	int distinct_subjects = (int) table.select(this.column_name_subject).distinct().count();
	boolean is_complex = table_size != distinct_subjects;
	
	// put them in the protobuf object
	table_stats_builder.setSize(table_size)
		.setDistinctSubjects(distinct_subjects)
		.setIsComplex(is_complex)
		.setName(tableName);
	
	return table_stats_builder.build();
}

Example 16

Project: Machine-Learning-End-to-Endguide-for-Java-developers File: PCAExpt.java Source Code and License

6 votes

public static void main(String[] args) {
	SparkSession spark = SparkSession.builder()
			.master("local[8]")
			.appName("PCAExpt")
			.getOrCreate();

	// Load and parse data
	String filePath = "/home/kchoppella/book/Chapter09/data/covtypeNorm.csv";

	// Loads data.
	Dataset<Row> inDataset = spark.read()
			.format("com.databricks.spark.csv")
			.option("header", "true")
			.option("inferSchema", true)
			.load(filePath);
	ArrayList<String> inputColsList = new ArrayList<String>(Arrays.asList(inDataset.columns()));
	
	//Make single features column for feature vectors 
	inputColsList.remove("class");
	String[] inputCols = inputColsList.parallelStream().toArray(String[]::new);
	
	//Prepare dataset for training with all features in "features" column
	VectorAssembler assembler = new VectorAssembler().setInputCols(inputCols).setOutputCol("features");
	Dataset<Row> dataset = assembler.transform(inDataset);

	PCAModel pca = new PCA()
			.setK(16)
			.setInputCol("features")
			.setOutputCol("pcaFeatures")
			.fit(dataset);

	Dataset<Row> result = pca.transform(dataset).select("pcaFeatures");
	System.out.println("Explained variance:");
	System.out.println(pca.explainedVariance());
	result.show(false);
	// $example off$
	spark.stop();
}

Example 17

Project: uberscriptquery File: WriteCsvFileActionStatementExecutor.java Source Code and License

6 votes

@Override
public Object execute(SparkSession sparkSession, ActionStatement actionStatement, CredentialProvider credentialManager) {

    String filePath = actionStatement.getParamValues().get(0).getValue().toString();
    String saveModeStr = actionStatement.getParamValues().get(1).getValue().toString();
    String dfTableName = actionStatement.getParamValues().get(2).getValue().toString();

    SaveMode saveMode = SaveMode.valueOf(saveModeStr);

    String sql = String.format("select * from %s", dfTableName);
    logger.info(String.format("Running sql [%s] to get data and then save it", sql));
    Dataset<Row> df = sparkSession.sql(sql);

    logger.info(String.format("Saving to csv %s, saveMode: %s", filePath, saveMode));
    df.coalesce(1).write().mode(saveMode).option("header", "false").csv(filePath);
    logger.info(String.format("Saved to csv %s, saveMode: %s", filePath, saveMode));
    return null;
}

Example 18

Project: net.jgp.labs.spark.datasources File: ExifDirectoryRelation.java Source Code and License

6 votes

@Override
public RDD<Row> buildScan() {
    log.debug("-> buildScan()");
    schema();

    // I have isolated the work to a method to keep the plumbing code as simple as
    // possible.
    List<PhotoMetadata> table = collectData();

    @SuppressWarnings("resource")
    JavaSparkContext sparkContext = new JavaSparkContext(sqlContext.sparkContext());
    JavaRDD<Row> rowRDD = sparkContext.parallelize(table)
            .map(photo -> SparkBeanUtils.getRowFromBean(schema, photo));

    return rowRDD.rdd();
}

Example 19

Project: integrations File: DataIntegration.java Source Code and License

5 votes

public static void main( String[] args ) throws InterruptedException {

        final String path = args[ 0 ];
        final String username = args[ 1 ];
        final String password = args[ 2 ];
        final SparkSession sparkSession = MissionControl.getSparkSession();
        final String jwtToken = MissionControl.getIdToken( username, password );
        logger.info( "Using the following idToken: Bearer {}", jwtToken );

        Dataset<Row> payload = sparkSession
                .read()
                .format( "com.databricks.spark.csv" )
                .option( "header", "true" )
                .load( path );

        Flight flight = Flight.newFlight()
                .addEntity( ENTITY_SET_TYPE )
                .to( ENTITY_SET_NAME )
                .key( ENTITY_SET_KEY )
                .addProperty( new FullQualifiedName( "iowastate.escene15" ) )
                .value( row -> get_geo( row.getAs( "NUMBER" ),
                        row.getAs( "STREET" ),
                        row.getAs( "UNIT" ),
                        row.getAs( "CITY" ),
                        row.getAs( "POSTCODE" ) ).getFormattedAddress() ).ok()
                .addProperty( new FullQualifiedName( "iowastate.escene11" ) )
                .value( row -> get_geo( row.getAs( "NUMBER" ),
                        row.getAs( "STREET" ),
                        row.getAs( "UNIT" ),
                        row.getAs( "CITY" ),
                        row.getAs( "POSTCODE" ) ) ).ok()
                .ok()
                .done();

        Shuttle shuttle = new Shuttle( RetrofitFactory.Environment.LOCAL, jwtToken );
        shuttle.launch( flight, payload );
    }

Example 20

Project: MegaSparkDiff File: JdbcToJdbcTest.java Source Code and License

5 votes

@Test
public void testCompareEqualTables()
{
    Pair<Dataset<Row>,Dataset<Row>> pair = returnDiff("Test1","Test2");

    //the expectation is that both tables are equal
    if (pair.getLeft().count() != 0)
        Assert.fail("Expected 0 differences coming from left table." +
                "  Instead, found " + pair.getLeft().count() + ".");

    if (pair.getRight().count() != 0)
        Assert.fail("Expected 0 differences coming from right table." +
                "  Instead, found " + pair.getRight().count() + ".");
}

Example 21

Project: MegaSparkDiff File: JdbcToJdbcTest.java Source Code and License

5 votes

@Test
public void testCompareTable1IsSubset()
{
    Pair<Dataset<Row>,Dataset<Row>> pair = returnDiff("Test4","Test1");

    //the expectation is that table1 is a complete subset of table2
    if (pair.getLeft().count() != 0)
        Assert.fail("Expected 0 differences coming from left table." +
                "  Instead, found " + pair.getLeft().count() + ".");

    if (pair.getRight().count() != 5)
        Assert.fail("Expected 5 differences coming from right table." +
                "  Instead, found " + pair.getRight().count() + ".");
}

Example 22

Project: rdf2x File: MetadataWriter.java Source Code and License

5 votes

/**
 * Persist predicate metadata table storing all predicates.
 */
public void writePredicateMetadata() {

    // create the schema
    List<StructField> fields = new ArrayList<>();
    fields.add(DataTypes.createStructField(PREDICATE_ID, DataTypes.IntegerType, false));
    fields.add(DataTypes.createStructField(PREDICATE_URI, DataTypes.StringType, false));
    fields.add(DataTypes.createStructField(PREDICATE_LABEL, DataTypes.StringType, true));
    StructType schema = DataTypes.createStructType(fields);

    List<Tuple2<String, String>> indexes = new ArrayList<>();
    indexes.add(new Tuple2<>(PREDICATES_TABLE_NAME, PREDICATE_URI));

    List<Tuple2<String, String>> primaryKeys = new ArrayList<>();
    primaryKeys.add(new Tuple2<>(PREDICATES_TABLE_NAME, PREDICATE_ID));


    final IndexMap<String> predicateIndex = rdfSchema.getPredicateIndex();
    final Map<String, String> uriLabels = rdfSchema.getUriLabels();
    // create table rows
    List<Row> rows = predicateIndex.getValues().stream()
            .map(uri -> {
                Object[] valueArray = new Object[]{
                        predicateIndex.getIndex(uri),
                        uri,
                        uriLabels.get(uri)
                };
                return RowFactory.create(valueArray);
            }).collect(Collectors.toList());

    // create and write the META_Predicates dataframe
    DataFrame df = sql.createDataFrame(rows, schema);
    persistor.writeDataFrame(PREDICATES_TABLE_NAME, df);
    persistor.createPrimaryKeys(primaryKeys);
    persistor.createIndexes(indexes);
    df.unpersist();
}

Example 23

Project: bunsen File: ValueSetUdfsTest.java Source Code and License

5 votes

@Test
public void testSnomedHasAncestor() {

  Dataset<Row> results = spark.sql("select id from test_snomed_cond "
      + "where in_valueset(code, 'diabetes')");

  Assert.assertEquals(1, results.count());
  Assert.assertEquals("diabetes", results.head().get(0));
}

Example 24

Project: integrations File: IowaCityCallsForService.java Source Code and License

5 votes

public static String getFirstName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 2 );
}

Example 25

Project: integrations File: IowaCityCallsForService.java Source Code and License

5 votes

public static String getLastName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 1 );
}

Example 26

Project: integrations File: IowaCityCallsForService.java Source Code and License

5 votes

public static String getFirstName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 2 );
}

Example 27

Project: integrations File: IowaCityCallsForService.java Source Code and License

5 votes

public static String getLastName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 1 );
}

Example 28

Project: bunsen File: ValueSets.java Source Code and License

5 votes

/**
 * Writes value records to a table. This class ensures the columns and partitions are mapped
 * properly, and is a workaround similar to the problem described <a
 * href="http://stackoverflow.com/questions/35313077/pyspark-order-of-column-on-write-to-mysql-with-jdbc">here</a>.
 *
 * @param values a dataset of value records
 * @param tableName the table to write them to
 */
private static void writeValuesToTable(Dataset<Value> values, String tableName) {

  // Note the last two columns here must be the partitioned-by columns in order and in lower case
  // for Spark to properly match them to the partitions
  Dataset<Row> orderColumnDataset = values.select("system",
      "version",
      "value",
      "valueseturi",
      "valuesetversion");

  orderColumnDataset.write()
      .mode(SaveMode.ErrorIfExists)
      .insertInto(tableName);
}

Example 29

Project: rdf2x File: InstanceRelationWriterTest.java Source Code and License

5 votes

@Test
public void testWriteRelationTablesWithoutPredicateIndex() throws IOException {
    InstanceRelationWriter writer = new InstanceRelationWriter(config
            .setStorePredicate(false), jsc(), persistor, rdfSchema);
    writer.writeRelationTables(getTestRelationSchema(), getTestRelations());

    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(1L, 3L));
    rows.add(RowFactory.create(2L, 3L));

    DataFrame result = this.result.values().iterator().next();
    assertEquals("Expected schema of A_B was extracted", getExpectedSchemaOfAB(false, false), result.schema());
    assertRDDEquals("Expected rows of A_B were extracted", jsc().parallelize(rows), result.toJavaRDD());
}

Example 30

Project: HiveUnit File: Tabular.java Source Code and License

5 votes

static Tabular tabularDataset(Dataset<Row> ds){
    return new Tabular(){
        public int          numRows()                   { return (int)ds.count(); }
        public int          numCols()                   { return ds.columns().length; }
        public List<String> headers()                   { return Arrays.asList(ds.columns()) ; }
        public String val(int rowNum, int colNum) {
            int ri = rowNum-1;
            int ci = colNum-1;
            Object v = ds.collectAsList().get(ri).get(ci);
            return v == null ? "" : v.toString(); }
    };
}

Example 31

Project: spark-cassandra-poc File: SparkFileLoaderUtils.java Source Code and License

5 votes

private void writeUserViewCountResultToCassandra(List<Row> collectAsList, String tableName,
		Connection<CassandraDBContext> connection) throws QueryExecutionException {
	connection.execute(new CassandraQuery("DROP table if exists wootag." + tableName + ";"));
	connection.execute(new CassandraQuery("create table IF NOT EXISTS wootag." + tableName + " ("
			+ " user_id text, view_duration_in_second int, view_counts int,"
			+ " PRIMARY KEY ( user_id, view_duration_in_second )" + ");"));

	connection.insertRows(collectAsList, tableName,
			Arrays.asList("user_id", "view_duration_in_second", "view_counts"));
	System.out.println("Output size : " + collectAsList.size());
}

Example 32

Project: rdf2x File: MetadataWriter.java Source Code and License

5 votes

/**
 * Write metadata describing relation tables
 *
 * @param relationSchema the relation schema
 */
public void writeRelationMetadata(RelationSchema relationSchema) {
    // create the schema
    List<StructField> fields = new ArrayList<>();
    fields.add(DataTypes.createStructField(RELATIONS_NAME, DataTypes.StringType, false));
    fields.add(DataTypes.createStructField(RELATIONS_FROM_NAME, DataTypes.StringType, true));
    fields.add(DataTypes.createStructField(RELATIONS_TO_NAME, DataTypes.StringType, true));
    fields.add(DataTypes.createStructField(RELATIONS_PREDICATE_ID, DataTypes.IntegerType, true));

    // create table rows
    List<Row> rows = relationSchema.getTables().stream()
            .map(table -> {
                RelationPredicateFilter predicateFilter = table.getPredicateFilter();
                RelationEntityFilter entityFilter = table.getEntityFilter();
                Object[] valueArray = new Object[]{
                        table.getName(),
                        entityFilter == null ? null : entityFilter.getFromTypeName(),
                        entityFilter == null ? null : entityFilter.getToTypeName(),
                        predicateFilter == null ? null : rdfSchema.getPredicateIndex().getIndex(predicateFilter.getPredicateURI())
                };
                return RowFactory.create(valueArray);
            }).collect(Collectors.toList());

    StructType schema = DataTypes.createStructType(fields);

    // add index for each field
    List<Tuple2<String, String>> indexes = fields.stream()
            .map(field -> new Tuple2<>(RELATIONS_TABLE_NAME, field.name()))
            .collect(Collectors.toList());

    // create and write the META_Relations dataframe
    DataFrame df = sql.createDataFrame(rows, schema);
    persistor.writeDataFrame(RELATIONS_TABLE_NAME, df);
    persistor.createIndexes(indexes);
    df.unpersist();
}

Example 33

Project: integrations File: DaneCountySheriffs.java Source Code and License

5 votes

public static String safeDOBParse( Row row ) {
    String dob = row.getAs( "birthd" );
    if ( dob == null ) {
        return null;
    }
    if ( dob.contains( "#" ) ) {
        return null;
    }
    return bdHelper.parse( dob );
}

Example 34

Project: uberscriptquery File: JdbcSqlInputStatementExecutor.java Source Code and License

5 votes

@Override
public Dataset<Row> execute(SparkSession sparkSession, StatementAssignment statementAssignment, CredentialProvider credentialManager) {
    logger.info("Running query by sql jdbc: " + statementAssignment);
    Map<String, String> queryConfig = statementAssignment.getQueryConfig();
    String connectionString = queryConfig.get(StatementAssignment.QUERY_CONFIG_CONNECTION_STRING);
    String passwordFile = queryConfig.get(StatementAssignment.QUERY_CONFIG_PASSWORD_FILE);
    String passwordEntry = queryConfig.get(StatementAssignment.QUERY_CONFIG_PASSWORD_ENTRY);
    String password = credentialManager.getPassword(passwordFile, passwordEntry);
    if (password != null) {
        connectionString = connectionString.replace("[password]", password);
    }
    return SparkUtils.readJdbc(connectionString, statementAssignment.getQueryStatement(), sparkSession);
}

Example 35

Project: rdf2x File: MetadataWriterTest.java Source Code and License

5 votes

private JavaRDD<Row> getExpectedRowsOfMetaPredicates() {
    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/knows"), "http://example.com/knows", "Knows label"));
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/likes"), "http://example.com/likes", "Likes label"));
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/name"), "http://example.com/name", "Name label"));
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/age"), "http://example.com/age", null));
    return jsc().parallelize(rows);
}

Example 36

Project: net.jgp.labs.spark.datasources File: PhotoMetadataIngestionApp.java Source Code and License

5 votes

private boolean start() {
    SparkSession spark = SparkSession.builder()
            .appName("EXIF to Dataset")
            .master("local[*]").getOrCreate();
    
    String importDirectory = "/Users/jgp/Pictures";
    
    Dataset<Row> df = spark.read()
            .format("exif")
            .option("recursive", "true")
            .option("limit", "100000")
            .option("extensions", "jpg,jpeg")
            .load(importDirectory);
    
    // We can start analytics
    df = df
            .filter(df.col("GeoX").isNotNull())
            .filter(df.col("GeoZ").notEqual("NaN"))
            .orderBy(df.col("GeoZ").desc());
    df.collect();
    df.cache();
    System.out.println("I have imported " + df.count() + " photos.");
    df.printSchema();
    df.show(5);
    
    return true;
}

Example 37

Project: rdf2x File: RelationSchemaCollectorTest.java Source Code and License

5 votes

private DataFrame getTestRDD() {
    SQLContext sql = new SQLContext(jsc());
    List<Row> rdd = new ArrayList<>();

    // cycle one -> two -> three -> one
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 1L, uriIndex.getIndex("http://example.com/a"), 2L));
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 2L, uriIndex.getIndex("http://example.com/a"), 3L));
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 3L, uriIndex.getIndex("http://example.com/a"), 1L));

    // one -> four, four -> one
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 1L, uriIndex.getIndex("http://example.com/b"), 4L));
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/b"), 4L, uriIndex.getIndex("http://example.com/a"), 1L));

    // five -> one
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/c"), 5L, uriIndex.getIndex("http://example.com/a"), 1L));

    return sql.createDataFrame(rdd, new StructType()
            .add("predicateIndex", DataTypes.IntegerType, false)
            .add("fromTypeIndex", DataTypes.IntegerType, false)
            .add("fromID", DataTypes.LongType, false)
            .add("toTypeIndex", DataTypes.IntegerType, false)
            .add("toID", DataTypes.LongType, false)
    );
}

Example 38

Project: rdf2x File: InstanceRelationWriterTest.java Source Code and License

5 votes

private JavaRDD<Row> getExpectedRowsOfEAV() {
    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(1L, uriIndex.getIndex("http://example.com/name"), "STRING", null, "First A 1"));
    rows.add(RowFactory.create(1L, uriIndex.getIndex("http://example.com/name"), "STRING", null, "First A 2"));
    rows.add(RowFactory.create(2L, uriIndex.getIndex("http://example.com/name"), "STRING", null, "Second A"));
    rows.add(RowFactory.create(3L, uriIndex.getIndex("http://example.com/age"), "INTEGER", null, "100"));
    rows.add(RowFactory.create(3L, uriIndex.getIndex("http://example.com/name"), "STRING", "en", "First B"));
    return jsc().parallelize(rows);
}

Example 39

Project: bunsen File: ConceptMaps.java Source Code and License

5 votes

/**
 * Writes mapping records to a table. This class ensures the columns and partitions are mapped
 * properly, and is a workaround similar to the problem described <a
 * href="http://stackoverflow.com/questions/35313077/pyspark-order-of-column-on-write-to-mysql-with-jdbc">here</a>.
 *
 * @param mappings a dataset of mapping records
 * @param tableName the table to write them to
 */
private static void writeMappingsToTable(Dataset<Mapping> mappings,
    String tableName) {

  // Note the last two columns here must be the partitioned-by columns
  // in order and in lower case for Spark to properly match
  // them to the partitions.
  Dataset<Row> orderedColumnDataset =
      mappings.select("sourceValueSet",
          "targetValueSet",
          "sourceSystem",
          "sourceValue",
          "targetSystem",
          "targetValue",
          "equivalence",
          "conceptmapuri",
          "conceptmapversion");

  orderedColumnDataset
      .write()
      .insertInto(tableName);
}

Example 40

Project: MegaSparkDiff File: JdbcToFileTest.java Source Code and License

5 votes

@Test
public void testCompareJDBCTableToTextFile()
{
    SparkFactory.initializeSparkLocalMode("local[*]");

    AppleTable leftAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from Test4)", "table1");

    String file2Path = this.getClass().getClassLoader().
            getResource("Test4.txt").getPath();
    AppleTable rightAppleTable = SparkFactory.parallelizeTextSource(file2Path,"table2");

    Pair<Dataset<Row>,Dataset<Row>> pair = SparkCompare.compareAppleTables(leftAppleTable, rightAppleTable);

    //the expectation is that both tables are completely different
    if (pair.getLeft().count() != 0)
        Assert.fail("Expected 0 differences coming from left table." +
                "  Instead, found " + pair.getLeft().count() + ".");

    if (pair.getRight().count() != 1)
        Assert.fail("Expected 1 difference coming from right table." +
                "  Instead, found " + pair.getRight().count() + ".");

    SparkFactory.stopSparkContext();
}

参考文章:

https://www.programcreek.com/java-api-examples/?api=org.apache.spark.sql.Row

http://spark.apache.org/docs/2.1.1/api/scala/index.html#org.apache.spark.sql.Row

zxfBdd

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
spark row在java和scala中实例化的方法

scala中实例化方法:It is invalid to use the native primitive interface to retrieve a value that is null, instead a user must checkisNullAtbefore attempting to retrieve a value that might be null.To cre...
复制链接

扫一扫

专栏目录