spark row在java和scala中实例化的方法

63 篇文章 4 订阅

scala中实例化方法:

It is invalid to use the native primitive interface to retrieve a value that is null, instead a user must check isNullAt before attempting to retrieve a value that might be null.

To create a new Row, use RowFactory.create() in Java or Row.apply() in Scala.

Row object can be constructed by providing field values. Example:

import org.apache.spark.sql._

// Create a Row from values.
Row(value1, value2, value3, ...)
// Create a Row from a Seq of values.
Row.fromSeq(Seq(value1, value2, ...))

A value of a row can be accessed through both generic access by ordinal, which will incur boxing overhead for primitives, as well as native primitive access. An example of generic access by ordinal:

import org.apache.spark.sql._

val row = Row(1, true, "a string", null)
// row: Row = [1,true,a string,null]
val firstValue = row(0)
// firstValue: Any = 1
val fourthValue = row(3)
// fourthValue: Any = null

For native primitive access, it is invalid to use the native primitive interface to retrieve a value that is null, instead a user must check isNullAt before attempting to retrieve a value that might be null. An example of native primitive access:

// using the row from the previous example.
val firstValue = row.getInt(0)
// firstValue: Int = 1
val isNull = row.isNullAt(3)
// isNull: Boolean = true

 

 

java中实例化方法:

The following are top voted examples for showing how to use org.apache.spark.sql.Row. These examples are extracted from open source projects. You can vote up the examples you like and your votes will be used in our system to generate more good examples.

Example 1

Project: uberscriptquery   File: SparkUtilsTest.java   Source Code and License14 votesvote up
@Test
public void test_getDataSetResult() {

    StructField[] structFields = new StructField[]{
            new StructField("intColumn", DataTypes.IntegerType, true, Metadata.empty()),
            new StructField("stringColumn", DataTypes.StringType, true, Metadata.empty())
    };

    StructType structType = new StructType(structFields);

    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(1, "v1"));
    rows.add(RowFactory.create(2, "v2"));

    Dataset<Row> df = sparkSession.createDataFrame(rows, structType);

    DataSetResult dataSetResult = SparkUtils.getDataSetResult(df);
    Assert.assertEquals(2, dataSetResult.getColumnNames().size());
    Assert.assertEquals(2, dataSetResult.getRows().size());
    Assert.assertEquals(new Integer(1), dataSetResult.getRows().get(0).get(0));
    Assert.assertEquals("v1", dataSetResult.getRows().get(0).get(1));
    Assert.assertEquals(new Integer(2), dataSetResult.getRows().get(1).get(0));
    Assert.assertEquals("v2", dataSetResult.getRows().get(1).get(1));
}
 

Example 2

Project: bunsen   File: Loinc.java   Source Code and License11 votesvote up
/**
 * Reads the LOINC mutliaxial hierarchy file and converts it to a {@link HierarchicalElement}
 * dataset.
 *
 * @param spark the Spark session
 * @param loincHierarchyPath path to the multiaxial hierarchy CSV
 * @return a dataset of {@link HierarchicalElement} representing the hierarchical relationship.
 */
public static Dataset<HierarchicalElement> readMultiaxialHierarchyFile(SparkSession spark,
    String loincHierarchyPath) {

  return spark.read()
      .option("header", true)
      .csv(loincHierarchyPath)
      .select(col("IMMEDIATE_PARENT"), col("CODE"))
      .where(col("IMMEDIATE_PARENT").isNotNull()
          .and(col("IMMEDIATE_PARENT").notEqual(lit(""))))
      .where(col("CODE").isNotNull()
          .and(col("CODE").notEqual(lit(""))))
      .map((MapFunction<Row, HierarchicalElement>) row -> {

        HierarchicalElement element = new HierarchicalElement();

        element.setAncestorSystem(LOINC_CODE_SYSTEM_URI);
        element.setAncestorValue(row.getString(0));

        element.setDescendantSystem(LOINC_CODE_SYSTEM_URI);
        element.setDescendantValue(row.getString(1));

        return element;
      }, Hierarchies.getHierarchicalElementEncoder());
}
 

Example 3

Project: gaffer-doc   File: GetDataFrameOfElementsExample.java   Source Code and License9 votesvote up
public void getDataFrameOfElementsWithEntityGroup() {
    // ---------------------------------------------------------
    final GetDataFrameOfElements operation = new GetDataFrameOfElements.Builder()
            .view(new View.Builder()
                    .entity("entity")
                    .build())
            .build();
    // ---------------------------------------------------------

    final Dataset<Row> df = runExample(operation, null);

    // Restrict to entities involving certain vertices
    final Dataset<Row> seeded = df.filter("vertex = 1 OR vertex = 2");
    String result = seeded.showString(100, 20);
    printJava("df.filter(\"vertex = 1 OR vertex = 2\").show();");
    print("The results are:\n");
    print("```");
    print(result.substring(0, result.length() - 2));
    print("```");

    // Filter by property
    final Dataset<Row> filtered = df.filter("count > 1");
    result = filtered.showString(100, 20);
    printJava("df.filter(\"count > 1\").show();");
    print("The results are:\n");
    print("```");
    print(result.substring(0, result.length() - 2));
    print("```");
}
 

Example 4

Project: PRoST   File: JoinTree.java   Source Code and License7 votesvote up
public Dataset<Row> computeJoins(SQLContext sqlContext){
	// compute all the joins
	Dataset<Row> results = node.computeJoinWithChildren(sqlContext);
	// select only the requested result
	Column [] selectedColumns = new Column[node.projection.size()];
	for (int i = 0; i < selectedColumns.length; i++) {
		selectedColumns[i]= new Column(node.projection.get(i));
	}

	// if there is a filter set, apply it
	results =  filter == null ? results.select(selectedColumns) : results.filter(filter).select(selectedColumns);
	
	// if results are distinct
	if(selectDistinct) results = results.distinct();
	
	return results;
	
}
 

Example 5

Project: integrations   File: ClerkOfCourtsDemo2010.java   Source Code and License7 votesvote up
public static String getSubjectIdentification( Row row ) {
    String name = row.getAs( "Defendant Name" );
    String gender = row.getAs( "Gender" );
    String race = row.getAs( "Race" );
    String dob = row.getAs( "DOB" );

    StringBuilder sb = new StringBuilder();
    sb
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( name ) ) )
            .append( "|" )
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( gender ) ) )
            .append( "|" )
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( race ) ) )
            .append( "|" )
            .append( encoder.encodeToString( StringUtils.getBytesUtf8( dob ) ) );
    return sb.toString();
}
 

Example 6

Project: Explainer   File: ExplainerUtils.java   Source Code and License7 votesvote up
public static List<List<Double>> constructListWithColumnNames(DataFrame dataframe,
    String[] columnNames) {

  List<Double> l;
  Row[] rows;

  List<List<Double>> list = new ArrayList<>();
  for (String name : columnNames) {
    l = new ArrayList<>();
    rows = dataframe.select(name).collect();
    for (Row r : rows) {
      l.add(Double.valueOf(r.get(0).toString()));
    }
    list.add(l);
  }
  return list;

}
 

Example 7

Project: bunsen   File: FhirEncodersTest.java   Source Code and License6 votesvote up
@Test
public void coding() {

  Coding expectedCoding = condition.getSeverity().getCodingFirstRep();
  Coding actualCoding = decodedCondition.getSeverity().getCodingFirstRep();

  // Codings are a nested array, so we explode them into a table of the coding
  // fields so we can easily select and compare individual fields.
  Dataset<Row> severityCodings = conditionsDataset
      .select(functions.explode(conditionsDataset.col("severity.coding"))
          .alias("coding"))
      .select("coding.*") // Pull all fields in the coding to the top level.
      .cache();

  Assert.assertEquals(expectedCoding.getCode(),
      severityCodings.select("code").head().get(0));
  Assert.assertEquals(expectedCoding.getCode(),
      actualCoding.getCode());

  Assert.assertEquals(expectedCoding.getSystem(),
      severityCodings.select("system").head().get(0));
  Assert.assertEquals(expectedCoding.getSystem(),
      actualCoding.getSystem());

  Assert.assertEquals(expectedCoding.getUserSelected(),
      severityCodings.select("userSelected").head().get(0));
  Assert.assertEquals(expectedCoding.getUserSelected(),
      actualCoding.getUserSelected());

  Assert.assertEquals(expectedCoding.getDisplay(),
      severityCodings.select("display").head().get(0));
  Assert.assertEquals(expectedCoding.getDisplay(),
      actualCoding.getDisplay());
}
 

Example 8

Project: embulk-input-parquet_hadoop   File: SparkTestBase.java   Source Code and License6 votesvote up
public List<Value> read() throws IOException
{
    spark.conf().set(SQLConf$.MODULE$.PARQUET_WRITE_LEGACY_FORMAT().key(), isLegacyFormat);

    Dataset<Row> dataFrame = spark.createDataFrame(data, schema).repartition(1);
    File file = new File(SparkTestBase.this.tempFolder.getRoot(), name);
    dataFrame.write().options(options).parquet(file.getPath());

    ArrayList<Value> results = new ArrayList<>();
    try (ParquetReader<Value> reader = ParquetReader
            .builder(new MessagePackReadSupport(), new Path(file.getPath()))
            .build()) {
        Value v;
        while ((v = reader.read()) != null) {
            results.add(v);
        }
    }
    return results;
}
 

Example 9

Project: rdf2x   File: RelationExtractor.java   Source Code and License6 votesvote up
/**
 * Map a {@link Instance} into an Iterator of all of its relations
 * represented as rows of (related URI, predicate index, type index, instance ID)
 *
 * @param instance the requested {@link Instance}
 * @return an Iterator of all of its relations represented as rows of (related URI, predicate index, type index, instance ID)
 */
private Iterable<Row> getRelatedTypeIDs(Instance instance) {
    // typeIDs representing references to the instance in each table (or a single one, if instance has a single type)
    final Long id = instance.getId();

    final List<Tuple2<Integer, Long>> instanceTypeIDs = getRelationEntityTypes(instance)
            .map(typeIndex -> new Tuple2<>(typeIndex, id))
            .collect(Collectors.toList());

    return instance.getRelations().stream()
            .flatMap(relation ->
                    instanceTypeIDs.stream()
                            .map(instanceTypeID -> RowFactory.create(
                                    relation.getObjectURI(),
                                    relation.getPredicateIndex(),
                                    instanceTypeID._1(),
                                    instanceTypeID._2()
                            ))
            ).collect(Collectors.toList());
}
 

Example 10

Project: MegaSparkDiff   File: JdbcToJdbcTest.java   Source Code and License6 votesvote up
private Pair<Dataset<Row>, Dataset<Row>> returnDiff(String table1, String table2)
{
    AppleTable leftAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from " + table1 + ")", "table1");

    AppleTable rightAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from " + table2 + ")", "table2");

    return SparkCompare.compareAppleTables(leftAppleTable, rightAppleTable);
}
 

Example 11

Project: stonk   File: Submiter.java   Source Code and License6 votesvote up
public static void main(String[] args) throws Exception {
    //加载配置
    loadArgs(args);
    //生成Context
    JavaSparkContext context = buildJavaSparkContext();

    Dataset<Row> dataset = SparkDataFileConverter.extractDataFrame(taskInfo, context);
    String mlAlgoName = taskInfo.getSparkTaskAlgorithm().getName();
    MLAlgorithmDesc mlAlgoDesc = MLAlgorithmLoader.getMLAlgorithmDesc(mlAlgoName);

    if (mlAlgoDesc.getComponentsType() == ComponentType.ESTIMATOR) {
        excuteEstimator(taskInfo, dataset);
    } else if (mlAlgoDesc.getComponentsType() == ComponentType.TRANSFORMER) {
        excuteTransformer(taskInfo, dataset);
    }
}
 

Example 12

Project: MegaSparkDiff   File: SparkCompareTest.java   Source Code and License6 votesvote up
/**
 * Test of compareRdd method, of class SparkCompare.
 */
@Test
public void testCompareRdd() {
   
    //code to get file1 location
    String file1Path = this.getClass().getClassLoader().
            getResource("TC5NullsAndEmptyData1.txt").getPath();
    
    String file2Path = this.getClass().getClassLoader().
            getResource("TC5NullsAndEmptyData2.txt").getPath();

    Pair<Dataset<Row>, Dataset<Row>> comparisonResult = SparkCompare.compareFiles(file1Path, file2Path);

    try {
        comparisonResult.getLeft().show();
        comparisonResult.getRight().show();
    } catch (Exception e) {
        Assert.fail("Straightforward output of test results somehow failed");
    }
}
 

Example 13

Project: MegaSparkDiff   File: SparkCompareTest.java   Source Code and License6 votesvote up
@Test
public void testCompareJDBCtpFileAppleTablesWithDifference()
{
    AppleTable leftAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from Persons1)", "table1");

    String file1Path = this.getClass().getClassLoader().
            getResource("TC1DiffsAndDups1.txt").getPath();

    AppleTable rightAppleTable = SparkFactory.parallelizeTextSource(file1Path,"table2");

    Pair<Dataset<Row>, Dataset<Row>> pair = SparkCompare.compareAppleTables(leftAppleTable, rightAppleTable);

    //the expectation is one difference
    if (pair.getLeft().count() != 2)
    {
        Assert.fail("expected 2 different record in left");
    }
    if (pair.getRight().count() != 5)
    {
        Assert.fail("expected 5 different record in right");
    }
}
 

Example 14

Project: bunsen   File: Snomed.java   Source Code and License6 votesvote up
/**
 * Reads a Snomed relationship file and converts it to a {@link HierarchicalElement} dataset.
 *
 * @param spark the Spark session
 * @param snomedRelationshipPath path to the SNOMED relationship file
 * @return a dataset of{@link HierarchicalElement} representing the hierarchical relationship.
 */
public static Dataset<HierarchicalElement> readRelationshipFile(SparkSession spark,
    String snomedRelationshipPath) {

  return spark.read()
      .option("header", true)
      .option("delimiter", "\t")
      .csv(snomedRelationshipPath)
      .where(col("typeId").equalTo(lit(SNOMED_ISA_RELATIONSHIP_ID)))
      .where(col("active").equalTo(lit("1")))
      .select(col("destinationId"), col("sourceId"))
      .where(col("destinationId").isNotNull()
          .and(col("destinationId").notEqual(lit(""))))
      .where(col("sourceId").isNotNull()
          .and(col("sourceId").notEqual(lit(""))))
      .map((MapFunction<Row, HierarchicalElement>) row -> {

        HierarchicalElement element = new HierarchicalElement();

        element.setAncestorSystem(SNOMED_CODE_SYSTEM_URI);
        element.setAncestorValue(row.getString(0));

        element.setDescendantSystem(SNOMED_CODE_SYSTEM_URI);
        element.setDescendantValue(row.getString(1));

        return element;
      }, Hierarchies.getHierarchicalElementEncoder());
}
 

Example 15

Project: PRoST   File: VerticalPartitioningLoader.java   Source Code and License6 votesvote up
private TableStats calculate_stats_table(Dataset<Row> table, String tableName) {
	TableStats.Builder table_stats_builder = TableStats.newBuilder();
	
	// calculate the stats
	int table_size = (int) table.count();
	int distinct_subjects = (int) table.select(this.column_name_subject).distinct().count();
	boolean is_complex = table_size != distinct_subjects;
	
	// put them in the protobuf object
	table_stats_builder.setSize(table_size)
		.setDistinctSubjects(distinct_subjects)
		.setIsComplex(is_complex)
		.setName(tableName);
	
	return table_stats_builder.build();
}
 

Example 16

Project: Machine-Learning-End-to-Endguide-for-Java-developers   File: PCAExpt.java   Source Code and License6 votesvote up
public static void main(String[] args) {
	SparkSession spark = SparkSession.builder()
			.master("local[8]")
			.appName("PCAExpt")
			.getOrCreate();

	// Load and parse data
	String filePath = "/home/kchoppella/book/Chapter09/data/covtypeNorm.csv";

	// Loads data.
	Dataset<Row> inDataset = spark.read()
			.format("com.databricks.spark.csv")
			.option("header", "true")
			.option("inferSchema", true)
			.load(filePath);
	ArrayList<String> inputColsList = new ArrayList<String>(Arrays.asList(inDataset.columns()));
	
	//Make single features column for feature vectors 
	inputColsList.remove("class");
	String[] inputCols = inputColsList.parallelStream().toArray(String[]::new);
	
	//Prepare dataset for training with all features in "features" column
	VectorAssembler assembler = new VectorAssembler().setInputCols(inputCols).setOutputCol("features");
	Dataset<Row> dataset = assembler.transform(inDataset);

	PCAModel pca = new PCA()
			.setK(16)
			.setInputCol("features")
			.setOutputCol("pcaFeatures")
			.fit(dataset);

	Dataset<Row> result = pca.transform(dataset).select("pcaFeatures");
	System.out.println("Explained variance:");
	System.out.println(pca.explainedVariance());
	result.show(false);
	// $example off$
	spark.stop();
}
 

Example 17

Project: uberscriptquery   File: WriteCsvFileActionStatementExecutor.java   Source Code and License6 votesvote up
@Override
public Object execute(SparkSession sparkSession, ActionStatement actionStatement, CredentialProvider credentialManager) {

    String filePath = actionStatement.getParamValues().get(0).getValue().toString();
    String saveModeStr = actionStatement.getParamValues().get(1).getValue().toString();
    String dfTableName = actionStatement.getParamValues().get(2).getValue().toString();

    SaveMode saveMode = SaveMode.valueOf(saveModeStr);

    String sql = String.format("select * from %s", dfTableName);
    logger.info(String.format("Running sql [%s] to get data and then save it", sql));
    Dataset<Row> df = sparkSession.sql(sql);

    logger.info(String.format("Saving to csv %s, saveMode: %s", filePath, saveMode));
    df.coalesce(1).write().mode(saveMode).option("header", "false").csv(filePath);
    logger.info(String.format("Saved to csv %s, saveMode: %s", filePath, saveMode));
    return null;
}
 

Example 18

Project: net.jgp.labs.spark.datasources   File: ExifDirectoryRelation.java   Source Code and License6 votesvote up
@Override
public RDD<Row> buildScan() {
    log.debug("-> buildScan()");
    schema();

    // I have isolated the work to a method to keep the plumbing code as simple as
    // possible.
    List<PhotoMetadata> table = collectData();

    @SuppressWarnings("resource")
    JavaSparkContext sparkContext = new JavaSparkContext(sqlContext.sparkContext());
    JavaRDD<Row> rowRDD = sparkContext.parallelize(table)
            .map(photo -> SparkBeanUtils.getRowFromBean(schema, photo));

    return rowRDD.rdd();
}
 

Example 19

Project: integrations   File: DataIntegration.java   Source Code and License5 votesvote up
public static void main( String[] args ) throws InterruptedException {

        final String path = args[ 0 ];
        final String username = args[ 1 ];
        final String password = args[ 2 ];
        final SparkSession sparkSession = MissionControl.getSparkSession();
        final String jwtToken = MissionControl.getIdToken( username, password );
        logger.info( "Using the following idToken: Bearer {}", jwtToken );

        Dataset<Row> payload = sparkSession
                .read()
                .format( "com.databricks.spark.csv" )
                .option( "header", "true" )
                .load( path );

        Flight flight = Flight.newFlight()
                .addEntity( ENTITY_SET_TYPE )
                .to( ENTITY_SET_NAME )
                .key( ENTITY_SET_KEY )
                .addProperty( new FullQualifiedName( "iowastate.escene15" ) )
                .value( row -> get_geo( row.getAs( "NUMBER" ),
                        row.getAs( "STREET" ),
                        row.getAs( "UNIT" ),
                        row.getAs( "CITY" ),
                        row.getAs( "POSTCODE" ) ).getFormattedAddress() ).ok()
                .addProperty( new FullQualifiedName( "iowastate.escene11" ) )
                .value( row -> get_geo( row.getAs( "NUMBER" ),
                        row.getAs( "STREET" ),
                        row.getAs( "UNIT" ),
                        row.getAs( "CITY" ),
                        row.getAs( "POSTCODE" ) ) ).ok()
                .ok()
                .done();

        Shuttle shuttle = new Shuttle( RetrofitFactory.Environment.LOCAL, jwtToken );
        shuttle.launch( flight, payload );
    }
 

Example 20

Project: MegaSparkDiff   File: JdbcToJdbcTest.java   Source Code and License5 votesvote up
@Test
public void testCompareEqualTables()
{
    Pair<Dataset<Row>,Dataset<Row>> pair = returnDiff("Test1","Test2");

    //the expectation is that both tables are equal
    if (pair.getLeft().count() != 0)
        Assert.fail("Expected 0 differences coming from left table." +
                "  Instead, found " + pair.getLeft().count() + ".");

    if (pair.getRight().count() != 0)
        Assert.fail("Expected 0 differences coming from right table." +
                "  Instead, found " + pair.getRight().count() + ".");
}
 

Example 21

Project: MegaSparkDiff   File: JdbcToJdbcTest.java   Source Code and License5 votesvote up
@Test
public void testCompareTable1IsSubset()
{
    Pair<Dataset<Row>,Dataset<Row>> pair = returnDiff("Test4","Test1");

    //the expectation is that table1 is a complete subset of table2
    if (pair.getLeft().count() != 0)
        Assert.fail("Expected 0 differences coming from left table." +
                "  Instead, found " + pair.getLeft().count() + ".");

    if (pair.getRight().count() != 5)
        Assert.fail("Expected 5 differences coming from right table." +
                "  Instead, found " + pair.getRight().count() + ".");
}
 

Example 22

Project: rdf2x   File: MetadataWriter.java   Source Code and License5 votesvote up
/**
 * Persist predicate metadata table storing all predicates.
 */
public void writePredicateMetadata() {

    // create the schema
    List<StructField> fields = new ArrayList<>();
    fields.add(DataTypes.createStructField(PREDICATE_ID, DataTypes.IntegerType, false));
    fields.add(DataTypes.createStructField(PREDICATE_URI, DataTypes.StringType, false));
    fields.add(DataTypes.createStructField(PREDICATE_LABEL, DataTypes.StringType, true));
    StructType schema = DataTypes.createStructType(fields);

    List<Tuple2<String, String>> indexes = new ArrayList<>();
    indexes.add(new Tuple2<>(PREDICATES_TABLE_NAME, PREDICATE_URI));

    List<Tuple2<String, String>> primaryKeys = new ArrayList<>();
    primaryKeys.add(new Tuple2<>(PREDICATES_TABLE_NAME, PREDICATE_ID));


    final IndexMap<String> predicateIndex = rdfSchema.getPredicateIndex();
    final Map<String, String> uriLabels = rdfSchema.getUriLabels();
    // create table rows
    List<Row> rows = predicateIndex.getValues().stream()
            .map(uri -> {
                Object[] valueArray = new Object[]{
                        predicateIndex.getIndex(uri),
                        uri,
                        uriLabels.get(uri)
                };
                return RowFactory.create(valueArray);
            }).collect(Collectors.toList());

    // create and write the META_Predicates dataframe
    DataFrame df = sql.createDataFrame(rows, schema);
    persistor.writeDataFrame(PREDICATES_TABLE_NAME, df);
    persistor.createPrimaryKeys(primaryKeys);
    persistor.createIndexes(indexes);
    df.unpersist();
}
 

Example 23

Project: bunsen   File: ValueSetUdfsTest.java   Source Code and License5 votesvote up
@Test
public void testSnomedHasAncestor() {

  Dataset<Row> results = spark.sql("select id from test_snomed_cond "
      + "where in_valueset(code, 'diabetes')");

  Assert.assertEquals(1, results.count());
  Assert.assertEquals("diabetes", results.head().get(0));
}
 

Example 24

Project: integrations   File: IowaCityCallsForService.java   Source Code and License5 votesvote up
public static String getFirstName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 2 );
}
 

Example 25

Project: integrations   File: IowaCityCallsForService.java   Source Code and License5 votesvote up
public static String getLastName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 1 );
}
 

Example 26

Project: integrations   File: IowaCityCallsForService.java   Source Code and License5 votesvote up
public static String getFirstName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 2 );
}
 

Example 27

Project: integrations   File: IowaCityCallsForService.java   Source Code and License5 votesvote up
public static String getLastName( Row row ) {
    String name = row.getAs( "NAME" );
    if ( StringUtils.isBlank( name ) ) {
        return null;
    }
    Matcher m = p.matcher( name );
    if ( !m.matches() ) {
        return null;
    }
    return (String) m.group( 1 );
}
 

Example 28

Project: bunsen   File: ValueSets.java   Source Code and License5 votesvote up
/**
 * Writes value records to a table. This class ensures the columns and partitions are mapped
 * properly, and is a workaround similar to the problem described <a
 * href="http://stackoverflow.com/questions/35313077/pyspark-order-of-column-on-write-to-mysql-with-jdbc">here</a>.
 *
 * @param values a dataset of value records
 * @param tableName the table to write them to
 */
private static void writeValuesToTable(Dataset<Value> values, String tableName) {

  // Note the last two columns here must be the partitioned-by columns in order and in lower case
  // for Spark to properly match them to the partitions
  Dataset<Row> orderColumnDataset = values.select("system",
      "version",
      "value",
      "valueseturi",
      "valuesetversion");

  orderColumnDataset.write()
      .mode(SaveMode.ErrorIfExists)
      .insertInto(tableName);
}
 

Example 29

Project: rdf2x   File: InstanceRelationWriterTest.java   Source Code and License5 votesvote up
@Test
public void testWriteRelationTablesWithoutPredicateIndex() throws IOException {
    InstanceRelationWriter writer = new InstanceRelationWriter(config
            .setStorePredicate(false), jsc(), persistor, rdfSchema);
    writer.writeRelationTables(getTestRelationSchema(), getTestRelations());

    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(1L, 3L));
    rows.add(RowFactory.create(2L, 3L));

    DataFrame result = this.result.values().iterator().next();
    assertEquals("Expected schema of A_B was extracted", getExpectedSchemaOfAB(false, false), result.schema());
    assertRDDEquals("Expected rows of A_B were extracted", jsc().parallelize(rows), result.toJavaRDD());
}
 

Example 30

Project: HiveUnit   File: Tabular.java   Source Code and License5 votesvote up
static Tabular tabularDataset(Dataset<Row> ds){
    return new Tabular(){
        public int          numRows()                   { return (int)ds.count(); }
        public int          numCols()                   { return ds.columns().length; }
        public List<String> headers()                   { return Arrays.asList(ds.columns()) ; }
        public String val(int rowNum, int colNum) {
            int ri = rowNum-1;
            int ci = colNum-1;
            Object v = ds.collectAsList().get(ri).get(ci);
            return v == null ? "" : v.toString(); }
    };
}
 

Example 31

Project: spark-cassandra-poc   File: SparkFileLoaderUtils.java   Source Code and License5 votesvote up
private void writeUserViewCountResultToCassandra(List<Row> collectAsList, String tableName,
		Connection<CassandraDBContext> connection) throws QueryExecutionException {
	connection.execute(new CassandraQuery("DROP table if exists wootag." + tableName + ";"));
	connection.execute(new CassandraQuery("create table IF NOT EXISTS wootag." + tableName + " ("
			+ " user_id text, view_duration_in_second int, view_counts int,"
			+ " PRIMARY KEY ( user_id, view_duration_in_second )" + ");"));

	connection.insertRows(collectAsList, tableName,
			Arrays.asList("user_id", "view_duration_in_second", "view_counts"));
	System.out.println("Output size : " + collectAsList.size());
}
 

Example 32

Project: rdf2x   File: MetadataWriter.java   Source Code and License5 votesvote up
/**
 * Write metadata describing relation tables
 *
 * @param relationSchema the relation schema
 */
public void writeRelationMetadata(RelationSchema relationSchema) {
    // create the schema
    List<StructField> fields = new ArrayList<>();
    fields.add(DataTypes.createStructField(RELATIONS_NAME, DataTypes.StringType, false));
    fields.add(DataTypes.createStructField(RELATIONS_FROM_NAME, DataTypes.StringType, true));
    fields.add(DataTypes.createStructField(RELATIONS_TO_NAME, DataTypes.StringType, true));
    fields.add(DataTypes.createStructField(RELATIONS_PREDICATE_ID, DataTypes.IntegerType, true));

    // create table rows
    List<Row> rows = relationSchema.getTables().stream()
            .map(table -> {
                RelationPredicateFilter predicateFilter = table.getPredicateFilter();
                RelationEntityFilter entityFilter = table.getEntityFilter();
                Object[] valueArray = new Object[]{
                        table.getName(),
                        entityFilter == null ? null : entityFilter.getFromTypeName(),
                        entityFilter == null ? null : entityFilter.getToTypeName(),
                        predicateFilter == null ? null : rdfSchema.getPredicateIndex().getIndex(predicateFilter.getPredicateURI())
                };
                return RowFactory.create(valueArray);
            }).collect(Collectors.toList());

    StructType schema = DataTypes.createStructType(fields);

    // add index for each field
    List<Tuple2<String, String>> indexes = fields.stream()
            .map(field -> new Tuple2<>(RELATIONS_TABLE_NAME, field.name()))
            .collect(Collectors.toList());

    // create and write the META_Relations dataframe
    DataFrame df = sql.createDataFrame(rows, schema);
    persistor.writeDataFrame(RELATIONS_TABLE_NAME, df);
    persistor.createIndexes(indexes);
    df.unpersist();
}
 

Example 33

Project: integrations   File: DaneCountySheriffs.java   Source Code and License5 votesvote up
public static String safeDOBParse( Row row ) {
    String dob = row.getAs( "birthd" );
    if ( dob == null ) {
        return null;
    }
    if ( dob.contains( "#" ) ) {
        return null;
    }
    return bdHelper.parse( dob );
}
 

Example 34

Project: uberscriptquery   File: JdbcSqlInputStatementExecutor.java   Source Code and License5 votesvote up
@Override
public Dataset<Row> execute(SparkSession sparkSession, StatementAssignment statementAssignment, CredentialProvider credentialManager) {
    logger.info("Running query by sql jdbc: " + statementAssignment);
    Map<String, String> queryConfig = statementAssignment.getQueryConfig();
    String connectionString = queryConfig.get(StatementAssignment.QUERY_CONFIG_CONNECTION_STRING);
    String passwordFile = queryConfig.get(StatementAssignment.QUERY_CONFIG_PASSWORD_FILE);
    String passwordEntry = queryConfig.get(StatementAssignment.QUERY_CONFIG_PASSWORD_ENTRY);
    String password = credentialManager.getPassword(passwordFile, passwordEntry);
    if (password != null) {
        connectionString = connectionString.replace("[password]", password);
    }
    return SparkUtils.readJdbc(connectionString, statementAssignment.getQueryStatement(), sparkSession);
}
 

Example 35

Project: rdf2x   File: MetadataWriterTest.java   Source Code and License5 votesvote up
private JavaRDD<Row> getExpectedRowsOfMetaPredicates() {
    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/knows"), "http://example.com/knows", "Knows label"));
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/likes"), "http://example.com/likes", "Likes label"));
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/name"), "http://example.com/name", "Name label"));
    rows.add(RowFactory.create(predicateIndex.getIndex("http://example.com/age"), "http://example.com/age", null));
    return jsc().parallelize(rows);
}
 

Example 36

Project: net.jgp.labs.spark.datasources   File: PhotoMetadataIngestionApp.java   Source Code and License5 votesvote up
private boolean start() {
    SparkSession spark = SparkSession.builder()
            .appName("EXIF to Dataset")
            .master("local[*]").getOrCreate();
    
    String importDirectory = "/Users/jgp/Pictures";
    
    Dataset<Row> df = spark.read()
            .format("exif")
            .option("recursive", "true")
            .option("limit", "100000")
            .option("extensions", "jpg,jpeg")
            .load(importDirectory);
    
    // We can start analytics
    df = df
            .filter(df.col("GeoX").isNotNull())
            .filter(df.col("GeoZ").notEqual("NaN"))
            .orderBy(df.col("GeoZ").desc());
    df.collect();
    df.cache();
    System.out.println("I have imported " + df.count() + " photos.");
    df.printSchema();
    df.show(5);
    
    return true;
}
 

Example 37

Project: rdf2x   File: RelationSchemaCollectorTest.java   Source Code and License5 votesvote up
private DataFrame getTestRDD() {
    SQLContext sql = new SQLContext(jsc());
    List<Row> rdd = new ArrayList<>();

    // cycle one -> two -> three -> one
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 1L, uriIndex.getIndex("http://example.com/a"), 2L));
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 2L, uriIndex.getIndex("http://example.com/a"), 3L));
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 3L, uriIndex.getIndex("http://example.com/a"), 1L));

    // one -> four, four -> one
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/a"), 1L, uriIndex.getIndex("http://example.com/b"), 4L));
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/b"), 4L, uriIndex.getIndex("http://example.com/a"), 1L));

    // five -> one
    rdd.add(RowFactory.create(0, uriIndex.getIndex("http://example.com/c"), 5L, uriIndex.getIndex("http://example.com/a"), 1L));

    return sql.createDataFrame(rdd, new StructType()
            .add("predicateIndex", DataTypes.IntegerType, false)
            .add("fromTypeIndex", DataTypes.IntegerType, false)
            .add("fromID", DataTypes.LongType, false)
            .add("toTypeIndex", DataTypes.IntegerType, false)
            .add("toID", DataTypes.LongType, false)
    );
}
 

Example 38

Project: rdf2x   File: InstanceRelationWriterTest.java   Source Code and License5 votesvote up
private JavaRDD<Row> getExpectedRowsOfEAV() {
    List<Row> rows = new ArrayList<>();
    rows.add(RowFactory.create(1L, uriIndex.getIndex("http://example.com/name"), "STRING", null, "First A 1"));
    rows.add(RowFactory.create(1L, uriIndex.getIndex("http://example.com/name"), "STRING", null, "First A 2"));
    rows.add(RowFactory.create(2L, uriIndex.getIndex("http://example.com/name"), "STRING", null, "Second A"));
    rows.add(RowFactory.create(3L, uriIndex.getIndex("http://example.com/age"), "INTEGER", null, "100"));
    rows.add(RowFactory.create(3L, uriIndex.getIndex("http://example.com/name"), "STRING", "en", "First B"));
    return jsc().parallelize(rows);
}
 

Example 39

Project: bunsen   File: ConceptMaps.java   Source Code and License5 votesvote up
/**
 * Writes mapping records to a table. This class ensures the columns and partitions are mapped
 * properly, and is a workaround similar to the problem described <a
 * href="http://stackoverflow.com/questions/35313077/pyspark-order-of-column-on-write-to-mysql-with-jdbc">here</a>.
 *
 * @param mappings a dataset of mapping records
 * @param tableName the table to write them to
 */
private static void writeMappingsToTable(Dataset<Mapping> mappings,
    String tableName) {

  // Note the last two columns here must be the partitioned-by columns
  // in order and in lower case for Spark to properly match
  // them to the partitions.
  Dataset<Row> orderedColumnDataset =
      mappings.select("sourceValueSet",
          "targetValueSet",
          "sourceSystem",
          "sourceValue",
          "targetSystem",
          "targetValue",
          "equivalence",
          "conceptmapuri",
          "conceptmapversion");

  orderedColumnDataset
      .write()
      .insertInto(tableName);
}
 

Example 40

Project: MegaSparkDiff   File: JdbcToFileTest.java   Source Code and License5 votes 
@Test
public void testCompareJDBCTableToTextFile()
{
    SparkFactory.initializeSparkLocalMode("local[*]");

    AppleTable leftAppleTable = SparkFactory.parallelizeJDBCSource("org.hsqldb.jdbc.JDBCDriver",
            "jdbc:hsqldb:hsql://127.0.0.1:9001/testDb",
            "SA",
            "",
            "(select * from Test4)", "table1");

    String file2Path = this.getClass().getClassLoader().
            getResource("Test4.txt").getPath();
    AppleTable rightAppleTable = SparkFactory.parallelizeTextSource(file2Path,"table2");

    Pair<Dataset<Row>,Dataset<Row>> pair = SparkCompare.compareAppleTables(leftAppleTable, rightAppleTable);

    //the expectation is that both tables are completely different
    if (pair.getLeft().count() != 0)
        Assert.fail("Expected 0 differences coming from left table." +
                "  Instead, found " + pair.getLeft().count() + ".");

    if (pair.getRight().count() != 1)
        Assert.fail("Expected 1 difference coming from right table." +
                "  Instead, found " + pair.getRight().count() + ".");

    SparkFactory.stopSparkContext();
}

 

参考文章:

https://www.programcreek.com/java-api-examples/?api=org.apache.spark.sql.Row

http://spark.apache.org/docs/2.1.1/api/scala/index.html#org.apache.spark.sql.Row

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值