java azure blob 查询,使用Spark从Azure Blob读取数据

I am having issue in reading data from azure blobs via spark streaming

JavaDStream lines = ssc.textFileStream("hdfs://ip:8020/directory");

code like above works for HDFS, but is unable to read file from Azure blob

https://blobstorage.blob.core.windows.net/containerid/folder1/

Above is the path which is shown in azure UI, but this doesnt work, am i missing something, and how can we access it.

I know Eventhub are ideal choice for streaming data, but my current situation demands to use storage rather then queues

解决方案

In order to read data from blob storage, there are two things that need to be done. First, you need to tell Spark which native file system to use in the underlying Hadoop configuration. This means that you also need the Hadoop-Azure JAR to be available on your classpath (note there maybe runtime requirements for more JARs related to the Hadoop family):

JavaSparkContext ct = new JavaSparkContext();

Configuration config = ct.hadoopConfiguration();

config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");

config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");

Now, call onto the file using the wasb:// prefix (note the [s] is for optional secure connection):

ssc.textFileStream("wasb[s]://@.blob.core.windows.net/");

This goes without saying that you'll need to have proper permissions set from the location making the query to blob storage.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值