从Spark Row 到 GenericRowWithSchema-369IT编程

admin管理员组
文章数量:1026989

从Spark Row 到 GenericRowWithSchema

Dataframe.collect () 是常用的将分布式数据载入到Driver的方法，得到的是Array[GenericRowWithSchema]类型，常常需要从GenericRowWithSchema提取数据，具体所以了解GenericRowWithSchema类型是十分有必要的。而GenericRowWithSchema继承自 org.apache.spark.sql.Row，自己本身并没有定义多少方法。所以从Row 先开始学习。

库引用:

import org.apache.spark.sql._

创建：

 // Create a Row from values.Row(value1, value2, value3, ...)// Create a Row from a Seq of values.Row.fromSeq(Seq(value1, value2, ...))

访问：两种方式，generic access方式得到的都是Any类。 native primitive access得到的是指定类。

val row = Row(1, true, "a string", null)// row: Row = [1,true,a string,null]val firstValue = row(0)// firstValue: Any = 1val fourthValue = row(3)// fourthValue: Any = nullval firstValue = row.getInt(0)// firstValue: Int = 1val isNull = row.isNullAt(3)//没有getNull，只需判断即可// isNull: Boolean = true

常用函数：

anyNull(): Boolean 是否还有null元素

fieldIndex(java.lang.String name): Int 查找域名索引

get(int i) 和apply(int i)一样,得到位置i处的值，类型为Any

getAs(int i),实际用法是getAs[T](int i) 将位置i处的值按T类型取出

getAs(java.lang.String fieldName) 不会用

get*(int i) 获取位置i处的值，并使其为*类型， *：Byte, Date, Decimal, Double, Float, JavaMap (i处需为array type), List(array type), Long, Map(map type), Seq(array type), Short, String, Structure( structure type 返回Row), Timestamp(data type),

getValuesMap(scala.collection.Seq<java.lang.String> fieldNames) 不会

mkString(String seq) 可看成是由各元素.toString后组成

mkString(java.lang.String start, java.lang.String sep, java.lang.String end) 带指定头尾

toSeq() 用所有元素组成WrappedArray类返回，调用WrappedArray.to* 可转换为其他*类型

其他：copy(), equals(java.Obejct o)，hashCode()， length() ，isNullAt(int i)， size()， toString()

val r1=Row(1, true, "a string", null,Array(1,2,3))
val r2=Row(1,2,3,4) 
r1.mkString(",")
//res3: String = 1,true,a string,null,[I@117d8e07
r1.toSeq
//WrappedArray(1, true, a string, null, [I@117d8e07)
r2.toSeq
//res5: Seq[Any] = WrappedArray(1, 2, 3, 4)
print(r1)
//[1,true,a string,null,[I@117d8e07]res6: Unit = ()
print(r2)
//[1,2,3,4]res7: Unit = ()
r1.schema
r2.schema
//res8: org.apache.spark.sql.types.StructType = null
//res9: org.apache.spark.sql.types.StructType = null

GenericRow和GenericRowwithSchema 定义如下：

class GenericRow(protected[sql] val values: Array[Any]) extends Row {/** No-arg constructor for serialization. */protected def this() = this(null)def this(size: Int) = this(new Array[Any](size))override def length: Int = values.lengthoverride def get(i: Int): Any = values(i)override def toSeq: Seq[Any] = values.clone()override def copy(): GenericRow = this
}

class GenericRowWithSchema(values: Array[Any], override val schema: StructType)extends GenericRow(values) {/** No-arg constructor for serialization. */protected def this() = this(null, null)override def fieldIndex(name: String): Int = schema.fieldIndex(name)
}

所以，要想得到GenericRowwithSchema内部的值，可调用：

GenericRowwithSchema.toSeq ：WrappedArray[SchemaType]，相应位置的元素类型由自带Schema决定

若想得到某个位置的数据则用GenericRowwithSchema.getT(位置)，如grs.getString(1)

Row.toSeq: 得到WrappedArray[Any]对象（实际上是Seq[Any]对象），再调用WrappedArray.toT 转换为其他T[Any]类型或

（推荐使用）WrappedArray.asInstanceof(Seq[T])转换为WrappedArray[T]对象(即Seq[T]对象)。注意，直接转为WrappedArray.asInstanceof(Array[T])是不行的，会报错。

WrapperedArray的相关信息参见： Scala 数组（Array, WrapperedArrary）

具体示例可见:

参考资料：

.scala

.5.1/api/java/org/apache/spark/sql/Row.html

从Spark Row 到 GenericRowWithSchema

库引用:

import org.apache.spark.sql._

创建：

 // Create a Row from values.Row(value1, value2, value3, ...)// Create a Row from a Seq of values.Row.fromSeq(Seq(value1, value2, ...))

访问：两种方式，generic access方式得到的都是Any类。 native primitive access得到的是指定类。

val row = Row(1, true, "a string", null)// row: Row = [1,true,a string,null]val firstValue = row(0)// firstValue: Any = 1val fourthValue = row(3)// fourthValue: Any = nullval firstValue = row.getInt(0)// firstValue: Int = 1val isNull = row.isNullAt(3)//没有getNull，只需判断即可// isNull: Boolean = true

常用函数：

anyNull(): Boolean 是否还有null元素

fieldIndex(java.lang.String name): Int 查找域名索引

get(int i) 和apply(int i)一样,得到位置i处的值，类型为Any

getAs(int i),实际用法是getAs[T](int i) 将位置i处的值按T类型取出

getAs(java.lang.String fieldName) 不会用

getValuesMap(scala.collection.Seq<java.lang.String> fieldNames) 不会

mkString(String seq) 可看成是由各元素.toString后组成

mkString(java.lang.String start, java.lang.String sep, java.lang.String end) 带指定头尾

toSeq() 用所有元素组成WrappedArray类返回，调用WrappedArray.to* 可转换为其他*类型

其他：copy(), equals(java.Obejct o)，hashCode()， length() ，isNullAt(int i)， size()， toString()

val r1=Row(1, true, "a string", null,Array(1,2,3))
val r2=Row(1,2,3,4) 
r1.mkString(",")
//res3: String = 1,true,a string,null,[I@117d8e07
r1.toSeq
//WrappedArray(1, true, a string, null, [I@117d8e07)
r2.toSeq
//res5: Seq[Any] = WrappedArray(1, 2, 3, 4)
print(r1)
//[1,true,a string,null,[I@117d8e07]res6: Unit = ()
print(r2)
//[1,2,3,4]res7: Unit = ()
r1.schema
r2.schema
//res8: org.apache.spark.sql.types.StructType = null
//res9: org.apache.spark.sql.types.StructType = null

GenericRow和GenericRowwithSchema 定义如下：

class GenericRow(protected[sql] val values: Array[Any]) extends Row {/** No-arg constructor for serialization. */protected def this() = this(null)def this(size: Int) = this(new Array[Any](size))override def length: Int = values.lengthoverride def get(i: Int): Any = values(i)override def toSeq: Seq[Any] = values.clone()override def copy(): GenericRow = this
}

class GenericRowWithSchema(values: Array[Any], override val schema: StructType)extends GenericRow(values) {/** No-arg constructor for serialization. */protected def this() = this(null, null)override def fieldIndex(name: String): Int = schema.fieldIndex(name)
}

所以，要想得到GenericRowwithSchema内部的值，可调用：

GenericRowwithSchema.toSeq ：WrappedArray[SchemaType]，相应位置的元素类型由自带Schema决定

若想得到某个位置的数据则用GenericRowwithSchema.getT(位置)，如grs.getString(1)

Row.toSeq: 得到WrappedArray[Any]对象（实际上是Seq[Any]对象），再调用WrappedArray.toT 转换为其他T[Any]类型或

（推荐使用）WrappedArray.asInstanceof(Seq[T])转换为WrappedArray[T]对象(即Seq[T]对象)。注意，直接转为WrappedArray.asInstanceof(Array[T])是不行的，会报错。

WrapperedArray的相关信息参见： Scala 数组（Array, WrapperedArrary）

具体示例可见:

参考资料：

.scala

.5.1/api/java/org/apache/spark/sql/Row.html

本文标签：从Spark Row 到 GenericRowWithSchema

版权声明：本文标题：从Spark Row 到 GenericRowWithSchema 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/IT/1694646710a254474.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

从Spark Row 到 GenericRowWithSchema

从Spark Row 到 GenericRowWithSchema

从Spark Row 到 GenericRowWithSchema

更多相关文章

从Spark Row 到 GenericRowWithSchema

发表评论

推荐文章

templates - Using WordPress templating for HTML emails

javascript - Loading a json file into Localstorage - Stack Overflow

javascript - Export to Excel doesn't work in edge browser - Stack Overflow

javascript - Ember Data: model and hasMany submodels not persisting - Stack Overflow

javascript - Grails: How to make a g:textfield autocomplete? - Stack Overflow

热门文章

theme development - How to properly add and access a JavaScript file in Wordpress?

nginx - Docker Compose Frontend React Application Outgoing Requests Failure - Stack Overflow

WordPress multisite REST API connection refused with android - redirects to home page when using IP address in Postman

plugins - Add spacebar in WP List Table Search

c - How to export a static library linked to an interface library? - Stack Overflow

excel - Why is a ListBox not transferring its value? - Stack Overflow

jquery - Send javascript date to webservice - Stack Overflow

c++ - Blatant null dereference inside boost::geometry - Stack Overflow

javascript - jquery UI AutoComplete - embedding the list in another container - Stack Overflow

javascript - Animate div on click to bottom and back to original position - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow

javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow

javascript - Get the JSON objects that are not present in another array - Stack Overflow

javascript - How to dismiss a phonegap notification programmatically - Stack Overflow

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow

369IT编程

从Spark Row 到 GenericRowWithSchema

从Spark Row 到 GenericRowWithSchema

从Spark Row 到 GenericRowWithSchema

更多相关文章

从Spark Row 到 GenericRowWithSchema

发表评论

推荐文章

templates - Using WordPress templating for HTML emails

javascript - Loading a json file into Localstorage - Stack Overflow

javascript - Export to Excel doesn&#39;t work in edge browser - Stack Overflow

javascript - Ember Data: model and hasMany submodels not persisting - Stack Overflow

javascript - Grails: How to make a g:textfield autocomplete? - Stack Overflow

热门文章

theme development - How to properly add and access a JavaScript file in Wordpress?

nginx - Docker Compose Frontend React Application Outgoing Requests Failure - Stack Overflow

WordPress multisite REST API connection refused with android - redirects to home page when using IP address in Postman

plugins - Add spacebar in WP List Table Search

c - How to export a static library linked to an interface library? - Stack Overflow

excel - Why is a ListBox not transferring its value? - Stack Overflow

jquery - Send javascript date to webservice - Stack Overflow

c++ - Blatant null dereference inside boost::geometry - Stack Overflow

javascript - jquery UI AutoComplete - embedding the list in another container - Stack Overflow

javascript - Animate div on click to bottom and back to original position - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

javascript - Type &#39;undefined&#39; is not assignable to type &#39;menuItemProps[]&#39; - Stack Overflow

javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow

javascript - Get the JSON objects that are not present in another array - Stack Overflow

javascript - How to dismiss a phonegap notification programmatically - Stack Overflow

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow

javascript - Export to Excel doesn't work in edge browser - Stack Overflow

javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow