SparkSQL Components
- Catalyst
- Execution Core
- Query planner that translates the logical queries into actual Dataset operations
- SparkSession (2.X) or SQLContext (1.X) is defined in core
- Hive Integration
Catalyst Optimizer
- Backend agnostic - supports both SQL and Dataset code
- Manipulation of trees of relational operators and expressions
Execution Core: SparkSession
Spark 1.X - SQLContext
val sc = new SparkContext(master,none)
val sqlContext = new SQLContext(sc)
Spark 2.X - SparkSession which also creates the SparkContext
val spark = SparkSession.builder().master(master).appName(name).getOrCreate()
val sc = spark.sparkContext
Hive Integration
- Interoperate with Hadoop Hive tables and metastore
-Spark 1.x, use HiveContext an extension of SQLContext
-Spark 2.x, enables Hive support in SparkSession
- Create, read and delete Hive tables
- Use Hive SerDe and UDFs
- Catalyst
- Execution Core
- Query planner that translates the logical queries into actual Dataset operations
- SparkSession (2.X) or SQLContext (1.X) is defined in core
- Hive Integration
Catalyst Optimizer
- Backend agnostic - supports both SQL and Dataset code
- Manipulation of trees of relational operators and expressions
Execution Core: SparkSession
Spark 1.X - SQLContext
val sc = new SparkContext(master,none)
val sqlContext = new SQLContext(sc)
Spark 2.X - SparkSession which also creates the SparkContext
val spark = SparkSession.builder().master(master).appName(name).getOrCreate()
val sc = spark.sparkContext
Hive Integration
- Interoperate with Hadoop Hive tables and metastore
-Spark 1.x, use HiveContext an extension of SQLContext
-Spark 2.x, enables Hive support in SparkSession
- Create, read and delete Hive tables
- Use Hive SerDe and UDFs
No comments:
Post a Comment