Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Spark SQL with Scala Code Examples

463 Aufrufe

Veröffentlicht am

A concentrated look at Apache Spark's library Spark SQL including background information and numerous Scala code examples of using Spark SQL with CSV, JSON and databases such as mySQL.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Spark SQL with Scala Code Examples

  1. 1. Spark SQL Code Examples
  2. 2. Background • Spark SQL is Spark's module for working with structured data. • Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API. Usable in Java, Scala, Python and R. • Born out of Shark project at Berkeley
  3. 3. Assumptions These slides and examples assume you already have at least a basic understanding of Spark constructs such as RDDs, Actions, Transformers.
  4. 4. Resources To learn more about Spark, checkout supergloo’s free Spark Tutorials
  5. 5. Introduction • DataFrames are a kind of Resilient Distributed Data Set • DataFrames are composed of Row objects accompanied with schema which describes the data types of each column. • A DataFrame may be considered similar to a table in a traditional relational database
  6. 6. 1. $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 2. scala>val baby_names = sqlContext.read.format("com.databricks.spark.csv").option("he ader", "true").option("inferSchema", “true").load("baby_names.csv") 3. scala> baby_names.registerTempTable(“names") 4. scala> val distinctYears = sqlContext.sql("select distinct Year from names”) 5. scala> distinctYears.collect.foreach(println) Spark SQL with CSV
  7. 7. JSON in following examples: {"first_name":"James", "last_name":"Butterburg", "address": {"street": "6649 N Blue Gum St", "city": "New Orleans","state": "LA", "zip": "70116" }} {"first_name":"Josephine", "last_name":"Darakjy", "address": {"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI", "zip": "48116" }} {"first_name":"Art", "last_name":"Chemel", "address": {"street": "8 W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip": "08014" }} Spark SQL with JSON (slide 1 of 2)
  8. 8. 1. $SPARK_HOME/bin/spark-shell 2. scala> val customers = sqlContext.jsonFile(“customers.json") 3. scala> customers.registerTempTable(“customers") 4. scala> val firstCityState = sqlContext.sql("SELECT first_name, address.city, address.state FROM customers") Spark SQL with JSON (slide 2 of 2)
  9. 9. Requirements 1. MySQL instance 2. MySQL JDBC driver Spark SQL with JDBC mySQL (slide 1 of 2)
  10. 10. 1. $SPARK_HOME/bin/spark-shell –jars mysql-connector- java-5.1.26.jar 2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost/sparksql").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "baby_names").option("user", "root").option("password", “root").load() 3. scala> dataframe_mysql.registerTempTable(“names") 4. scala> dataframe_mysql.sqlContext.sql("select * from names”).collect.foreach(println) Spark SQL with JDBC mySQL (slide 2 of 2)
  11. 11. Conclusion For more Spark SQL and other Spark tutorials visit: http://www.supergloo.com/
  12. 12. Credit Title slide image: https://flic.kr/p/8wFrUX

×