SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
```python
!pip install pyspark
```
Collecting pyspark
Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
Collecting py4j==0.10.4 (from pyspark)
Downloading py4j-0.10.4-py2.py3-none-any.whl (186kB)
Building wheels for collected packages: pyspark
Running setup.py bdist_wheel for pyspark: started
Running setup.py bdist_wheel for pyspark: finished with status
'done'
Stored in directory:
C:UsersDellAppDataLocalpipCachewheels5f0bb35cb16b15d28dcc32f8e
7ec91a044829642874bb7586f6e6cbe
Successfully built pyspark
Installing collected packages: py4j, pyspark
Successfully installed py4j-0.10.4 pyspark-2.2.0
```python
from pyspark import SparkContext,SparkConf
sc=SparkContext()
```
```python
import os
```
```python
os.getcwd()
```
'C:UsersDell'
```python
os.chdir('C:UsersDellDesktop')
```
```python
os.listdir()
```
['desktop.ini',
'dump 2582017',
'Fusion Church.html',
'Fusion Church_files',
'iris.csv',
'KOG',
'NF22997109906610.ETicket.pdf',
'R Packages',
'Telegram.lnk',
'twitter_share.jpg',
'winutils.exe',
'~$avel Reimbursements.docx',
'~$thonajay.docx']
```python
#load data
data=sc.textFile('C:UsersDellDesktopiris.csv')
```
```python
type(data)
```
pyspark.rdd.RDD
```python
data.top(1)
```
['7.9,3.8,6.4,2,"virginica"']
```python
data.first()
```
'"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"'
```python
from pyspark.sql import SparkSession
```
```python
spark= SparkSession.builder 
.master("local") 
.appName("Data Exploration") 
.getOrCreate()
```
```python
#load data as Spark DataFrame
data2=spark.read.format("csv") 
.option("header","true") 
.option("mode","DROPMALFORMED") 
.load('C:UsersDellDesktopiris.csv')
```
```python
type(data2)
```
pyspark.sql.dataframe.DataFrame
```python
data2.printSchema()
```
root
|-- Sepal.Length: string (nullable = true)
|-- Sepal.Width: string (nullable = true)
|-- Petal.Length: string (nullable = true)
|-- Petal.Width: string (nullable = true)
|-- Species: string (nullable = true)
```python
data2.columns
```
['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width',
'Species']
```python
data2.schema.names
```
['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width',
'Species']
```python
newColumns=['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width',
'Species']
```
```python
from functools import reduce
```
```python
data2 = reduce(lambda data2, idx:
data2.withColumnRenamed(oldColumns[idx], newColumns[idx]),
range(len(oldColumns)), data2)
data2.printSchema()
data2.show()
```
root
|-- Sepal_Length: string (nullable = true)
|-- Sepal_Width: string (nullable = true)
|-- Petal_Length: string (nullable = true)
|-- Petal_Width: string (nullable = true)
|-- Species: string (nullable = true)
+------------+-----------+------------+-----------+-------+
|Sepal_Length|Sepal_Width|Petal_Length|Petal_Width|Species|
+------------+-----------+------------+-----------+-------+
| 5.1| 3.5| 1.4| 0.2| setosa|
| 4.9| 3| 1.4| 0.2| setosa|
| 4.7| 3.2| 1.3| 0.2| setosa|
| 4.6| 3.1| 1.5| 0.2| setosa|
| 5| 3.6| 1.4| 0.2| setosa|
| 5.4| 3.9| 1.7| 0.4| setosa|
| 4.6| 3.4| 1.4| 0.3| setosa|
| 5| 3.4| 1.5| 0.2| setosa|
| 4.4| 2.9| 1.4| 0.2| setosa|
| 4.9| 3.1| 1.5| 0.1| setosa|
| 5.4| 3.7| 1.5| 0.2| setosa|
| 4.8| 3.4| 1.6| 0.2| setosa|
| 4.8| 3| 1.4| 0.1| setosa|
| 4.3| 3| 1.1| 0.1| setosa|
| 5.8| 4| 1.2| 0.2| setosa|
| 5.7| 4.4| 1.5| 0.4| setosa|
| 5.4| 3.9| 1.3| 0.4| setosa|
| 5.1| 3.5| 1.4| 0.3| setosa|
| 5.7| 3.8| 1.7| 0.3| setosa|
| 5.1| 3.8| 1.5| 0.3| setosa|
+------------+-----------+------------+-----------+-------+
only showing top 20 rows
```python
data2.dtypes
```
[('Sepal_Length', 'string'),
('Sepal_Width', 'string'),
('Petal_Length', 'string'),
('Petal_Width', 'string'),
('Species', 'string')]
```python
data3 = data2.select('Sepal_Length', 'Sepal_Width', 'Species')
data3.cache()
data3.count()
```
150
```python
data3.show()
```
+------------+-----------+-------+
|Sepal_Length|Sepal_Width|Species|
+------------+-----------+-------+
| 5.1| 3.5| setosa|
| 4.9| 3| setosa|
| 4.7| 3.2| setosa|
| 4.6| 3.1| setosa|
| 5| 3.6| setosa|
| 5.4| 3.9| setosa|
| 4.6| 3.4| setosa|
| 5| 3.4| setosa|
| 4.4| 2.9| setosa|
| 4.9| 3.1| setosa|
| 5.4| 3.7| setosa|
| 4.8| 3.4| setosa|
| 4.8| 3| setosa|
| 4.3| 3| setosa|
| 5.8| 4| setosa|
| 5.7| 4.4| setosa|
| 5.4| 3.9| setosa|
| 5.1| 3.5| setosa|
| 5.7| 3.8| setosa|
| 5.1| 3.8| setosa|
+------------+-----------+-------+
only showing top 20 rows
```python
data3.limit(5)
```
DataFrame[Sepal_Length: string, Sepal_Width: string, Species: string]
```python
data3.limit(5).show()
```
+------------+-----------+-------+
|Sepal_Length|Sepal_Width|Species|
+------------+-----------+-------+
| 5.1| 3.5| setosa|
| 4.9| 3| setosa|
| 4.7| 3.2| setosa|
| 4.6| 3.1| setosa|
| 5| 3.6| setosa|
+------------+-----------+-------+
```python
data3.limit(5).limit(2).show()
```
+------------+-----------+-------+
|Sepal_Length|Sepal_Width|Species|
+------------+-----------+-------+
| 5.1| 3.5| setosa|
| 4.9| 3| setosa|
+------------+-----------+-------+
```python
data4=data2.selectExpr('CAST(Sepal_Length AS INT) AS Sepal_Length')
```
```python
data4
```
DataFrame[Sepal_Length: int]
```python
from pyspark.sql.functions import *
```
```python
data4.select('Sepal_Length').agg(mean('Sepal_Length')).show()
```
+-----------------+
|avg(Sepal_Length)|
+-----------------+
|5.386666666666667|
+-----------------+
```python
data5=data2.selectExpr('CAST(Sepal_Length AS INT) AS
Sepal_Length','CAST(Petal_Width AS INT) AS Petal_Width','CAST(Sepal_Width
AS INT) AS Sepal_Width','CAST(Petal_Length AS INT) AS
Petal_Length','Species')
```
```python
data5
```
DataFrame[Sepal_Length: int, Petal_Width: int, Sepal_Width: int,
Petal_Length: int, Species: string]
```python
data5.columns
```
['Sepal_Length', 'Petal_Width', 'Sepal_Width', 'Petal_Length',
'Species']
```python
data5.select('Sepal_Length','Species').groupBy('Species').agg(mean("Sepal
_Length")).show()
```
+----------+-----------------+
| Species|avg(Sepal_Length)|
+----------+-----------------+
| virginica| 6.08|
|versicolor| 5.48|
| setosa| 4.6|
+----------+-----------------+
```python
#df =
data3.select(col('Sepal_Length'),dat.Sepal_Length.cast('float').alias('pr
ice'))
```

Weitere Àhnliche Inhalte

Was ist angesagt?

KCDC - .NET memory management
KCDC - .NET memory managementKCDC - .NET memory management
KCDC - .NET memory managementbenemmett
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationAndrew Hutchings
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinChad Cooper
 
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.Andrii Soldatenko
 
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212Mahmoud Samir Fayed
 
Triangle OpenStack meetup 09 2013
Triangle OpenStack meetup 09 2013Triangle OpenStack meetup 09 2013
Triangle OpenStack meetup 09 2013Dan Radez
 
R sharing 101
R sharing 101R sharing 101
R sharing 101Omnia Safaan
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in Rmickey24
 
Openstack installation using rdo multi node
Openstack installation using rdo multi nodeOpenstack installation using rdo multi node
Openstack installation using rdo multi nodeNarasimha sreeram
 
Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput Grant McAlister
 
Basicsof c make and git for a hello qt application
Basicsof c make and git for a hello qt applicationBasicsof c make and git for a hello qt application
Basicsof c make and git for a hello qt applicationDinesh Manajipet
 
Maximal slice problem
Maximal slice problemMaximal slice problem
Maximal slice problemmininerej
 
Use of django at jolt online v3
Use of django at jolt online v3Use of django at jolt online v3
Use of django at jolt online v3Jaime Buelta
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1Nagavarunkumar Kolla
 
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189Mahmoud Samir Fayed
 
Fun with processes - lightning talk
Fun with processes - lightning talkFun with processes - lightning talk
Fun with processes - lightning talkPaweƂ Dawczak
 
Assignment6
Assignment6Assignment6
Assignment6Ryan Gogats
 

Was ist angesagt? (20)

KCDC - .NET memory management
KCDC - .NET memory managementKCDC - .NET memory management
KCDC - .NET memory management
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
 
Project 1
Project 1Project 1
Project 1
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.
 
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
 
Triangle OpenStack meetup 09 2013
Triangle OpenStack meetup 09 2013Triangle OpenStack meetup 09 2013
Triangle OpenStack meetup 09 2013
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Openstack installation using rdo multi node
Openstack installation using rdo multi nodeOpenstack installation using rdo multi node
Openstack installation using rdo multi node
 
tp smarts_onboarding
 tp smarts_onboarding tp smarts_onboarding
tp smarts_onboarding
 
Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput
 
Basicsof c make and git for a hello qt application
Basicsof c make and git for a hello qt applicationBasicsof c make and git for a hello qt application
Basicsof c make and git for a hello qt application
 
Maximal slice problem
Maximal slice problemMaximal slice problem
Maximal slice problem
 
Use of django at jolt online v3
Use of django at jolt online v3Use of django at jolt online v3
Use of django at jolt online v3
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189
 
Fun with processes - lightning talk
Fun with processes - lightning talkFun with processes - lightning talk
Fun with processes - lightning talk
 
Assignment6
Assignment6Assignment6
Assignment6
 

Ähnlich wie Pyspark

Entity System Architecture with Unity - Unity User Group Berlin
Entity System Architecture with Unity - Unity User Group BerlinEntity System Architecture with Unity - Unity User Group Berlin
Entity System Architecture with Unity - Unity User Group BerlinSimon Schmid
 
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid Wooga
 
Spraykatz installation & basic usage
Spraykatz installation & basic usageSpraykatz installation & basic usage
Spraykatz installation & basic usageSylvain Cortes
 
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017Codemotion
 
Ac cuda c_4
Ac cuda c_4Ac cuda c_4
Ac cuda c_4Josh Wyatt
 
Czym jest webpack i dlaczego chcesz go uĆŒywać?
Czym jest webpack i dlaczego chcesz go uĆŒywać?Czym jest webpack i dlaczego chcesz go uĆŒywać?
Czym jest webpack i dlaczego chcesz go uĆŒywać?Marcin Gajda
 
Open stack pike-devstack-tutorial
Open stack pike-devstack-tutorialOpen stack pike-devstack-tutorial
Open stack pike-devstack-tutorialEueung Mulyana
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGIMike Pittaro
 
How to Install Configure and Use sysstat utils on RHEL 7
How to Install Configure and Use sysstat utils on RHEL 7How to Install Configure and Use sysstat utils on RHEL 7
How to Install Configure and Use sysstat utils on RHEL 7VCP Muthukrishna
 
[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ 및 ê”Źì¶• ë°©ëȕ
[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ  및 ê”Źì¶• ë°©ëȕ[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ  및 ê”Źì¶• ë°©ëȕ
[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ 및 ê”Źì¶• ë°©ëȕOpen Source Consulting
 
Arbeiten mit distribute, pip und virtualenv
Arbeiten mit distribute, pip und virtualenvArbeiten mit distribute, pip und virtualenv
Arbeiten mit distribute, pip und virtualenvMarkus Zapke-GrĂŒndemann
 
КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...
КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...
КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...Ontico
 
AtlasCamp 2015 Docker continuous integration training
AtlasCamp 2015 Docker continuous integration trainingAtlasCamp 2015 Docker continuous integration training
AtlasCamp 2015 Docker continuous integration trainingSteve Smith
 
Using Nix and Docker as automated deployment solutions
Using Nix and Docker as automated deployment solutionsUsing Nix and Docker as automated deployment solutions
Using Nix and Docker as automated deployment solutionsSander van der Burg
 
Mojolicious lite
Mojolicious liteMojolicious lite
Mojolicious liteandrefsantos
 
How to deliver a Python project
How to deliver a Python projectHow to deliver a Python project
How to deliver a Python projectmattjdavidson
 
Undelete (and more) rows from the binary log
Undelete (and more) rows from the binary logUndelete (and more) rows from the binary log
Undelete (and more) rows from the binary logFrederic Descamps
 

Ähnlich wie Pyspark (20)

Entity System Architecture with Unity - Unity User Group Berlin
Entity System Architecture with Unity - Unity User Group BerlinEntity System Architecture with Unity - Unity User Group Berlin
Entity System Architecture with Unity - Unity User Group Berlin
 
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
 
Spraykatz installation & basic usage
Spraykatz installation & basic usageSpraykatz installation & basic usage
Spraykatz installation & basic usage
 
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
 
Ac cuda c_4
Ac cuda c_4Ac cuda c_4
Ac cuda c_4
 
GoLang & GoatCore
GoLang & GoatCore GoLang & GoatCore
GoLang & GoatCore
 
Czym jest webpack i dlaczego chcesz go uĆŒywać?
Czym jest webpack i dlaczego chcesz go uĆŒywać?Czym jest webpack i dlaczego chcesz go uĆŒywać?
Czym jest webpack i dlaczego chcesz go uĆŒywać?
 
C&C Botnet Factory
C&C Botnet FactoryC&C Botnet Factory
C&C Botnet Factory
 
Open stack pike-devstack-tutorial
Open stack pike-devstack-tutorialOpen stack pike-devstack-tutorial
Open stack pike-devstack-tutorial
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGI
 
How to Install Configure and Use sysstat utils on RHEL 7
How to Install Configure and Use sysstat utils on RHEL 7How to Install Configure and Use sysstat utils on RHEL 7
How to Install Configure and Use sysstat utils on RHEL 7
 
[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ 및 ê”Źì¶• ë°©ëȕ
[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ  및 ê”Źì¶• ë°©ëȕ[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ  및 ê”Źì¶• ë°©ëȕ
[였픈소슀컚섀팅] ìż ëČ„ë„€í‹°ìŠ€ì™€ ìż ëČ„ë„€í‹°ìŠ€ on 였픈슀택 ëč„ꔐ 및 ê”Źì¶• ë°©ëȕ
 
Arbeiten mit distribute, pip und virtualenv
Arbeiten mit distribute, pip und virtualenvArbeiten mit distribute, pip und virtualenv
Arbeiten mit distribute, pip und virtualenv
 
Basic Linux kernel
Basic Linux kernelBasic Linux kernel
Basic Linux kernel
 
КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...
КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...
КаĐș ĐżĐŸĐœŃŃ‚ŃŒ, Ń‡Ń‚ĐŸ ĐżŃ€ĐŸĐžŃŃ…ĐŸĐŽĐžŃ‚ ĐœĐ° сДрĐČДрД? / АлДĐșŃĐ°ĐœĐŽŃ€ ĐšŃ€ĐžĐ¶Đ°ĐœĐŸĐČсĐșĐžĐč (NatSys Lab.,...
 
AtlasCamp 2015 Docker continuous integration training
AtlasCamp 2015 Docker continuous integration trainingAtlasCamp 2015 Docker continuous integration training
AtlasCamp 2015 Docker continuous integration training
 
Using Nix and Docker as automated deployment solutions
Using Nix and Docker as automated deployment solutionsUsing Nix and Docker as automated deployment solutions
Using Nix and Docker as automated deployment solutions
 
Mojolicious lite
Mojolicious liteMojolicious lite
Mojolicious lite
 
How to deliver a Python project
How to deliver a Python projectHow to deliver a Python project
How to deliver a Python project
 
Undelete (and more) rows from the binary log
Undelete (and more) rows from the binary logUndelete (and more) rows from the binary log
Undelete (and more) rows from the binary log
 

Mehr von Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft Ajay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanishAjay Ohri
 

Mehr von Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanish
 

KĂŒrzlich hochgeladen

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Standamitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 

KĂŒrzlich hochgeladen (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Pyspark

  • 1. ```python !pip install pyspark ``` Collecting pyspark Downloading pyspark-2.2.0.post0.tar.gz (188.3MB) Collecting py4j==0.10.4 (from pyspark) Downloading py4j-0.10.4-py2.py3-none-any.whl (186kB) Building wheels for collected packages: pyspark Running setup.py bdist_wheel for pyspark: started Running setup.py bdist_wheel for pyspark: finished with status 'done' Stored in directory: C:UsersDellAppDataLocalpipCachewheels5f0bb35cb16b15d28dcc32f8e 7ec91a044829642874bb7586f6e6cbe Successfully built pyspark Installing collected packages: py4j, pyspark Successfully installed py4j-0.10.4 pyspark-2.2.0 ```python from pyspark import SparkContext,SparkConf sc=SparkContext() ``` ```python import os ``` ```python os.getcwd() ``` 'C:UsersDell' ```python os.chdir('C:UsersDellDesktop') ``` ```python os.listdir() ```
  • 2. ['desktop.ini', 'dump 2582017', 'Fusion Church.html', 'Fusion Church_files', 'iris.csv', 'KOG', 'NF22997109906610.ETicket.pdf', 'R Packages', 'Telegram.lnk', 'twitter_share.jpg', 'winutils.exe', '~$avel Reimbursements.docx', '~$thonajay.docx'] ```python #load data data=sc.textFile('C:UsersDellDesktopiris.csv') ``` ```python type(data) ``` pyspark.rdd.RDD ```python data.top(1) ``` ['7.9,3.8,6.4,2,"virginica"'] ```python data.first() ```
  • 3. '"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"' ```python from pyspark.sql import SparkSession ``` ```python spark= SparkSession.builder .master("local") .appName("Data Exploration") .getOrCreate() ``` ```python #load data as Spark DataFrame data2=spark.read.format("csv") .option("header","true") .option("mode","DROPMALFORMED") .load('C:UsersDellDesktopiris.csv') ``` ```python type(data2) ``` pyspark.sql.dataframe.DataFrame ```python data2.printSchema() ``` root |-- Sepal.Length: string (nullable = true) |-- Sepal.Width: string (nullable = true) |-- Petal.Length: string (nullable = true) |-- Petal.Width: string (nullable = true) |-- Species: string (nullable = true)
  • 4. ```python data2.columns ``` ['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species'] ```python data2.schema.names ``` ['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species'] ```python newColumns=['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width', 'Species'] ``` ```python from functools import reduce ``` ```python data2 = reduce(lambda data2, idx: data2.withColumnRenamed(oldColumns[idx], newColumns[idx]), range(len(oldColumns)), data2) data2.printSchema() data2.show() ``` root |-- Sepal_Length: string (nullable = true) |-- Sepal_Width: string (nullable = true) |-- Petal_Length: string (nullable = true)
  • 5. |-- Petal_Width: string (nullable = true) |-- Species: string (nullable = true) +------------+-----------+------------+-----------+-------+ |Sepal_Length|Sepal_Width|Petal_Length|Petal_Width|Species| +------------+-----------+------------+-----------+-------+ | 5.1| 3.5| 1.4| 0.2| setosa| | 4.9| 3| 1.4| 0.2| setosa| | 4.7| 3.2| 1.3| 0.2| setosa| | 4.6| 3.1| 1.5| 0.2| setosa| | 5| 3.6| 1.4| 0.2| setosa| | 5.4| 3.9| 1.7| 0.4| setosa| | 4.6| 3.4| 1.4| 0.3| setosa| | 5| 3.4| 1.5| 0.2| setosa| | 4.4| 2.9| 1.4| 0.2| setosa| | 4.9| 3.1| 1.5| 0.1| setosa| | 5.4| 3.7| 1.5| 0.2| setosa| | 4.8| 3.4| 1.6| 0.2| setosa| | 4.8| 3| 1.4| 0.1| setosa| | 4.3| 3| 1.1| 0.1| setosa| | 5.8| 4| 1.2| 0.2| setosa| | 5.7| 4.4| 1.5| 0.4| setosa| | 5.4| 3.9| 1.3| 0.4| setosa| | 5.1| 3.5| 1.4| 0.3| setosa| | 5.7| 3.8| 1.7| 0.3| setosa| | 5.1| 3.8| 1.5| 0.3| setosa| +------------+-----------+------------+-----------+-------+ only showing top 20 rows ```python data2.dtypes ``` [('Sepal_Length', 'string'), ('Sepal_Width', 'string'), ('Petal_Length', 'string'), ('Petal_Width', 'string'), ('Species', 'string')] ```python data3 = data2.select('Sepal_Length', 'Sepal_Width', 'Species') data3.cache() data3.count() ```
  • 6. 150 ```python data3.show() ``` +------------+-----------+-------+ |Sepal_Length|Sepal_Width|Species| +------------+-----------+-------+ | 5.1| 3.5| setosa| | 4.9| 3| setosa| | 4.7| 3.2| setosa| | 4.6| 3.1| setosa| | 5| 3.6| setosa| | 5.4| 3.9| setosa| | 4.6| 3.4| setosa| | 5| 3.4| setosa| | 4.4| 2.9| setosa| | 4.9| 3.1| setosa| | 5.4| 3.7| setosa| | 4.8| 3.4| setosa| | 4.8| 3| setosa| | 4.3| 3| setosa| | 5.8| 4| setosa| | 5.7| 4.4| setosa| | 5.4| 3.9| setosa| | 5.1| 3.5| setosa| | 5.7| 3.8| setosa| | 5.1| 3.8| setosa| +------------+-----------+-------+ only showing top 20 rows ```python data3.limit(5) ``` DataFrame[Sepal_Length: string, Sepal_Width: string, Species: string] ```python
  • 7. data3.limit(5).show() ``` +------------+-----------+-------+ |Sepal_Length|Sepal_Width|Species| +------------+-----------+-------+ | 5.1| 3.5| setosa| | 4.9| 3| setosa| | 4.7| 3.2| setosa| | 4.6| 3.1| setosa| | 5| 3.6| setosa| +------------+-----------+-------+ ```python data3.limit(5).limit(2).show() ``` +------------+-----------+-------+ |Sepal_Length|Sepal_Width|Species| +------------+-----------+-------+ | 5.1| 3.5| setosa| | 4.9| 3| setosa| +------------+-----------+-------+ ```python data4=data2.selectExpr('CAST(Sepal_Length AS INT) AS Sepal_Length') ``` ```python data4 ``` DataFrame[Sepal_Length: int] ```python from pyspark.sql.functions import * ``` ```python data4.select('Sepal_Length').agg(mean('Sepal_Length')).show()
  • 8. ``` +-----------------+ |avg(Sepal_Length)| +-----------------+ |5.386666666666667| +-----------------+ ```python data5=data2.selectExpr('CAST(Sepal_Length AS INT) AS Sepal_Length','CAST(Petal_Width AS INT) AS Petal_Width','CAST(Sepal_Width AS INT) AS Sepal_Width','CAST(Petal_Length AS INT) AS Petal_Length','Species') ``` ```python data5 ``` DataFrame[Sepal_Length: int, Petal_Width: int, Sepal_Width: int, Petal_Length: int, Species: string] ```python data5.columns ``` ['Sepal_Length', 'Petal_Width', 'Sepal_Width', 'Petal_Length', 'Species'] ```python data5.select('Sepal_Length','Species').groupBy('Species').agg(mean("Sepal _Length")).show() ``` +----------+-----------------+ | Species|avg(Sepal_Length)| +----------+-----------------+ | virginica| 6.08|
  • 9. |versicolor| 5.48| | setosa| 4.6| +----------+-----------------+ ```python #df = data3.select(col('Sepal_Length'),dat.Sepal_Length.cast('float').alias('pr ice')) ```