jupyter_labs_eda_sql_coursera_sqllite - Fatimahhg/special-potato GitHub Wiki

Assignment: SQL Notebook for Peer Assignment

Estimated time needed: 60 minutes.

Introduction

Using this Python notebook you will:

  1. Understand the Spacex DataSet
  2. Load the dataset into the corresponding table in a Db2 database
  3. Execute SQL queries to answer assignment questions

Overview of the DataSet

SpaceX has gained worldwide attention for a series of historic milestones.

It is the only private company ever to return a spacecraft from low-earth orbit, which it first accomplished in December 2010. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars wheras other providers cost upward of 165 million dollars each, much of the savings is because Space X can reuse the first stage.

Therefore if we can determine if the first stage will land, we can determine the cost of a launch.

This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

This dataset includes a record for each payload carried during a SpaceX mission into outer space.

Download the datasets

This assignment requires you to load the spacex dataset.

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the link below to download and save the dataset (.CSV file):

Spacex DataSet

!pip install sqlalchemy==1.3.9

Connect to the database

Let us first load the SQL extension and establish a connection with the database

%load_ext sql
import csv, sqlite3

con = sqlite3.connect("my_data1.db") cur = con.cursor()

!pip install -q pandas==1.1.5
%sql sqlite:///my_data1.db
import pandas as pd
df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/data/Spacex.csv")
#df.to_sql("SPACEXTBL", con, if_exists='replace', index=False,method="multi")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 10 columns):

Column Non-Null Count Dtype


0 Date 101 non-null object 1 Time (UTC) 101 non-null object 2 Booster_Version 101 non-null object 3 Launch_Site 101 non-null object 4 Payload 101 non-null object 5 PAYLOAD_MASS__KG_ 101 non-null int64 6 Orbit 101 non-null object 7 Customer 101 non-null object 8 Mission_Outcome 101 non-null object 9 Landing_Outcome 101 non-null object dtypes: int64(1), object(9) memory usage: 8.0+ KB

df.head(2)
  Date Time (UTC) Booster_Version Launch_Site Payload PAYLOAD_MASS__KG_ Orbit Customer Mission_Outcome Landing_Outcome
2010-06-04 18:45:00 F9 v1.0 B0003 CCAFS LC-40 Dragon Spacecraft Qualification Unit 0 LEO SpaceX Success Failure (parachute)
2010-12-08 15:43:00 F9 v1.0 B0004 CCAFS LC-40 Dragon demo flight C1, two CubeSats, barrel of... 0 LEO (ISS) NASA (COTS) NRO Success Failure (parachute)
Assignment: SQL Notebook for Peer Assignment

Estimated time needed: 60 minutes.

Introduction

Using this Python notebook you will:

Understand the Spacex DataSet Load the dataset into the corresponding table in a Db2 database Execute SQL queries to answer assignment questions Overview of the DataSet

SpaceX has gained worldwide attention for a series of historic milestones.

It is the only private company ever to return a spacecraft from low-earth orbit, which it first accomplished in December 2010. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars wheras other providers cost upward of 165 million dollars each, much of the savings is because Space X can reuse the first stage.

Therefore if we can determine if the first stage will land, we can determine the cost of a launch.

This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

This dataset includes a record for each payload carried during a SpaceX mission into outer space.

Download the datasets

This assignment requires you to load the spacex dataset.

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the link below to download and save the dataset (.CSV file):

Spacex DataSet

!pip install sqlalchemy==1.3.9 Connect to the database

Let us first load the SQL extension and establish a connection with the database

%load_ext sql import csv, sqlite3

con = sqlite3.connect("my_data1.db") cur = con.cursor() !pip install -q pandas==1.1.5 %sql sqlite:///my_data1.db import pandas as pd df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/data/Spacex.csv") #df.to_sql("SPACEXTBL", con, if_exists='replace', index=False,method="multi") df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 101 entries, 0 to 100 Data columns (total 10 columns):

Column Non-Null Count Dtype


0 Date 101 non-null object 1 Time (UTC) 101 non-null object 2 Booster_Version 101 non-null object 3 Launch_Site 101 non-null object 4 Payload 101 non-null object 5 PAYLOAD_MASS__KG_ 101 non-null int64 6 Orbit 101 non-null object 7 Customer 101 non-null object 8 Mission_Outcome 101 non-null object 9 Landing_Outcome 101 non-null object dtypes: int64(1), object(9) memory usage: 8.0+ KB df.head(2) Date Time (UTC) Booster_Version Launch_Site Payload PAYLOAD_MASS__KG_ Orbit Customer Mission_Outcome Landing_Outcome 0 2010-06-04 18:45:00 F9 v1.0 B0003 CCAFS LC-40 Dragon Spacecraft Qualification Unit 0 LEO SpaceX Success Failure (parachute) 1 2010-12-08 15:43:00 F9 v1.0 B0004 CCAFS LC-40 Dragon demo flight C1, two CubeSats, barrel of... 0 LEO (ISS) NASA (COTS) NRO Success Failure (parachute) Note:This below code is added to remove blank rows from table

%sql create table SPACEXTABLE as select * from SPACEXTBL where Date is not null Tasks

Now write and execute SQL queries to solve the assignment tasks.

Note: If the column names are in mixed case enclose it in double quotes For Example "Landing_Outcome"

Task 1

Display the names of the unique launch sites in the space mission set(df['Launch_Site']) {'CCAFS LC-40', 'CCAFS SLC-40', 'KSC LC-39A', 'VAFB SLC-4E'} Task 2

Display 5 records where launch sites begin with the string 'CCA' df['Launch_Site'].value_counts() CCAFS SLC-40 34 CCAFS LC-40 26 KSC LC-39A 25 VAFB SLC-4E 16 Name: Launch_Site, dtype: int64 CCA1 = df[df['Launch_Site'].str.contains('CCAFS SLC-40')] CCA1.head() Task 3

Display the total payload mass carried by boosters launched by NASA (CRS) #by NASA (CRS) df_CRS = df[df['Customer'].str.contains('CRS')] df_Mass = df_CRS'PAYLOAD_MASS__KG_','Booster_Version' df_Mass Task 4

Display average payload mass carried by booster version F9 v1.1 df_F9 = df[df['Booster_Version'].str.contains('F9 v1.1')] df_F9['PAYLOAD_MASS__KG_'].mean() 2534.6666666666665 Task 5

List the date when the first succesful landing outcome in ground pad was acheived.

Hint:Use min function

df['Landing_Outcome'].value_counts() df_Land = df[df['Landing_Outcome'] == 'Success (ground pad)'] df_Land['Date'].min() '2015-12-22' Task 6

List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000 df_boost = df[(df['PAYLOAD_MASS__KG_'] > 4000) & (df['PAYLOAD_MASS__KG_']<6000) & (df['Landing_Outcome'] == 'Success (drone ship)')] set(df_boost['Booster_Version']) {'F9 FT B1021.2', 'F9 FT B1031.2', 'F9 FT B1022', 'F9 FT B1026'} Task 7

List the total number of successful and failure mission outcomes df['Mission_Outcome'].value_counts() Success 98 Failure (in flight) 1 Success (payload status unclear) 1 Success 1 Name: Mission_Outcome, dtype: int64 dF_success = df[df['Mission_Outcome'].str.contains('Success')] len(dF_success['Mission_Outcome']) 100 Task 8

List the names of the booster_versions which have carried the maximum payload mass. Use a subquery df_Load = df[df['PAYLOAD_MASS__KG_'] == df['PAYLOAD_MASS__KG_'].max()] print(set(df_Load['Booster_Version'])) {'F9 B5 B1048.5', 'F9 B5 B1049.5', 'F9 B5 B1049.4', 'F9 B5 B1051.3', 'F9 B5 B1056.4', 'F9 B5 B1051.4', 'F9 B5 B1060.2 ', 'F9 B5 B1058.3 ', 'F9 B5 B1048.4', 'F9 B5 B1051.6', 'F9 B5 B1049.7 ', 'F9 B5 B1060.3'} Task 9

List the records which will display the month names, failure landing_outcomes in drone ship ,booster versions, launch_site for the months in year 2015.

Note: SQLLite does not support monthnames. So you need to use substr(Date, 6,2) as month to get the months and substr(Date,0,5)='2015' for year.

import datetime df_date = df'Date','Booster_Version','Launch_Site','Landing_Outcome' df_date df_date['Date'] = pd.to_datetime(df_date['Date'] , format='%Y-%m-%d') :1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_date['Date'] = pd.to_datetime(df_date['Date'] , format='%Y-%m-%d') df_2015 = df_date[df_date['Date'].dt.strftime('%Y') == '2015'] df_month = df_2015[df_2015['Landing_Outcome'] == 'Failure (drone ship)'] df_month['Date']

January

April

13 2015-01-10 16 2015-04-14 Name: Date, dtype: datetime64[ns] Task 10

Rank the count of landing outcomes (such as Failure (drone ship) or Success (ground pad)) between the date 2010-06-04 and 2017-03-20, in descending order. df['Landing_Outcome'].value_counts() df_new = df[(df['Date'] > '2010-06-04') & (df['Date'] < '2017-03-20') & (df['Landing_Outcome'] == 'Success (ground pad)')] df_new df_new['Landing_Outcome'].count() 3 Reference Links

Hands-on Lab : String Patterns, Sorting and Grouping

Hands-on Lab: Built-in functions

Hands-on Lab : Sub-queries and Nested SELECT Statements

Hands-on Tutorial: Accessing Databases with SQL magic

Hands-on Lab: Analyzing a real World Data Set

Author(s)

Lakshmi Holla Other Contributors

Rav Ahuja Change log

Date Version Changed by Change Description 2021-07-09 0.2 Lakshmi Holla Changes made in magic sql 2021-05-20 0.1 Lakshmi Holla Created Initial Version © IBM Corporation 2021. All rights reserved.

⚠️ **GitHub.com Fallback** ⚠️