HW3 ‐ Importing CSV Files and Querying Data in Cloud Databases - yentingw-gin/BIDD330_Spring2025_Reporting GitHub Wiki
Overview
CSV file is one of the most common formats to store data. They are supported by most databases and data analysis tools, making them a common choice for data import tasks. In this article, we’ll walk through how to upload a CSV file and query data in a cloud database with three different methods:
- SQL Server Management Studio (SSMS)
- Python (Jupyter Notebook), and
- Databricks (Spark)
Each section walks through setup steps and example code, with tips for beginners.
SQL Server Management Studio (SSMS)
There are several ways to upload a CSV file to a cloud database. For example, a data practitioner can use the BULK INSERT
command to load the file directly to the server, or use the Import Flat File feature to first load the data into a local database and then transfer it to the cloud. This article will focus on the latter approach, as it is one of the simplest methods.
Importing CSV Files to Local Instances
- Open SSMS and connect to your local SQL Server instance
- In the Object Explorer window, right-click on Databases and select New Database
- In the New Database window, name the database you are about to create
- Click OK
- Right-click on the newly created database, select Tasks > Import data
- In the SQL Server Import and Export Wizard window, select Flat File Source for Data Source
- Choose a CSV file from the Browse window, review the data then click Next
- In the Choose a Destination step, select Microsoft OLE DB Provider for SQL Server as the destination and pick a destination database
- Review and click Next > Finish
Backing up Local Instances & Uploading them to Cloud
- Right-click on the database to upload, select Tasks > Export Data-tier Application
- In the Export Setting window, click Brwose to specify the local disk
- Click Next to review the summary and hit Finish
- Connect to the cloud server
- In the Object Explorer window, right-click on Databases and select Import Data-tier Application > Next
- In the Import Settings window, select the backup file you've exported.
- Review the Database Settings and Summary, click Finish
The database is in the Cloud, and you can query it using SSMS now!
Python (in Jupyter Notebook)
You can also import CSV files and query data in the cloud using Python in the Jupyter Notebook. Jupyter Notebook provides an interactive environment that allows data practitioners to view results as they code, making it easier to explore and debug data workflows.
Below is a code snippet demonstrating how to connect to the server and execute an SQL query using Python.
import pyodbc
conn = pyodbc.connect(
'DRIVER={SQL Server};SERVER=uwc-studentsql.continuum.uw.edu\\uwcbiddsql;DATABASE=gold_ytwang;UID={placeholder};PWD={placeholder}'
)
cursor = conn.cursor()
cursor.execute("SELECT TOP 10 * FROM [dbo].[unemployment_raw]")
for i in cursor:
print(i)
Databricks (Spark)
- Load CSV file:
scala> val covid = spark.read.format("csv").option("inferSchema","true").option("header","true").load("bing_covid-19_data.csv")
- Show the loaded dataset:
scala> covid.show(10)
- Preview schema:
scala> covid.printSchema
- Load this into a view:
scala> covid.createOrReplaceView("v_FactCovid")
- Run Spark SQL:
scala> spark.sql("select COUNT(*) from v_FactCovid where admin_region_1 = 'Hawaii' ").show
How to Leverage These Techniques in the Final Project
For my final project, I am planning to create the analysis and reports using Bing COVID-19 data and Business Formation Statistics Weekly Data.
Below are the action items to prepare the datasets for analysis and reports:
- Import the Bing COVID-19 and Business Formation Statistics Weekly Data to the local database using SSMS
- Perform ETL, build the data warehouse tablesviews, and the stored procedures in SSMS
- Once the data warehouse is built, export the Data-tier Application and import it into the cloud
- Use Python and Spark to help transform data in between steps