Querying SQLite from Python

In the past missions, we focused on exploring the SQL syntax for retrieving data from a database. In this mission, we'll explore how to interact with a SQLite database in Python so you can start to incorporate databases in your data science workflow. SQLite is a database that doesn't require a standalone server process and stores the entire database as a file on disk. This makes it ideal for working with larger datasets that can fit on disk but not in memory. Since the Pandas library loads the entire dataset we're working with into memory, this makes SQLite a compelling alternative option for working with datasets that are larger than 8 gigabytes (which is roughly the amount of memory modern computers contain). In addition, since the entire database can be contained within a single file, some datasets are released online as a SQLite database file (using the extension .db).

You can interact with a SQLite database in two main ways:

  • using the Sqlite Python module.
  • using the SQLite shell.

In this mission, we'll focus on learning how to use the Sqlite Python module to interact with the database. Next in this course is a guided project where we explore how to use the SQLite shell to interact with the database.

2: Introduction To The Data

We'll continue to work with the dataset from the American Community Survey on college majors and job outcomes:

RankMajor_codeMajorMajor_categoryTotalSample_sizeMenWomenShareWomenEmployed
12419PETROLEUM ENGINEERINGEngineering23393620572820.1205641976
22416MINING AND MINERAL ENGINEERINGEngineering7567679770.101852640
32415METALLURGICAL ENGINEERINGEngineering85637251310.153037648
42417NAVAL ARCHITECTURE AND MARINE ENGINEERINGEngineering12581611231350.107313758
52405CHEMICAL ENGINEERINGEngineering3226028921239110210.34163125694

The full table has many more columns, 21 to be specific, than the ones displayed above and they're explained in further detail onFiveThirtyEight's Github repo.

Here are the descriptions of the columns in the above snapshot:

  • Rank - Rank by median earnings
  • Major_code - Major code
  • Major - Major description
  • Major_category - Category of major
  • Total - Total number of people with major
  • Sample_size - Sample size (unweighted) of full-time
  • Men - Male graduates
  • Women - Female graduates
  • ShareWomen - Women as share of total
  • Employed - Number employed

We have loaded the data in for years 2010-2012 for just recent college grads into the table recent_grads. The database file we'll be working with is called jobs.db.

3: Connect To The Database

From Python 2.5 and onwards, the Sqlite module has come built-in to the Python language, which means we don't need to install any separate libraries to get started. Specifically, we'll be working with the Sqlite3 Python module, which was developed to work with SQLite version 3.

We can import it into our environment using:

 

import sqlite3

Once imported, we connect to the database we want to query using the connect() function. The connect() function has a single required parameter, the database we want to connect to. Since the database we're working with is stored as a file on disk, we need to pass in the filename. The connect() function returns a Connection instance, which maintains the connection to the database we want to work with. When you're connected to a database, SQLite locks the database file and prevents any other process from connecting to the database simultaneously. This was a design decision made by the SQLite team to keep the database lightweight and avoid the complexity that arises when multiple processes are interacting with the same database.

Instructions

  • Import the Sqlite3 library into the environment.
  • Then, use the Sqlite3 functionconnect() to connect tojobs.db and assign the returned Connection instance to conn

import sqlite3
conn=sqlite3.connect("jobs.db")

 

4: Cursor Object And Tuples

Before we can execute a query, we need to express our SQL query as a string. While we use the Connection class to represent the database we're working with, we use the Cursor class to:

  • run a query against the database.
  • parse the results from the database.
  • convert the results to native Python objects.
  • store the results within the Cursor instance as a local variable.

After running a query and converting the results to a list of tuples, the Cursor instance stores the list as a local variable. Before diving into the syntax of querying the database, let's learn more about tuples.

5: Tuples

tuple is a core Python data structure used to represent a sequence of values, similar to a list. Unlike liststuples are immutable, which means they can't be modified after creation. Each row is in the results set is represented as a tuple.

To create an empty tuple, assign a pair of empty parentheses to a variable:

 

t = ()

Tuples are indexed the same way as lists, from 0 to n-1, and you access values using bracket notation.

 

t = ('Apple', 'Banana')

apple = t[0]

banana = t[1]

Tuples are faster than lists, which is helpful when working with larger databases and larger results sets. Let's now dive into how to use the Cursor instance to query the database.

6: Running A Query

We need to use the Connection instance method cursor() to return a Cursor instance corresponding to the database we want to query.

 

cursor = conn.cursor()

In the following code block, we:

  • write a basic select query that returns all of the values from the recent_grads table and store it as a string called query.
  • use the Cursor method execute() to run the query against our database.
  • return the full results set and store it as results.
  • print the first 3 tuples in the list results.

 

# SQL Query as a string

query = "select * from recent_grads;"

# Execute the query, convert the results to tuples, and store as a local variable.

cursor.execute(query)

# Fetch the full results set, as a list of tuples.

results = cursor.fetchall()

# Display the first 3 results.

print(results[0:3])

Now it's your turn!

Instructions

  • Write a query that returns all of the values in the Major column from the recent_grads table.
  • Store the full results set (a list of tuples) in majors.
  • Then, print the first 3 tuples inmajors

import sqlite3
conn = sqlite3.connect("jobs.db")
cursor = conn.cursor()

query = "select * from recent_grads;"
cursor.execute(query)
results = cursor.fetchall()
print(results[0:2])
query = "select major from recent_grads;"
majors = cursor.execute(query).fetchall()
print(majors[0:3])

7: Shortcut For Running A Query

So far, we've been running queries by creating a Cursor instance and then calling the execute method on the instance. The sqlite library actually allows us to skip creating a Cursor altogether by using the execute method within the Connection object itself. Under the hood, a Cursor instance will be created for us and our query run against the database, but this shortcut allows us to skip a step. Here's how that code looks like:

 

conn = sqlite3.connect("jobs.db")

query = "select * from recent_grads;"

conn.execute(query).fetchall()

In the above code, we didn't explicitly create a separate Cursor instance ourselves. Let's now learn how to fetch a specific number of results after a query is run.

8: Fetching A Specific Number Of Results

To make it easier to work with large results sets, the Cursor class allows you to control the number of results you want to retrieve at any given time. To return a single result (as a tuple), we use the Cursor method fetchone() and to return n results, we use the Cursor method fetchmany().

Each Cursor instance contains an internal counter which is updated every time you retrieve results. When you call the fetchone()method, the Cursor instance will return a single result and then increment its internal counter by 1. This means that if you callfetchone() again, the Cursor instance will actually return the second tuple in the results set (and increment by 1 again).

The fetchmany() method takes in an integer (n) and returns the corresponding results starting from the current position. Thefetchmany() method then increments the Cursor instance's counter by n. In the following code, we return the first 2 results using thefetchone() method, then the next 5 results using the fetchmany() method.

 

first_result = cursor.fetchone()

second_result = cursor.fetchone()

next_five_results = cursor.fetchmany(5)

Instructions

  • Write and run a query that returns the Major andMajor_category columns fromrecent_grads.
  • Then, fetch the first 5 results and store it as five_results

import sqlite3
conn = sqlite3.connect("jobs.db")
cursor = conn.cursor()
query="select major,Major_category from recent_grads;"
cursor.execute(query)
five_results=cursor.fetchmany(5)

 

conn

Connection (<class 'sqlite3.Connection'>)

<sqlite3.Connection at 0x7f3fbc6b8f10>

 five_results

list (<class 'list'>)

[('PETROLEUM ENGINEERING', 'Engineering'), ('MINING AND MINERAL ENGINEERING', 'Engineering'), ('METALLURGICAL ENGINEERING', 'Engineering'), ('NAVAL ARCHITECTURE AND MARINE ENGINEERING', 'Engineering'), ('CHEMICAL ENGINEERING', 'Engineering')]

 query

str (<class 'str'>)

'select major,Major_category from recent_grads;'

 cursor

Cursor (<class 'sqlite3.Cursor'>)

<sqlite3.Cursor at 0x7f3fbc8e5b20>

 

9: Closing The Connection

Since SQLite restricts access to the database file when we're connected to a database, we need to close the connection when we're done working with it. Closing the connection to the database allows other processes to access the database, which is important when you're in a production environment and working with other team members. In addition, if we made any changes to the database, they are automatically saved and our changes are persisted in the database file upon closing.

To close a connection to a database, use the Connection instance method close(). When you're working with multiple databases and multiple Connection instances, you want to make sure you call the close() method on the correct instance. After closing the connection, attempting to query the database using any linked Cursor instances will return the following error:

 

ProgrammingError: Cannot operate on a closed database.

Instructions

  • Close the connection to the database using the Connection instance method close()metho

 

conn = sqlite3.connect("jobs.db")
conn.close()

 

10: Practice

Let's now practice the entire workflow we've learned so far from start to finish.

Instructions

  • Connect to the databasejobs2.db, which contains the same data as jobs.db.
  • Write and execute a query that returns all of the major names (Major) in reverse alphabetical order (Z to A).
  • Assign the full result set toreverse_alphabetical.
  • Finally, close the connection to the database


conn = sqlite3.connect("jobs2.db")
query = "select Major from recent_grads order by Major desc;"
reverse_alphabetical = conn.cursor().execute(query).fetchall()

conn.close()

 

11: Next Steps

Next up in this course is a guided project where we walk through how to use the SQLite shell. The SQLite shell is similar to the IPython shell where you can write and run commands interactively.

 

转载于:https://my.oschina.net/Bettyty/blog/747364

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值