Querying SQLite from Python

最新推荐文章于 2024-09-27 11:30:45 发布

weixin_33795743

最新推荐文章于 2024-09-27 11:30:45 发布

阅读量90

点赞数

文章标签：数据库 python shell

原文链接：https://my.oschina.net/Bettyty/blog/747364

版权

2019独角兽企业重金招聘Python工程师标准>>>

In the past missions, we focused on exploring the SQL syntax for retrieving data from a database. In this mission, we'll explore how to interact with a SQLite database in Python so you can start to incorporate databases in your data science workflow. SQLite is a database that doesn't require a standalone server process and stores the entire database as a file on disk. This makes it ideal for working with larger datasets that can fit on disk but not in memory. Since the Pandas library loads the entire dataset we're working with into memory, this makes SQLite a compelling alternative option for working with datasets that are larger than 8 gigabytes (which is roughly the amount of memory modern computers contain). In addition, since the entire database can be contained within a single file, some datasets are released online as a SQLite database file (using the extension .db).

You can interact with a SQLite database in two main ways:

using the Sqlite Python module.
using the SQLite shell.

In this mission, we'll focus on learning how to use the Sqlite Python module to interact with the database. Next in this course is a guided project where we explore how to use the SQLite shell to interact with the database.

2: Introduction To The Data

We'll continue to work with the dataset from the American Community Survey on college majors and job outcomes:

Rank	Major_code	Major	Major_category	Total	Sample_size	Men	Women	ShareWomen	Employed
1	2419	PETROLEUM ENGINEERING	Engineering	2339	36	2057	282	0.120564	1976
2	2416	MINING AND MINERAL ENGINEERING	Engineering	756	7	679	77	0.101852	640
3	2415	METALLURGICAL ENGINEERING	Engineering	856	3	725	131	0.153037	648
4	2417	NAVAL ARCHITECTURE AND MARINE ENGINEERING	Engineering	1258	16	1123	135	0.107313	758
5	2405	CHEMICAL ENGINEERING	Engineering	32260	289	21239	11021	0.341631	25694

The full table has many more columns, 21 to be specific, than the ones displayed above and they're explained in further detail onFiveThirtyEight's Github repo.

Here are the descriptions of the columns in the above snapshot:

Rank - Rank by median earnings
Major_code - Major code
Major - Major description
Major_category - Category of major
Total - Total number of people with major
Sample_size - Sample size (unweighted) of full-time
Men - Male graduates
Women - Female graduates
ShareWomen - Women as share of total
Employed - Number employed

We have loaded the data in for years 2010-2012 for just recent college grads into the table recent_grads. The database file we'll be working with is called jobs.db.

3: Connect To The Database

From Python 2.5 and onwards, the Sqlite module has come built-in to the Python language, which means we don't need to install any separate libraries to get started. Specifically, we'll be working with the Sqlite3 Python module, which was developed to work with SQLite version 3.

We can import it into our environment using:

import sqlite3

Once imported, we connect to the database we want to query using the connect() function. The connect() function has a single required parameter, the database we want to connect to. Since the database we're working with is stored as a file on disk, we need to pass in the filename. The connect() function returns a Connection instance, which maintains the connection to the database we want to work with. When you're connected to a database, SQLite locks the database file and prevents any other process from connecting to the database simultaneously. This was a design decision made by the SQLite team to keep the database lightweight and avoid the complexity that arises when multiple processes are interacting with the same database.

Instructions

Import the Sqlite3 library into the environment.
Then, use the Sqlite3 functionconnect() to connect tojobs.db and assign the returned Connection instance to conn

import sqlite3
conn=sqlite3.connect("jobs.db")

4: Cursor Object And Tuples

Before we can execute a query, we need to express our SQL query as a string. While we use the Connection class to represent the database we're working with, we use the Cursor class to:

run a query against the database.
parse the results from the database.
convert the results to native Python objects.
store the results within the Cursor instance as a local variable.

After running a query and converting the results to a list of tuples, the Cursor instance stores the list as a local variable. Before diving into the syntax of querying the database, let's learn more about tuples.

5: Tuples

A tuple is a core Python data structure used to represent a sequence of values, similar to a list. Unlike lists, tuples are immutable, which means they can't be modified after creation. Each row is in the results set is represented as a tuple.

To create an empty tuple, assign a pair of empty parentheses to a variable:

t = ()

Tuples are indexed the same way as lists, from 0 to n-1, and you access values using bracket notation.

t = ('Apple', 'Banana')

apple = t[0]

banana = t[1]

Tuples are faster than lists, which is helpful when working with larger databases and larger results sets. Let's now dive into how to use the Cursor instance to query the database.

6: Running A Query

We need to use the Connection instance method cursor() to return a Cursor instance corresponding to the database we want to query.

cursor = conn.cursor()

In the following code block, we:

write a basic select query that returns all of the values from the recent_grads table and store it as a string called query.
use the Cursor method execute() to run the query against our database.
return the full results set and store it as results.
print the first 3 tuples in the list results.

# SQL Query as a string

query = "select * from recent_grads;"

# Execute the query, convert the results to tuples, and store as a local variable.

cursor.execute(query)

# Fetch the full results set, as a list of tuples.

results = cursor.fetchall()

# Display the first 3 results.

print(results[0:3])

Now it's your turn!

Instructions

Write a query that returns all of the values in the Major column from the recent_grads table.
Store the full results set (a list of tuples) in majors.
Then, print the first 3 tuples inmajors

import sqlite3
conn = sqlite3.connect("jobs.db")
cursor = conn.cursor()

query = "select * from recent_grads;"
cursor.execute(query)
results = cursor.fetchall()
print(results[0:2])
query = "select major from recent_grads;"
majors = cursor.execute(query).fetchall()
print(majors[0:3])

7: Shortcut For Running A Query

So far, we've been running queries by creating a Cursor instance and then calling the execute method on the instance. The sqlite library actually allows us to skip creating a Cursor altogether by using the execute method within the Connection object itself. Under the hood, a Cursor instance will be created for us and our query run against the database, but this shortcut allows us to skip a step. Here's how that code looks like:

conn = sqlite3.connect("jobs.db")

query = "select * from recent_grads;"

conn.execute(query).fetchall()

In the above code, we didn't explicitly create a separate Cursor instance ourselves. Let's now learn how to fetch a specific number of results after a query is run.

8: Fetching A Specific Number Of Results

To make it easier to work with large results sets, the Cursor class allows you to control the number of results you want to retrieve at any given time. To return a single result (as a tuple), we use the Cursor method fetchone() and to return n results, we use the Cursor method fetchmany().

Each Cursor instance contains an internal counter which is updated every time you retrieve results. When you call the fetchone()method, the Cursor instance will return a single result and then increment its internal counter by 1. This means that if you callfetchone() again, the Cursor instance will actually return the second tuple in the results set (and increment by 1 again).

The fetchmany() method takes in an integer (n) and returns the corresponding results starting from the current position. Thefetchmany() method then increments the Cursor instance's counter by n. In the following code, we return the first 2 results using thefetchone() method, then the next 5 results using the fetchmany() method.

first_result = cursor.fetchone()

second_result = cursor.fetchone()

next_five_results = cursor.fetchmany(5)

Instructions

Write and run a query that returns the Major andMajor_category columns fromrecent_grads.
Then, fetch the first 5 results and store it as five_results

import sqlite3
conn = sqlite3.connect("jobs.db")
cursor = conn.cursor()
query="select major,Major_category from recent_grads;"
cursor.execute(query)
five_results=cursor.fetchmany(5)

conn

Connection (<class 'sqlite3.Connection'>)

<sqlite3.Connection at 0x7f3fbc6b8f10>

five_results

list (<class 'list'>)

[('PETROLEUM ENGINEERING', 'Engineering'), ('MINING AND MINERAL ENGINEERING', 'Engineering'), ('METALLURGICAL ENGINEERING', 'Engineering'), ('NAVAL ARCHITECTURE AND MARINE ENGINEERING', 'Engineering'), ('CHEMICAL ENGINEERING', 'Engineering')]

query

str (<class 'str'>)

'select major,Major_category from recent_grads;'

cursor

Cursor (<class 'sqlite3.Cursor'>)

<sqlite3.Cursor at 0x7f3fbc8e5b20>

9: Closing The Connection

Since SQLite restricts access to the database file when we're connected to a database, we need to close the connection when we're done working with it. Closing the connection to the database allows other processes to access the database, which is important when you're in a production environment and working with other team members. In addition, if we made any changes to the database, they are automatically saved and our changes are persisted in the database file upon closing.

To close a connection to a database, use the Connection instance method close(). When you're working with multiple databases and multiple Connection instances, you want to make sure you call the close() method on the correct instance. After closing the connection, attempting to query the database using any linked Cursor instances will return the following error:

ProgrammingError: Cannot operate on a closed database.

Instructions

Close the connection to the database using the Connection instance method close()metho

conn = sqlite3.connect("jobs.db")
conn.close()

10: Practice

Let's now practice the entire workflow we've learned so far from start to finish.

Instructions

Connect to the databasejobs2.db, which contains the same data as jobs.db.
Write and execute a query that returns all of the major names (Major) in reverse alphabetical order (Z to A).
Assign the full result set toreverse_alphabetical.
Finally, close the connection to the database

conn = sqlite3.connect("jobs2.db")
query = "select Major from recent_grads order by Major desc;"
reverse_alphabetical = conn.cursor().execute(query).fetchall()

conn.close()

11: Next Steps

Next up in this course is a guided project where we walk through how to use the SQLite shell. The SQLite shell is similar to the IPython shell where you can write and run commands interactively.

转载于:https://my.oschina.net/Bettyty/blog/747364