Importing Modules

In this section we are looking at importing modules from global repository using pip and locally made libraries.

Understanding Pip and Importing in Python

What is `pip`?

pip is the package installer for Python. It allows you to install and manage additional libraries and dependencies that are not included in the Python standard library. To use pip, you typically run commands in the terminal.

Installing Packages with `pip`

To install a Python package using pip, you can use the following command in your terminal or command prompt:

Terminal

pip install <package_name>

For example, to install the requests library, you would run:

Terminal

pip install requests

To install a specific version of a package, use == followed by the version number, like so:

Terminal

pip install pandas==1.3.5

How Importing Works in Python

When you use import in Python, the interpreter searches for the module in the following order:

Current Working Directory: It first checks if the module exists in the directory where the Python script is located.
Standard Library: Next, it checks if the module is part of Python's built-in standard library.
Installed Packages: Finally, it looks in the site-packages directory, where packages installed by pip are stored.

Types of Imports

Standard Imports: Import the entire module. For example:
Code
```
import math
```
Selective Imports: Import specific functions or classes from a module. For example:
Code
```
from math import sqrt, pi
```
Aliased Imports: Create a shorthand name for a module or function. For example:
Code
```
import pandas as pd
```

Exercise 1: Built-in Modules

Instructions:

Open a new Python file and name it exercise1.py.
Import the math module.
Use the math.sqrt() function to find the square root of 16.
Use math.pi to print the value of Pi.
Save and run the script.

Supressed code.. [7 lines]

import math

# Finding square root
print("Square root of 16:", math.sqrt(16))

# Printing the value of Pi
print("Value of Pi:", math.pi)

Exercise 2: Using Aliases with Modules

Instructions:

Open a new Python file named exercise2.py.
Import the datetime module as dt.
Use dt.datetime.now() to print the current date and time.
Save and run the script.

Supressed code.. [4 lines]

import datetime as dt

# Printing current date and time
print("Current Date and Time:", dt.datetime.now())

Exercise 3: Installing and Using External Modules

Instructions:

Open a new Python file and name it exercise3.py.
Use pip to install the requests module by running the following command in your terminal or command prompt:
Terminal
```
pip install requests
```
In exercise3.py, import the requests module and use it to get data from a URL (for example, https://api.github.com). Print the status code of the response.
Save and run the script.

Supressed code.. [5 lines]

# Make sure you've run `pip install requests` in your terminal before running this code
import requests

response = requests.get("https://api.github.com")
print("Status Code:", response.status_code)

Exercise 4: Selective Imports and Installing Specific Versions

Instructions:

Install pandas and matplotlib libraries with specific versions.
- In your terminal, run the following commands to install specific versions:
  Terminal
```
pip install pandas==1.3.5
pip install matplotlib==3.5.1
```
  Do you have an error?
  - You should always check if the package version exists... you will likely have an error.
  - Check https://pypi.org/
Open a new Python file named exercise4.py.
From pandas, import only DataFrame and Series as df and ser.
Print the installed versions of pandas and matplotlib to verify them.
Save and run the script.

Supressed code.. [6 lines]

from pandas import DataFrame as df, Series as ser
import matplotlib

# Verifying and printing versions
print("Pandas version:", df.__module__.split('.')[0])
print("Matplotlib version:", matplotlib.__version__)

Exercise 5: Basic Data Analysis with Pandas

Instructions:

Create a new Python file and name it exercise5.py.
Import pandas as pd.

Create a small DataFrame using the following data:

Code

data = {
    "Name": ["Aisha", "Luis", "Chen", "Amara"],
    "Age": [24, 27, 22, 32],
    "City": ["New York", "San Francisco", "Beijing", "Nairobi"]
}

Print the DataFrame and display summary statistics using .describe().
Save and run the script.

Supressed code.. [13 lines]

import pandas as pd

data = {
    "Name": ["Aisha", "Luis", "Chen", "Amara"],
    "Age": [24, 27, 22, 32],
    "City": ["New York", "San Francisco", "Beijing", "Nairobi"]
}

df = pd.DataFrame(data)
print("DataFrame:")
print(df)

print("\nSummary Statistics:")
print(df.describe())

Exercise 6: Data Visualization with Matplotlib

Instructions:

Create a new Python file named exercise6.py.
Import matplotlib.pyplot as plt.
Use the following data to create a bar chart showing the ages of individuals:
Code
```
names = ["Aisha", "Luis", "Chen", "Amara"]
ages = [24, 27, 22, 32]
```
Label the axes as "Names" and "Ages" and add a title "Ages of Individuals".
Display the plot using plt.show().
Save and run the script.

Supressed code.. [9 lines]

import matplotlib.pyplot as plt

names = ["Aisha", "Luis", "Chen", "Amara"]
ages = [24, 27, 22, 32]

plt.bar(names, ages)
plt.xlabel("Names")
plt.ylabel("Ages")
plt.title("Ages of Individuals")
plt.show()

Using a `requirements.txt` File

A requirements.txt file lists all the packages your project depends on, along with their versions. This file is helpful for sharing your project and ensuring others install the exact package versions you used.

To create a requirements.txt file with the installed packages and their versions, run:

Terminal

pip freeze > requirements.txt

The command pip freeze > requirements.txt is used to generate a requirements.txt file listing all currently installed Python packages along with their versions. Here’s how it works:

pip freeze:
- This part of the command outputs a list of installed packages in your Python environment, with each package version pinned to the specific version currently installed.
- For example, the output might look like this:
  Output
```
pandas==1.3.5
requests==2.26.0
matplotlib==3.5.1
```
- The == notation ensures that the exact versions are captured, which is crucial for creating a reproducible environment.
> (Redirect Operator):
- The > symbol redirects the output of pip freeze to a file instead of displaying it in the terminal.
- Here, we use > to create or overwrite a file called requirements.txt with the output from pip freeze.
Creating requirements.txt:
- The command pip freeze > requirements.txt will save all installed packages and their versions into requirements.txt.

Why Use `requirements.txt`?

Reproducibility: Using a requirements.txt file allows others to install the same package versions, ensuring the code runs the same way.
Easy Setup: To install all packages listed in a requirements.txt file, simply use:
Terminal
```
pip install -r requirements.txt
```
This will output a file listing all dependencies, which can then be installed with:
Terminal
```
pip install -r requirements.txt
```

Exercise 7: Creating a `requirements.txt` File

Instructions:

Install the requests and pandas packages if you haven't already, using:
Terminal
```
pip install requests pandas==1.3.5
```
In your terminal, create a requirements.txt file with the following command:
Terminal
```
pip freeze > requirements.txt
```
Open the requirements.txt file to verify that requests and pandas==1.3.5 are listed.

Explanation

The requirements.txt file should now include requests and pandas==1.3.5, specifying that exact version. This file allows others to replicate your environment with compatible package versions.

Exercise 8: Installing from a `requirements.txt` File

Instructions:

To simulate using the requirements.txt file in a new environment, you can uninstall an installed package, like so:
Terminal
```
pip uninstall requests
```
Use the requirements.txt file to reinstall the necessary packages by running:
Note
```
pip install -r requirements.txt
```
Verify the installation by running a script that imports requests and pandas, ensuring they are correctly installed.
Supressed code.. [3 lines]
```
import requests
import pandas as pd

print("Requests and Pandas installed and working!")
```
This setup ensures that anyone using your code can quickly set up a matching environment by using pip install -r requirements.txt.

Exercise 9: Creating a Data Science Program Using a Provided Dataset with a Requirements File

The following dataset includes 700 entries and 11 columns, covering various user behavior metrics such as "App Usage Time," "Screen On Time," "Battery Drain," "Data Usage," and demographic data ("Age" and "Gender").

Info

Download the dataset here:
- user_behavior_dataset.csv
Documentation for each package used:
- pandas: https://pandas.pydata.org/docs/
- numpy: https://numpy.org/doc/
- matplotlib: https://matplotlib.org/stable/index.html
- seaborn: https://seaborn.pydata.org/

We are going to load this dataset and analyze key metrics like "App Usage Time (min/day)" and "Screen On Time (hours/day)".
Group the data by "Operating System" and "User Behavior Class" to perform a comparative analysis.
Visualize the distribution of "App Usage Time" by Batter Drain or behavior class using histograms or box plots.

In this exercise, you’ll work with a provided dataset and specify dependencies using a requirements.txt file for easy environment setup.

Instructions:

Create a requirements.txt file: Open your terminal in the project directory and use the following command to specify versions and create requirements.txt:

Terminal

echo "pandas==1.3.5" >> requirements.txt
echo "numpy==2.1.0" >> requirements.txt
echo "matplotlib==3.5.1" >> requirements.txt
echo "seaborn==0.11.2" >> requirements.txt

Install from requirements.txt: Once created, install dependencies directly from requirements.txt by running:
Terminal
```
pip install -r requirements.txt
```
Write the analysis program: In a Python file (exercise9.py), write a program that:
- Loads the dataset user_behavior_dataset.csv.
- Displays a summary of the dataset and calculates statistics on "App Usage Time (min/day)" and "Screen On Time (hours/day)".
- Compares "App Usage Time (min/day)" across "Gender" and visualizes the distribution using a box plot.

Supressed code.. [58 lines]

# Import necessary libraries for data analysis and visualization
import pandas as pd  # For data manipulation
import numpy as np  # For numerical operations
import matplotlib.pyplot as plt  # For plotting
import seaborn as sns  # For enhanced data visualization
from matplotlib.offsetbox import OffsetImage, AnnotationBbox  # For annotations on plots (if needed)

# Load the dataset
data = pd.read_csv('./user_behavior_dataset.csv')

# Display the first few rows of the dataset
print(data.head())

# Display the shape of the dataset (rows, columns)
print("Dataset shape:", data.shape)

# Show information about the dataset, including column types and non-null values
data.info()

# Display descriptive statistics for numeric columns
print(data.describe())

# Check for missing values in the dataset
print("Missing values per column:\\n", data.isnull().sum())

# Plot 1: Distribution of App Usage Time
plt.figure(figsize=(10, 6))
sns.histplot(data['App Usage Time (min/day)'], kde=True, color="blue")
plt.title('Distribution of App Usage Time (min/day)')
plt.xlabel('App Usage Time (min/day)')
plt.ylabel('Frequency')

# Plot 2: Battery Drain vs Data Usage by User Behavior Class
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Battery Drain (mAh/day)', y='Data Usage (MB/day)', hue='User Behavior Class', data=data, palette='plasma')
plt.title('Battery Drain vs Data Usage by User Behavior Class')
plt.xlabel('Battery Drain (mAh/day)')
plt.ylabel('Data Usage (MB/day)')
plt.legend(title='User Behavior Class')

# Plot 3: Distribution of Battery Drain for Android vs iOS
plt.figure(figsize=(10, 6))
sns.kdeplot(data=data, x='Battery Drain (mAh/day)', hue='Operating System', palette='coolwarm', fill=True)
plt.title('Distribution of Battery Drain for Android vs iOS')
plt.xlabel('Battery Drain (mAh/day)')
plt.ylabel('Density')

# Plot 4: Number of Apps Installed by User Behavior Class
plt.figure(figsize=(10, 6))
sns.boxplot(x='User Behavior Class', y='Number of Apps Installed', data=data, palette='Set2')
plt.title('Number of Apps Installed by User Behavior Class')
plt.xlabel('User Behavior Class')
plt.ylabel('Number of Apps Installed')

# Plot 5: Pairplot of selected behavioral metrics
sns.pairplot(data[['App Usage Time (min/day)', 'Screen On Time (hours/day)', 'Battery Drain (mAh/day)', 'Data Usage (MB/day)', 'User Behavior Class']], hue='User Behavior Class', palette='husl')

# Display all plots
plt.show()

Keyboard shortcuts

Programming for Engineers