Python

Python common links

Advantages of Python

Concise and easy-to-learn syntax

Python's syntax is simple and close to natural language. Even novice programmers can get started quickly, lowering the learning threshold.

Rich standard library and third-party resources

Python provides a wide range of standard function libraries, covering many aspects such as networking, data processing, and graphical interfaces. In addition, the powerful third-party ecosystem such as NumPy, Pandas, and TensorFlow makes Python a versatile development tool.

Cross-platform features

Python is a cross-platform language. Whether it is Windows, macOS or Linux, the same Python program can be executed, which greatly improves the flexibility of development.

Widely used in many fields

Python plays an important role in multiple fields such as data science, artificial intelligence, web development, automation scripting, game development, etc., allowing developers to handle multiple needs with one language.

Active community support

Python has a large global community, and both beginners and experienced developers can easily find educational resources, discussion groups, and technical support.

Efficient development speed

Python provides intuitive syntax and powerful tools, allowing developers to implement programming faster and shorten the product development cycle.

Python development environment

Anaconda

What is Anaconda?

Anaconda is an open source Python and R programming platform designed for scientific computing, including applications such as data science, machine learning, artificial intelligence, and big data analytics.

Main functions

Integrated environment: Provides easy-to-use integrated development environments (IDEs) such as Jupyter Notebook and Spyder.
Package management: Easily install, update, and manage thousands of open source libraries using the Conda management tool.
Cross-platform support: Supports Windows, macOS and Linux.
Data visualization: There are many built-in tools to help with data visualization, such as Matplotlib and Seaborn.

Suitable for objects

Anaconda is suitable for users in the following fields:

data scientist
Machine Learning Engineer
artificial intelligence researcher
Professionals who need to perform big data analysis

How to install Anaconda?

accessAnaconda official website。
Select the appropriate operating system version and download the corresponding installation file.
Follow the installation wizard to complete the installation and configure environment variables (optional).

FAQ

The following are problems commonly encountered by users:

How to upgrade Anaconda?

Can be entered in the terminalconda update condaandconda update anaconda。

What is the difference between Anaconda and Python?

Anaconda is a data science platform that integrates Python, a programming language, with a variety of built-in tools and libraries.

Anaconda environment

What is the Anaconda environment

Anaconda provides a virtual environment (Environment) function that allows users to create multiple independent Python execution environments on the same computer. Each environment can have different Python versions and packages to avoid dependency conflicts between different projects.

Establish environment

# Create an environment named myenv and specify the Python version
conda create -n myenv python=3.10

Starting and switching environments

# Start environment
conda activate myenv

#Exit the environment
conda deactivate

View environment

# List all environments
conda env list
# or
conda info --envs

Export and restore environments

# Export environment configuration to YAML file
conda env export > environment.yml

# Create environment from YAML file
conda env create -f environment.yml

Delete environment

# Delete the specified environment
conda remove -n myenv --all

Jupyter

What is Jupyter?

Jupyter is an open source interactive computing environment that supports multiple programming languages and is mainly used for data science, machine learning, and academic research.

Jupyter

Core features

Multi-language support: including Python, R, Julia and more than 40 languages.
Interactive programming: Write programs through Jupyter Notebook, execute the code and view the results in real time.
Data visualization: built-in support for graphical tools, such as Matplotlib, Plotly, etc.
Sharing capabilities: Notebooks can be exported to HTML, PDF or shared via Jupyter Hub.

Main components

Jupyter Notebook：A web-based editor that allows users to write and run code.
JupyterLab：Next-generation user interface, providing greater functionality and scalability.
Jupyter Hub：A multi-user Jupyter platform for teams or academic institutions.

Application scope

Jupyter is widely used in the following fields:

Data Science and Analytics
Machine learning and deep learning model development
Academic Research and Education
Data visualization and reporting

How to use Jupyter?

Install Anaconda or install Jupyter standalone.
Enter in the terminaljupyter notebookStart Jupyter Notebook.
Enter the editing interface through the browser to create and run notebooks.

Advantages and Challenges

Advantages:Real-time execution, result visualization, and multi-language support.
challenge:There is limited support for large projects, and execution efficiency may not be as good as traditional IDEs.

VS Code Python development environment

Install Visual Studio Code

Go toVisual Studio Code official website, download and install the version appropriate for your operating system.

Install Python extension

In Visual Studio Code, install the Python extension by following these steps:

Click the extension icon on the left.
Search for "Python".
Select the Python extension provided by Microsoft and click "Install".

Install Python

Make sure Python is installed on your system. Can be obtained fromPython official websiteDownload and install.

After the installation is complete, enter the following command in the command line to confirm that the installation is successful:

python --version
# or
python3 --version

Setting up the Python interpreter

Open your Python project or file, click the "Python" status bar in the lower right corner of Visual Studio Code, and select the appropriate Python interpreter.

Execute Python program

Open the Python file in the editor and execute the program using the following method:

Right-click the file content and select "Run Python File in Terminal".
Or use shortcut keysCtrl + Shift + P, search for "Run Python File" and execute it.

Install necessary packages

If you need to install a third-party package, you can use the built-in terminal to enter:

pip install package name

Enable autocompletion and debugging

Enjoy auto-completion and powerful debugging tools through the functionality provided by Python extensions:

Click the debug icon on the left.
Select "Create a launch.json file" and select Python.
After the settings are completed, press F5 to enable debug mode.

Commonly used shortcut keys

Here are some commonly used shortcut keys:

Execution program: Ctrl + F5
Open the command panel: Ctrl + Shift + P
Formatting code: Shift + Alt + F
Start the terminal: Ctrl + `

VS Code

vscode/Vidual Studio Code

VS Code sets Python execution parameters

Modify launch.json

If you need to pass parameters when executing a Python program, you can setlaunch.jsonFinish:

Click the "Run and Debug" icon on the left.
Click "create a launch.json file" or "Add Configuration".
Select "Python" as the environment.
in generatedlaunch.jsonModify relevant settings in the file.

Set program and args parameters

The following is an example configuration, including the program path and execution parameters:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Run with Arguments",
            "type": "python",
            "request": "launch",
            "program": "${workspaceFolder}/main.py", // program path
            "console": "integratedTerminal", // terminal type
            "args": ["arg1", "arg2", "--option", "value"] // Pass parameters
        }
    ]
}

The purpose of args

existargsCommand line parameters can be passed in, for example:

arg1andarg2is a positional parameter.
--optionandvalueis a parameter with options.

Read parameters in the program

usesys.argvTo read the parameters passed from the command line:

importsys

print("All parameters:", sys.argv)
if len(sys.argv) > 1:
    print("First parameter:", sys.argv[1])
    print("Second parameter:", sys.argv[2])

Execution example

Suppose the program is:

python main.py arg1 arg2 --option value

Execution result:

All parameters: ['main.py', 'arg1', 'arg2', '--option', 'value']
First parameter: arg1
Second parameter: arg2

VS Code Python Debug mode

Enable Debug mode

1. InstallationPython ExtensionExtension.

2. Start your Python project in VS Code.

3. PressF5Or click on the activity bar on the leftDebugicon.

Setting launch.json

1. Click "Add Configuration" in the Debug panel.

2. SelectPython, the system will automatically generate alaunch.json。

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal"
        }
    ]
}

Set breakpoint

1. Click next to the code line number to add a break point.

2. Conditional break points can be used: right-click the break point and select "Edit Condition".

Debugging function

Step Over:according toF10Skip function.
Step Into:according toF11Enter inside the function.
Step Out:according toShift+F11Jump out of the function.

Check variables

1. View the current variable status in the "Variables" area of the Debug panel.

2. You can manually add specific expressions in the "Monitor" area.

Using the Debug Console

1. Enter Python commands in the Debug Console to check the program status in real time.

2. Can perform operations such as variable query and function calling.

VS Code sets Python path

Step 1: Install Python and VS Code

Make sure Python is installed and added to your system environment variables, then download and install Visual Studio Code.

Step 2: Install the Python extension

Open Visual Studio Code, click the Extensions icon on the left, and searchPython, and then install the Python extensions provided by Microsoft.

Step 3: Check Python installation path

Enter the following command in the terminal to confirm the installation path of Python:

which python

Or (Windows system):

where python

Step 4: Set Python path

In Visual Studio Code, pressCtrl + Shift + P, enter and selectPython: Select Interpreter。

Select the correct Python path in the manifest. If it doesn't appear, enter the full path manually.

Step 5: Confirm settings

Open the terminal and executepython --versionto confirm that the correct version of the Python interpreter is selected.

Additional information

If you need the Python path of a specific project, you can add it in the project root directory.vscode/settings.jsonfile and add the following content:

{
  "python.pythonPath": "Your full path to Python"
}

replaceYour Python full pathis the actual path.

VS Code Python environment switching

When developing in VS Code, I decide to use Anaconda or other environments, mainly through the underlying interpreter (Interpreter) settings. This ensures that your package dependencies are completely isolated from the program execution environment.

1. Steps to choose a Python environment

Click the environment name in the lower right corner:In the lower right corner of the VS Code window, you will see text similar to Python 3.x.x or environment name. Click directly to open the menu.
Use the command palette:pressCtrl + Shift + P(Windows) orCmd + Shift + P(Mac), enter and selectPython: Select Interpreter。
Select the target environment in the list:
- To use Anaconda, select the option labeledCondaThe path of the words.
- To use a standard virtual environment, select the option labeledVenvproject.

2. How to decide to use Anaconda or other environments

environment type	Applicable situations	Main advantages
Anaconda	Data Science, Machine Learning, Deep Learning	It is pre-installed with a large number of scientific computing libraries and has strong management of underlying binary files (such as DLL).
Venv	General web development, automation scripts	Lightweight, fast to start, and contains only the necessary packages for execution.

3. Manage and switch environments in Conda

In addition to the graphical interface, you can also use commands in the built-in terminal of VS Code to manage the environment:

See all environments: conda env list
Manually activate the environment: conda activate environment name
Create a new environment: conda create -n name python=3.10

4. Frequently Asked Questions and Solutions

Conda environment not found:If the list is not displayed, please confirm whether Anaconda has added system environment variables, or click "Enter interpreter path" to manually specifypython.exepath.
The terminal does not activate automatically:Search in VS Code settingspython.terminal.activateEnvironment, and make sure this option is checked.
Permission error:If you cannot switch on Windows, please execute VS Code as an administrator, or execute it in PowerShellSet-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser。

Execute Python on your Chromebook

Method 1: Using Linux (Crostini)

Turn on "Linux (Beta)" or "Developer → Linux Development Environment" in Chromebook settings.

Start a Linux terminal and enter:


sudo apt update
sudo apt install python3 python3-pip -y

enterpython3Enter the Python interactive environment.
To install additional modules, availablepip3 install module name。

Method 2: Use an online compiler

Online Python compilers can be used directly through the browser, for example:
- Google Colab
- Replit
- Jupyter Notebook on Binder
Suitable for environments that don't want to be installed or for temporary testing of code.

Method 3: Install Android App

Chromebooks support the installation of Android Apps, which can be downloaded through Google Play:
- Pydroid 3: A complete Python development environment.
- QPython:Lightweight Python scripting tool.

Method 4: Use VS Code

Install VS Code in Linux (Crostini) environment:


sudo apt install wget gpg -y
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > packages.microsoft.gpg
sudo install -o root -g root -m 644 packages.microsoft.gpg /usr/share/keyrings/
sudo sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/packages.microsoft.gpg] \
https://packages.microsoft.com/repos/code stable main" > /etc/apt/sources.list.d/vscode.list'
sudo apt update
sudo apt install code -y

Install the Python plug-in in VS Code to develop programs.

in conclusion

The fastest way is to useLinux (Crostini)Install Python.
If you do not want to touch the system, you can chooseonline environmentorAndroid App。
Advanced development suggestionsVS CodeorJupyter Notebook。

pip usage guide

1. What is pip?

definition:pip is Python's package management tool, its full name is "Pip Installs Packages".
use:A suite for installing, updating, removing, and managing Python.
source:pip will download packages from the official Python package library PyPI (Python Package Index).

2. Basic operations of pip

Check version:
```
pip --version
```
Installation kit:
```
pip install package name
```
For example:pip install requests
Upgrade kit:
```
pip install --upgrade package name
```
For example:pip install --upgrade requests
List installed packages:
```
pip list
```
Uninstall the package:
```
pip uninstall package name
```
For example:pip uninstall requests
Check for expired packages:
```
pip list --outdated
```

3. Advanced functions

Install a specific version:
```
pip install package name==version number
```
For example:pip install requests==2.26.0
Install multiple packages:
```
pip install -r requirements.txt
```
userequirements.txtFile, listing all required package names and versions.
Generate requirements file:
```
pip freeze > requirements.txt
```
This command will export the packages and versions in the current environment as a requirements archive.
Clear cache:
```
pip cache purge
```
Clean up pip's cache folder to free up disk space.

4. Frequently Asked Questions and Solutions

Problem: pip command not found
Solution: Check whether Python has been installed, and confirm that the Python installation directory has been added to the system's PATH environment variable.
Problem: Permissions error encountered while installing package
Solution: try adding--userParameters, for example:pip install package name --user
Problem: Unable to connect to PyPI
Solution: Confirm whether the network is normal, or try to use a domestic mirror source, for example:pip install package name -i https://pypi.tuna.tsinghua.edu.cn/simple

5. Best practices for pip

Use a virtual environment such asvenvorvirtualenv) to avoid conflicts with global installation packages.
Regularly check and update installed packages to maintain security.
Used in projectsrequirements.txtto manage dependencies.

pip/cache/http-v2 folder

1. What is the pip/cache/http-v2 folder?

pip：Python package management tool for installing, updating and managing Python packages.
cache：The cache directory of pip is used to store downloaded packages and related resources for easy reuse.
http-v2：Cached HTTP request data, specifically for download requests based on the HTTP/2 protocol.

2. Purpose of http-v2 folder

Caching downloaded packages:When installing a package, pip downloads the package's archive and caches it inhttp-v2directory so that future installations of the same package do not need to be downloaded again.
Speed up the installation process:By reusing files in the cache, network requests can be reduced and installation speed can be increased.
Save bandwidth:Avoid re-downloading the package from the Internet each time you install it, reducing bandwidth consumption.

3. Management of http-v2 folders

View cache location:
- Use instructionspip cache dirView the current cache directory.
Clear cache:
- Use instructionspip cache purgeClear all cached data.
- Only delete specific cached data: you can delete it directlyhttp-v2folder, but this will cause pip to re-download the file the next time it installs the relevant package.
Disable caching:
- If you do not want to use caching, you can add it when installing the package--no-cache-dirParameters, for example:pip install package_name --no-cache-dir。

4. Precautions

Cache directory size:Long-term use of pip may cause the cache folder to become larger. It is recommended to clean it regularly to free up disk space.
Offline installation:Cached packages can be installed offline without an Internet connection.
Version update:Clearing the cache may require some packages to be re-downloaded, but will ensure that you are downloading the latest version.

Python basic data structure

Check variable type

Use type()

type()The function can return the type of the object.


x = 10
print(type(x))   # <class 'int'>

y = "hello"
print(type(y))   # <class 'str'>

Use isinstance()

isinstance()Used to check whether a variable belongs to a certain type and supports multi-type checking.


x = 10

print(isinstance(x, int))         # True
print(isinstance(x, str))         # False
print(isinstance(x, (int, float))) # True

difference

type()Relatively strict, only the exact type will be returned.
isinstance()Inheritance relationships will be considered and are more commonly used in practice.

Python Boolean data type

Boolean in Python is the basic data type in programming that represents two logical states: True or False. It is the basis for all conditional judgments and process control.

1. Boolean values and types

There are only two Bollinger values:TrueandFalse。
They must be written with the first letter capitalized.
The type of Boolean value isbool。

a = True
b = False
print(type(a)) # Output: <class 'bool'>

2. The relationship between Bollinger and numbers

Inside Python,boolis an integerintsubcategories of , so they can participate in numerical operations:

TrueNumerically equivalent to 1.
FalseNumerically equivalent to 0.

print(True + 1) # Output: 2 (1 + 1)
print(False * 5) # Output: 0 (0 * 5)

3. Core use: comparison operator

Comparison operators compare two values and return a Boolean result.

operator	describe	example	result
`==`	equal	`10 == 10`	`True`
`!=`	not equal to	`5 != 10`	`True`
`>`	greater than	`10 > 5`	`True`
`<=`	less than or equal to	`5 <= 5`	`True`

4. Logical operators

Logical operators are used to combine or modify Boolean values or Boolean expressions.

and: Logical "and", both sides areTrueThe result isTrue。
or: Logical "OR", either side isTrueThe result isTrue。
not: Logical "not", inverting the Boolean value.

print(True and False) # Output: False
print(True or False) # Output: True
print(not True) # Output: False

5. Truth Value Testing

In Python, all objects have a truth value. When an object is used for conditional judgment, Python will convert it toTrueorFalse：

regarded asFalse(Falsy) objects include:None, the number 0 (0, 0.0), empty string (""), empty list ([]), empty tuple (()), empty dictionary ({}) and the empty set (set())。
regarded asTrue(Truthy) objects include: all other objects except Falsy values (such as non-zero numbers, any non-empty container).

if []: # Empty lists are treated as False
    print("This will not be executed")
else:
    print("The list is empty")

if "hello": # Non-empty strings are considered True
    print("String is not empty")

Python array

What is an array?

In Python, an array is a data structure used to store multiple elements of the same type. Although Python itself does not have a built-in array type, you can uselistorarrayModules to achieve similar functionality.

Use List as Array

listIt is Python's built-in data structure that can store many types of data, but can also be used to simulate arrays.

my_list = [1, 2, 3, 4, 5]
print(my_list[0]) # Output: 1

Using the array module

If you need a true array (all elements must be of the same type) you can usearrayMods.

import array

my_array = array.array('i', [1, 2, 3, 4, 5])
print(my_array[0]) # Output: 1

here,'i'Indicates that the elements in the array are integers.

Basic operations of array module

Here are some basic operations:

New elements:my_array.append(6)
Remove elements:my_array.remove(3)
Get the length:len(my_array)

Arrays and NumPy

For situations where numerical operations are required,numpyProvides more powerful array support.

import numpy as np

my_numpy_array = np.array([1, 2, 3, 4, 5])
print(my_numpy_array[0]) # Output: 1

NumPy arrays support multidimensional data and vectorized operations, making them ideal for processing large amounts of data.

in conclusion

Python provides many ways to implement array functions,listApplicable to general situations,arrayModules are suitable for situations where the same type of elements is required, whereasnumpyIt is the tool of choice for scientific computing.

Determine whether the list is empty

for variablesret_value = [], there are several simple and efficient ways to determine whether it is an empty list in the Python language. Since an empty list is treated asFalse, the following are several commonly used judgment methods.

1. Use Boolean evaluation (the most recommended Pythonic way)

This is the most popular method, checking the boolean value of a list directly. If the list is empty,notoperator will make its condition evaluate to true (True）。

ret_value = []

if not ret_value:
    #Execute here when the list is empty
    print("ret_value is an empty list")
else:
    #Execute here when the list is not empty
    print("ret_value is not an empty list")

2. Use`len()`function

Checks whether the length of the list is equal to 0.

ret_value = []

if len(ret_value) == 0:
    print("ret_value is an empty list")
else:
    print("ret_value is not an empty list")

3. Use equivalent comparisons`== []`

Directly assign variables to an empty list literal[]Make a comparison.

ret_value = []

if ret_value == []:
    print("ret_value is an empty list")
else:
    print("ret_value is not an empty list")

Dynamic increase and decrease of array

Using Lists for Dynamic Operations

In Python,listIt is a dynamic data structure that can easily add and remove elements.

Add new element

New elements can be added using the following methods:

append(): Adds an element to the end of the list.
insert(): Insert an element at the specified index position.
extend(): Appends the elements of another list to an existing list.

# Example of new element
my_list = [1, 2, 3]
my_list.append(4) # [1, 2, 3, 4]
my_list.insert(1, 10) # [1, 10, 2, 3, 4]
my_list.extend([5, 6]) # [1, 10, 2, 3, 4, 5, 6]

Remove element

Elements can be removed using:

pop(): Remove elements based on index, default to remove the last element.
remove(): Removes the first matching element based on value.
clear(): Remove all elements from the list.

# Remove element example
my_list = [1, 2, 3, 4, 5]
my_list.pop() # [1, 2, 3, 4]
my_list.remove(2) # [1, 3, 4]
my_list.clear() # []

Dynamic operations using the array module

For cases where you need elements of the same type, you can usearrayMods.

Add new element

append()andextend()Method applies toarrayMods.

import array
my_array = array.array('i', [1, 2, 3])
my_array.append(4)  # [1, 2, 3, 4]
my_array.extend([5, 6])  # [1, 2, 3, 4, 5, 6]

Remove element

remove()andpop()Methods can be usedarrayMods.

# Remove element example
my_array = array.array('i', [1, 2, 3, 4])
my_array.remove(2) # [1, 3, 4]
my_array.pop() # [1, 3]

in conclusion

Python provides a variety of methods to dynamically increase and decrease arrays.listandarrayThe modules are suitable for different needs. For more functional requirements, you can also consider usingnumpy。

Python array counting

In Python,list.count()It is a tool specially used for statisticsThe number of occurrences of a specific elementmethod. It is the same as getting the length of the listlen()Different, a parameter must be passed in as the comparison object.

1. Basic syntax of list.count()

The syntax islist.count(value), which returns the integer number of times the value appears in the list. If the value does not exist, 0 is returned.

fruits = ['apple', 'banana', 'apple', 'orange', 'apple']

# Count the number of occurrences of 'apple'
apple_count = fruits.count('apple')
print(apple_count) # Output: 3

# Count non-existent elements
grape_count = fruits.count('grape')
print(grape_count) # Output: 0

2. Comparison of common counting requirements and tools

Depending on your needs (whether you are looking for a single element, multiple elements, or all statistics), there are different best practices:

need	Recommended method	Sample code
Count a single specific element	`list.count()`	`arr.count(10)`
Get the total length of the list	`len()`	`len(arr)`
Count the frequency of all elements	`collections.Counter`	`Counter(arr)`
Condition count (eg greater than 5)	generator expression	`sum(1 for x in arr if x > 5)`

3. Advanced tools: collections.Counter

If you need to know how many times "all elements" appear in the list at once, useCounterWill run many timescount()Much more efficient (O(n) vs O(n^2)).

from collections import Counter

data = [1, 2, 2, 3, 3, 3, 4]
counts = Counter(data)

print(counts) # Output: Counter({3: 3, 2: 2, 1: 1, 4: 1})
print(counts[3]) # Get the number of times 3: 3
print(counts.most_common(1)) # Get the element with the most occurrences

4. Easily confusing usage reminders

Don't use count to check for existence:If you just want to confirm whether an element is in the list, useif x in listCompareif list.count(x) > 0Much faster, becauseinIt will stop as soon as it finds the first one.
Nested List: count()Only "shallow comparison" will be performed. If the list contains another list, it will not count inside the sublist.
Performance Note:Repeating calls to the same long list in a large looplist.count()It will lead to a sharp drop in performance, so priority should be given toCounter。

5. Practical skills: conditional counting

If you want to count the number of elements that conform to a certain logic (for example, a string with a length greater than 3), you can combinesum()：

words = ['hi', 'hello', 'python', 'a', 'code']

# Count the number of words with length > 3
big_word_count = sum(1 for w in words if len(w) > 3)
print(big_word_count) # Output: 3

Merge 2D lists

Requirements description

Two 2D lists, each element is[key, value]。
Corresponds to the first element (key).
If the keys are the same, the second elements on both sides will be merged into the second and third columns.
The order is not necessarily the same.

Sample program

list1 = [
    ["a", 1],
    ["b", 2],
    ["c", 3]
]

list2 = [
    ["c", 30],
    ["a", 10],
    ["d", 40]
]

#Convert list1 to dict
dict1 = {k: v for k, v in list1}
dict2 = {k: v for k, v in list2}

# Find all keys
all_keys = sorted(set(dict1.keys()) | set(dict2.keys()))

# Merge results
merged = []
for k in all_keys:
    v1 = dict1.get(k)
    v2 = dict2.get(k)
    merged.append([k, v1, v2])

for row in merged:
    print(row)

Output results


['a', 1, 10]
['b', 2, None]
['c', 3, 30]
['d', None, 40]

in conclusion

Quickly compare keys using a dictionary.
Use the value that does not existNonefill.
The result is a three-column structure:[key, value1, value2]。

Loop two lists at the same time

In Python, if you have two lists (e.g.selected_fields_listandaSqlValuesStr_list), and need to iterate over them simultaneously, the most common and Pythonic way is to use the built-inzip()function.

1. Use zip() function

zip()The function packs multiple iterable objects (Iterable) into a sequence of tuples. Each iteration of the loop will remove the element at the corresponding position from each list.

import re

# Assume these are the two lists you get after re.split()
selected_fields = "name,age,city"
aSqlValuesStr = "Alice,25,Taipei"

selected_fields_list = re.split(r',', selected_fields)
aSqlValuesStr_list = re.split(r',', aSqlValuesStr)

# Execute for loop
for field, value in zip(selected_fields_list, aSqlValuesStr_list):
    print(f"Field: {field}, value: {value}")

2. Sample output

Field: name, value: Alice
Field: age, value: 25
Field: city, value: Taipei

3. Important Notes on zip()

When you usezip()When using functions, you need to pay attention to the following points:

List lengths are inconsistent: zip()Iteration will stop with the shortest list. ifselected_fields_listhas 3 elements, andaSqlValuesStr_listWith 4 elements, the loop will only execute 3 times.
Need to deal with length inconsistencies:If you need to ensure that the lengths of two lists match exactly, or need the longer list to prevail, you should useitertools.zip_longest。

#Example: List lengths are inconsistent
list1 = [1, 2, 3]
list2 = ['a', 'b']

for item1, item2 in zip(list1, list2):
    print(item1, item2)
#The output will only be:
# 1a
#2b

Align display of 2D array

Align fields using format

data = [
    [1, 23, 456],
    [7890, 12, 3],
    [45, 678, 9]
]

# Calculate the maximum width of each column
col_widths = [max(len(str(row[i])) for row in data) for i in range(len(data[0]))]

# Format output
for row in data:
    print(" ".join(str(val).rjust(col_widths[i]) for i, val in enumerate(row)))

Output results


   1   23  456
7890   12    3
  45  678    9

Using the tabulate module

Third-party packagestabulateAligned tables can be output directly.


from tabulate import tabulate

data = [
    [1, 23, 456],
    [7890, 12, 3],
    [45, 678, 9]
]

print(tabulate(data, tablefmt="grid"))

Output results


+------+-----+-----+
|    1 |  23 | 456 |
| 7890 |  12 |   3 |
|   45 | 678 |   9 |
+------+-----+-----+

in conclusion

Built-in method: calculate column width first, then userjust()Alignment.
External kit: usetabulateQuickly generate beautiful tables.

Aligned output of two-dimensional array floating point numbers

illustrate

The following example demonstrates how to print a two-dimensional list. When the element isfloatFixed display of two decimal places, other elements are output as they are, while maintaining field alignment.

Program example

data = [
    [1, 23.456, 456],
    [7890.1, 12, 3.5],
    [45, 678.9, 9]
]

# Format each element string
formatted_data = []
for row in data:
    new_row = []
    for val in row:
        if isinstance(val, float):
            s = f"{val:.2f}" # float -> two decimal places
        else:
            s = str(val)
        new_row.append(s)
    formatted_data.append(new_row)

# Calculate the maximum width of each column
col_widths = [max(len(row[i]) for row in formatted_data) for i in range(len(formatted_data[0]))]

# Format output
for row in formatted_data:
    print(" ".join(val.rjust(col_widths[i]) for i, val in enumerate(row)))

Output results


    1  23.46  456
7890.10     12  3.50
   45  678.90     9

focus

useisinstance(val, float)Determine floating point numbers.
throughf"{val:.2f}"Format a floating point number to two decimal places.
Convert all elements to strings first, then calculate the column width to ensure column alignment.

Extract all values from dictionary list

The data structure you provide is a list containing multiple dictionaries, where each dictionary represents a time series record (such as K-line data).

data = [
    {'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
    {'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'},
    # ...
]

If you want to get all values in all records (1759028400000, '109398.3', '109364.8'etc.) and collect them in a list, you can use Nested List Comprehension.

1. Extract all values into a single list (flattening)

Use a two-level list comprehension to iterate through each dictionary in the list, and then call.values()Method gets the values and finally collects all the values into a flat list.

data = [
    {'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
    {'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'}
]

all_values = [
    value
    for record in data # Traverse each dictionary record in the external list
    for value in record.values() # Traverse all values in the dictionary record
]

print(all_values)

2. Output results

This results in a single list containing all values:

[1759028400000, '109398.3', '109364.8', '109489.2', '109364.8', '518.7594', 1759024800000, '109305.6', '109398.3', '109496.4', '109296.0', '757.0290']

3. Extract the value of a specific field

If you only need to extract specific fields (e.g. just allopenprice), you can use a single-level list comprehension:

data = [
    {'time': 1759028400000, 'open': '109398.3', ...},
    {'time': 1759024800000, 'open': '109305.6', ...}
]

open_prices = [record['open'] for record in data]

print(open_prices)

Output result:

['109398.3', '109305.6']

Extract the value of a specific key from a dictionary list

1. Use List Comprehension (Recommended) ✅

List comprehensions are a concise, one-line syntax for creating new lists from existing lists.

Python implementation example

from datetime import datetime

data_list = [
    {'videoid': 'b5HxsaM_E2Y', 'publishedat': datetime(2025, 12, 7, 3, 0, 53), 'rankno': 7, 'viewcount': 913053, 'query': 'baseball'},
    {'videoid': 'FEbMCBxsoWI', 'publishedat': datetime(2025, 11, 25, 5, 28, 6), 'rankno': 13, 'viewcount': 754598, 'query': 'baseball'},
    {'videoid': 'nOJUI0PGB68', 'publishedat': datetime(2025, 12, 7, 3, 7, 46), 'rankno': 14, 'viewcount': 748349, 'query': 'baseball'},
    {'videoid': 'uMHXIudw_w0', 'publishedat': datetime(2025, 12, 2, 10, 1, 38), 'rankno': 8, 'viewcount': 687949, 'query': 'baseball'}
]

target_key = 'videoid'

# Use list comprehension: for each dictionary item in data_list, take out item[target_key]
video_ids = [item[target_key] for item in data_list]

print(f"Extracted key: {target_key}")
print("All videoid values:")
print(video_ids)

Output results

['b5HxsaM_E2Y', 'FEbMCBxsoWI', 'nOJUI0PGB68', 'uMHXIudw_w0']

2. Handle the situation where the key may not exist (safe access)

If some dictionaries in the list may be missing the target key (e.g. some dictionaries don't have the `videoid` key), using `item[target_key]` directly will throw a `KeyError` error. You can use the dictionary's `.get()` method or a conditional to handle it safely.

Python implementation example (secure access)

data_with_missing_key = [
    {'videoid': 'A1', 'query': 'football'},
    {'query': 'basketball'}, # Missing 'videoid' key
    {'videoid': 'C3', 'query': 'Volleyball'}
]

target_key = 'videoid'

# Option A: Use .get() to set the default value to None (or any other value)
safe_video_ids_A = [item.get(target_key) for item in data_with_missing_key]
# Output: ['A1', None, 'C3']

# Option B: Extract only values with that key
safe_video_ids_B = [item[target_key] for item in data_with_missing_key if target_key in item]
# Output: ['A1', 'C3']

print(f"\nSafe extraction results (Plan B): {safe_video_ids_B}")

Convert dict values to list

you fromonesymbollist.values()The result obtained is a Python dictionary view objectdict_values. Although it looks like a list, it is a dynamic view and not a standard list that can be indexed or modified.

dict_values([1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523'])

Do not showdict_values()To get the content of a tag, the simplest and most common way is to use the built-inlist()The function casts it to a list.

1. Convert to list (list)

Willdict_valuespassed tolist()Function that will immediately copy all elements in the view object to a new standard list.

# Assume this is the dict_values view object you obtained
dict_values_object = your_dictionary.values() # Assume onesymbollist is a dictionary

# Step: Use list() function to convert
result_list = list(dict_values_object)

print(result_list)

Output results

[1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523']

2. Convert to tuple

If your goal is to create an immutable sequence, you can usetuple()The function converts it into a tuple.

result_tuple = tuple(dict_values_object)

print(result_tuple)

Output results

(1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523')

3. Direct iteration (no conversion required)

Note that if you just want to process the values one by one in a loop, you don't need to explicitly convert it to a list.dict_valuesItself is an iterable object (Iterable).

# Assume this is the dict_values view object you obtained
dict_values_object = your_dictionary.values()

print("Output elements one by one:")
for value in dict_values_object:
    print(value)

In summary, uselist(your_dict.values())is the most common way to get a clean, actionable list of values.

Convert dictionary values to comma separated string

to start fromdict_valuesExtracts all values from the view object and combines them into a single string separated only by commas and spaces (for example:1763510400000, '0.00015218', ...), you need to combine the following steps:

get from dictionarydict_valuesView object.
Converts all elements in this view object to string types.
Using Python", ".join()method to concatenate these strings.

1. Python implementation

Assume your original dictionary is namedonesymbol_dict, and you need to convert non-string values (such as timestamps) to strings to use.join()Make a connection.

onesymbol_dict = {
    'time': 1763510400000,
    'open': '0.00015218',
    'close': '0.00015336',
    'high': '0.00015415',
    'low': '0.00015067',
    'volume': '1634523'
}

# Step 1 & 2: Get values and convert all values to strings
# Use list comprehensions to ensure all elements are strings
values_as_strings = [str(v) for v in onesymbol_dict.values()]

# Step 3: Use ', '.join() to join these strings
values_string = ", ".join(values_as_strings)

print(values_string)

2. Output results

This will produce a single string whose content is only the value, without any outer brackets ordict_valuesLabel:

1763510400000, 0.00015218, 0.00015336, 0.00015415, 0.00015067, 1634523

3. Notes on string quotation marks

Note that in the example output you expect, values other than timestamps have single quotes:1763510400000, '0.00015218', ...。

If your goal is to have string values retain single quotes in the final output, you will need to add quotes manually during concatenation. This is usually required when preparing strings for SQL statements or specific formats.

# Extra step: Handle quotes manually, assuming non-integer values require quotes
quoted_values = []
for v in onesymbol_dict.values():
    if isinstance(v, (str, float)) or (isinstance(v, int) and v< 1000000000000): # 假設小數字串需要引號
        quoted_values.append(f"'{v}'")
    else:
        quoted_values.append(str(v))

final_quoted_string = ", ".join(quoted_values)

print(final_quoted_string)

This will give you the format you expect (assumingtimeNo quotes required, other numeric strings do):

1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523'

Determine whether a string is a number

Use str.isdigit()

isdigit()Method can be used to check whether a string contains only numeric characters.

# Example
string = "12345"
if string.isdigit():
    print("is a number")
else:
    print("Not a number")

Notice:isdigit()Cannot handle decimal points or negative signs.

Use str.replace() to handle decimals

If you need to check a string with a decimal point, you can remove the decimal point before usingisdigit()。

# Example
string = "123.45"
if string.replace(".", "").isdigit():
    print("is a number")
else:
    print("Not a number")

This method does not work with negative numbers.

Convert to number using try-except

The most common approach is to try to convert the string to a float or integer and catch an exception if the conversion fails.

# Example
string = "-123.45"
try:
    float(string) # You can use int(string) instead to check for integers
    print("is a number")
except ValueError:
    print("Not a number")

Use regular expressions

Regular expressions can match numbers exactly, including integers, decimals, and negative numbers.

# Example
import re

string = "-123.45"
pattern = r"^-?\d+(\.\d+)?$"
if re.match(pattern, string):
    print("is a number")
else:
    print("Not a number")

in conclusion

For simple cases, useisdigit(). For more complex cases (like dealing with decimals or negative numbers) it is recommended to usetry-exceptor regular expression.

Python f-string

f-string (formatted string literal) is a powerful and efficient string formatting method introduced in Python 3.6. It provides a concise and highly readable way to embed the values of variables and expressions into strings.

The core feature of f-string is: using prefixes at the beginning of the stringforF, and use braces{}to contain what is to be calculated and displayed.

1. Basic usage: Embed variables

You can put any variable name directly inside curly braces.

name = "Alice"
age=30
message = f"Hello, my name is {name} and I am {age} years old."
# Output: Hello, my name is Alice and I am 30 years old.

2. Embedded expressions

The power of f-string is that you can put any valid Python expression inside the curly braces and it will be evaluated at runtime.

price = 19.99
tax_rate = 0.05
total = price * (1 + tax_rate)

# Perform calculations within f-string
result = f"The total price including tax is: {price * (1 + tax_rate):.2f} yuan."
# Output: The total price including tax is: 20.99 yuan.

# Call function
def get_status():
    return "OK"

status_msg = f"System status: {get_status()}"
# Output: System status: OK

3. String formatting and alignment

f-string supports and.format()Same method as Format Specifier Mini-Language, using colon:to separate expressions and format specifiers.

format code	use	example	output
`:.2f`	Floating point number with two decimal places	`f"{3.14159:.2f}"`	`3.14`
`: <10`	Align left, width 10	`f"{'Name':<10}"`	`Name`
`: >10`	Align right, width 10	`f"{'Value':>10}"`	`Value`
`:^10`	Center aligned, width 10	`f"{'Hi':^10}"`	`Hi`
`:,`	Numeric thousands separator	`f"{1000000:,}"`	`1,000,000`

4. Debug F-strings

Starting with Python 3.8, f-string introduces a convenient debugging feature that allows you to add an equal sign after the variable.=, automatically displays variable names and their values.

user_id = 42
is_active = True

debug_output = f"User ID is {user_id=}, Status: {is_active=}"
# Output: User ID is user_id=42, Status: is_active=True

5. Precautions

Quotation: Expressions within an f-string can use a different quoting type than the outer string. If single quotes are used outside'...', you can use double quotes inside"...",vice versa.
Braces: If you need to display braces for a literal value inside an f-string, use double braces{{}}。
backslash: backslash within f-string\It is not allowed to be used to represent string formats, such asf"{{'\n'}}"will cause errors.

f-string does not display floating point decimal points

1. Solution 1: Integer conversion within f-string (recommended)

Braces in f-string{}Internally, you can directly useint()The function converts the variable back to an integer type. This is the clearest and most direct method.

Python implementation example

inta = 12 # assuming raw integer
float_a = inta * 2.0 # The result is 24.0 (floating point number)

# Convert floating point number back to integer directly in f-string
result_str = f"abc def {int(float_a)}"

print(f"Original value (float_a): {float_a}")
print(f"Formatted result: {result_str}")

2. Option 2: Use f-string format specifier (Format Specifier)

You can use the integer format codes from the formatting mini-languagedor.0fto control the output format.

A. Use integer format codes`:d`

use:dPython is asked to treat the value as an integer when displayed. if variableais a floating point number, Python will automatically round it to the nearest integer before displaying it (if your calculation results in $24.0$, it will display $24$).

float_a = 24.0
result_d = f"abc def {float_a:d}"
# Output: abc def 24

B. Use floating point format code`:.0f`

use:.0fmeans format the value as a floating point number, but require 0 digits after the decimal point. This will also cause the results to be displayed rounded.

float_a = 24.0
result_0f = f"abc def {float_a:.0f}"
# Output: abc def 24

Summary and suggestions

If your purpose is to ensure that the result is an absolute integer without a decimal point, it is recommended to use option 1:

str = f"abc def {int(a)}"

Python re.split function

In Python,re.split()The function isreA powerful tool in the (regular expression) module, used to split strings according to the delimiters (patterns) defined in regular expressions and return the results as a list (list).

1. Function syntax

re.split(pattern, string, maxsplit=0, flags=0)

pattern: Regular expression pattern used to split strings.
string: The target string to be cut.
maxsplit(Optional): Specify the maximum number of cuts. The default value 0 means no limit on the number of cuts.
flags(optional): Flags that control the behavior of regular expressions (e.g.re.IGNORECASE）。

2. Basic usage

Use regular expressions to define multiple or complex delimiters.

import re

text = "apple,banana;orange-grape"
# Use comma, semicolon or hyphen as delimiter
result = re.split(r'[;,-]', text)

print(result)
# Output: ['apple', 'banana', 'orange', 'grape']

3. Handle multiple spaces (a common use)

with standard stringssplit()different,re.split()Multiple consecutive delimiters (e.g. multiple spaces) can be easily handled and ignored.

text = "Word1 Word2 Word3"
# Use \s+ to match one or more whitespace characters as delimiters
result = re.split(r'\s+', text)

print(result)
# Output: ['Word1', 'Word2', 'Word3']

4. Use maxsplit to limit the number of cuts

If setmaxsplit, the cutting operation will only be performed a specified number of times, and the remaining parts will be retained in the list as the last element.

text = "one:two:three:four"
# Only cut once
result = re.split(r':', text, maxsplit=1)

print(result)
# Output: ['one', 'two:three:four']

5. Keep delimiters

If you place the delimiter pattern inside the brackets(), the delimiter itself will also be included between the elements of the resulting list.

text = "2025-01-15"
# Put hyphens in parentheses so they are preserved
result = re.split(r'(-)', text)

print(result)
# Output: ['2025', '-', '01', '-', '15']

Determine the beginning of a string startswith

use

startswith()is a Python string (str) object method, used to determine whether the string begins with the specified substring. If it matches, returnTrue; Otherwise returnFalse。

grammar


str.startswith(prefix[, start[, end]])

Parameter description

prefix: The starting string to be compared can be a single string or a tuple of strings.
start(Optional): Specify which index position of the original string to start judging from.
end(Optional): Specify the index position (excluding end).

return value

Boolean value: If the string starts with the specified prefix, returnTrue,otherwiseFalse。

example

text = "Python Programming"

#Basic usage
print(text.startswith("Py")) # True
print(text.startswith("Java")) # False

#Specify range
print(text.startswith("thon", 2)) # True (starting at index 2 is "thon")

# multiple alignments
print(text.startswith(("Py", "Java", "C"))) # True, because any matching

# Not case sensitive (can be converted to lowercase first)
print(text.lower().startswith("py")) # True

Common applications

File name or URL judgment:

url = "https://openai.com"
if url.startswith("https://"):
    print("Secure connection URL")

Filter list:


files = ["data1.csv", "data2.csv", "readme.txt"]
csv_files = [f for f in files if f.startswith("data")]
print(csv_files)  # ['data1.csv', 'data2.csv']

extend

To determine whether a string "ends" with a certain text, you can useendswith()methods, syntax andstartswith()same.

filename = "report.pdf"
if filename.endswith(".pdf"):
    print("This is a PDF file")

Delete the last character of the string

illustrate

In Python, strings are immutable objects. To remove the last character, string slicing is usually used to create a new string.

example

text = "Hello!"

#Method 1: Use slicing
new_text = text[:-1]
print(new_text) #Output: Hello

#Method 2: Use rstrip() to remove specific ending characters
text2 = "Hello!!!"
new_text2 = text2.rstrip("!")
print(new_text2) #Output: Hello

#Method 3: Make sure it is not empty and then delete the last character
if text:
    text = text[:-1]
print(text)

Output results


Hello
Hello
Hello

illustrate

text[:-1]The substring from the beginning to the second to last character will be taken out.
rstrip(chars)The specified characters at the end (not limited to one) can be removed.
Before slicing, you can check whether the string is empty to avoid value errors.

extend

To remove the leading characters, use:


text = text[1:]

Summarize

The easiest way to remove the last character iss[:-1]。
To remove specific trailing characters, it is recommended to userstrip()。

Find the content before the substring in the string

Problem description

Given a stringstr1, we hope to find instrAorstrBThe part that comes before. For example:

str1 = "Hello World, this is a test. Stop here or continue."
strA = "Stop"
strB = "continue"

The goal is to obtain"Hello World, this is a test. "。

Use re.split()

re.split()You can split a string based on multiple keywords and take the first part:

import re

def get_substring_before(text, strA, strB):
    result = re.split(f"{re.escape(strA)}|{re.escape(strB)}", text, maxsplit=1)[0]
    return result

str1 = "Hello World, this is a test. Stop here or continue."
strA = "Stop"
strB = "continue"

print(get_substring_before(str1, strA, strB))  # "Hello World, this is a test. "

Use re.search()

re.search()can be used to matchstrAorstrB, and get the content before matching:

import re

def get_substring_before(text, strA, strB):
    match = re.search(f"{re.escape(strA)}|{re.escape(strB)}", text)
    return text[:match.start()] if match else text

str1 = "Hello World, this is a test. Stop here or continue."
print(get_substring_before(str1, "Stop", "continue"))  # "Hello World, this is a test. "

Use find() method

find()You can manually search for the earlieststrAorstrB, and then retrieve the corresponding part:

def get_substring_before(text, strA, strB):
    indexA = text.find(strA)
    indexB = text.find(strB)
    
    indices = [i for i in [indexA, indexB] if i != -1]
    first_index = min(indices, default=len(text))
    
    return text[:first_index]

str1 = "Hello World, this is a test. Stop here or continue."
print(get_substring_before(str1, "Stop", "continue"))  # "Hello World, this is a test. "

in conclusion

usere.split()The simplest, suitable for single search.
usere.search()Provide more flexible regular matching.
usefind()The method is the most efficient and suitable for simple string processing.

Concatenate multiple values into a string

Use join (you need to convert everything into strings first)

values = ["str1", "str2", 123, "str3", 456]

s = ", ".join(str(v) for v in values)
print(s) # Output: str1, str2, 123, str3, 456

Automatically handle mixed strings and integers

def join_values(*args, sep=", "):
    return sep.join(str(v) for v in args)

print(join_values("str1", "str2", 88, "str3"))
# Output: str1, str2, 88, str3

If the source is dict, sort by key and then concatenate

data = {
    "str1": "hello",
    "str2": "world",
    "int1": 123,
    "str3": "ok"
}

# Sort by key and then join
s = ", ".join(str(data[k]) for k in sorted(data.keys()))
print(s) # hello, world, 123, ok

To join key + value together

s = ", ".join(f"{k}={v}" for k, v in data.items())
print(s)
# Output: str1=hello, str2=world, int1=123, str3=ok

The source is a list of dict, each dict is joined into one column

rows = [
    {"str1": "A", "int1": 10},
    {"str1": "B", "int1": 20},
]

for row in rows:
    print(", ".join(str(v) for v in row.values()))
# A, 10
# B, 20

Universal function: switch between "with quotes" or "without quotes" mode

def join_values(values, sep=", ", quoted=False):
    if quoted:
        # Use full single quotes ‘ ’
        return sep.join(f"‘{v}’" for v in values)
    else:
        return sep.join(str(v) for v in values)

values = ["str1", "str2", 88, "str3"]

print(join_values(values, quoted=False))
# Output: str1, str2, 88, str3

print(join_values(values, quoted=True))
# Output: ‘str1’, ‘str2’, ‘88’, ‘str3’

Can support *args input version

def join_args(*args, sep=", ", quoted=False):
    if quoted:
        return sep.join(f"‘{v}’" for v in args)
    return sep.join(str(v) for v in args)

print(join_args("str1", "str2", 88, "str3", quoted=True))
# ‘str1’, ‘str2’, ‘88’, ‘str3’

Support dict (only output value)

data = {"str1": "hello", "str2": "world", "int1": 123}

print(join_values(data.values(), quoted=True))
# ‘hello’, ‘world’, ‘123’

Support key=value format

def join_key_value(d, sep=", ", quoted=False):
    if quoted:
        return sep.join(f"{k}=‘{v}’" for k, v in d.items())
    return sep.join(f"{k}={v}" for k, v in d.items())

print(join_key_value(data, quoted=True))
# str1=‘hello’, str2=‘world’, int1=‘123’

re.match()

Pythonre.matchIs a function in the regular expression module, used to match from the beginning of the string. If the match is successful, return aMatchobject; otherwise returnNone。

grammar

re.match(pattern, string, flags=0)

Parameter description:

pattern: Regular expression to match.
string: The string to be checked.
flags: Optional parameter used to modify matching behavior, for examplere.IGNORECASE。

Common properties and methods

group(n):Return to the firstncaptured subgroups,n=0Return the entire match.
start(): Returns the starting position of the match.
end(): Returns the end position of the match.
span(): Returns the matching range (start, end).

Usage examples

import re

#Define a string
text = "123 Hello World!"

# Use re.match to match numbers from the beginning
match = re.match(r"(\d+)\s+(.*)", text)

if match:
    print(f"Entire matching result: {match.group(0)}") # 123 Hello World!
    print(f"Number part: {match.group(1)}") # 123
    print(f"Text part: {match.group(2)}") # Hello World!
else:
    print("match failed")

Output results

The entire matching result: 123 Hello World!
Numeric part: 123
Text part: Hello World!

Things to note

re.matchOnly match from the beginning of the string, if the beginning does not match, returnNone。
If you need to search globally, usere.searchorre.findall。

regular expression

Regular Expression (Regex for short) is a syntax used to describe string matching rules. It is often used to search, replace or verify strings. in PythonreIn the module,patternThat's the core part of defining these rules.

basic syntax elements

Character class：
- \d: Matches any number (0-9).
- \D: Matches anything that is not a number.
- \w: Matches any single character (letter, number, underscore).
- \W: Matches any non-single character.
- \s: Matches any whitespace character (space, tab, etc.).
- \S: Matches any non-whitespace character.
- .: Matches except newline characters (\n) except any single character.
quantifier：
- *: Matches the previous expression 0 or more times.
- +: Matches the previous expression 1 or more times.
- ?: Matches the previous expression 0 or 1 times.
- {n}: Matches the previous expression exactly n times.
- {n,}: Matches the previous expression at least n times.
- {n,m}: Matches the previous expression at least n times but not more than m times.
border symbol：
- ^: Matches the beginning of the string.
- $: Matches the end of the string.
- \b: Matches word boundaries (such as the beginning or end of a word).
- \B: Matches non-word boundaries.
Grouping and selection：
- (...):Group, capture the contents within brackets.
- |: Logical "or", such asa|bmatchaorb。
- (?:...): Groups but does not capture content.

Advanced usage

Lookahead and Lookbehind：
- (?=...): Looking forward, the match must be followed by the specified content.
- (?!...): Negative lookaround, matching cannot be followed by specified content.
- (?<=...): Forward lookback, matching must be preceded by the specified content.
- (?<!...): Negative lookback, matching cannot be preceded by specified content.

example

import re

# Example 1: Match content starting with a number
pattern = r"^\d+"
text = "123abc"
match = re.match(pattern, text)
if match:
    print(f"Matching result: {match.group()}") # Output: 123

#Example 2: Match text after numbers
pattern = r"(\d+)\s+(.*)"
text = "123 Hello World"
match = re.match(pattern, text)
if match:
    print(f"Number part: {match.group(1)}") # Output: 123
    print(f"Text part: {match.group(2)}") # Output: Hello World

Application scenarios of regular expressions

Verify email address or phone number format.
Extract specific data from text, such as dates or URLs.
Batch replace text, such as replacing punctuation marks with spaces.

Application of re.search()

Basic usage

re.search()Used to search for the first match of a regular expression in a string and return itMatchObject, returned if there is no matchNone。

import re

text = "Hello 2024!"
match = re.search(r"\d+", text)

if match:
    print("Number found:", match.group()) # 2024

Return Match object

whenre.search()When a match is found, it returnsMatchObjects can access information through the following methods:

group(): Return the matching string
start(): matching starting index
end():End index of match
span():Return (start, end) index range

import re

text = "Python 3.10 is great!"
match = re.search(r"\d+\.\d+", text)

if match:
    print("Match content:", match.group()) # 3.10
    print("Start index:", match.start()) # 7
    print("End index:", match.end()) # 11
    print("range:", match.span()) # (7, 11)

Use group matching

through brackets()to create a group and usegroup(n)to extract the corresponding matching content.

import re

text = "John Doe, Age: 25"
match = re.search(r"(\w+) (\w+), Age: (\d+)", text)

if match:
    print("Last name:", match.group(1)) # John
    print("name:", match.group(2)) # Doe
    print("Age:", match.group(3)) # 25

Comparison with re.findall()

re.search()Only the first matching result is returned, whereasre.findall()All matching results will be returned.

import re

text = "Price: $10, Discount: $2, Tax: $1"

match = re.search(r"\$\d+", text)
print("re.search:", match.group())  # $10

matches = re.findall(r"\$\d+", text)
print("re.findall:", matches)  # ['$10', '$2', '$1']

in conclusion

re.search()Suitable for finding the first matching result, and can be used toMatchObject to get detailed information. For multiple matching results, usere.findall()。

Non-capturing groups of regular expressions

Improve matching performance

In a regular expression,(...)Matching content will be captured and stored ingroup(n), but not the capture group(?:...)It is only used for organizational structure and will not affect the group number, so matching is faster.

Avoid affecting group index

If used in regular expression()to organize matching conditions, which will affectgroup(n)number. use(?:...)This ensures that the group index remains unchanged.

import re

text = "2024-03-12"
pattern = r"(\d{4})-(?:\d{2})-(\d{2})"

match = re.search(pattern, text)
print(match.group(1))  # 2024
print(match.group(2))  # 12

Paired with OR operator

use(?:...|...)can let|The operator affects matching content but not group access.

import re

text = "bar123"
pattern = r"(?:foo|bar|baz)\d+"

match = re.search(pattern, text)
print(match.group())  # bar123

Applies to --user-data-dir parsing

When parsing Chrome parameters, use(?:...)This ensures that the matching format does not affect the group number.

import re

cmdline = '--user-data-dir="C:\\Users\\moirg\\AppData\\Local\\Google\\Chrome\\User Data"'

match = re.search(r'--user-data-dir=(?:"([^"]+)"|(\S+))', cmdline)
user_data_dir = match.group(1) or match.group(2)

print(user_data_dir)  # C:\Users\moirg\AppData\Local\Google\Chrome\User Data

in conclusion

(?:...)It can improve performance in regular expressions, avoid affecting group index, and is suitable for|Operations and specific condition matching make the code more efficient and clear.

Python datetime

Import module

import datetime

Get current time

now = datetime.datetime.now()
print(now)

Create specified time

dt = datetime.datetime(2025, 7, 2, 14, 30, 0)
print(dt)

Format time string

now = datetime.datetime.now()
formatted = now.strftime("%Y-%m-%d %H:%M:%S")
print(formatted)

Parse time string

dt_str = "2025-07-02 14:30:00"
parsed = datetime.datetime.strptime(dt_str, "%Y-%m-%d %H:%M:%S")
print(parsed)

Time addition and subtraction

now = datetime.datetime.now()
delta = datetime.timedelta(days=7)
next_week = now + delta
print(next_week)

Get today's date

today = datetime.date.today()
print(today)

Compare dates

dt1 = datetime.datetime(2025, 7, 1)
dt2 = datetime.datetime(2025, 7, 2)
print(dt1 < dt2)

Get time difference

dt1 = datetime.datetime(2025, 7, 1, 12, 0, 0)
dt2 = datetime.datetime(2025, 7, 2, 14, 30, 0)
diff = dt2 - dt1
print(diff)
print(diff.total_seconds())

Get day of week

today = datetime.date.today()
print(today.weekday()) # 0 = Monday, 6 = Sunday

datetime time zone issue

offset-naive and offset-aware datetime subtraction error

When one datetime object has no time zone (naive) and the other has a time zone (aware) and is subtracted, it will produce:

TypeError: can't subtract offset-naive and offset-aware datetimes

Check if datetime is naive or aware

from datetime import datetime

def is_aware(dt):
    return dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None

def is_naive(dt):
    return not is_aware(dt)

dt1 = datetime.now()                 # naive
dt2 = datetime.now().astimezone()    # aware

print(is_naive(dt1), is_aware(dt1))
print(is_naive(dt2), is_aware(dt2))

Solution A: Convert to aware (recommended)

Suitable for situations across time zones or when precise time calculation is required.

from datetime import datetime, timezone

sql_dt = sql_dt.replace(tzinfo=timezone.utc) # SQL data transfer aware
now_dt = datetime.now(timezone.utc) # Use aware for the current time

diff = now_dt - sql_dt
print(diff.total_seconds())

Solution B: Convert to naive (not recommended, time zone difference will be ignored)

sql_dt = sql_dt.replace(tzinfo=None)
now_dt = datetime.now()

diff = now_dt - sql_dt

Summarize

Two datetimes must both be naive or both aware to be subtracted.
It is recommended to convert all data to UTC aware to avoid time zone errors.

Calculate the standard deviation of a series

illustrate

In Python, you can use the built-in modulesstatisticsofstdev()orpstdev()Calculate the sample standard deviation or population standard deviation.

example

import statistics

ratios = [2.3, 2.8, 3.1, 2.5, 3.0]

# average
average = sum(ratios) / len(ratios)

#Sample standard deviation (n-1)
std_sample = statistics.stdev(ratios)

#Matrix standard deviation (n)
std_population = statistics.pstdev(ratios)

print(f"Average: {average:.2f}")
print(f"Sample standard deviation: {std_sample:.3f}")
print(f"Population standard deviation: {std_population:.3f}")

Output results

Average: 2.74
Sample standard deviation: 0.303
Population standard deviation: 0.271

Formula method (without using mods)

If you don't want to rely onstatisticsModule, you can calculate it yourself using mathematical formulas:

ratios = [2.3, 2.8, 3.1, 2.5, 3.0]
average = sum(ratios) / len(ratios)

# standard deviation (parent)
variance = sum((x - average) ** 2 for x in ratios) / len(ratios)
std_dev = variance ** 0.5

print(f"Standard deviation: {std_dev:.3f}")

Output results

Standard deviation: 0.271

Difference description

statistics.stdev(): sample standard deviation (divided byn-1）
statistics.pstdev(): Parent standard deviation (divided byn）
If the data is the entire matrix → usepstdev()
If the data is a sample → usestdev()

Summarize

average value:avg = sum(data) / len(data)
Sample standard deviation:statistics.stdev(data)
Parent standard deviation:statistics.pstdev(data)

Python displays ANSI color string

basic example

# ANSI color code example
print("\033[31m red text\033[0m")
print("\033[32m green text\033[0m")
print("\033[33myellow text\033[0m")
print("\033[34m blue text\033[0m")
print("\033[35m purple text\033[0m")
print("\033[36m cyan text\033[0m")
print("\033[37m white text\033[0m")

Bold and background color

print("\033[1;31m bold red text\033[0m")
print("\033[42m green background text\033[0m")

Custom color combination

# Format: \033[style; foreground color; background color m
# Style: 0=Default, 1=Bold, 4=Underline
# Foreground color: 30~37
# Background color: 40~47

print("\033[1;33;44m bold yellow text + blue background\033[0m")

Can be encapsulated into functions

def color_text(text, color_code):
    return f"\033[{color_code}m{text}\033[0m"

print(color_text("Warning!", "1;31")) # Bold red
print(color_text("Success!", "1;32")) # Bold green

Check whether the current terminal supports ANSI colors

Method 1: Check sys.stdout.isatty()

importsys

if sys.stdout.isatty():
    print("Terminal may support ANSI colors")
else:
    print("It may be a file or an output environment that does not support color")

Method 2: Use colorama (cross-platform solution)

import colorama
colorama.init()

print("\033[32mThis text should be green\033[0m")

Method 3: Actual test output

def supports_ansi():
    try:
        print("\033[31m test red\033[0m")
        return True
    except:
        return False

print("ANSI is supported" if supports_ansi() else "ANSI is not supported")

Replenish

Linux/macOS terminals usually support ANSI colors by default.
Windows 10 previously required installing additional tools or usingcolorama。
If the output is directed to a file, the color code will be displayed as text instead of color.

Python input supports autocomplete

illustrate

In Python,input()The function itself does not supportTabCompleted automatically. To achieve this function, you can combinereadlineModule that allows users to useTabDo autocompletion (similar to Bash or IPython).

basic example

import readline

# Define a list of strings that can be completed
WORDS = ['apple', 'banana', 'cherry', 'grape', 'orange', 'watermelon']

def completer(text, state):
    """Autocomplete function: compare available strings based on input prefixes"""
    options = [w for w in WORDS if w.startswith(text)]
    if state < len(options):
        return options[state]
    else:
        return None

# Enable completion function
readline.set_completer(completer)
readline.parse_and_bind('tab: complete')

# User input (supports Tab)
user_input = input("Enter the name of the fruit (you can press Tab to complete): ")
print(f"What you entered is: {user_input}")

Operating Instructions

readline.set_completer()Set auto-completion logic.
readline.parse_and_bind('tab: complete')letTabKey triggers autocompletion.
completer(text, state)：
- textis the currently entered text.
- stateRepresents the matching item (0, 1, 2...).
- If there are no more matches, returnNone。

Advanced: Dynamic completion

You can also update the completion list based on the current context or dynamic content:

import readline

def dynamic_completer(text, state):
    current_words = ['cat', 'car', 'dog', 'duck', 'deer']
    options = [w for w in current_words if w.startswith(text)]
    if state < len(options):
        return options[state]
    return None

readline.set_completer(dynamic_completer)
readline.parse_and_bind('tab: complete')

command = input("Enter the animal name: ")
print("You enter:", command)

Things to note

readlineThe module is inLinux / macOSThe above presets are available.
existWindowsOn, you need to install the package:
```
pip install pyreadline3
```
This method only works in the terminal (console) and cannot be used in a GUI or non-interactive environment.

Summarize

usereadlineModules allowinput()Supports tab completion.
The completion logic can be customized based on the input content.
Windows requires additional installationpyreadline3kit.

Python process

Python loop

for with range()

The most common loop is used to run a fixed number of times or sequence.

# from 0 to 4
for i in range(5):
    print(i) #0,1,2,3,4

#Specify the starting point, end point and step length
for i in range(2, 10, 2):
    print(i) #2,4,6,8

for with iterable objects

Directly iterate lists, strings, dictionaries, etc.

fruits = ["apple", "banana", "cherry"]

for fruit in fruits:
    print(fruit)

for ch in "hello":
    print(ch)

#Iterate over dictionary
person = {"name": "Tom", "age": 25}
for key, value in person.items():
    print(key, value)

while loop

It will always be executed when the condition is True.


count = 0
while count < 5:
    print(count)
    count += 1

break and continue

Control the loop process.

for i in range(10):
    if i == 3:
        continue # skip this time
    if i == 7:
        break # End early
    print(i)

nested loop

Go back into the circle and put it back into the circle.


for i in range(3):
    for j in range(2):
        print(f"i={i}, j={j}")

Loop matching else

for or while can be addedelse, only "normal completion (no break)" will be executed.

for i in range(5):
    print(i)
else:
    print("Loop ends normally")

List Comprehension

Concise writing method can complete the loop and generate the list in one line.


squares = [x**2 for x in range(5)]
print(squares)  # [0,1,4,9,16]

in conclusion

for in range(): Suitable for fixed times or sequences.
for in iterable object: Suitable for processing lists, strings, and dictionaries.
while: Suitable for loops with an indefinite number of times and ending according to conditions.
break / continue / else: Control the loop process.
List generation: quickly generate data in one line.

Python exception handling

basic grammatical structure

Python uses the try...except statement to intercept and handle errors that occur during program execution to prevent program crashes.

try:
    # Code that may cause exceptions
    result=10/0
except ZeroDivisionError:
    # Code executed when a specific exception occurs
    print("The divisor cannot be zero")
except Exception as e:
    #Catch all other types of exceptions
    print(f"An error occurred: {e}")
else:
    # Execute if no exception occurs in the try block
    print("operating normally")
finally:
    # Will be executed regardless of whether an exception occurs
    print("Clean resources or close files")

Block function description

try: place the monitored code snippet.
except: Define the logic for handling specific error types. You can use as to get the error message.
else: Optional block, the program code will be run only when there are no errors.
finally: optional block, usually used to close database connections or files to ensure resource release.

Catching multiple exceptions

You can use tuples in an except to handle multiple error types at the same time.

try:
    # Perform operations
    pass
except (ValueError, TypeError):
    print("The entered data type or value is incorrect")

Actively throw an exception

Use the raise keyword to manually trigger exceptions based on logical requirements.

age = -1
if age < 0:
    raise ValueError("年齡數值不可為負數")

best practices

Try to catch specific error types (such as FileNotFoundError) rather than generic Exceptions.
Keep the code in the try block simple and only include a few lines that may cause errors.
Use finally or with statements to process file reading and writing to ensure that files can still be closed correctly when a program exception occurs.

Python categories

1. Basic category concepts

Python classes are structures used to encapsulate data and behavior. Categories are used to create objects, which are instances of categories. For example:

classMyClass:
    def __init__(self, value):
        self.value = value

    def display(self):
        print(f"Value: {self.value}")

obj = MyClass(10)
obj.display() # Output: Value: 10

2. Static Method

Static methods are defined using the `@staticmethod` decorator and have nothing to do with categories and objects. They cannot access category properties or object properties. Applies to some instrumental functions:

classMyClass:
    @staticmethod
    def add(a, b):
        return a + b

result = MyClass.add(5, 3)
print(result) # Output: 8

3. Class Method

Class methods are defined using the `@classmethod` decorator, the first parameter is the class itself (usually named `cls`), and class attributes can be accessed:

classMyClass:
    count = 0

    @classmethod
    def increment_count(cls):
        cls.count += 1

MyClass.increment_count()
print(MyClass.count) # Output: 1

4. Inheritance and polytype

Python supports class inheritance. Subclasses can inherit the attributes and methods of the parent class and override the parent class methods:

class Parent:
    def greet(self):
        print("Hello from Parent!")

class Child(Parent):
    def greet(self):
        print("Hello from Child!")

obj = Child()
obj.greet() # Output: Hello from Child!

5. Category attributes and object attributes

Category attributes belong to the entire category and are shared by all objects; object attributes belong to each object:

classMyClass:
    class_attr = "I am a class attribute"

    def __init__(self, value):
        self.instance_attr = value

obj1 = MyClass(10)
obj2 = MyClass(20)

print(MyClass.class_attr) # Output: I am a class attribute
print(obj1.instance_attr) # Output: 10
print(obj2.instance_attr) # Output: 20

6. Use object as base class

All classes in Python inherit from `object` by default, which is a built-in base class that provides some basic methods, such as `__str__` and `__eq__`:

class MyClass(object):
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return f"MyClass with value {self.value}"

obj = MyClass(5)
print(obj) # Output: MyClass with value 5

7. Summary

- **Static Method**: Independent of categories, mainly used for tool functions. - **Class Method**: Manipulate data at the category level. - **Object Method (Instance Method)**: Manipulate object-level data. - **Inheritance and polytype**: Support code reuse and flexible design. - **object base class**: Provides basic methods so that all categories have consistent behavior.

Class inheritance

1. Basic inheritance concepts

In Python, class inheritance allows subclasses (Derived Class) to inherit the properties and methods of the parent class (Base Class) to achieve code reuse. For example:

class Parent:
    def greet(self):
        print("Hello from Parent!")

class Child(Parent):
    pass

c = Child()
c.greet() # Output: Hello from Parent!

2. Subclass overrides parent class method

Subclasses can override (Override) the methods of the parent class and rewrite its functions:

class Parent:
    def greet(self):
        print("Hello from Parent!")

class Child(Parent):
    def greet(self):
        print("Hello from Child!")

c = Child()
c.greet() # Output: Hello from Child!

3. Use super() to call the parent class method

In the subclass, you can call the method of the parent class through `super()` and extend the behavior of the parent class:

class Parent:
    def greet(self):
        print("Hello from Parent!")

class Child(Parent):
    def greet(self):
        super().greet()
        print("Hello from Child!")

c = Child()
c.greet()
# Output:
# Hello from Parent!
# Hello from Child!

4. Multiple inheritance

Python supports multiple inheritance, and subclasses can inherit multiple parent classes at the same time:

class Parent1:
    def greet(self):
        print("Hello from Parent1!")

class Parent2:
    def greet(self):
        print("Hello from Parent2!")

class Child(Parent1, Parent2):
    pass

c = Child()
c.greet() # Output: Hello from Parent1! (according to inheritance order)

5. Method resolution order (MRO)

Multiple inheritance uses MRO (Method Resolution Order) to determine the resolution order of methods. This can be checked using the `__mro__` attribute:

print(Child.__mro__)
# Output: (, , , )

6. Abstract base class

Use the `abc` module to define an Abstract Base Class to force subclasses to implement specific methods:

from abc import ABC, abstractmethod

class AbstractParent(ABC):
    @abstractmethod
    def greet(self):
        pass

class Child(AbstractParent):
    def greet(self):
        print("Hello from Child!")

c = Child()
c.greet() # Output: Hello from Child!

7. Summary

- Inheritance makes class code more reusable and extensible. - Subclasses can override parent class methods and call parent class methods using `super()`. - Supports multiple inheritance, but you need to pay attention to the method resolution order (MRO). - Abstract base classes can be used to force subclasses to implement specific methods, which is suitable for interface design.

Create a temporary class that inherits ClassB

Sample code

classClassB:
    def greet(self):
        print("Hello from ClassB!")

# Dynamically create a temporary class inherited from ClassB
TempClass = type('TempClass', (ClassB,), {
    'greet': lambda self: (print("Hello from TempClass!"), super(TempClass, self).greet())[0]
})

#Create instance and test
temp = TempClass()
temp.greet()

explain

type() function:
type('TempClass', (ClassB,), {...})
- 'TempClass': New category name.
- (ClassB,): Tuple of basic categories, here there is only ClassB.
- {...}: New attributes or methods.
Lambda functions are used to override methods:
- CustomizegreetThe method is to print out the new message first, and then passsuper()call parent categorygreet。

Output results

Hello from TempClass!
Hello from ClassB!

Switch between using different categories with the same interface

illustrate

Suppose there are two categoriesAClassandBClass, they have functions with the same name and parameters. We want to be able to easily switch which category to use without having to modify the main program logic.

Sample program

class AClass:
    def process(self, data):
        print(f"AClass processing: {data}")

    def result(self):
        return "result from AClass"


class BClass:
    def process(self, data):
        print(f"BClass processing: {data}")

    def result(self):
        return "result from BClass"


# You can control which category to use through settings
USE_A = True

#Dynamic selection of categories
SelectedClass = AClass if USE_A else BClass

# Create an instance and use it
obj = SelectedClass()
obj.process("Test data")
print(obj.result())

Output results (USE_A=True)

AClass processing: test data
Results from AClass

Output results (USE_A=False)

BClass processing: test data
Results from BClass

More advanced way of writing: using factory function

def get_class(name):
    mapping = {
        "A": AClass,
        "B": BClass
    }
    return mapping.get(name, AClass) # Default is AClass

# Dynamic selection
cls = get_class("B")
obj = cls()
obj.process("Test data")

in conclusion

If two categories have the same interface, they can be switched directly using conditional expressions or factory functions.
Keeping the function name and parameters consistent allows the main program to not need to be changed at all.
If you need a more flexible design, consider using "Interface" or "Abstract Class (abc)".

Python uses abstract categories

short answer

In Python,Not necessarily requiredWritten like C++ or Javaabstract class. Python adopts "Duck Typing". As long as objects have the same method names and behaviors, they can be considered compatible.

duck typing example

class AClass:
    def process(self, data):
        print(f"AClass processing: {data}")

class BClass:
    def process(self, data):
        print(f"BClass processing: {data}")

def run(obj):
    obj.process("data") # No need to specify the type, as long as there is this method

run(AClass())
run(BClass())

Output results

AClass Processing: Data
BClass Processing: Data

Situations using abstract categories (ABC)

Although Python does not enforce types, if you want to clearly define the interface in team development or large projects, you can useabcMods.

from abc import ABC, abstractmethod

class BaseClass(ABC):
    @abstractmethod
    def process(self, data):
        pass

class AClass(BaseClass):
    def process(self, data):
        print(f"AClass processing: {data}")

class BClass(BaseClass):
    def process(self, data):
        print(f"BClass processing: {data}")

# BaseClass() will report an error because the abstract method is not implemented

in conclusion

Python does not force the use of abstract categories, and the dynamically typed language inherently supports "different categories for the same interface".
If the project requires clear interface specifications, high readability or automatic checking, it is recommended to useabcMods.
In small or rapid development projects, duck typing can meet the needs.

Python modules

definition

In Python,Moduleis a file containing program code, usually with the file extension.pyThe end. Modules can define functions, categories, variables, and can also contain executable code to facilitate reuse and program structuring.

use

Split the program into multiple files for easier maintenance and reading.
Provide reusable libraries or tools.
Avoid code duplication and improve development efficiency.
Support namespace management to reduce name conflicts.

Usage

Python viaimportKeyword to introduce the module, for example:

import math

print(math.sqrt(16)) # Output 4.0

Custom modules

Developers can create their own mods. For example, create amymodule.py：


# mymodule.py
def greet(name):
    return f"Hello, {name}!"

Use in another program file:


import mymodule

print(mymodule.greet("Alice"))

Common built-in groups

os: Handle operating system related operations.
sys: Access Python interpreter related information.
datetime: Processing date and time.
math: Provides mathematical functions.
random: Generate random numbers.

Python suite

definition

In Python,PackageIt is a way of organizing modules. A package is a directory containing multiple modules, which makes the program easier to manage and maintain through a hierarchical structure.

feature

A package is a directory (folder).
The directory contains at least one__init__.pyarchive (omitted in Python 3.3+, but added to help clearly define the suite).
Packages can contain sub-suites to form a hierarchical structure.

basic example

Create a file calledmypackageKit of:


mypackage/
│── __init__.py
│── module1.py
└── module2.py

module1.pyexample:


def add(a, b):
    return a + b

module2.pyexample:


def multiply(a, b):
    return a * b

Use the kit

import mypackage.module1
import mypackage.module2

print(mypackage.module1.add(2, 3)) # Output 5
print(mypackage.module2.multiply(2, 3)) # Output 6

Import specific projects from a package

from mypackage.module1 import add
from mypackage.module2 import multiply

print(add(10, 5)) # Output 15
print(multiply(10, 5)) # Output 50

Common uses

Integrate and manage related modules to facilitate maintenance.
Avoid name conflicts and provide clear namespaces.
Make the structure of large projects clearer.

Get the current package name

illustrate

In Python, to get the current module'sPackage name, you can use special variables__package__. This and__module__The concept of getting the current module name is similar.

Example: Module and Kit Structure


mypackage/
│── __init__.py
└── submodule.py

submodule.pycontent:


print("__name__:", __name__)
print("__package__:", __package__)
print("__module__:", __module__)

Execution result

If used in other programs asimport mypackage.submoduleImport, the output is roughly as follows:


__name__: mypackage.submodule
__package__: mypackage
__module__: __main__

explain

__name__: The complete module name.
__package__: The name of the package to which the current module belongs. If it is a top-level module, it will be an empty string.
__module__: Usually used in the definition of a class or function to indicate which module it is defined in.

Application scenarios

Determine the contextual position of modules in large projects.
When importing relative, use__package__Help confirm package paths.

Python package version

There are two main ways to check the Python package version: through the terminal (command line) or by running it in Python code. This can help you confirm whether the environment meets project requirements.

1. Check via terminal (Command Line)

This is the fastest method and does not require entering the Python interactive environment.

Check out specific packages: pip show package name(For examplepip show pandas）。
List all installed packages: pip list。
Search for a specific package version: pip list | findstr "Package name"(Windows) orpip list | grep "package name" (Linux/Mac)。
Conda environment: conda list package name。

2. Check in Python code

If you need to determine the version when the program is executed, you can use the following two methods:

Use the __version__ attribute:Most major packages support this property.
```
import pandas
print(pandas.__version__)
```
Use importlib.metadata (recommended):This is a standard method built into Python 3.8+ and can be checked without importing the package first.
```
from importlib.metadata import version
print(version('pandas'))
```

3. Inspection method comparison table

method	Instructions/Code	Applicable situations
Pip command	`pip show`	View detailed information such as installation path, author, dependencies, etc.
Pip list	`pip list`	Get a quick overview of all packages and versions in your current environment.
internal properties	`.__version__`	Make logical judgments while the script is running.
Metadata	`version()`	Standardized way of checking without loading the entire suite.

4. Check Python’s own version

Sometimes the problem isn't with the suite, but with the Python interpreter itself:

Terminal: python --version
Within the program:
```
import sys
print(sys.version)
```

5. Frequently Asked Questions and Reminders

Multiple environment conflicts:ifpip listThe displayed version is different from the one executed in VS Code. Please confirm theInterpreterWhether the choice is correct.
Package not installed:If usingimportlibChecking for uninstalled packages will throwPackageNotFoundError。
Case sensitive:existpip showis not case-sensitive, but some code logic may require precise names.

Get package name from category and module object

Class object

In Python, class objects can be accessed viacls.__module__Find the name of the module that defines the category and then usesys.modulesGet the module object and finally read it__package__Property to get the package name.

importsys

# Assume a category is defined in mypackage.submodule
classMyClass:
    pass

# Get the name of the module to which the category belongs
module_name = MyClass.__module__
print("Module name:", module_name)

# Get module object
mod = sys.modules[module_name]

# Get the package name from the module object
print("Package name:", mod.__package__)

Module object

The module itself is an object and can be accessed directly__package__property.

import math
import mypackage.submodule as sub

# math is a standard function library module, there is no package, so __package__ is an empty string
print("math.__package__:", math.__package__)

# Modules for custom kits
print("sub.__package__:", sub.__package__)

Example of execution result

Module name: mypackage.submodule
Package name: mypackage
math.__package__:
sub.__package__: mypackage

in conclusion

category → __module__ → sys.modules[...] → __package__
Module object → __package__Obtain directly

inspect.getfile

When a Python file is executed directly, its module name is__main__. This makes using onlycls.__module__Unable to obtain original file name and path.

Assume your project structure is:

/project
  |-- test_runner.py <-- The file you execute directly (will be treated as __main__)
  |-- test/
        |-- db_test.py <-- File that defines DbCmdAgent

If you execute the following code in `test_runner.py`:

from test.db_test import DbCmdAgent
agent_obj = DbCmdAgent(...) # Instantiation

# At this time cls.__module__ is still 'test.db_test' (correct module name)

However, if you define categories in `test_runner.py`:

# Contents of test_runner.py
class DbCmdAgent:
    pass

agent_obj = DbCmdAgent()
#At this time cls.__module__ == '__main__' (wrong module name)

This indicates that the class you are getting information for is defined in a file that is executed as `__main__`.

use`inspect`The module can directly obtain the file path

Regardless of whether the category is defined in `__main__`, you can bypass the `__module__` attribute and useinspectmodule to obtain the source code file path corresponding to this category. This is a more reliable and general method.

Python corrected implementation

import inspect
import os

# --- Simulation situation: categories defined in __main__ (currently executing script) ---

class DbCmdAgent:
    """This category is defined in the currently executing main script"""
    def __init__(self, data):
        self.data = data

agent_obj = DbCmdAgent("some_data")

def get_class_location_robust(obj):
    """
    Use inspect.getfile to bypass the __module__ == '__main__' problem.
    """
    cls = type(obj)
    
    # 1. Use inspect.getfile() to get the file path that defines the category
    try:
        file_path = inspect.getfile(cls)
        
        # 2. Get the file name and directory
        file_name = os.path.basename(file_path)
        directory = os.path.dirname(file_path)
        py_name = os.path.splitext(file_name)[0]
        
        # 3. If __module__ is __main__, replace it with the archive name to provide more context
        module_name = cls.__module__
        if module_name == '__main__':
            module_name = py_name # Use db_test or test_runner as context
            
    exceptTypeError:
        # Handle built-in types
        file_path = "N/A (Built-in or C extension)"
        file_name = "N/A"
        directory = "N/A"
        py_name = "N/A"
        module_name = cls.__module__
        
    return {
        "module_name_or_main": module_name,
        "py_name_no_ext": py_name,
        "directory": directory,
        "file_path": file_path,
    }

# Execute and view the results
location_info = get_class_location_robust(agent_obj)

print("--- Category definition file information ---")
print(f"File name (.py name): {location_info['py_name_no_ext']}")
print(f"Directory path (Package): {location_info['directory']}")
print(f"Full file path: {location_info['file_path']}")

4. Summary of key points

useinspect.getfile(type(obj)): This is the most reliable way to get the original definition file path directly, completely independent of whether the module is considered__main__。
__module__This only reflects the class's module namespace in the current program, not necessarily its actual name in the project structure.

Dynamic import module

Function description

This method usesimportlib.import_moduleTry to import the specified module. If you encounterModuleNotFoundError, will try to import its submodules from the currently loaded package.

Program example

import importlib
importsys

def safe_import(module_name):
    try:
        # Try to import directly
        return importlib.import_module(module_name)
    except ModuleNotFoundError:
        # If failed, try to import the submodule from a known package
        for pkg in list(sys.modules.keys()):
            if pkg and not pkg.startswith("_"):
                try:
                    return importlib.import_module(f"{pkg}.{module_name}")
                except ModuleNotFoundError:
                    continue
        raise # If still not found, throw an exception

Usage examples

import numpy

mod1 = safe_import("random") # It can succeed directly because it is a standard library
mod2 = safe_import("linalg") # Will try numpy.linalg
print(mod2.__name__) # Output numpy.linalg

illustrate

sys.modulesSave all loaded modules.
When the import fails, an attempt will be made withpackage.module_namemethod to try again.
This method simulates the automatic search mechanism of Python submodules.

Python module search path: composition of sys.path

In Python,sys.pathis a list containing all directory paths that the Python interpreter will search in order when trying to import a module. When you executeimport some_module, Python will check in ordersys.patheach directory in the list until you find a directory namedsome_modulefiles (e.g.some_module.py、some_module/__init__.pywait).

The three main components of sys.path

sys.pathLists usually consist of the following three parts, and are searched in the following order:

1. The entry directory (entry point) of the program code

Directory to execute the script: first element of the list (sys.path[0]) is usually the currently executing Python script (e.g.main.py) is located in the directory.
Interactive mode: If you are in the Interactive Interpreter or Jupyter Notebook, this element will be the empty string'', which represents the current working directory (Current Working Directory, CWD).

2. PYTHONPATH environment variable

Python will read a file namedPYTHONPATHsystem environment variables.
All directory paths defined in this variable will be added tosys.pathmiddle.
use:PYTHONPATHUsually used to permanently add some commonly used project or library directories located in non-standard locations to the search path without having to manually set them in each project.

3. Standard function library and installation directory

Standard Library: Contains all Python built-in modules (such asos, sys, json, socket) installation directory.
Third-party package installation directory (site-packages): This is all throughpipInstalled third-party packages (e.g.fastapi, numpy, pandas) installation directory. In a virtual environment, this is usually located in the environment'slib/site-packagesdirectory.

---

System variables related to sys.path

In addition to the abovePYTHONPATHIn addition, there are several environment variables related to the Python execution environment, which affect the behavior of the interpreter and path finding, but affectsys.pathThe main variables that make up thePYTHONPATH：

System variables	Function description	Relationship to sys.path
`PYTHONPATH`	Defines additional directories to be added to the mod search path.	direct impact`sys.path`composition.
`PYTHONHOME`	Used to set an alternative path to the Python installation directory, especially for embedded systems.	Indirectly affects the standard library and`site-packages`location.
`PATH`	Used by the operating system to find executable files (e.g.`python.exe`) path.	does not directly affect`sys.path`, but affects which Python interpreter is executed.
`VIRTUAL_ENV`	When you are in a virtual environment, this variable points to the root directory of the virtual environment.	indirect impact`sys.path`, as it ensures`site-packages`It comes from the virtual environment rather than the entire system.

How to modify sys.path

becausesys.pathis an ordinary Python list that you can dynamically modify while the program is running, but such modifications only take effect within the current interpreter session:

importsys
import os

# Add the parent directory to the search path (commonly used for testing or internal project reference)
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

Mod Shadow

Module Shadowing (or Name Shadowing) is a common error or programming problem in Python. It occurs when you accidentally create a code file or variable name that conflicts with the name of a built-in module or third-party library that you are trying to import or use.

1. Definition and causes of occurrence

Definition: When the Python interpreter attempts to load a module, it follows a specific search order (sys.path) to find the mod.
Shadow occurs: If there is a standard module in your project directory that matches the one you intend to import (e.g.socket, json, random) or a file or directory with the same name as a third-party library (for example, you create a file namedsocket.pyfile), Python will load your local file first.

The result of this is that your local file "covers" or "overrides" the standard module that should be loaded, making your code unable to access standard module functionality (such assocket.AF_UNSPEC) will fail because the local file you loaded does not have these properties.

2. Common examples

Standard module name	The local file name that caused the shadow	as a result of
`socket`	`socket.py`	Unable to use standard`socket`The network constants of the module (such as`AF_INET`, `AI_PASSIVE`）。
`json`	`json.py`	Unable to use standard`json`Modular`loads`or`dumps`function.
`test`(Project name)	`test.py`	In a unit testing environment, there may be conflicts with the internal logic of the testing framework.

3. Solution

Rename the archive: This is the simplest and most effective solution. Simply rename local files or directories that conflict with standard module or library names. For example, changesocket.pyrenamednetwork_handler.py。
Check sys.path: do this in your codeimport sys; print(sys.path)Check Python's search path to see if it loads your local files in preference to the standard library directory.
Use a virtual environment: While a virtual environment by itself does not prevent local archive shadowing, it can ensure that third-party packages you install do not conflict with other environments.

Get the current entry .py name

illustrate

To get the current execution of the Python program__main__The name of the .py file it belongs to, which can be used__main__.__file__orsys.argv[0]. But in interactive mode, Jupyter, or-cMay not exist at execution time and therefore require safe handling.

Get the current main .py file name (safe version)

import os
importsys
import __main__

def get_main_py_path():
    """Returns the absolute path of .py to which main belongs. If it cannot be found, it returns None"""

    # Case 1: Normal execution .py
    main_file = getattr(__main__, "__file__", None)
    if main_file:
        return os.path.abspath(main_file)

    # Case 2: Judgment from sys.argv[0]
    if len(sys.argv) > 0:
        argv0 = sys.argv[0]
        if argv0 not in ("", "-c", ""):
            candidate = os.path.abspath(argv0)
            if os.path.exists(candidate):
                return candidate

    # Case 3: Interactive mode, Jupyter, embed etc.
    return None

# Example
path = get_main_py_path()
if path:
    print("main path:", path)
    print("main filename:", os.path.basename(path))
else:
    print("main.py not found (maybe in interactive environment or not executed from archive)")

Just the file name (basename)

path = get_main_py_path()
filename = os.path.basename(path) if path else None
print(filename)

Summarize

__main__.__file__is the most reliable way, but does not always exist.
sys.argv[0]is a backup method, but you need to confirm that the value is not-cor<stdin>。
Interactive (REPL/Jupyter) cannot obtain the .py name and will return None.

Get the parameter names and values of the current function

To get the parameter names and their corresponding values of a function in Python, you can use the `inspect` module, which provides introspection utilities. Specifically, `inspect.signature()` can help you retrieve the names of the parameters, and you can pass the current frame's local variables to get their values.

Here is an example that demonstrates how to get the function name, parameter names, and their values:

```python
import inspect

# Sample function
def my_function(a, b, c=5):
    # Get the current frame
    frame = inspect.currentframe()
    
    # Get the function name
    func_name = frame.f_code.co_name
    print(f"Function name: {func_name}")
    
    # Get the parameter names and their values
    args, _, _, values = inspect.getargvalues(frame)
    
    # Print parameter names and values
    for arg in args:
        print(f"Parameter name: {arg}, Value: {values[arg]}")

# Call the function
my_function(1, 2)
```

### Output:
```
Function name: my_function
Parameter name: a, Value: 1
Parameter name: b, Value: 2
Parameter name: c, Value: 5
```

### Explanation:
1. **`inspect.currentframe()`**: Retrieves the current execution frame.
2. **`frame.f_code.co_name`**: Extracts the name of the current function.
3. **`inspect.getargvalues(frame)`**: Gets the argument names and their corresponding values from the frame. This function returns a tuple containing:
   - `args`: List of argument names.
   - `_`: Placeholder for unused information.
   - `values`: Dictionary containing argument names as keys and their values.

This allows you to print both the names of the function's parameters and their values at runtime.

Get the type of function parameter

Using the inspect module

In Python, you can useinspect.signature()Get function parameter information, and further fromParameter.annotationThe property gets the type annotation (type hint) for each parameter.

import inspect

def my_function(a: int, b: str, c: float = 3.14) -> bool:
    return str(a) == b

sig = inspect.signature(my_function)

for name, param in sig.parameters.items():
    print(f"Parameter name: {name}")
    print(f"Default value: {param.default}")
    print(f"Type annotation: {param.annotation}")
    print()

Output results

Parameter name: a
  Default: <class 'inspect._empty'>
  Type annotation: <class 'int'>

Parameter name: b
  Default: <class 'inspect._empty'>
  Type annotation: <class 'str'>

Parameter name: c
  Default: 3.14
  Type annotation: <class 'float'>

illustrate

param.annotationWill get type annotations (such asint、str）。
If the parameter has no type annotation, it will beinspect._empty。
availableget_type_hints()Get the complete type hint dictionary.

Use get_type_hints

This method will automatically resolve forward references (types marked as strings).


from typing import get_type_hints

hints = get_type_hints(my_function)
print(hints)

Output results


{'a': <class 'int'>, 'b': <class 'str'>, 'c': <class 'float'>, 'return': <class 'bool'>}

Summarize

inspect.signature()The parameter order and name can be obtained.
param.annotationYou can check the type annotation of each parameter.
get_type_hints()It is easier to get all type hints (including return values) at once.

Get the class name of the object

illustrate

In Python, you can use the object's__class__attribute ortype()function, Get the category (class) it belongs to, and then get the category name.

example

class Animal:
    pass

class Dog(Animal):
    pass

obj = Dog()

#Method 1: Use __class__.__name__
print(obj.__class__.__name__) # Output: Dog

#Method 2: Use type()
print(type(obj).__name__) # Output: Dog

#Method 3: Get the complete module and category name
print(obj.__class__) # Output: <class '__main__.Dog'>
print(obj.__class__.__module__) # Output: __main__

Output results


Dog
Dog
<class '__main__.Dog'>
__main__

illustrate

obj.__class__: Get the category object of the object.
obj.__class__.__name__: Get the category name string.
type(obj)Equivalent toobj.__class__。
__module__: The name of the module where the category is defined can be obtained.

Advanced applications

To get the complete "module + category name" at the same time, you can write:


cls = type(obj)
full_name = f"{cls.__module__}.{cls.__name__}"
print(full_name)

Output results


__main__.Dog

Summarize

type(obj).__name__This is the simplest way to get the category name.
If you need to completely identify the source of the category (including modules), you can usef"{obj.__class__.__module__}.{obj.__class__.__name__}"。

Detect the category to which the attribute belongs

The following is sample code that uses Python to determine which inheritance class a property belongs to:

Sample code

import inspect

class BaseClass:
    base_attr = "I am an attribute from BaseClass"

class SubClass(BaseClass):
    sub_attr = "I am an attribute from SubClass"

# Define a function to find the category to which an attribute belongs
def find_attribute_owner(cls, attr_name):
    for base in inspect.getmro(cls): # Get MRO (method resolution order)
        if attr_name in base.__dict__:
            return base
    return None

# test
sub_obj = SubClass()
attributes = sub_obj.__class__.__dict__.items() # Get all attributes at the category level
for name, value in attributes:
    owner = find_attribute_owner(sub_obj.__class__, name)
    print(f"Attribute '{name}' belongs to category: {owner.__name__}")

Program Description

inspect.getmro(cls):Gets the MRO (method resolution order) of a category, which is the inherited category hierarchy order.
base.__dict__:Each category has its own__dict__, which stores the attributes defined by the category.
The function checks each category in the MRO to find in which category the attribute is defined.

Execution result

For the categories in the example, the execution results are as follows:

Attribute '__module__' belongs to class: SubClass
Attribute 'sub_attr' belongs to class: SubClass
Attribute '__doc__' belongs to class: SubClass
Attribute 'base_attr' belongs to class: BaseClass

Get function annotation (Docstring)

Use doc

class MyClass:
    def fun1(self):
        '''
        Fun1 comment
        '''
        pass

obj = MyClass()
print(obj.fun1.__doc__)

Using the inspect module

import inspect

class MyClass:
    def fun1(self):
        '''
        Fun1 comment
        '''
        pass

print(inspect.getdoc(MyClass.fun1))

Function marked as deprecated

Using warnings module

In Python, you can use the built-inwarningsModules that alert users at execution time (rather than compile time) that a function has been deprecated and suggest new alternatives.

import warnings

def old_function(x, y):
    warnings.warn(
        "Function old_function() is deprecated, please use new_function(x, y) instead.",
        category=DeprecationWarning,
        stacklevel=2
    )
    return x + y

def new_function(x, y):
    return x + y

Execution example


result = old_function(3, 4)
print(result)

Output results

DeprecationWarning: Function old_function() is deprecated, please use new_function(x, y) instead.
  result = old_function(3, 4)
7

Additional information

warnings.warn()The warning is displayed during the execution phase and does not interrupt the program.
category=DeprecationWarningUsed to mark deprecation warnings.
stacklevel=2Have the warning message point to the caller's location, not inside the function.
If you want to treat warnings as errors during development, use:
```
python -W error::DeprecationWarning your_script.py
```

Advanced: Create a decorator to automatically mark deprecated functions

Decorators can be used to allow multiple old functions to share the same deprecation prompt logic:

import warnings
from functools import wraps

def deprecated(new_func_name):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            warnings.warn(
                f"Function {func.__name__}() is deprecated, please use {new_func_name}() instead.",
                category=DeprecationWarning,
                stacklevel=2
            )
            return func(*args, **kwargs)
        return wrapper
    return decorator

@deprecated("new_function")
def old_function(x, y):
    return x + y

The difference between @staticmethod and @classmethod

In Python,@staticmethodand@classmethodBoth decorators can define methods that can be called without instantiating the class, but their purpose and behavior differ.

@staticmethod

definition：@staticmethodis not accepting any implicit first argument (noneselforcls) method. It's just like a normal function, just belonging to the class's namespace.
usage: Static methods cannot modify the state of a class or instance and are typically used for class-related functionality but do not require access to or modification of the class or instance.
When to use: Use when you don’t need access to a category or instance, but want to attribute related functionality to a category@staticmethod。

@staticmethod example:

classMyClass:
    @staticmethod
    def static_method(x, y):
        return x + y

# Static methods can be called without creating an instance
result = MyClass.static_method(5, 10) # Result: 15

focus：@staticmethodCategory cannot be accessed (cls) or instance (self）。

@classmethod

definition：@classmethodis the accept category itself (cls) as the first parameter. This allows it to access and modify the category's status.
usage:Category methods can modify the state of the category hierarchy (such as category variables) and can be called on an instance or category.
When to use: Use when you need to access or modify category-level data (such as category variables) or create an alternative constructor for a category@classmethod。

@classmethod example:

classMyClass:
    class_variable = 0

    def __init__(self, value):
        self.value = value
        MyClass.class_variable += 1

    @classmethod
    def get_class_variable(cls):
        return cls.class_variable

#Create instance
obj1 = MyClass(10)
obj2 = MyClass(20)

# Call category method
print(MyClass.get_class_variable()) # Result: 2

focus：@classmethodCan access category-level status (cls）。

Summarize

feature	@staticmethod	@classmethod
first parameter	No implicit first argument	`cls`(category itself)
access instance	none	none
Access Category	none	have
usage	Utility functions that are related to categories but do not require instances or categories	Need to manipulate category-level data or provide alternative constructors

Static class performs initialization

Python itself does not provide a "default static method" or a "default class method", that is, the function of automatically executing a method the first time any static or class method is called. But we can achieve similar behavior through lazy loading techniques.

Solution: Use static variables and lazy loading

You can define a static variable in the class to track the initialized state, and then execute the initialization logic when the static or class method is called for the first time.

example:

classMyClass:
    initialized = False # Static variable, tracking whether it has been initialized

    @staticmethod
    def init_once():
        if not MyClass.initialized:
            print("Initialization logic execution...")
            MyClass.initialized = True

    @classmethod
    def class_method(cls):
        cls.init_once()
        print("Call class method")

    @staticmethod
    def static_method():
        MyClass.init_once()
        print("Call static method")

# Call the category method for the first time to trigger initialization
MyClass.class_method() # Output: Initialization logic execution...Call class method

# Call the category method for the second time and no longer perform initialization
MyClass.class_method() # Output: Call class method

# The first time the static method is called, no initialization is performed because it has already been initialized.
MyClass.static_method() # Output: Call static method

How it works:

static variablesinitialized: This variable is used to track whether the category has been initialized. The initial value isFalse。
init_once()method: This method is responsible for executing the initialization logic and willinitializedset toTrue, to prevent repeated initialization.
Static and class methods: Each time these methods are called, it will first check whether initialization is required to ensure that the initialization logic will only be executed once.

Summarize

Although Python does not have a built-in "default static method" or "default class method", by using static variables and lazy loading techniques, you can automatically execute initialization logic when the static or class method is called for the first time, and ensure that this logic will only be executed once.

Python thread

In Python, thread is a mechanism used to implement concurrency. It allows a program to perform multiple tasks simultaneously within a single process (Process). This is very useful for performing I/O-intensive operations (such as network communication, file reading and writing), and can prevent the program from blocking (Blocking) waiting for external operations to complete.

1. Why use threads?

Non-blocking I/O: While the program is waiting for network response or disk reading and writing, other threads can continue to perform calculations to improve efficiency.
User Interface: Keep the UI responsive even when the main program is performing time-consuming background tasks.
Parallel tasks: Process multiple independent tasks at the same time, such as multiple client connection requests.

2. Python thread restrictions: GIL (Global Interpreter Lock)

In the standard CPython interpreter, there is a "Global Interpreter Lock" (Global Interpreter Lock, GIL). The GIL ensures that only one thread can execute Python bitcode at any given time. This means:

Threads perform well in I/O-intensive tasks (because Python can release the GIL while waiting for I/O, allowing other threads to execute).
The acceleration effect of threads on CPU-intensive tasks is not obvious (because the GIL will limit the parallel computing capabilities of multi-core CPUs). For CPU-intensive tasks, it is recommended to use multi-process (multiprocessingmodule).

3. Execution thread module:`threading`

Python uses the standard function librarythreadingModule to handle threads of execution. There are two main methods of thread creation:

Method 1: Pass function as target (Target Function)

This is the simplest and most common usage.

import threading
import time

def task(name, delay):
    """Task function to be executed by the thread"""
    print(f"Thread {name}: Starting...")
    time.sleep(delay) # Simulate time-consuming I/O operations
    print(f"Thread {name}: Task completed.")

#Create execution thread
thread1 = threading.Thread(target=task, args=("T1", 2))
thread2 = threading.Thread(target=task, args=("T2", 4))

# Start the thread
thread1.start()
thread2.start()

# Wait for all threads to complete (block the main thread until they finish)
thread1.join()
thread2.join()

print("All execution threads have completed. The main program exits.")

Method 2: Inheritance`threading.Thread`category

Suitable for more complex scenarios, encapsulating the logic of the execution thread in a category.

import threading
import time

class MyThread(threading.Thread):
    def __init__(self, name, delay):
        super().__init__()
        self.name = name
        self.delay = delay

    def run(self):
        """
        When the thread starts, the run() method is automatically called.
        Here you define the tasks to be performed by the thread.
        """
        print(f"Thread {self.name}: Starting...")
        time.sleep(self.delay)
        print(f"Thread {self.name}: Task completed.")

#Create and start the thread
thread3 = MyThread("T3", 3)
thread3.start()
thread3.join()

print("Custom execution thread has been completed.")

4. Thread synchronization and data sharing

When multiple threads access and modify shared data, race conditions may occur. You need to use a synchronization mechanism to protect your data:

Lock: Lock is the most basic synchronization primitive. Called before a thread accesses a shared resourcelock.acquire(), call after completionlock.release()。
RLock(Reentrant lock): Allows the same thread to acquire the lock multiple times.
Semaphore(Log): Used to limit the number of threads that can access resources at the same time.

Using Lock Example

import threading

# Shared resources
counter = 0
# Create lock
lock = threading.Lock()

def increment_counter():
    global counter
    # Obtain the lock to ensure that only one thread can execute this block at the same time
    lock.acquire()
    try:
        # Competition section
        current_value = counter
        time.sleep(0.001) # Simulate switching
        counter = current_value + 1
    finally:
        # Release lock
        lock.release()

threads = []
for i in range(100):
    t = threading.Thread(target=increment_counter)
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Final counter value: {counter}") # If there is no lock, this value may not be 100

Starting, synchronizing and stopping Python threads

PythonthreadingThe module provides the function of creating and managing execution threads, but due to operating system limitations and design philosophy, Python does not provide a safe, direct, and forced method to stop (Kill) external execution threads. Forced stopping may result in resource leakage or data corruption.

Therefore, stopping the execution thread must be achieved through **Cooperative Mechanism****, that is, letting the execution thread check a stop flag by itself and exit gracefully.

1. Thread stopping mechanism: using flags (Flag)

This is the safest and most recommended method of thread stopping. It requires the thread to periodically check an external variable (flag) in the loop of executing the task.

Python implementation example

import threading
import time

# Shared stop flag
stop_flag = threading.Event()

def monitored_task(name, delay):
    """
    A task function that periodically checks the stop flag
    """
    print(f"Thread {name}: Starting...")
    i = 0
    while not stop_flag.is_set(): # Check whether the flag is set
        i += 1
        print(f"Thread {name}: Execution step {i}")
        
        # Simulate time-consuming operations and check regularly
        time.sleep(delay)
        
        # Here you can set a limit on the number of executions to ensure that there is no infinite loop
        if i >= 5:
            break
            
    print(f"Execution thread {name}: received a stop signal or the task ended, and exited gracefully.")

# --- Main program control block ---

#Create execution thread
worker_thread = threading.Thread(target=monitored_task, args=("Worker-1", 1))

# Start the thread
worker_thread.start()

print("\nMain program: The execution thread has been started, wait 3 seconds...\n")
time.sleep(3) # Let the thread run for a while

# Send stop signal
print("\nMain program: Set stop flag...\n")
stop_flag.set() # Set Event and let is_set() return True

# Wait for the thread to finish gracefully and exit (usually quickly)
worker_thread.join()

print("\nMain program: The thread has been safely stopped and joined. The program exited.")

2. Description of key components

threading.Event(): A simple synchronization primitive. It maintains a Bollinger flag internally.
- stop_flag.is_set(): Check whether the flag is True (i.e. whether a stop signal is issued).
- stop_flag.set():Set the flag to True to send a stop signal.
while not stop_flag.is_set():: This is the main control structure of the thread. As long as no stop signal is received, the loop will continue to execute.
worker_thread.join(): Ensure that the main program will wait for the work thread to complete (gracefully exit) before continuing execution. This is an important step to ensure a clean ending to the procedure.

3. Other stopping mechanisms (forced stopping is not recommended)

Although there are some experimental or unsafe ways to force a stop, such as using `_thread.stop()` or throwing an exception, these methods can cause:

Resource locking: If an execution thread is forced to stop while holding a lock (Lock), the lock will never be released, causing other execution threads to be permanently blocked.
Data corruption: A thread may be terminated in the middle of modifying shared data, leaving an inconsistent state.

Therefore, in Python, you should always stick to using a coordinated flagging mechanism to stop threads.

Solution to multi-thread shared object conflict

1. The most commonly used solution: Thread-Local Storage

This is the most recommended approach in multi-threaded environments. Instead of having all threads grab the same object, each thread has an independent copy of the object. In Python, you can usethreading.local()to achieve.

import threading

# Create a thread area to store objects
thread_data = threading.local()

def get_service():
    # If the current thread does not have its own service, create one
    if not hasattr(thread_data, 'service'):
        print(f"Create a new connection for thread {threading.current_thread().name}")
        thread_data.service = create_new_connection()
    return thread_data.service

def task():
    service = get_service()
    # Perform operations...

2. The second most commonly used method: locking mechanism (Locking)

If the object must be the same (such as writing to the same file or operating on the same global counter), you must useLock. This ensures that only one thread can access the object at the same time, avoiding race conditions.

lock = threading.Lock()

def safe_task():
    with lock:
        # Within this block, other threads must wait
        shared_object.do_something()

---

Alternatives to Multithreading: Asynchronous and Multiprocessing

If you want to avoid the risk of lock contention or crashes that come with multiple threads, there are two main alternatives to consider:

1. Asynchronous coroutines (Asyncio) - suitable for I/O-intensive tasks

This is currently the most popular approach in Python (such as the core principle of FastAPI). It runs within a single thread and waits for I/O (such as API requests, database queries) by switching tasks.

advantage: There is no thread competition problem, no need to use locks, and very low memory consumption.
Suitable for the scene: A large number of API requests, crawlers, and web servers.

import asyncio

async def fetch_api(url):
    # Use asynchronous libraries such as aiohttp
    response = await call_api(url)
    return response

async def main():
    # Execute multiple tasks at the same time, but switch within a single thread
    results = await asyncio.gather(fetch_api("url1"), fetch_api("url2"))

2. Multiprocessing - suitable for computationally intensive tasks

Python's execution thread is limited by GIL (Global Interpreter Lock) and cannot truly parallelize operations.multiprocessingMultiple independent Python interpreter instances will be opened.

advantage: Each process has an independent memory space, and the problem of "shared object crash" will not occur at all. Able to take advantage of multi-core CPUs.
Suitable for the scene: Image processing, big data operations, and parsing large JSON.

from multiprocessing import Process

def task(name):
    print(f"Process {name} is executing")

if __name__ == "__main__":
    p = Process(target=task, args=('A',))
    p.start()
    p.join()

3. Task Queue - suitable for distributed processing

If you want to completely decouple tasks, you can use Celery or Redis Queue. Throw the task into the queue and let the back-end Worker (which may be multiple processes or multiple machines) pick up and execute it.

---

Summary suggestions

plan	Solution	Applicable situations
Thread-Local	Each thread gets a copy	API Service, database connection
Asyncio	Single thread switching (non-synchronous)	High concurrent network requests (recommended)
Multiprocessing	independent memory space	CPU computing, completely avoiding sharing conflicts

Balance between thread area storage and information sharing

core concepts

When you use Thread-Local Storage (TLS), the purpose is to protect those "non-thread-safe" objects (such as API Service, database connections). But if data needs to be exchanged between threads (for example, the results downloaded by thread A need to be processed by thread B), you need to establish a special "communication channel".

1. Use thread-safe queues (Queue) - most recommended

Pythonqueue.QueueIs thread safe. This is the most standard and safest way to transfer information between threads. It has already processed all lock logic internally.

import threading
import queue

# Create a global queue that all execution threads can access
task_queue = queue.Queue()

def producer():
    # Produce data and put them in the queue
    data = {"video_id": "abc", "status": "pending"}
    task_queue.put(data)

def consumer():
    # Get data from the queue
    data = task_queue.get()
    # Process data...
    task_queue.task_done()

2. Use thread security variables (such as shared list locking)

If you need to share a large list or dictionary, you can use general global variables, but they must be accessed with matchingthreading.Lock。

shared_results = []
results_lock = threading.Lock()

def task():
    result = "Some operation results"
    
    #Lock before accessing shared resources
    with results_lock:
        shared_results.append(result)
    # Automatically unlock after leaving the with block

3. Use Event or Condition objects (signal synchronization)

Sometimes you don't want to share "data" but "state" (for example: telling other threads that the API has been initialized).

api_ready = threading.Event()

def initializer():
    #Perform initialization
    api_ready.set() # Send signal

def worker():
    api_ready.wait() # Wait for the signal until the initializer calls set()
    print("Start working")

Summary: Regional storage vs shared information

Content type	Storage location	management style
Tool objects(API, DB connection)	Thread-Local (region)	Each has a copy to avoid crashes.
Mission information(ID, parameter)	Queue (global)	Use thread-safe queue passing.
Calculation result(statistics)	Global List/Dict (global)	Must match`threading.Lock`。

To put it simply: **"Get the private tools (connections) by yourself, and get the public information (data) in queue (Queue/Lock)."**

thread lock

Basic concepts

In a multi-thread environment, when multiple threads try to modify the same global variable or shared resource (such as a file, database connection, global list) at the same time, an error occurs.Race Condition, leading to data confusion.threading.LockIs a synchronization primitive that ensures that only one thread can enter a protected block of code at a time.

1. Standard usage

The safest and recommended way is to matchwithUse narrative sentences. This ensures that even if an exception occurs within the block, the lock will be released correctly to avoid deadlock.

import threading

# 1. Create a lock object
my_lock = threading.Lock()
shared_counter = 0

def increment_task():
    global shared_counter
    # 2. Use with to automatically manage acquire() and release()
    with my_lock:
        # The code in this block can only be executed by one thread at a time
        temp = shared_counter
        temp += 1
        shared_counter = temp

# Start multiple thread tests
threads = [threading.Thread(target=increment_task) for _ in range(100)]
for t in threads: t.start()
for t in threads: t.join()

print(f"Final count: {shared_counter}")

2. Manual control method

Although not recommended, sometimes finer control is needed. You must manually callacquire()acquire the lock andfinallycall in blockrelease()。

lock = threading.Lock()

def manual_task():
    lock.acquire() #Acquire the lock. If the lock is already occupied, it will block (wait) here.
    try:
        # Execute tasks
        pass
    finally:
        lock.release() # Must be released, otherwise other threads will never be able to execute

3. Lock characteristics: non-reentrancy

threading.LockIt is not reentrant. This means that if the same thread requests the same lock again when it already holds the lock, it will "lock" itself (deadlock).

solution: If you need to acquire the same lock multiple times in the same thread, use insteadthreading.RLock()(Reentrant lock).

4. When should you use Lock?

Modify global variables: Such as counter and accumulator.
Non-thread-safe objects: As you mentioned beforeGoogle API Serviceobject.
File writing: Prevent multiple execution threads from writing the same file at the same time, causing content confusion.
Database operations: If multiple execution threads share the same connection for query and submission.

Performance considerations

Excessive use of locks will cause program performance to decrease because multiple threads will become "queued for execution." If possible, give priority to usingqueue.Queueor what we discussed earlierThread-Local Storage, these methods are usually more efficient and less error-prone than frequent locking.

Python asynchronous programming

In Python,async defandawaitis realizedAsynchronous Programmingcore syntax. They prevent the program from getting stuck while waiting for I/O tasks (such as network requests, reading files) and can instead handle other tasks, greatly improving performance.

1. async def: define coroutine function

When you prepend a function definition withasync, the function will become aCoroutine Function. When you call it, it will not execute the content immediately, but will return a "coroutine object".

async def fetch_data():
    print("Start fetching data...")
    # Simulate time-consuming tasks
    return {"data": "success"}

# A direct call will only get the coroutine object and will not execute print
result = fetch_data()
print(result) # Output: <coroutine object fetch_data at ...>

2. await: suspend and wait

awaitcan only beasync defFor internal use. Its function is to "temporarily suspend the current coroutine, wait for the subsequent tasks to complete, and obtain the return value." While waiting, the system can perform other asynchronous tasks.

import asyncio

async def main():
    # Use await to execute the coroutine and get the results
    data = await fetch_data()
    print(f"Capture results: {data}")

# Start the entry for asynchronous programs
asyncio.run(main())

3. Key comparison table

grammar	Function description	Things to note
async def	Declare an asynchronous function	What is returned is the coroutine object, not the execution result.
await	Wait for asynchronous tasks to complete	Can only be written inside async functions.
asyncio.run()	Start the outermost asynchronous entry	A program usually only needs to be called once.

4. Common errors and corrections

Forgot to write await:This is the most common mistake. if not writeawait, you will getcoroutineobject, when you try to read properties (such asresult.text) will spray out whenAttributeError。
Use await in general functions:This will result inSyntaxError. You must ensure that the outer function hasasyncGrooming.
Blocking operation:existasyncused in functionstime.sleep()It will block the entire event loop. Please use insteadawait asyncio.sleep()。

5. Why use asynchronous?

Imagine you are cooking:

Sync:When boiling water, you stand in front of the kettle in a daze, and then go to chop vegetables after the water boils.
Asynchronous (Async):Boil water (await) at the same time, when you go to chop vegetables, the sound of boiling water (event notification) will call you back to deal with it.

Complete the coroutine and return the result

loop.run_until_complete()yesasyncioThe lower-level method in the module is used to execute the coroutine until it is completed and return the result. After Python 3.7, although it is officially recommended to useasyncio.run(), but you still need to use this method in some specific situations (such as when you need to reuse event loops or customize startup logic).

1. Basic usage process

userun_until_completeYou must first obtain or create an event loop object, and then pass the coroutine to it.

import asyncio

async def my_task():
    await asyncio.sleep(1)
    return "Task completed"

# 1. Get event loop
loop = asyncio.get_event_loop()

# 2. Execute the coroutine until completion and obtain the return value directly
result = loop.run_until_complete(my_task())

print(result) #Output: Task completed

2. Differences from asyncio.run()

Both can get the return value, but manage the life cycle differently:

characteristic	asyncio.run() (recommended)	loop.run_until_complete()
degree of automation	high. Automatically create, close loops and clean up tasks.	Low. The life cycle of the loop needs to be managed manually.
reusability	Low. Each call creates a new loop.	high. Multiple tasks can be executed in the same loop.
Usage restrictions	Cannot be used within an already running loop.	More flexible and often used in legacy code or testing environments.

3. Get results in an existing loop

If you are in a script that is already running and want to ensure that a coroutine is executed and the value is obtained, you can use this method:

import asyncio

async def add(a, b):
    return a + b

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)

try:
    # Execute the first task
    val1 = loop.run_until_complete(add(10, 20))
    #Execute the second task
    val2 = loop.run_until_complete(add(val1, 5))
    print(f"Final result: {val2}")
finally:
    # Must be closed manually
    loop.close()

4. Get the results of multiple tasks

If you want multiple coroutines to execute in parallel and retrieve the return value uniformly, you need to matchasyncio.gather：

async def task(id):
    return f"result {id}"

loop = asyncio.get_event_loop()
# gather will wrap multiple coroutines into one task, and run_until_complete will wait for all to be completed.
results = loop.run_until_complete(asyncio.gather(task(1), task(2), task(3)))
print(results) #Output: ['Result 1', 'Result 2', 'Result 3']

5. Common precautions

RuntimeError：If you are on an already runningasync defcall within functionrun_until_complete, will spurt outThis event loop is already running. Please always use it inside asynchronous functions.await。
Resource cleanup:After using the loop, you should callloop.close()to release resources, otherwise the program may not exit normally or generate a warning.
Return value type: run_until_completeWill return coroutinereturnIf an uncaught exception occurs in the coroutine, the exception will be thrown to the calling end.

asyncio.run()It is a high-level API introduced since Python 3.7. It is the most recommended way to start asynchronous programs. it will automatically take care ofEstablish an event loop, execute the coroutine, and close the loop after completion. The most important thing is that it will be passed directly back to the coroutinereturnvalue.

1. Basic acquisition methods

You just need toasync defThe function call passed toasyncio.run(), it will return the result like a normal synchronous function.

import asyncio

async def calculate_score(name):
    print(f"Calculating the score of {name}...")
    await asyncio.sleep(1) # Simulate time-consuming tasks
    return 95

# Get the result of return directly
final_score = asyncio.run(calculate_score("Zhang Xiaoming"))

print(f"The final score is: {final_score}") # Output: 95

2. Process the results of multiple tasks

Usually we will define amain()function as the entry point, and obtain the results of all subtasks inside, and finally byasyncio.run(main())Unified output.

async def task_a():
    return "apple"

async def task_b():
    return "banana"

async def main():
    # Use gather in main to execute simultaneously
    results = await asyncio.gather(task_a(), task_b())
    return results # Return a list

# Get the return value of main through asyncio.run
all_fruits = asyncio.run(main())
print(all_fruits) # Output: ['apple', 'banana']

3. Execution rules of asyncio.run()

Rule items	illustrate
single entrance	In an execution continuation, usually only called once`asyncio.run()`。
Automatic cleaning	It automatically cancels all remaining tasks and closes the thread pool, which is very safe.
Nesting limit	cannotis already`async def`function internal call`asyncio.run()`。

4. Common errors: RuntimeError

If you are inside an asynchronous function trying to get the results of another asynchronous function, useawait, instead ofasyncio.run()。

# Error demonstration
async def sub_task():
    return 10

async def main():
    # An error will occur here: RuntimeError: asyncio.run() cannot be called from a running event loop
    res = asyncio.run(sub_task())
    
# Correct correction
async def main():
    res = await sub_task() # Please use await in an asynchronous environment

5. Practical suggestions

The outermost layer uses:Bundleasyncio.run()As the start button of the program, place it inif __name__ == "__main__":in the block.
Exception handling:If an error occurs inside the coroutine and is not caught,asyncio.run()The exception will be thrown directly. It is recommended torunthe outer layer plustry...except。
Postback type:Regardless of whether the coroutine returns a string, dictionary or custom object,asyncio.run()can be removed intact.

Python data analysis

Python scientific operations NumPy

NumPy(Numerical Python) is the most important scientific computing library in Python. It provides efficient multi-dimensional array objectsndarray, and a large library of mathematical functions for manipulating these arrays. It is the underlying pillar of data science, machine learning (such as Pandas, Scikit-learn, TensorFlow) and other fields.

1. Why choose NumPy instead of native List?

High performance:Written in C language, array operations are performed in continuous memory, which is much faster than Python's built-in list (List).
Vectorization:Allows mathematical operations to be performed on entire arrays without the need to write explicitforCircle.
Broadcasting:Automatically handle operations between arrays of different shapes.

2. Core object: ndarray basic operations

import numpy as np

# Create one-dimensional and two-dimensional arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([[1, 2], [3, 4]])

# Quickly create a specific array
zeros = np.zeros((3, 3)) # 3x3 matrix with all zeros
ones = np.ones((2, 4)) # 2x4 matrix with all 1’s
eye = np.eye(3) # 3x3 identity matrix
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]

3. Commonly used array operations and attributes

Function	Code example	illustrate
Shape check	`arr.shape`	Return the size of each dimension (such as (3, 2)).
change shape	`arr.reshape(1, 6)`	Change dimensions without changing the data.
Matrix multiplication	`np.dot(a, b)`or`a @ b`	Perform matrix multiplication in linear algebra.
statistical function	`np.mean()`, `np.std()`	Calculate the mean, standard deviation, maximum and minimum values.

4. Slicing & Indexing

NumPy's slicing syntax is similar to Python List, but more powerful and supports multi-dimensional simultaneous slicing:

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Get the second column (index 1)
print(arr[1, :]) # [4, 5, 6]

# Get the 2x2 sub-matrix in the lower right corner
print(arr[1:, 1:]) # [[5, 6], [8, 9]]

5. Installation and version checking

Install: pip install numpyorconda install numpy。

Check version:

import numpy as np
print(np.__version__)

6. Use with Numba

As mentioned earlier,NumbaIt can perfectly recognize the array structure of NumPy and further compile complex NumPy operations into machine code, reaching the execution limit close to the native C language.

Python efficiently processes large amounts of data Numba

NumbaIt is an open source JIT (Just-In-Time) compiler specifically designed to accelerate Python code that processes large amounts of data. It can translate Python functions into machine code with execution speed comparable to C, C++ or Fortran, and is especially suitable forNumPyArray operations.

1. Core functions and advantages

JIT compilation:Just-in-time compilation as the program executes, rather than ahead-of-time compilation (AOT).
Support NumPy:Excellent optimization effect for NumPy arrays and functions.
GPU acceleration:Through CUDA or ROCm support, computing tasks can be transferred to NVIDIA or AMD graphics cards for execution.
Parallel operations:Supports multi-core CPU parallel processing, which can be achieved with simple tags.

2. Basic usage examples

The easiest way to use Numba is to add@jitor@njitDecorator.

from numba import njit
import numpy as np

# @njit represents "nopython" mode, which ensures not to enter the Python parser and is the fastest
@njit
def fast_function(n):
    total=0
    for i in range(n):
        total += i
    return total

# The first call will compile, the second call will directly execute the machine code
print(fast_function(10000000))

3. Pattern comparison: nopython vs object

model	Decorator	illustrate
nopython mode	`@njit`	Recommended.Completely separated from the Python interpreter and directly compiled into machine code. If the program code contains parts that cannot be compiled, an error will be reported.
object mode	`@jit`	If it cannot be compiled, it will fall back to the Python parser for execution. Performance improvements are usually limited.

4. Commonly used techniques: parallel operations

To take advantage of your CPU's multi-core capabilities, simply turn onparallel=Trueand useprange。

from numba import njit, prange

@njit(parallel=True)
def parallel_sum(A):
    s = 0
    # prange will automatically allocate loops to different CPU cores
    for i in prange(A.shape[0]):
        s += A[i]
    return s

5. Installation and dependency checking

Installation instructions: pip install numbaorconda install numba。
Dependencies:Numba dependencyLLVMCompiler framework (viallvmlitelibrary).
Check the installation:
```
import numba
print(numba.__version__)
```

6. Usage restrictions

Non-pure numerical calculations:Numba is working on Python’sdict、listOr the performance is poor when customizing objects.
Compilation overhead:The first time the function is executed there will be a short delay due to compilation.
Third-party libraries:Unless Numba has official support (such as NumPy), most third-party library functions cannot be compiled.

Pandas data analysis tool

What is pandas?

Pandas is a Python-based data analysis and manipulation tool designed for processing structured data, such as tabular data or time series data.

Pandas core data structures

Series：A one-dimensional array of tags, which can be viewed as an indexed list.
DataFrame：A two-dimensional data structure, similar to a spreadsheet or SQL table.

Main features of Pandas

Read and write data in multiple formats (such as CSV, Excel, JSON, SQL).
Data cleaning and preprocessing, such as filling missing values and removing duplicate values.
Data filtering, grouping and aggregation operations.
Time series analysis and processing.
Integrate other data science tools such as NumPy and Matplotlib.

Usage examples

import pandas as pd

# Create DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['Taipei', 'Taichung', 'Kaohsiung']}
df = pd.DataFrame(data)

# View data
print(df)

# Filter information older than 28
filtered_df = df[df['age'] > 28]
print(filtered_df)

Applicable scenarios

Analyze large data sets.
Handle data cleaning and format conversion tasks.
Quickly explore and visualize data.

Why choose Pandas?

Pandas provides efficient, flexible and intuitive operation methods, which is particularly suitable for data analysis and processing. It is one of the indispensable tools in data science and machine learning.

in conclusion

Pandas is a powerful data analysis tool, and both beginners and advanced users can benefit from its easy-to-use design and extensive functionality.

Python Googletrans

Install Googletrans

First you need to installgoogletranskit. Enter the following command at the command line:

pip install googletrans==4.0.0-rc1

Note: Please confirm the version when installing4.0.0-rc1, as older versions may no longer work.

Basic usage examples

Here is an example of translating English into Traditional Chinese:

from googletrans import Translator

#Initialize the Translator object
translator = Translator()

# Translate text
text = "Hello, how are you?"
result = translator.translate(text, src="en", dest="zh-tw")

# Output translation results
print("original text:", text)
print("Translation:", result.text)

Supported language codes

You can translate multiple languages, here are the common language codes:

English:en
Traditional Chinese:zh-tw
Simplified Chinese:zh-cn
Japanese:ja
Korean language:ko
French:fr
German:de

Things to note

Googletrans is an unofficial Google Translate API and may stop working due to changes on Google's end. If you find that the translation function is not working, please consider using other translation APIs, such as Google's official Cloud Translation API.

Python other translation kits

DeepL Translator

DeepL provides high-accuracy translation services, but requires an API key to use its developer API.

Microsoft Translator

The translation tool provided by Microsoft supports multi-language translation, but requires the use of Azure API key settings.

Amazon Translate

The translation service provided by Amazon Web Services (AWS) provides efficient translation of multilingual texts and needs to be accessed through the API key provided by AWS.

LibreTranslate

LibreTranslate is an open source translation tool that can set up its own server and does not require an API key. Some third-party public servers also offer the option of using them without an API key.

TextBlob

TextBlob is a natural language processing-based tool with built-in Google Translate functionality. However, older versions do not require an API key and may need to pay attention to version support.

MyMemory

MyMemory provides memory-based translation. Some functions do not require an API key, but advanced use may require application.

in conclusion

Among Googletrans' competitors, LibreTranslate and some versions of TextBlob offer options that don't require an API key. If you need a tool that is completely free and requires no additional setup, consider these options.

OpenCC Chinese conversion

OpenCC(Open Chinese Convert) is an open source project dedicated to the conversion of Simplified Chinese to Traditional Chinese. It is not just a simple word-to-word conversion, but more importantly, it handlesvocabulary levelconversion and differences in word usage habits in different regions (Mainland China, Taiwan, Hong Kong).

1. Core advantages of OpenCC

Accurate conversion:Able to distinguish between simple and complex situations such as "fa (developed)" and "fa (hair)".
Regional support:Supports terminology conversion between Taiwan Traditional Chinese, Hong Kong Traditional Chinese and Mainland China Simplified Chinese (for example: software vs software).
High performance:Developed using C++ and providing multiple language interfaces such as Python, Node.js, and Java.
Custom dictionary:Allows users to create their own conversion rules.

2. Python installation and basic use

In Python, the most commonly used wrapper library isopencc-python-reimplemented。

# Installation instructions
pip install opencc-python-reimplemented

# Basic code example
from opencc import OpenCC

# Initialization, s2t stands for Simplified to Traditional (simplified to traditional)
cc = OpenCC('s2t')
text = "Artificial intelligence has changed the world"
result = cc.convert(text)

print(result) #Output: Artificial intelligence has changed the world

3. Common conversion mode comparison table

Schema name	illustrate	Applicable situations
s2t	Simplified Chinese to Traditional Chinese	Convert standard characters from simple to traditional.
t2s	Traditional Chinese to Simplified Chinese	Standard characters are converted from traditional to simplified.
s2twp	Simplified Chinese to Taiwanese Traditional Chinese (including vocabulary)	Most commonly used in Taiwan.Will convert "software" to "software".
tw2s	Taiwan Traditional Chinese to Simplified Chinese	Convert Taiwanese idioms to Mainland China simplified Chinese.
s2hk	Simplified to Hong Kong Traditional	It conforms to the word usage habits in Hong Kong (such as "里" is changed to "里").

4. Command line tools (CLI)

If you don't want to write a program, OpenCC can also be operated directly in the terminal:

# Convert input.txt to output.txt, using Simplified to Taiwanese Traditional mode
opencc -i input.txt -o output.txt -c s2twp.json

5. Frequently Asked Questions and Reminders

One simple versus many complex:Although OpenCC performs well, errors may still occur in a few extreme contexts, and it is recommended that important documents still require manual proofreading.
Environment dependencies:Native OpenCC requires a C execution environment. If you get an error when installing the Python library, please try usingreimplementedVersion.
JSON configuration:The conversion logic is given by.jsonFile definitions, you can go to the OpenCC installation directory to see how these dictionaries work.

Python query Chinese phonetic notation

The following example will show how to use Python to query all phonetic notation of Chinese characters. we usepypinyinKit to obtain the pinyin of Chinese characters and customize the correspondence table from pinyin to phonetic symbols.

Step 1: Install the pypinyin package

pip install pypinyin

Step 2: Python code example

The following is the Python code, including the phonetic correspondence table and the program logic for querying and generating HTML results.

from pypinyin import pinyin, Style
from jinja2 import Template

# Simple correspondence table from Pinyin to Zhuyin
pinyin_to_zhuyin = {
    "a": "ㄚ", "ai": "ㄞ", "an": "ㄢ", "ang": "ㄤ", "ao": "ㄠ",
    "ba": "ㄅㄚ", "bai": "ㄅㄞ", "ban": "ㄅㄢ", "bang": "ㄅㄤ", "bao": "ㄅㄠ",
    # Omit some correspondences and need to improve them yourself
    "hao": "ㄏㄠ", "hao": "ㄏㄠ", "hǎo": "ㄏㄠˇ", "hào": "ㄏㄠˋ"
}

# Query all pinyin of Chinese characters and convert them to pinyin
def get_zhuyin(char):
    pinyins = pinyin(char, style=Style.NORMAL, heteronym=True)
    unique_pinyins = set(pinyins[0])
    zhuyins = {p: pinyin_to_zhuyin.get(p, p) for p in unique_pinyins} # Convert Pinyin to Zhuyin
    return list(zhuyins.values())

# Query characters
chinese_char = 'good' # can be replaced with other characters
zhuyin_results = get_zhuyin(chinese_char)

illustrate

pinyin to phonetic conversion:usepinyin_to_zhuyinDictionary to map pinyin to phonetic symbols. This correspondence table needs to be expanded by itself.
Jinja2 template:Use Jinja2 template to generate HTML and output the results to the web page for display.
Multi-phonetic character support:useheteronym=TrueMake sure that all pronunciations of multi-phonetic characters are displayed.

Python storage

Check if a file or folder exists

Use os.path

import os

# Check if it is a file
if os.path.isfile("example.txt"):
    print("File exists")

# Check if it is a folder
if os.path.isdir("my_folder"):
    print("The folder exists")

# Check if the file or folder exists
if os.path.exists("path/to/check"):
    print("path exists")

Using pathlib (Python 3.4+)

from pathlib import Path

path = Path("example.txt")

if path.exists():
    print("path exists")

if path.is_file():
    print("is a file")

if path.is_dir():
    print("It is a folder")

Python file operations

In Python, the most standard and powerful module for handling file copying, moving, renaming and deletion isshutil(for high-level operations) andos(for basic path operations).

1. File copy operation (Copy)

The copy operation has different functions depending on whether metadata needs to be preserved (such as creation time, permissions):

shutil.copy(source, destination):Copy file contents and permissions. The destination can be a directory.
shutil.copy2(source, destination):The most recommended, it will preserve all metadata of the archive (time stamps, etc.) as completely as possible.
shutil.copytree (source folder, destination folder):Recursively copies the entire directory and all its contents.

import shutil

# Copy a single file
shutil.copy2('source.txt', 'destination.txt')

#Copy the entire folder (the destination folder must not exist)
shutil.copytree('my_folder', 'backup_folder')

2. File moving and renaming (Move/Rename)

The underlying logic of moving and renaming is very similar. useshutil.moveCan be executed across disks, whileos.renameIt is often used to rename the same disk.

Function	Common commands	illustrate
Move files or directories	`shutil.move(src, dst)`	Move files or entire directories to new paths, supporting cross-partitions.
Rename file	`os.rename(old, new)`	Change file or directory name.

import shutil
import os

# Move files
shutil.move('test.txt', 'archive/test.txt')

# Rename
os.rename('old_name.txt', 'new_name.txt')

3. File deletion operation (Remove)

Delete operations should be used with caution, as these commands usually do not go through the recycle bin, but directly remove them permanently.

os.remove(path):Delete a single file. If the path is a directory, an error will be reported.
os.rmdir(path):Delete empty directories.
shutil.rmtree(path):Forcefully delete the entire directory tree (including all files and subdirectories inside).

# Delete a single file
if os.path.exists('temp.txt'):
    os.remove('temp.txt')

# Forcefully delete the entire folder
shutil.rmtree('old_data_folder')

4. Path and status check

Before performing a copy or move, check whether the path exists to avoid program crashes:

os.path.exists(path)：Check if the path exists.
os.path.isfile(path)：Check if it is a file.
os.path.isdir(path)：Check if it is a directory.

5. Notes and suggestions

Path processing:Recommendedos.path.join()orpathlibModule to combine paths to ensure that the program works between Windows (backslashes) and Linux (forward slashes).
Exception handling:File operations can easily fail due to insufficient permissions or files being occupied. It is recommended to usetry...exceptpack.
Override behavior: shutil.copyandshutil.moveWhen a file with the same name already exists in the destination path, it will usually be overwritten directly. Before execution, you should confirm whether it needs to be renamed to avoid it.

logging suite

PythonloggingThe suite is a powerful built-in set for effective log management in applications. Whether logging debugging information, error tracking, or performance monitoring,loggingEach suite can provide multi-level and multi-format logging methods.

Basic concepts

Logger: Logger, responsible for creating and managing log messages.
Handler: Processor, used to determine the output location of the log, such as console, file, network, etc.
Formatter:Formatter, defines the output format of log messages.
Level:Level, set the severity level of the log, for exampleDEBUG、INFO、WARNING、ERRORandCRITICAL。

Sample code

The following is a basicloggingKit configuration example:

import logging

# Set logger
logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)

# Set Handler
console_handler = logging.StreamHandler()
file_handler = logging.FileHandler('app.log')

# Set Formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
console_handler.setFormatter(formatter)
file_handler.setFormatter(formatter)

# Add Handler to logger
logger.addHandler(console_handler)
logger.addHandler(file_handler)

# Test different levels of logs
logger.debug('This is a debug message')
logger.info('This is an information message')
logger.warning('This is a warning message')
logger.error('This is an error message')
logger.critical('This is a serious message')

Log level

loggingThe following log levels are supported:

DEBUG: The lowest level, used for debugging information.
INFO: General information, such as system operating status.
WARNING: Warning information, but will not cause the program to stop.
ERROR: Error information, which usually causes problems due to errors.
CRITICAL: The most serious error that may cause the program to terminate.

Custom output format

Can be usedFormatterClass to define the output format of the log. For example:

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

This format will output something similar to the following:

2024-10-25 10:00:00 - my_logger - INFO - This is an information message

summary

loggingThe suite provides a variety of log management options, allowing developers to customize log levels, output formats and destinations according to needs, improving the maintainability and debugging efficiency of applications.

Logging Handler

in Pythonloggingin the kit,HandlerIt is a key component responsible for defining the output location of the log. Different processors can output logs to different destinations, including the console, a file, the network, and even email.StreamHandlerOne of the most commonly used processors, it is responsible for outputting logs to the console.

Main processor (Handler)

existlogging, some commonHandlerinclude:

StreamHandler: Output the log tostdoutorstderr(usually the console).
FileHandler: Output logs to a file.
NullHandler: Ignore log output, suitable for situations where display is not required.
SMTPHandler: Send logs via email.

The relationship between StreamHandler and Console Output

StreamHandleris the most commonly used processor for console output, it usually sends log messages to standard error (stderr). through settingsStreamHandler, which allows the logs to be displayed directly on the console, suitable for real-time monitoring of system status.

If you do not want the logs to be displayed on the console, you can remove them from the LoggerStreamHandler, or use insteadNullHandlerto avoid output.

Sample code: Remove Console Output

The following example shows how to configureloggingand remove the console outputStreamHandler：

import logging

# Set Logger
logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)

# Add StreamHandler to output to console
console_handler = logging.StreamHandler()
logger.addHandler(console_handler)

# Test output
logger.info("This message will be displayed on the console")

# Remove console output
logger.removeHandler(console_handler)
logger.info("This message will not be displayed on the console")

How to use NullHandler to avoid Console Output

Can be usedNullHandlerTo disable all output of the log:

import logging

# Set up Logger and add NullHandler
logger = logging.getLogger('my_logger')
logger.addHandler(logging.NullHandler())

# This message will not be displayed in the console
logger.info("This message will not be displayed")

summary

HandleryesloggingA component in a suite that defines the output location. throughStreamHandler, log messages can be displayed on the console for real-time monitoring. If console output is not needed, it can be removedStreamHandleror useNullHandlerto disable logging output.

Python log thread ID

Modify log format

To include a thread ID in logging.Formatter, you can use the%(thread)dproperty. This displays the unique integer ID assigned to the thread by the system.

import logging

# Add %(thread)d to display thread ID
log_formatter = logging.Formatter(
    '%(asctime)s.%(msecs)03d %(thread)d %(levelno)s %(message)s',
    "%Y-%m-%d %H:%M:%S"
)

#Example output:
# 2025-12-20 16:30:45.123 140735612345678 20 This is a log message

Commonly used thread-related attributes

In addition to the integer ID, you can also choose to display the name of the thread according to your needs:

%(thread)d: Displays the integer ID of the thread.
%(threadName)s: Displays the name of the thread (such as MainThread or Thread-1).

Complete sample code

This code shows how to apply this format in a multi-threaded environment:

import logging
import threading

# 1. Set formatter
log_format = '%(asctime)s.%(msecs)03d [Thread:%(thread)d] %(levelno)s %(message)s'
date_format = "%Y-%m-%d %H:%M:%S"
formatter = logging.Formatter(log_format, date_format)

# 2. Set log processor
handler = logging.StreamHandler()
handler.setFormatter(formatter)

logger = logging.getLogger("ThreadTest")
logger.addHandler(handler)
logger.setLevel(logging.INFO)

# 3. Test function
def task():
    logger.info("Subthread is executing")

# Start testing
logger.info("Main thread started")
t = threading.Thread(target=task)
t.start()
t.join()

Python library

Python commonly used library suite

Relational Database (SQL)

SQLite (built-in)

Python has built-in support for SQLite and is suitable for small applications.

import sqlite3

conn = sqlite3.connect("example.db")
cursor = conn.cursor()

cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT)")
cursor.execute("INSERT INTO users (name) VALUES (?)", ("Alice",))
conn.commit()

cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

cursor.close()
conn.close()

MySQL（PyMySQL）

Used to connect to the MySQL database.

pip install pymysql

import pymysql

conn = pymysql.connect(host="localhost", user="root", password="password", database="test")
cursor = conn.cursor()

cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

cursor.close()
conn.close()

PostgreSQL（psycopg2）

Used to connect to PostgreSQL.

pip install psycopg2

import psycopg2

conn = psycopg2.connect(dbname="testdb", user="user", password="password", host="localhost")
cursor = conn.cursor()

cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

cursor.close()
conn.close()

SQL Server（pyodbc）

Used to connect to Microsoft SQL Server.

pip install pyodbc

import pyodbc

conn = pyodbc.connect("DRIVER={SQL Server}; SERVER=localhost; DATABASE=test; UID=user; PWD=password")
cursor = conn.cursor()

cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

cursor.close()
conn.close()

Non-relational database (NoSQL)

MongoDB（pymongo）

Suitable for file-based databases.

pip install pymongo

import pymongo

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["testdb"]
collection = db["users"]

collection.insert_one({"name": "Alice", "age": 25})
print(list(collection.find()))

Redis（redis-py）

Suitable for caching and high-performance Key-Value access.

pip install redis

import redis

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

r.set("name", "Alice")
print(r.get("name"))

Elasticsearch（elasticsearch-py）

Suitable for full-text search and analysis.

pip install elasticsearch

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

doc = {"name": "Alice", "age": 25}
es.index(index="users", document=doc)

print(es.search(index="users", query={"match_all": {}}))

ORM framework

SQLAlchemy

Supports multiple SQL databases and provides ORM functions.

pip install sqlalchemy

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, sessionmaker

engine = create_engine("sqlite:///example.db")
Base = declarative_base()

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String)

Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)
session = Session()

session.add(User(name="Alice"))
session.commit()

print(session.query(User).all())

in conclusion

Relevant database:sqlite3(built-in),pymysql（MySQL）、psycopg2（PostgreSQL）、pyodbc（SQL Server）。
Non-relational database:pymongo（MongoDB）、redis（Redis）、elasticsearch（Elasticsearch）。
ORM framework:SQLAlchemy(Supports multiple SQL databases).

PyMySQL

Install PyMySQL

Install PyMySQL using the following instructions:

pip install pymysql

Connect to MySQL database

Connect to the MySQL server using PyMySQL:

importpymysql

# Establish connection
conn = pymysql.connect(
    host="localhost",
    user="your_user",
    password="your_password",
    database="your_database",
    charset="utf8mb4",
    cursorclass=pymysql.cursors.DictCursor # Return dictionary format
)

#Create cursor
cursor = conn.cursor()

# Query information
cursor.execute("SELECT * FROM your_table")
result = cursor.fetchall()
for row in result:
    print(row)

# Close connection
cursor.close()
conn.close()

Execute SQL statement

When executing INSERT, UPDATE, and DELETE statements, changes need to be submitted:

try:
    with conn.cursor() as cursor:
        sql = "INSERT INTO users (name, age) VALUES (%s, %s)"
        cursor.execute(sql, ("Alice", 25))
    conn.commit() # Submit changes
except Exception as e:
    conn.rollback() # Rollback when an error occurs
    print("An error occurred:", e)

Execute stored procedure

Can be usedcallprocCall stored procedure:

with conn.cursor() as cursor:
    cursor.callproc("your_stored_procedure", (param1, param2))
    result = cursor.fetchall()
    print(result)

Insert data in batches

useexecutemanyTo perform bulk inserts:

data = [("Bob", 30), ("Charlie", 28), ("David", 35)]
sql = "INSERT INTO users (name, age) VALUES (%s, %s)"

with conn.cursor() as cursor:
    cursor.executemany(sql, data)
conn.commit()

Prevent SQL injection

Use parameterized queries to prevent SQL injection attacks:

name = "Alice"
sql = "SELECT * FROM users WHERE name = %s"

with conn.cursor() as cursor:
    cursor.execute(sql, (name,))
    result = cursor.fetchall()
    print(result)

Close connection

Make sure to close the database connection when the program ends:

conn.close()

PyMySQL datetime usage

1. Insert datetime (Python datetime → MySQL DATETIME)

PyMySQL will automaticallydatetime.datetimeConvert to MySQL DATETIME and can be passed in directly.

import pymysql
import datetime

conn = pymysql.connect(
    host="localhost",
    user="root",
    password="pwd",
    database="testdb",
    cursorclass=pymysql.cursors.DictCursor
)

with conn:
    with conn.cursor() as cur:
        now = datetime.datetime.now()

        sql = """
            INSERT INTO logs (msg, created_at)
            VALUES (%s, %s)
        """

        cur.execute(sql, ("hello world", now))
        conn.commit()

2. Insert date (Python date → MySQL DATE)

today = datetime.date.today()

sql = "INSERT INTO records (start_date) VALUES (%s)"
cur.execute(sql, (today,))
conn.commit()

3. Query datetime (MySQL DATETIME → Python datetime)

The found field will automatically becomedatetime.datetimetype.

sql = "SELECT id, created_at FROM logs ORDER BY id DESC LIMIT 1"
cur.execute(sql)
row = cur.fetchone()

dt = row["created_at"]
print(dt, type(dt))
# For example: 2025-07-01 15:22:33

4. Format the detected datetime into a string

formatted = dt.strftime("%Y-%m-%d %H:%M:%S")
print(formatted)

5. Use datetime as query condition

start = datetime.datetime(2025, 1, 1, 0, 0, 0)

sql = """
    SELECT * FROM logs
    WHERE created_at >= %s
"""
cur.execute(sql, (start,))
rows = cur.fetchall()

6. Query data within a certain period of time

begin = datetime.datetime(2025, 1, 1)
end   = datetime.datetime(2025, 1, 31, 23, 59, 59)

sql = """
    SELECT * FROM logs
    WHERE created_at BETWEEN %s AND %s
"""
cur.execute(sql, (begin, end))
rows = cur.fetchall()

7. If the database datetime is stored in strings (VARCHAR/CHAR), it needs to be parsed manually.

sql = "SELECT created_at FROM logs_str LIMIT 1"
cur.execute(sql)
row = cur.fetchone()

dt = datetime.datetime.strptime(row["created_at"], "%Y-%m-%d %H:%M:%S")
print(dt)

8. Insert datetime in large quantities (executemany)

items = [
    ("log1", datetime.datetime.now()),
    ("log2", datetime.datetime.now()),
]

sql = "INSERT INTO logs (msg, created_at) VALUES (%s, %s)"
cur.executemany(sql, items)
conn.commit()

in conclusion

PyMySQL will automatically handle datetime ↔ MySQL DATETIME, no manual conversion is required.
When inserting data, just pass Python datetime.
Querying MySQL DATETIME will obtain datetime.datetime objects.
This is only needed if datetime in MySQL is stored as a string.strptimeManual parsing.

Get datetime field from pymysql query result

Basic usage

importpymysql
import datetime

# Establish database connection
conn = pymysql.connect(
    host='localhost',
    user='root',
    password='your_password',
    database='your_db',
    cursorclass=pymysql.cursors.DictCursor # Return fields as dictionary
)

with conn:
    with conn.cursor() as cursor:
        sql = "SELECT id, created_at FROM your_table LIMIT 1"
        cursor.execute(sql)
        result = cursor.fetchone()

        dt = result['created_at'] # Assume that the created_at field is of datetime type
        print(dt)
        print(type(dt))

Convert to string

dt_str = dt.strftime("%Y-%m-%d %H:%M:%S")
print(dt_str)

If the data is in string format

# If created_at is stored in string format (such as CHAR / VARCHAR)
dt = datetime.datetime.strptime(result['created_at'], "%Y-%m-%d %H:%M:%S")
print(dt)

Supplement: cursorclass selection

# If DictCursor is not set, the data will be tuple, and the value method is as follows:
cursor = conn.cursor()
cursor.execute("SELECT id, created_at FROM your_table")
row = cursor.fetchone()
dt = row[1]

Get the difference between SQL datetime and the current time

Sample program - Days difference from current time

importpymysql
import datetime

# Establish database connection
conn = pymysql.connect(
    host='localhost',
    user='root',
    password='your_password',
    database='your_db',
    cursorclass=pymysql.cursors.DictCursor
)

with conn:
    with conn.cursor() as cursor:
        sql = "SELECT created_at FROM your_table LIMIT 1"
        cursor.execute(sql)
        result = cursor.fetchone()

        dt_from_sql = result['created_at'] # datetime type
        now = datetime.datetime.now()

        # Calculate time difference
        diff = now - dt_from_sql
        print(f"Difference in days: {diff.days}")

If only comparing dates, not including time

diff = now.date() - dt_from_sql.date()
print(f"Pure date difference number of days: {diff.days}")

Sample program - time difference from current time

importpymysql
import datetime

# Establish database connection
conn = pymysql.connect(
    host='localhost',
    user='root',
    password='your_password',
    database='your_db',
    cursorclass=pymysql.cursors.DictCursor
)

with conn:
    with conn.cursor() as cursor:
        sql = "SELECT created_at FROM your_table LIMIT 1"
        cursor.execute(sql)
        result = cursor.fetchone()

        dt_from_sql = result['created_at'] # datetime type
        now = datetime.datetime.now()

        # Calculate time difference
        diff = now - dt_from_sql
        hours = diff.total_seconds() / 3600
        print(f"Difference in hours: {hours}")

round up hours

print(f"The difference is an integer number of hours: {int(hours)}")

Database forced update

principle

In InnoDB, the default storage engine of MySQL, the default isolation level isREPEATABLE READ(repeatable read).

This means that after you execute the first query in the same transaction (Transaction), MySQL will create a "snapshot" of the transaction. Even if the data in the database is modified by other programs one second later, as long as your transaction has not ended, the second query will still see the same snapshot data as the first time. This is to ensure the consistency of "repeatable reading".

How to force the latest data?

Method 1: Manually commit the transaction (most recommended)

This is the most standard approach. Before executing the second query, callconnection.commit(). Submitting will end the current transaction, and MySQL will create a new snapshot during the next query and read the latest data.

# First query
cursor.execute(sql)
result1 = cursor.fetchall()

# Force update: commit the current transaction
connection.commit()

# The second query will capture the latest information.
cursor.execute(sql)
result2 = cursor.fetchall()

Method 2: Turn on automatic submission mode (Autocommit)

If you want every SQL command to take effect immediately and see the latest results, you can turn on automatic submission after establishing a connection. So every oneexecute()will be treated as independent matters.

connection.autocommit = True
# All subsequent execute(sql) will directly read the latest status of the database

Method 3: Adjust the isolation level

You can change the isolation level toREAD COMMITTED(read commit content). At this level, every time a query is executed within the same transaction, the latest submitted data will be read.

# Set before executing query
cursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED")

Common Misconception: Don’t Just Check Cursor

Cursors and connections: Caching usually occurs inconnection(wired) transaction scope, notcursorThe object itself. If you re-create the cursor but use the same connection and do not commit, the problem will still exist.
Query Cache: The server-side Query Cache function has been removed since MySQL 8.0, so modern environments are generally irrelevant to SQL caching.

Multithreaded database connection

Why do multiple threads need to be wired independently?

In Python's multi-thread environment, if multiple threads share the same database connection, serious conflicts will occur. This is because connected objects are usually not thread-safe. Multiple threads sending SQL commands or reading results at the same time will cause data confusion, transaction exceptions, or connection interruptions. Therefore, it is a best practice to use a connection pool to allocate independent connections to each thread.

1. Use the connection pool of MySQL Connector

mysql-connector-pythonBuilt-in simple connection pool function. You can create a pool in advance and "borrow" connections from it when threads need it.

import threading
import mysql.connector
from mysql.connector import pooling

# 1. Create a connection pool (global variable)
db_config = {
    "host": "localhost",
    "user": "root",
    "password": "password",
    "database": "test_db"
}

connection_pool = mysql.connector.pooling.MySQLConnectionPool(
    pool_name="my_pool",
    pool_size=5, # Set the pool size, which should be greater than or equal to the number of threads
    **db_config
)

# 2. Execute thread task function
def thread_task(task_id):
    conn=None
    try:
        # Get an independent connection from the pool
        conn = connection_pool.get_connection()
        cursor = conn.cursor()
        
        # Perform operations
        cursor.execute("SELECT NOW()")
        result = cursor.fetchone()
        print(f"Execution thread {task_id} reading time: {result}")
        
        # Note: Under this isolation level, remember to commit if there are any changes.
        # conn.commit()

    except Exception as e:
        print(f"An error occurred in thread {task_id}: {e}")
    finally:
        # Important: Close the cursor and connection (here close will return the connection to the pool instead of actually interrupting it)
        if conn and conn.is_connected():
            cursor.close()
            conn.close()

# 3. Start multiple execution threads
threads = []
for i in range(3):
    t = threading.Thread(target=thread_task, args=(i,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

2. Use SQLAlchemy’s connection pool (commonly used in the industry)

If you are developing a larger application, SQLAlchemy provides more powerful and automated connection management (QueuePool). It automatically handles connection failures, recycling, and thread allocation.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# The connection pool will be automatically enabled when creating the Engine
engine = create_engine(
    "mysql+mysqlconnector://root:password@localhost/test_db",
    pool_size=10,
    max_overflow=20
)

# Use sessionmaker to ensure that each thread gets an independent Session
Session = sessionmaker(bind=engine)

def sqlalchemy_task():
    session = Session()
    try:
        # Perform operations
        # session.execute(...)
        session.commit()
    finally:
        session.close() # Return the connection

Implementation considerations

pool_size: The size of the connection pool must be adjusted according to the concurrency of execution threads. If there are more threads than connections, subsequent threads will enter a waiting state.
Context Manager: Recommended to use whenever possiblewithSyntax to manage cursor and connection to ensure that the connection can be returned even if an exception occurs.
Transaction integrity: each thread should be explicitly executed before the work is completedcommit()orrollback(), otherwise the connection may have an unfinished transaction status when returned to the pool, affecting the next user.

Python call stored procedure

UseMySQL

To execute MySQL stored procedures in Python, you can usemysql-connector-pythonorPyMySQL。

Installation kit

pip install mysql-connector-python

Example: Execute stored procedure

import mysql.connector

# Connect to database
conn = mysql.connector.connect(
    host="localhost",
    user="your_user",
    password="your_password",
    database="your_database"
)
cursor = conn.cursor()

# Call stored procedure
cursor.callproc("your_stored_procedure", (param1, param2))

# Get results
for result in cursor.stored_results():
    print(result.fetchall())

# Close connection
cursor.close()
conn.close()

Using SQL Server

To execute SQL Server stored procedures in Python, you can usepyodbc。

Installation kit

pip install pyodbc

Example: Execute stored procedure

import pyodbc

# Connect to SQL Server
conn = pyodbc.connect("DRIVER={SQL Server};"
                      "SERVER=your_server;"
                      "DATABASE=your_database;"
                      "UID=your_user;"
                      "PWD=your_password")
cursor = conn.cursor()

# Execute stored procedure
cursor.execute("{CALL your_stored_procedure (?, ?)}", (param1, param2))

# Get results
rows = cursor.fetchall()
for row in rows:
    print(row)

# Close connection
cursor.close()
conn.close()

Using PostgreSQL

To execute PostgreSQL stored procedures in Python, you can usepsycopg2。

Installation kit

pip install psycopg2

Example: Execute stored procedure

import psycopg2

# Connect to PostgreSQL
conn = psycopg2.connect(
    dbname="your_database",
    user="your_user",
    password="your_password",
    host="localhost",
    port="5432"
)
cursor = conn.cursor()

# Execute stored procedure
cursor.callproc("your_stored_procedure", (param1, param2))

# Get results
rows = cursor.fetchall()
for row in rows:
    print(row)

# Close connection
cursor.close()
conn.close()

Things to note

Make sure the stored procedure name is correct and pass in the appropriate parameters as required.
If the stored procedure has a return value, please usefetchall()orstored_results()to get results.
The calling syntax of different databases may be slightly different, please adjust according to the database used.

Python network

HTTP request suite requests

illustrate

requestsIt is the most commonly used HTTP request suite in Python, used to access REST API, download web pages, send form data, etc. It is better than the built-inurllibMore concise and easier to use.

Installation method


pip install requests

Basic usage

import requests

#Send GET request
url = "https://api.pionex.com/api/v1/common/symbols"
response = requests.get(url)

# Check if successful
if response.status_code == 200:
    data = response.json()
    print("Request successful, return field:", list(data.keys()))
else:
    print("Request failed:", response.status_code)

POST request example


import requests

url = "https://httpbin.org/post"
payload = {"key": "value", "symbol": "BTC_USDT"}
headers = {"Content-Type": "application/json"}

res = requests.post(url, json=payload, headers=headers)
print(res.json())

Common methods

method	use
`requests.get()`	Get information (most commonly used)
`requests.post()`	Submit information or forms
`requests.put()`	Update information
`requests.delete()`	Delete data
`response.status_code`	HTTP status code
`response.json()`	Parse JSON response
`response.text`	Get original text content
`response.raise_for_status()`	Automatically throw HTTP error exceptions

Error handling example

import requests

try:
    r = requests.get("https://api.pionex.com/api/v1/unknown")
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print("HTTP error:", e)
except requests.exceptions.RequestException as e:
    print("General request error:", e)

Replenish

Can be used for REST APIs, automated crawlers, webhook calls, and more.
Supports JSON, form, file upload and other formats.
andpandasorasyncioIt can be used to retrieve large amounts of data.

Check HTTP status codes

illustrate

raise_for_status()yesrequestsIn moduleResponseObject method used to check whether the HTTP response status code is an error (such as 4xx or 5xx).

If the status code shows that the request failed, it will automatically throwrequests.exceptions.HTTPErrorExceptions are used to facilitate debugging or stopping execution.

example

import requests

url = "https://api.pionex.com/api/v1/common/symbols"
response = requests.get(url)

# Check whether the HTTP status code is successful (200~299)
response.raise_for_status()

# If successful, JSON can be parsed safely
data = response.json()
print("The request was successful, the number of trading pairs obtained:", len(data.get("data", [])))

Execution result

The request was successful and the number of trading pairs obtained: 248

Error handling example

import requests

try:
    r = requests.get("https://api.pionex.com/api/v1/invalid_endpoint")
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print("HTTP error:", e)
except requests.exceptions.RequestException as e:
    print("Request exception:", e)

Replenish

No error will be thrown in the success status (200~299).
If the API responds to an error status such as 404 or 500, it will be automatically thrownHTTPError。
This is a very useful security check mechanism when writing REST API programs.

Python Web API

Use the requests module to call the API

import requests

# Call API
url = "https://api.example.com/data"
response = requests.get(url)

# Confirm the response is successful
if response.status_code == 200:
    print("Data obtained successfully")
else:
    print(f"Error: {response.status_code}")

Parse JSON response

# Parse JSON response
data = response.json()

#Access JSON data
print(data["key1"])
print(data["key2"]["subkey"])

Example: Calling and Parsing

import requests

# API URL
url = "https://jsonplaceholder.typicode.com/posts"

# Send request
response = requests.get(url)

# Check response status and parse
if response.status_code == 200:
    data = response.json()
    # List the titles of each article
    for post in data:
        print(f"Post ID: {post['id']}, Title: {post['title']}")
else:
    print(f"API call failed, status code: {response.status_code}")

Handling JSON exceptions

try:
    data = response.json()
    print(data)
except ValueError:
    print("The response is not in valid JSON format")

Things to note

Summarize

JSON cache

illustrate

Python providesjsonModule to convert (serialize) data into JSON format, And it can be saved as a file (dump) or reloaded (load) to achieve a cache effect.

Commonly used functions

json.dump(obj, file): Write a Python object to a file (JSON format).
json.dumps(obj): Convert the object into a JSON string.
json.load(file): Read JSON from a file and convert it back into a Python object.
json.loads(str): Convert JSON string back to Python object.

Example: Cache data to a file and then read it

import json
import os

cache_file = "data_cache.json"

# Simulation data source
data = {
    "name": "Alice",
    "age": 30,
    "hobbies": ["reading", "music", "coding"]
}

#Write data to JSON file (dump)
with open(cache_file, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

print("Data has been cached in the file.")

# Read JSON data from the file (load)
if os.path.exists(cache_file):
    with open(cache_file, "r", encoding="utf-8") as f:
        loaded_data = json.load(f)
    print("Loaded data:", loaded_data)

Output results

The data has been cached in the file.
Loaded data: {'name': 'Alice', 'age': 30, 'hobbies': ['reading', 'music', 'coding']}

illustrate

ensure_ascii=False: Ensure that Chinese characters are not converted to Unicode encoding.
indent=2: Make the JSON file indented and easier to read.
json.load()Will automatically restore JSON to corresponding types such as dict, list, int, str, etc.

Example: Cache calculation results using JSON

import json,os

CACHE_PATH = "result_cache.json"

def expensive_calculation():
    print("Performing complex operation...")
    return [x**2 for x in range(5)]

def get_cached_result():
    #Load the cache if it exists
    if os.path.exists(CACHE_PATH):
        with open(CACHE_PATH, "r") as f:
            return json.load(f)
    # Otherwise write to cache after calculation
    result = expensive_calculation()
    with open(CACHE_PATH, "w") as f:
        json.dump(result, f)
    return result

data = get_cached_result()
print("Get data:", data)

Output results (first execution)

Performing complex operations...
Get data: [0, 1, 4, 9, 16]

Output results (second execution)

Get data: [0, 1, 4, 9, 16]

Summarize

json.dump():Write to JSON file.
json.load(): Read from JSON file.
It can be used to cache calculation results and reduce repeated operations.
Rapid testing is available if neededdumps()andloads()Process strings.

Standard JSON structure recommendations

1. Recommended structure: List storage

This is the most standard and common way of storing a series of records in the same format. Treat all data points as a list (corresponding square brackets in JSON[]) element.

[
    {"time": 1759028400000, "open": "109398.3", "close": "109364.8", "high": "109489.2", "low": "109364.8", "volume": "518.7594"},
    {"time": 1759024800000, "open": "109305.6", "close": "109398.3", "high": "109496.4", "low": "109296.0", "volume": "757.0290"},
    ...
]

Python implementation

In Python, you just build a list of dictionaries and then usejson.dumps()Perform serialization.

import json

data_list = [
    {'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
    {'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'}
]

json_output = json.dumps(data_list, indent=4)
# print(json_output)

2. Alternative structure: Dictionary with timestamp as key

If each data point'stimeThe value is unique and you can use it as the key of the top-level object to create a dictionary structure. This structure facilitates direct querying of specific data through timestamps.

{
    "1759028400000": {"open": "109398.3", "close": "109364.8", "high": "109489.2", "low": "109364.8", "volume": "518.7594"},
    "1759024800000": {"open": "109305.6", "close": "109398.3", "high": "109496.4", "low": "109296.0", "volume": "757.0290"},
    ...
}

Python implementation

You need to iterate through the data list and convert each data point'stimeThe value is taken out as a dictionary key.

import json

source_data = [
    {'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
    {'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'}
]

data_dict = {}
for item in source_data:
    # Make sure the timestamp is a string because JSON keys must be strings
    time_key = str(item['time'])
    
    # Copy or create a new dictionary and remove the 'time' field as it is now the key
    value_data = item.copy()
    del value_data['time']
    
    data_dict[time_key] = value_data

json_output = json.dumps(data_dict, indent=4)
# print(json_output)

Convert database data to JSON list

In Python, use the built-injsonModules are the best way to serialize Python data structures, such as lists and dictionaries, into JSON format. Your target structure corresponds in Python to a list containing dictionaries.

1. Database query results

Assume that the data you get from the database query is already a list, each element of which is a dictionary, and the key names are consistent with the JSON field names you want.

db_data = [
    {'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
    {'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'},
    # ... more data
]

2. Conversion steps: use`json.dumps()`

You just need to importjsonmodule and usejson.dumps()Function converts a Python list object into a JSON-formatted string.

import json

# Assume db_data is ready

# Use json.dumps() to serialize Python lists into JSON strings
# indent=4 is used to beautify the output and make it easier to read
json_output_string = json.dumps(db_data, indent=4)

print(json_output_string)

3. Output results (JSON format string)

This will produce a JSON string that matches your goal:

[
    {
        "time": 1759028400000,
        "open": "109398.3",
        "close": "109364.8",
        "high": "109489.2",
        "low": "109364.8",
        "volume": "518.7594"
    },
    {
        "time": 1759024800000,
        "open": "109305.6",
        "close": "109398.3",
        "high": "109496.4",
        "low": "109296.0",
        "volume": "757.0290"
    }
]

4. Common processing from original database query results to lists

If your database driver returns a list of tuples (for example:[(1759028400000, '109398.3', ...), ...]), you need to convert it into a dictionary list before performing the above serialization.

# Assume the field name is as follows
field_names = ['time', 'open', 'close', 'high', 'low', 'volume']
tuple_data = [
    (1759028400000, '109398.3', '109364.8', '109489.2', '109364.8', '518.7594'),
    #...
]

# List Comprehension
list_of_dicts = [
    dict(zip(field_names, row))
    for row in tuple_data
]

# Now list_of_dicts can be passed directly to json.dumps() for serialization.

Python Web API JSON output

In Python, the way you implement a Web API and output JSON depends on the framework you use. Most directly corresponds to PHPecho json_encode()The logic is built in using PythonjsonModules, but in a web environment, headers are usually automatically processed through tools provided by the framework.

1. Use FastAPI (currently the most recommended)

FastAPI is the first choice for modern Python development, it automatically converts Python'sdict(dictionary) orlist(list) to JSON and set the correctContent-Type。

from fastapi import FastAPI

app = FastAPI()

@app.get("/api/stocks")
def get_stocks():
    # Directly return the Python dictionary, FastAPI will automatically handle json_encode
    data = ["AAPL", "TSLA", "NVDA", "GOOGL"]
    return {"stocks": data}

2. Use Flask (the most classic)

In Flask, officially providedjsonifyfunction, in addition to converting data, it will also help you set the Response Header toapplication/json。

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/api/stocks')
def get_stocks():
    data = ["AAPL", "TSLA", "NVDA", "GOOGL"]
    # Equivalent to PHP's echo json_encode($data) and add Header
    return jsonify(data)

3. Using Django Ninja / REST Framework

If you're using the big framework Django, you'll typically useJsonResponseobject.

from django.http import JsonResponse

def stock_api(request):
    data = {"stocks": ["AAPL", "TSLA"]}
    return JsonResponse(data)

4. Native Python (non-web environment)

If you just want to simply get the JSON string (such as writing to a file or custom output), you can usejson.dumps()。

import json

data = {"name": "Gemini", "age": 18}
# This is the closest to the low-level implementation of PHP json_encode()
json_string = json.dumps(data)
print(json_string)

Correspondence comparison table

Function	PHP (Vanilla)	Python (FastAPI / Flask)
Set JSON headers	`header('Content-Type: application/json')`	Frame automatic handling (using`return dict`or`jsonify`)
Object to JSON string	`json_encode($data)`	`json.dumps(data)`
Output and end	`echo ...; exit;`	`return ...`

Development suggestions

automation:Python frameworks are often smarter than native PHP, you just needreturnA dictionary and they will handle all format conversions for you.
efficacy:If you are building a high-performance API, it is recommended to chooseFastAPI, which is based on asynchronous (async/await) and is nearly as fast as Node.js.

Google API

core operating mechanism

Google API is mainly a service provided through Google Cloud Platform (GCP), allowing developers to access various Google functions (such as search, maps, cloud drives, etc.). Most Google APIs follow a RESTful architecture and use JSON format for data exchange.

1. Access rights control (Authentication)

Before using any Google API, you must first create a project in the Google Cloud Console and set up an authentication method. Mainly divided into two types:

API Key: Suitable for accessing public data (such as YouTube public video information, Google Maps). The security level is lower and only restricts the source.
OAuth 2.0: Suitable for accessing user private data (such as reading the user's Gmail, modifying cloud drive files). Requires user authorization.
Service Account: Suitable for server-to-server communication, no manual intervention is required for authorization.

2. Commonly used libraries and tools

In Python, the following libraries are officially recommended to simplify development:

google-api-python-client: Universal API client for most Discovery services.
google-auth: Specially handles authentication and token management.
google-cloud-storage / google-cloud-vision: Client library optimized for specific cloud services.

3. Use process steps

Enable API: Search and start the required service (such as YouTube Data API v3) in the "APIs and Services" library of the GCP Console.
Create credentials: Download the private key (service account) in JSON format or obtain the API Key.
Install SDK:usepip install google-api-python-client google-auth-httplib2 google-auth-oauthlib。
Write code: Initialize the Service object and callexecute()。

4. Important restrictions and quotas (Quotas)

Google APIs are not completely free and unlimited. Each API has its own quota billing method:

Units: For example, the YouTube API has a basic quota of 10,000 points per day, a simple read consumes 1 point, and uploading a video may consume 1,600 points.
Rate Limits: Limit the number of requests per second or per minute. If exceeded, a 429 Too Many Requests error will be returned.

5. Common API category table

category	Representative API	Main purpose
Work automation	Google Sheets / Calendar / Gmail	Automate forms, schedule, and send emails.
Data and storage	Cloud Storage / BigQuery	Store large files and perform big data analysis.
AI	Vision / Translation / Natural Language	Image recognition, text translation, and sentiment analysis.
multimedia	YouTube Data API	Search videos and manage channel content.

Google API Service

core concepts

Use Google APIs (such as YouTube, Drive, Gmail, etc.) in Python, usually throughgoogle-api-python-clientin the librarybuildfunction to create a Service object. This object encapsulates all API methods and is your main communication channel with Google servers.

1. Basic initialization method

To create a Service object, you need the API name, version, and authentication key (API Key) or credentials (OAuth2).

from googleapiclient.discovery import build

# Initialize YouTube Data API v3
youtube = build(
    'youtube',
    'v3',
    developerKey='YOUR_API_KEY'
)

# Call API example
request = youtube.videos().list(part="snippet", id="dQw4w9WgXcQ")
response = request.execute()

2. Handle thread safety (Thread Safety)

This is the core reason why you experienced the crash before.build()The generated Service object and its internalhttptransport layer objectNot thread safeof. In a multi-threaded environment, be sure to follow the following guidelines:

Do not share globally: Prevent multiple execution threads from calling the sameyoutube.execute()。
Lazy initialization: in each threadrun()Executed in methodbuild()。
Use Scoped Service: Ensure that each thread has its own independentgoogleapiclient.discovery.ResourceExample.

3. Optimize connection performance

call frequentlybuild()The performance will be reduced due to repeated downloading of the "Discovery Document" (approximately 1-2 seconds of additional delay). You can optimize by:

# Disable exploration file downloading (if you have installed precompiled libraries or want to speed things up)
# Note: This requires that you already have relevant definitions in your environment
youtube = build('youtube', 'v3', developerKey='KEY', static_discovery=True)

4. Error handling and quota management

The most common error encountered when calling Service methods isHttpError. You have to catch it to handle quota exhaustion or permission issues.

from googleapiclient.errors import HttpError

try:
    response = youtube.channels().list(part="statistics", mine=True).execute()
except HttpError as e:
    if e.resp.status == 403:
        print("API quota exhausted or insufficient permissions")
    elif e.resp.status == 401:
        print("Authentication invalid")
    else:
        print(f"HTTP error: {e}")

5. Common Service List

API functions	name parameter	version parameters
YouTube Data	'youtube'	'v3'
Google Drive	'drive'	'v3'
Google Sheets	'sheets'	'v4'
Gmail API	'gmail'	'v1'

YouTube API under multi-threading

try...except cannot be caught?

If you share the same Google API Service object among multiple threads (e.g.YoutubeApi.youtube), you may encounter thread-safety issues. bottomhttplib2orurllib3The transport layer will generate a race condition (Race Condition) when multiple threads access the same connection object concurrently, causing the program to crash directly at the underlying network layer. Sometimes it will even trigger an error at the C language level, causing the Python interpreter to close directly. This is whytry...exceptThe reason why it cannot be intercepted.

Solution 1: Create a separate Service object for each thread (most recommended)

Do not create a global domain in the main threadyoutubeObjects are available to everyone. Each thread should create its own API Service instance when it starts. In this way, each thread has its own network connection and cache, without interfering with each other.

import threading
from googleapiclient.discovery import build

def get_youtube_service():
    # Call this function internally in each thread to create independent objects
    return build('youtube', 'v3', developerKey='YOUR_API_KEY')

def thread_task(video_id):
    try:
        # Create a dedicated service within the execution thread
        local_youtube = get_youtube_service()
        request = local_youtube.videos().list(
            part="snippet,statistics",
            id=video_id
        )
        response = request.execute()
        print(f"Successfully obtained: {video_id}")
    except Exception as e:
        print(f"Thread error: {e}")

# Start multiple threads
t1 = threading.Thread(target=thread_task, args=("vid1",))
t2 = threading.Thread(target=thread_task, args=("vid2",))
t1.start()
t2.start()

Solution 2: Use thread lock (Lock)

If you don't want to create objects repeatedly, you can usethreading.LockEnsure that only one thread can execute at the same timeexecute(). Although this will cause the API call to become queued (losing the concurrency acceleration effect), it will ensure that it will not crash.

import threading

# Create a global lock
api_lock = threading.Lock()

def thread_task_with_lock(video_id):
    # Use with to ensure the lock is automatically released
    with api_lock:
        try:
            request = YoutubeApi.youtube.videos().list(
                part="snippet",
                id=video_id
            )
            vlist = request.execute()
        except Exception as e:
            print(f"Error: {e}")

Solution 3: Disable HttpLib2’s thread checking (advanced)

If you are using an older version of the client library, you can specify it explicitly when building the servicehttpobject and make it non-thread safe or usehttplib2.Http()independent instance. But this is usually more complicated, so it is recommended to give priority to option 1.

Summary suggestions

Option 1It is the best practice: it can maintain the concurrency speed of multiple threads and completely avoid crashes caused by resource sharing.
Check API quotas: Multiple threaded concurrent calls can consume 10,000 daily Quota very quickly, be sure to monitor for 403 errors.
timeout setting: It is recommended to add when creating a Servicesocket.setdefaulttimeout(10), to prevent a certain execution thread from permanently occupying resources due to network stuckness.

Python Send Email

Install necessary packages

pip install smtplib email

These two modules are generally built into Python and do not require additional installation.

Basic sample code

import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

# Set sender, recipient, subject and content
sender = '[email protected]'
receiver = '[email protected]'
subject = 'test email'
body = 'This is a test email sent by Python. '

# Create email content
msg = MIMEMultipart()
msg['From'] = sender
msg['To'] = receiver
msg['Subject'] = subject
msg.attach(MIMEText(body, 'plain'))

#Send using Gmail SMTP
smtp_server = 'smtp.gmail.com'
smtp_port=587
username = '[email protected]'
password = 'your_app_password' # It is recommended to use the App password instead of the general password

with smtplib.SMTP(smtp_server, smtp_port) as server:
    server.starttls()
    server.login(username, password)
    server.send_message(msg)

print('Email has been sent')

Things to note

Gmail requires "Application-specific password" to be turned on to log in normally.
Outlook, Yahoo, etc. also provide SMTP servers and ports. The settings need to be set according to each company's instructions.
Be sure to keep your account password properly to avoid leaking it.

Get a Gmail app-specific password

Prerequisites

Two-step verification is enabled for your Google Account.

Step instructions

Open your browser and go toGoogle Account
Click "Security" in the left menu
Find the "Sign in with Google" section and confirm that "2-step verification" is enabled
After enabling, the "Application Password" option will appear, click it
Enter your Google login password to verify again
Select "Mail" in "Select Application" and "Other (custom name)" in "Select Device"
Enter a name, such as "Python Script" and click "Generate"
The system will generate a 16-digit application password, copy it
Replace your original Gmail password with this password in the Python program

Security advice

Don't share app passwords with others
If the password is leaked, the password can be removed immediately and regenerated.

Set up Gmail rules

Preparation

Need to enableGmail API。
arriveGoogle Cloud ConsoleCreate a project and enable the Gmail API.
Download OAuth 2.0 credentials (credentials.json).
Install necessary packages:pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib

Create Gmail rules (Filter)

Filters can be created through the Gmail API, for example, when the sender is a certain mailbox, automatically add a label.

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os
import pickle

# Required authorization scope
SCOPES = ['https://www.googleapis.com/auth/gmail.settings.basic',
          'https://www.googleapis.com/auth/gmail.modify']

def gmail_service():
    credentials=None
    if os.path.exists("token.pickle"):
        with open("token.pickle", "rb") as token:
            creds = pickle.load(token)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file("credentials.json", SCOPES)
            creds = flow.run_local_server(port=0)

        with open("token.pickle", "wb") as token:
            pickle.dump(creds, token)

    return build("gmail", "v1", credentials=creds)

# Create Gmail filter
def create_filter():
    service = gmail_service()

    filter_config = {
        "criteria": {
            "from": "[email protected]" # Condition: sender
        },
        "action": {
            "addLabelIds": ["Label_123456"], # Gmail internal label ID
            "removeLabelIds": ["INBOX"] # Optional: remove the inbox
        }
    }

    result = service.users().settings().filters().create(
        userId="me",
        body=filter_config
    ).execute()

    print("Created successfully, Filter ID:", result["id"])

if __name__ == "__main__":
    create_filter()

Things to note

Label_123456is Gmail's label ID, accessible viausers().labels().list()obtain.
The first time you run it, a browser will pop up asking you to log in with a Google account and authorize it.
Once the rule is created, you will see it in Gmail Settings → Filters & Blocked Addresses.

Enforce Gmail rules

Preconditions

Gmail API enabled and OAuth verification completed.
Rules (Filter) need to be created first, and Gmail will automatically apply the rules to new emails.
To apply the rule to existing messages, you'll need to perform a search through the Gmail API and manually edit the label.

Automatically apply rules

The rules (Filter) created by the Gmail API will be automatically triggered when new emails come in, and no additional "execution" is required.

Apply rules to existing messages

Use the Gmail API to search for matching messages and then label or move them in batches.

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os
import pickle

SCOPES = ['https://www.googleapis.com/auth/gmail.modify']

def gmail_service():
    credentials=None
    if os.path.exists("token.pickle"):
        with open("token.pickle", "rb") as token:
            creds = pickle.load(token)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file("credentials.json", SCOPES)
            creds = flow.run_local_server(port=0)

        with open("token.pickle", "wb") as token:
            pickle.dump(creds, token)

    return build("gmail", "v1", credentials=creds)

#Apply Gmail rules to existing emails
def run_rule():
    service = gmail_service()

    # Search criteria, such as from:[email protected]
    query = "from:[email protected]"
    results = service.users().messages().list(userId="me", q=query).execute()
    messages = results.get("messages", [])

    if not messages:
        print("No emails matching the criteria")
        return

    for msg in messages:
        service.users().messages().modify(
            userId="me",
            id=msg["id"],
            body={
                "addLabelIds": ["Label_123456"], # Add a new label
                "removeLabelIds": ["INBOX"] # Remove the inbox
            }
        ).execute()

    print(f"{len(messages)} messages processed")

if __name__ == "__main__":
    run_rule()

Things to note

The Gmail API's Filter can only be automatically triggered for new emails.
If you want to "Apply rules to old emails", you need to search by yourselfqConditions and batch modifications.
Gmail search syntax available, e.g.from:, subject:, has:attachment。

Python HTTP API server

The most common and recommended way to build an HTTP API server in Python is to use an efficient and feature-rich web framework. The following introduces the three mainstream frameworks and their characteristics:

1. Use FastAPI (recommended: high performance, modern)

FastAPI is a modern, fast (powered by Starlette and Pydantic) web framework for building APIs. It natively supports asynchronous operations (async/await) and automatically generates OpenAPI (Swagger UI) files.

Features

Extremely fast, with performance comparable to Go.
Data verification and serialization based on Python type hints.
Automatically generate API documentation (Swagger UI / ReDoc).

basic example

# Installation: pip install fastapi uvicorn
from fastapi import FastAPI

app = FastAPI()

# Define a root route (GET request)
@app.get("/")
def read_root():
    return {"Hello": "World"}

# Define a route with parameters
@app.get("/items/{item_id}")
def read_item(item_id: int, q: str = None):
    return {"item_id": item_id, "q": q}

# Execution server (assuming the file name is main.py)
# Execute in terminal: uvicorn main:app --reload

2. Use Flask (lightweight, highly flexible)

Flask is a micro-framework that keeps the core simple and allows developers to freely choose components such as databases and verification libraries. It is suitable for small projects or applications that require a high degree of customization.

Features

Easy to learn and low start-up costs.
Very few built-in components and highly flexible.
Mature and has huge community support.

basic example

# Installation: pip install flask
from flask import Flask, jsonify, request

app = Flask(__name__)

#Define an API route
@app.route("/data", methods=['GET'])
def get_data():
    return jsonify({"message": "This is Flask API data"})

#Define a POST route
@app.route("/submit", methods=['POST'])
def post_data():
    data = request.get_json()
    return jsonify({"received": data}), 201

#Execute server
# if __name__ == '__main__':
# app.run(debug=True)

3. Use Django Rest Framework (DRF) (large projects)

DRF is a powerful tool for quickly building RESTful APIs based on the fully functional Django web framework. It provides all the functionality needed for large APIs such as serializers, model view sets, authentication and permission control.

Features

Suitable for deep integration with Django ORM and database functions.
Provides a powerful automated management interface and web-browsable API.
Fully featured for enterprise-level applications.

Note: The DRF sample code is longer and requires the creation of a Django project and application first, so it is omitted here, but it is the first choice for large projects.

Summary and suggestions

If you are looking for extreme performance and modern asynchronous features, and your project is of medium size, choose **FastAPI**.
If you need to quickly build a simple service or learn the basics of web development, choose **Flask**.
If your API is part of a large application and requires complex ORM, permission management, etc., use **Django + DRF**.

Django

Main features

ORM (Object Relational Mapping): allows developers to use Python code to operate the database without writing native SQL commands.
Admin Interface: An automatically generated management interface to facilitate the addition, deletion, modification and checking of data content.
Security: Built-in protection against Cross-site scripting (XSS), Cross-site request forgery (CSRF), and SQL injection.
High scalability: It adopts loose coupling design and is suitable for expansion from small projects to large enterprise-level applications.

MVT architectural elements

Model: defines the data structure and database schema, and is responsible for the logic of the data layer.
View: handles business logic and HTTP requests, and decides what data to send back to the user.
Template: Responsible for rendering the interface and embedding data into HTML for display.

System core components

URL Dispatcher: A regular expression-based routing system that maps URLs to specific view functions.
Auth System: A complete user authentication and authorization system, including permission groups and password hashing.
Middleware: A hook system for processing requests and responses, which can be used for logging, Gzip compression, etc.
Migrations: Database migration system that tracks Model changes and synchronizes them to the database structure.
Forms: Provides powerful form processing and validation functions to simplify the interaction between HTML forms and data processing.

Django REST

Main features

Web Browsable API: Built-in visual web interface allows developers to test API endpoints directly in the browser.
Serialization: Supports converting complex data structures (such as Querysets or Model instances) into JSON, XML, or other content types.
Authentication and permission system: Includes multiple built-in authentication schemes such as OAuth1, OAuth2, Session and Token.
Highly customizable: From view logic to serialization methods, a wealth of hook functions are provided for expansion.

core components

Serializers: Responsible for bidirectional conversion of data, handling serialization from database model to JSON, and verification and deserialization from request data to model.
ViewSets: Encapsulate the logic of multiple related views in a single category (such as List, Create, Retrieve, Update, Destroy).
Routers: Automatically handle the URL configuration of ViewSets, reducing the workload of manually writing routes.
APIView: The basic class of DRF views, which provides a more powerful request and response processing mechanism than Django's native View.
Permissions: Determines whether the user has permission to perform a specific action, such as IsAuthenticated or IsAdminUser.
Throttling: Limit the frequency of API requests to prevent the system from being maliciously attacked or overloaded.
Parsers & Renderers: Responsible for parsing the incoming request content (such as JSON or Form data) and determining the output format.

Advanced functions

Generic Views: Provides commonly used predefined views for quickly building standard CRUD interfaces.
Filtering & Searching: Integrate third-party packages or built-in mechanisms to easily implement data filtering, sorting, and full-text search.
Pagination: Built-in multiple paging methods (such as PageNumber, LimitOffset, Cursor) to improve the performance of large data processing.
Versioning: Supports multiple API version management strategies to ensure backward compatibility during backend updates.

Django Ninja

Main advantages

High performance: Based on Pydantic for data verification and serialization, and deeply optimizing execution speed.
Asynchronous support: Native support for async/await operations, which can effectively improve processing capabilities in high-concurrency scenarios.
Automated documentation: Built-in Swagger UI and ReDoc, you only need to define routes and types to automatically generate interactive API documents.
Type safety: Fully utilizes the type hints of Python 3.6+ to provide powerful editor completion and static checking.
Easy to integrate: seamlessly integrates with Django ORM and authentication system, suitable for rapid development of RESTful API.

core components

NinjaAPI: The core instance of the framework, used to register routes, set file paths, and manage global exception handling.
Schema: Based on the data structure defined by Pydantic, it is responsible for the verification of request parameters and the serialization of response data.
Router: Used to split API logic into different modules to facilitate management of the routing structure of large projects.
ModelSchema: Schema designed specifically for Django models, which can automatically generate corresponding field definitions from database models.
Authentication: Provides multiple built-in authentication mechanisms, such as APIKey, HttpBasic, HttpBearer and Session authentication.
Request / Response Parsers: Automatically handle parsing of JSON, form data, path parameters and query strings.

Features

Dependency Injection: Provides a simple dependency injection mechanism to facilitate sharing logic (such as database connection or configuration) in different operations.
File Handling: Simplifies the process of file upload and download, and integrates with Django's storage system.
Error Handling: Allows custom exception handlers to convert error messages into standard JSON format and send them back to the client.
Versioning: Supports running multiple API versions in the same Django project, separated by different path prefixes.

Implementing FastAPI routing in categories

Although the idiomatic approach to FastAPI is to use decorators@app.get()Decorate functions directly, but to better organize and structure large applications (especially when using dependency injection or route modularization), you can encapsulate route handlers in classes, often called the "APIRouter" pattern or "Class-Based Views".

This requires using **fastapi.APIRouter** and **method decorators** to achieve this.

1. File structure and category definition

we will useAPIRouterto define the route and then register the class method as a handler.

# Installation: pip install fastapi uvicorn
from fastapi import APIRouter, FastAPI, Depends
from typing import Annotated

# Step 1: Create APIRouter instance
router = APIRouter(
    prefix="/api/v1", # Set the prefix for this routing group
    tags=["items"] # Used for file classification
)

# Step 2: Define categories containing routing logic
class ItemService:
    """Process all Item-related business logic and routing processing"""
    
    def __init__(self, db_dependency: str):
        # Simulate dependency injection (Dependency Injection)
        self.db_connection = db_dependency

    # Use @router.get to decorate class methods
    @router.get("/")
    def read_root(self):
        # Implement read_root logic
        return {"message": "Hello World from ItemService Class!", "db_status": self.db_connection}

    # Implement read_item logic
    # Parameter {item_id} comes from URL path
    @router.get("/{item_id}")
    def read_item(self, item_id: int, q: str | None = None):
        return {
            "item_id": item_id,
            "query": q,
            "status": "Processing completed"
        }

#Auxiliary function: simulate dependency injection (Dependency)
def get_db():
    # In practice, this will return a database connection
    return "Database Connected Successfully"

# Step 3: Create a factory function for class instances and inject dependencies
# Annotated[str, Depends(get_db)] indicates that db_connection is provided by get_db
def get_item_service(db_connection: Annotated[str, Depends(get_db)]):
    # Return an instance of ItemService
    return ItemService(db_connection)

# Step 4: Create FastAPI application principal
app = FastAPI()

# Step 5: Core! Use Depends to mount methods of the ItemService class into the application
# Depends are used here to let FastAPI know how to obtain the instance of ItemService
app.include_router(
    router,
    dependencies=[Depends(get_item_service)]
)

2. Operation and testing

Save the above code asmain.py, and run the Uvicorn server:

uvicorn main:app --reload

Accessing `/api/v1/` will triggerItemService.read_rootmethod.
Accessing `/api/v1/123?q=test` will triggerItemService.read_itemmethod.

3. Summary

The key to this pattern is:

useAPIRouterto define routes.
Use on category method@router.get/@router.postand other decorators.
Use **Dependency Injection (Depends)** to get an instance of a class and include it inapp.include_routerofdependenciesparameters.

The advantage of this is that the business logic (inItemService(internal) is separated from the routing definition, which improves the modularization and testability of the program code.

Stopping the Flask server

1. Manual stop in development environment (most commonly used)

If you are running a Flask application in a terminal and started it using `app.run()`, the easiest way is to use the keyboard interrupt signal.

Use: Ctrl + C
Action: Press this key combination in a terminal window running the Flask server.
How it works: This sends a SIGINT (interrupt signal) to the Python program, allowing it to cleanly shut down the server and exit.

2. Automatically stop in the program (test or special scenario)

In unit testing or scenarios where you need to automatically shut down the server after performing a specific operation, you need to send the stop command through the code.

A. Use the Shutdown method of werkzeug.serving (recommended)

Flask's built-in development server actually uses the Werkzeug library. You can send a specific HTTP request to the server to cause it to shut itself down. This requires defining a special route in your application.

from flask import Flask, request

app = Flask(__name__)

# Define a secret shutdown route
@app.route('/shutdown', methods=['POST'])
def shutdown_server():
    # Ensure that only authorized users can call this endpoint
    
    func = request.environ.get('werkzeug.server.shutdown')
    if func is None:
        raise RuntimeError('Not running on Werkzeug development server')
    
    func() # Call the closing function
    return 'The server is shutting down...'

# ... other routes ...

# Send a POST request to http://127.0.0.1:5000/shutdown in another program or script to shut down the server.

B. Run in a separate thread or process (non-blocking)

By default `app.run()` is blocking. To continue executing other logic in the main code while leaving the server running in the background, you need to usethreadingormultiprocessingMods.

import threading
import time
from flask import Flask

app = Flask(__name__)
# ...your route ...

def run_server():
    # Note: debug=True should not be used in production environments
    app.run(port=5000)

# Start the server in a new thread
thread = threading.Thread(target=run_server)
thread.start()

print("Flask server has been started in the background...")

# Let the server run for a while
time.sleep(5)

# (Program shutdown) There is no built-in simple shutdown function in this thread mode.
# You must use the /shutdown route above or terminate the thread directly (not recommended, may not be clean).

# Example: Wait for the server thread to end (after Ctrl+C to stop)
# thread.join()

3. Stopping in production (using Gunicorn, uWSGI, etc.)

In a production environment, you would not use `app.run()`. You'll use a WSGI server (such as Gunicorn or uWSGI). To stop these servers:

Usage: Send SIGTERM or SIGKILL signal to the main program of Gunicorn or uWSGI.
Action: Typically via an operating system command (e.g. `kill` Or use systemd/supervisor service management) to complete.

Determine Flask’s current server environment

When a Flask application is not running in the default Werkzeug development server (such as in a production environment using Gunicorn, uWSGI, or other ASGI servers), the most reliable way to determine the current environment is to check the **WSGI/ASGI environment variables** (i.e.request.environorrequest.scope）。

1. Use`request.environ`(Applies to WSGI servers such as Gunicorn)

In a Flask route function, you can accessrequest.environDictionary that checks specific keys set by the WSGI server to determine the server type.

Python implementation example

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/server_info')
def get_server_info():
    
    # Default: assuming Werkzeug development server
    server_name = "Werkzeug (Development Server)"
    
    # Check for common WSGI server identifiers
    
    # Check Gunicorn
    if 'gunicorn.version' in request.environ:
        server_name = f"Gunicorn (version: {request.environ.get('gunicorn.version')})"
    
    # Check uWSGI
    elif 'uwsgi.version' in request.environ:
        # uWSGI usually does not have a standard version key, but has specific environment variables
        server_name = "uWSGI"
    
    # Check Waitress (common production WSGI server)
    elif 'waitress.version' in request.environ:
        server_name = f"Waitress (version: {request.environ.get('waitress.version')})"

    # Check common identifiers of other WSGI servers
    elif 'SERVER_SOFTWARE' in request.environ:
        # Many servers will set this standard WSGI key
        server_name = request.environ.get('SERVER_SOFTWARE')
        
    # If werkzeug.server.shutdown exists, it is almost certainly Werkzeug
    elif 'werkzeug.server.shutdown' in request.environ:
        server_name = "Werkzeug (Development Server, Shutdown Available)"
        
    return jsonify({
        "current_server": server_name,
        "is_dev_server": (server_name.startswith("Werkzeug"))
    })

# Start the application
# if __name__ == '__main__':
# app.run(debug=True)

2. Core judgment logic

In your server shutdown logic, you can use this check directly to decide whether to execute `shutdown_func`:

from flask import request, jsonify

@app.route('/shutdown', methods=['POST'])
def shutdown_server():
    shutdown_func = request.environ.get('werkzeug.server.shutdown')

    if shutdown_func is None:
        # Non-Werkzeug environment (usually production environment)
        return jsonify({
            "error": "This endpoint is for development environments only. Please use operating system commands to stop the server."
        }), 400
    
    #WerkzeugEnvironment
    shutdown_func()
    return jsonify({"message": "The server is shutting down..."}), 200

3. Why can’t we just look at SERVER_SOFTWARE?

Although `SERVER_SOFTWARE` is a standard WSGI environment variable, it is not necessarily set by all servers, or the setting format is not necessarily standard. Checking a unique key for a specific server configuration (such as `gunicorn.version`) is the most reliable way.

Flask gets API incoming parameters

In Flask, used to process query string parameters in URLs (such as/my_api?abc=3&def=xyinabcanddef) the main tools are fromflaskModularrequestobject. You can userequest.argsDictionary to access these parameters.

1. Use`request.args`dictionary

request.argsis a dictionary-like object that contains question marks (?) after all key-value pairs.

Python implementation example

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/my_api', methods=['GET'])
def get_query_parameters():
    
    #Access the entire query parameter dictionary
    all_params = request.args
    print(f"All parameters: {all_params}")
    
    # --- Method to obtain specific parameters ---
    
    # 1. Use .get() to obtain parameters (recommended: safe and provides default values)
    # Get the 'abc' parameter, defaults to None if it does not exist
    abc_value = request.args.get('abc')
    
    # Get the 'def' parameter, if it does not exist, it will be defaulted to 'default_value'
    def_value = request.args.get('def', 'default_value')
    
    # 2. Directly use [] to access (not recommended: KeyError will be thrown if the parameter does not exist)
    # try:
    # required_param = request.args['required']
    # except KeyError:
    # required_param = "Missing"

    # --- Handling multi-valued parameters ---
    # If the URL is /my_api?item=apple&item=banana
    item_list = request.args.getlist('item') # Return ['apple', 'banana']
    
    return jsonify({
        "status": "success",
        "abc": abc_value,
        "def": def_value,
        "item_list": item_list
    })

if __name__ == '__main__':
    # Test URL 1: http://127.0.0.1:5000/my_api?abc=3&def=xy&item=A&item=B
    # Test URL 2: http://127.0.0.1:5000/my_api
    app.run(debug=True)

2. Description of key methods

method	use	Behavior	example
`request.args.get('key')`	Safely get a single parameter	If the parameter does not exist, return`None`, no error will be thrown.	`request.args.get('user')`
`request.args.get('key', 'default')`	Get parameters and provide default values	If the parameter does not exist, returns the default value you specified.	`request.args.get('page', 1)`
`request.args['key']`	Direct access to parameters	If the parameter does not exist, it will throw`KeyError`error, resulting in a 500 server error.	`request.args['id']`
`request.args.getlist('key')`	Get multiple parameters with the same name	Returns a list containing all values.	`request.args.getlist('filter')`

3. Handle type conversion

fromrequest.argsAll values obtained are strings. If the argument should be a number or a Boolean value, you need to do the type conversion manually:

# Get parameters (string)
num_str = request.args.get('num', '0')

#Convert to integer
try:
    num_int = int(num_str)
except ValueError:
    num_int = 0 # Handle conversion failure

# Convert to Boolean (note: all non-empty strings are True in Python)
bool_str = request.args.get('is_admin', 'false').lower()
is_admin = bool_str == 'true'

Dynamically generate Flask route names

In Flask, the key to distinguishing a route is the Endpoint, not the function name you define in your Python code. You can use loops and Python's closure (Closure) features to dynamically define and register routes, and assign unique Endpoints to them.

1. Solution: Use Endpoint parameters and closures (recommended)

We will use a Factory Function to create each route processor and specify a unique `endpoint` name when registering it.

Python implementation example

from flask import Flask, jsonify
import os

app = Flask(__name__)

# Assuming this is your data source
dynamic_apis = {
    "user_info": {"method": "GET", "handler": lambda: {"data": "User info fetched"}},
    "product_list": {"method": "GET", "handler": lambda: {"data": "Products listed"}},
    "submit_form": {"method": "POST", "handler": lambda: {"data": "Form submitted"}, "methods": ['POST']},
}

filename_base = "db_agent"

# --- Core logic: routing factory function ---

def create_dynamic_route(funstr, handler_func, methods):
    """
    Create and return a route handler function (closure).
    This function captures the values of funstr and handler_func.
    """
    def dynamic_route_handler():
        # The function name is fixed, but funstr can be accessed at runtime
        print(f"Processing request: {funstr}")
        
        # Execute actual business logic
        result = handler_func()
        
        return jsonify(result)
    
    # Return the created function
    return dynamic_route_handler

# --- Dynamic registration loop ---

for funstr, api_info in dynamic_apis.items():
    
    # 1. Build URL path
    url_path = f"/{filename_base}/{funstr}"
    
    # 2. Define the Endpoint name (use funstr to ensure uniqueness)
    endpoint_name = f"{filename_base}_{funstr}_api"
    
    # 3. Get HTTP method
    methods = api_info.get("methods", ['GET']) # Default is GET
    
    # 4. Create routing processing function
    handler_func = create_dynamic_route(funstr, api_info["handler"], methods)

    # 5. Register route
    # Use the endpoint= parameter to specify a unique name instead of the function name
    app.add_url_rule(
        url_path,
        endpoint=endpoint_name,
        view_func=handler_func,
        methods=methods
    )
    
    print(f"Registered route: {url_path} (Endpoint: {endpoint_name})")


if __name__ == '__main__':
    # Test endpoint, for example: http://127.0.0.1:5000/db_agent/user_info
    app.run(debug=True)

2. Analysis of key points

Using `app.add_url_rule()`: This is the preferred way to register routes programmatically, replacing decorator syntax.
Endpoint: In `app.add_url_rule()` we must use the `endpoint` parameter to specify a unique name (e.g. `db_agent_user_info_api`). This name is the key used internally by Flask to manage and reference routes (e.g. in `url_for()`).
Closure: The `create_dynamic_route` function is the key. It returns an inner function `dynamic_route_handler`, which "closes" (captures and remembers) the current values of the `funstr` and `handler_func` variables in the loop, ensuring that each route handler can correctly execute its corresponding logic.
Function name (`get_???`): Because we use the `endpoint` parameter, Python does not need to specify a unique name for each dynamically created function. We can directly use a common `handler_func` name.

Gradio

Gradiois an open source Python library designed to make machine learning models more accessible and interactive. With simple code, you can quickly build a web interface for your machine learning model for others to test and use.

Gradio’s main features

Easy to use: Create interactive interfaces with just a few lines of code.
Instant preview: Ability to view results immediately after running locally or in the cloud, making it easy to test and adjust the model.
Supports a variety of input and output types: including text, pictures, audio, images, etc., suitable for various application scenarios.
Compatible with machine learning frameworks: Gradio is compatible with TensorFlow, PyTorch, Scikit-learn and other frameworks, and can be directly used in models trained by these frameworks.
Easy sharing: The generated web interface can be shared through links, and other users can access it without installing special software.

How to use Gradio

The usage of Gradio is very simple. The following is a basic example showing how to create a simple text input and output interface:

importgradioasgr

#Define functions to process input data
def greet(name):
    return "Hello," + name + "!"

# Create Gradio interface
iface = gr.Interface(fn=greet, inputs="text", outputs="text")

# Start interface
iface.launch()

The code above will create a simple web interface that allows users to enter a name and then display a personalized greeting.

Application scenarios of Gradio

Machine learning model display:Quickly transform machine learning models into interactive web applications for others to test and experience.
Data visualization:It can be used to visualize the results of data processing for easy presentation and sharing.
User testing tools:Developers can quickly generate test tools to allow users to try out AI applications or other functions.

in conclusion

Gradio provides developers with a simple and powerful tool that can quickly transform machine learning models into web applications, allowing a wider range of users to easily test and experience the results of AI. Whether used for personal projects or teamwork, Gradio is ideal for promoting and displaying models.

Gradio output redirects to HTTP server page

If you want to redirect the output of the Gradio interface to a page on the Apache HTTP server (such as `/results`), you can use Python'srequestsModule to convert Gradio The results are sent to the server. Here are the basic steps on how to integrate Gradio and Apache:

Step 1: Configure HTTP server

First, make sure your Apache HTTP server is running and has an endpoint configured that can handle data (e.g./results). This endpoint can be PHP, Python, or another backend language to handle the incoming data.

Step 2: Create the Gradio application

Here is sample code for writing a Gradio application and redirecting its output to an HTTP server:

importgradioasgr
import requests

# Handle Gradio input and redirect to HTTP server
def process_and_redirect(input_data):
    # Process the input data
    result = f"Processed: {input_data}"
    
    #Send an HTTP POST request to the Apache HTTP server and carry the processed data
    url = 'http://your-apache-server-address/results' # Replace with your server address
    payload = {'result': result}
    
    try:
        response = requests.post(url, data=payload)
        if response.status_code == 200:
            return f"Successfully redirected to {url}."
        else:
            return f"Redirect failed. Status code: {response.status_code}"
    except Exception as e:
        return f"An error occurred: {str(e)}"

# Create Gradio interface
iface = gr.Interface(
    fn=process_and_redirect,
    inputs="text",
    outputs="text",
    title="Gradio redirect to HTTP server"
)

iface.launch()

Step 3: Server-side processing

Your Apache server should be configured with an endpoint to handle POST requests, such as a simple PHP script to receive Gradio data:

<?php
// Handle POST requests from Gradio
if ($_SERVER["REQUEST_METHOD"] == "POST") {
    $result = $_POST['result']; // Get the 'result' parameter from the POST request
    echo "Data received from Gradio: " . htmlspecialchars($result);
}
?>

Things to note

security: If you transmit sensitive data, it is recommended to use HTTPS to secure the transmission and consider adding authentication.
Error handling: Ensure that appropriate error handling mechanisms are added to both the Gradio application and the server to avoid server failure or request failure.

Summarize

This allows you to use the Gradio application to redirect output to an Apache HTTP server and process the data server-side. Such integration enables Gradio's interactive features to be more widely used in Web environments.

Display Gradio content on a web server

Method 1: Use iframe to embed Gradio page

The easiest way to embed the Gradio interface in an Apache page is to useiframeLabel. set upsrcThe attribute is the URL of the Gradio server.

Sample code

<!DOCTYPE html>
<html lang="zh">
<head>
    <meta charset="UTF-8">
    <title>Apache interface embedded Gradio</title>
</head>
<body>
    <h1>Embedded Gradio interface</h1>
    <iframe src="http://your-gradio-server-address:7860" width="100%" height="800px" frameborder="0"></iframe>
</body>
</html>

Method 2: Use Reverse Proxy to integrate the Gradio interface into the Apache URL

If you want to directly access the Gradio interface through Apache's URL, you can configure a reverse proxy. This eliminates the need to display the original Gradio URL.

step

Enable Apachemod_proxyandmod_proxy_httpMods. Execute the following instructions:

sudo a2enmod proxy
sudo a2enmod proxy_http

Or un-comment:

#LoadModule proxy_module modules/mod_proxy.so
#LoadModule proxy_http_module modules/mod_proxy_http.so

In Apache's configuration file, add the following reverse proxy settings to reverse proxy the Gradio interface to a specific path in Apache (for example:/gradio)。

<Location "/gradio">
    ProxyPass "http://localhost:7860/"
    ProxyPassReverse "http://localhost:7860/"
</Location>

Save the configuration and restart the Apache service:

sudo systemctl restart apache2

Once completed, you can usehttp://your-apache-server-address/gradioDisplay interface content from Gradio in Apache pages.

Gradio Static Assets via Apache Proxy

Step 1: Enable Proxy and the Proxy HTTP module

In Apache, make sure it is enabledproxyandproxy_httpMods. If it is not enabled yet, you can execute the following command:

sudo a2enmod proxy
sudo a2enmod proxy_http

Step 2: Update the Apache Virtual Host configuration

In the Gradio application's Apache configuration file (e.g./etc/apache2/sites-available/yourdomain.conf), add the following settings to configure ProxyPass and ProxyPassReverse:

<VirtualHost *:80>
    ServerName yourdomain.com

    # Point the Gradio root directory to the Gradio server
    ProxyPass/http://localhost:7860/
    ProxyPassReverse/http://localhost:7860/

    # Ensure that static resources can be proxied normally
    ProxyPass /static/ http://localhost:7860/static/
    ProxyPassReverse /static/ http://localhost:7860/static/
</VirtualHost>

Step 3: Restart Apache

After completing the setup, restart Apache to apply the changes:

sudo systemctl restart apache2

Such a configuration can explicitly handle requests to Gradio static resources such astheme.css) should resolve the missing style issue. If the static file path of Gradio is not/static/, please adjust the path according to the actual situation.

Gradio UI component list

Input Components

Textbox: Accepts single or multiple lines of text input.
Number: Accepts numeric input.
Slider: Accepts numerical input within a range and selects by dragging.
Checkbox: Accepts Boolean value (True/False) input.
Radio: Accepts one of multiple options.
Dropdown: Accepts one of multiple options to display as a drop-down menu.
CheckboxGroup: Accepts zero or more selections from multiple options.
Image: Accepts image input and supports uploading or drawing.
Video: Accepts video input.
Audio: Accepts audio input and supports uploading or recording.
File: Accept file uploads.
ColorPicker: Accept color selection.
Sketchpad: Allows users to draw on the canvas as image input.
Dataframe: Accepts data frame (table) input, supports manual editing or uploading CSV/JSON.

Output Components

Label: Displays classification results or text labels, usually used for prediction results of machine learning models.
Textbox: Display text output.
Image: Display image output.
Video: Display video output.
Audio: Display audio output.
HTML: Display HTML content.
JSON: Displays data in JSON format.
Markdown: Display text in Markdown format.
Dataframe: Display data frame (table) output.
HighlightedText: Displays text with a specific mark or highlighted area.
Plot: Displays various graphs (e.g. Matplotlib, Plotly, etc.).
Gallery: A gallery that displays multiple pictures.

Functionality and Layout Components

Button: A clickable button used to trigger events or functions.
Tab: A tab page used to organize multiple interface sections.
Row / Column: Used to control the layout of components (horizontal or vertical arrangement).
Group: Used to group multiple components and set a common border or style.
State: Used to store and pass state between different function calls in Gradio applications.
Dataset: Used to display and select sample input data sets.
Chatbot: A component specifically used to build chatbot interfaces.
Blocks: The foundation of Gradio’s declarative API, allowing finer layout and control.

DataFrame component in Gradio

Available in Gradiogr.DataFramewidget to display or edit tabular data, such as a Pandas DataFrame or other tabular format. Here we will explain how to use Gradio The DataFrame component is used in applications to create interactive data tables.

1. Install Gradio

If Gradio is not installed yet, you can install it using the following command:

pip install gradio

2. Use`gr.DataFrame`display table data

Here's how to use Gradio to display a DataFrame. Let's say we have a Pandas DataFrame that needs to be displayed in a Gradio app:


    import gradio as gr

    import pandas as pd

#Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Occupation': ['Engineer', 'Designer', 'Doctor']}

    df = pd.DataFrame(data)

# Define function to return DataFrame

    def show_dataframe():

            return df

# Create Gradio interface
interface = gr.Interface(fn=show_dataframe, outputs=gr.DataFrame(), title="Personnel Data Table")

    interface.launch()

illustrate

import gradio as gr: Introducing the Gradio suite.
data: Dictionary used to create sample data, containing three columns: name, age, and occupation.
show_dataframe: Define a function to return a Pandas DataFrame for display.
gr.DataFrame(): Create a DataFrame component for table display in the Gradio interface.
interface.launch(): Start the Gradio application.

3. Use`gr.DataFrame`Make interactive edits

If you want to allow users to edit the table, you cangr.DataFrameMedium settingseditable=True, allowing users to modify table data:

interface = gr.Interface(fn=show_dataframe, outputs=gr.DataFrame(editable=True), title="Editable Personnel Data Table")

result

The launched application will display an editable table, and users can modify the data directly on the web page.

Forcefully terminate the Gradio server

Step 1: Find the PID of the Gradio program

Can be usedpsutilmodule to find the PID of the Gradio program. First, make sure you have installedpsutil：

pip install psutil

Then you can use the following code to find the PID of the Gradio-related program.

import psutil

# Search for programs containing 'gradio'
for process in psutil.process_iter(['pid', 'name', 'cmdline']):
    if 'gradio' in ' '.join(process.info['cmdline']):
        print("Found Gradio program PID:", process.info['pid'])

Step 2: Forcefully terminate the Gradio program

Once the PID is found, you can useterminate()orkill()method to terminate the program. For example:

for process in psutil.process_iter(['pid', 'name', 'cmdline']):
    if 'gradio' in ' '.join(process.info['cmdline']):
        process.kill() # Forcefully terminate the program
        print(f"Terminated Gradio program PID: {process.info['pid']}")

Things to note

usekill()The method will terminate the program immediately, so make sure that the program has no important operations in progress. This sample code terminates all matching Gradio programs.

Forcefully terminate the Gradio server using the listening port

Step 1: Find the program listening on a specific port

Can be usedpsutilModule to find programs listening on a specific port. First, make sure you have installedpsutil：

pip install psutil

Then, you can use the following code to find the monitor7860The program PID of the port.

import psutil

#Specify the port number to be searched
target_port = 7860
pid_to_kill = None

# Search for programs listening on the specified port
for conn in psutil.net_connections(kind='inet'):
    if conn.laddr.port == target_port and conn.status == psutil.CONN_LISTEN:
        pid_to_kill = conn.pid
        break

if pid_to_kill:
    print("Found the program PID of listening port 7860:", pid_to_kill)
else:
    print("No program found for listening port 7860")

Step 2: Force terminate found programs

Once the PID is found, you can usepsutil.Processofkill()Method to forcefully terminate the program:

if pid_to_kill:
    process = psutil.Process(pid_to_kill)
    process.kill() # Forcefully terminate the program
    print(f"Program PID of listening port 7860 has been terminated: {pid_to_kill}")
else:
    print("Cannot terminate program because the PID was not found")

Things to note

This code will forcefully terminate any program listening on the specified port. Please confirm that the port is indeed used by Gradio to avoid accidentally terminating other services.

Add HTTP API server to Gradio

Gradio is primarily designed for building fast, easy-to-use web UIs for displaying machine learning models. It itself already runs on a web server, but its main purpose is to provide a human-computer interaction interface rather than a traditional RESTful API endpoint.

If you want to have RESTful API functionality in your Gradio application, the most recommended method is to embed the Gradio application into a more powerful web framework, such as **FastAPI** or **Flask**. This allows you to use the main framework to define API endpoints while mounting the Gradio UI on a subpath.

1. Recommended solution: Embed Gradio application into FastAPI (preferred)

FastAPI is fast and relies on the Starlette framework together with Gradio, making the embedding process very smooth.

Step A: Install necessary packages

# Install FastAPI, Uvicorn (server), and Gradio
pip install fastapi uvicorn gradio

Step B: Implement the code

You will use Gradio's.to_app()method converts the Gradio interface into an ASGI application and then uses FastAPI'smountfunction to mount it on the main application.

from fastapi import FastAPI
from fastapi.responses import JSONResponse
importgradioasgr

# 1. Define Gradio application logic
def greet(name):
    return f"Hello, {name}!"

gr_interface = gr.Interface(fn=greet, inputs="text", outputs="text")

# 2. Convert Gradio interface to ASGI application
gradio_app = gr_interface.to_app()

# 3. Create FastAPI main application
app = FastAPI(
    title="Gradio + FastAPI Server",
    description="Provides both API endpoints and Gradio UI"
)

# 4. (API SVR) Define your RESTful API endpoints
@app.get("/api/v1/status")
def api_status():
    return JSONResponse(content={"status": "API is operational"})

@app.get("/api/v1/model_inference")
def api_inference(data: str):
    # Here you can call your model logic
    result = f"Processed data: {data.upper()}"
    return JSONResponse(content={"result": result})

# 5. (Gradio SVR) Mount the Gradio application to the /gradio path
# The mount() function allows you to mount another ASGI application to a specified path
app.mount("/gradio", gradient_app)

# Execution server (assuming the file name is main.py)
# Execute in the terminal: uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Access method

HTTP API server: accessed via `http://127.0.0.1:8000/api/v1/status`.
Gradio UI interface: accessed through `http://127.0.0.1:8000/gradio`.

2. Alternative: Provide API inside Gradio

Gradio itself also provides an experimental feature that allows you to directly call the underlying functions of the Gradio interface through URL parameters without the need for an additional web framework. This is not a standard RESTful API, but allows for simple programmatic access.

When a Gradio application is running, its internal functions are accessible through specific API paths.
You can use a POST request to `/api/predict` or `/run/<函式名稱>` The endpoint sends input data in JSON format and receives results in JSON format.
The specific API path and format can be found on the "API File" page of Gradio UI (usually in the `/api` path).

However, embedding Gradio using FastAPI/Flask is a more standard and flexible approach that provides more standardized API design, documentation, and certification.

Python crawler

Requests + BeautifulSoup

This is the most common crawler combination, suitable for beginners, and is used to parse static web pages.

pip install requests beautifulsoup4

Usage example:

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
print(soup.title.string)

Scrapy

A powerful crawler framework, suitable for large crawler projects, supporting multi-threaded and distributed crawlers.

pip install scrapy

Usage example:

scrapy startproject myproject

Create a crawler module in the project and run the crawling command.

Selenium

Suitable for dynamic web crawling that needs to simulate user operations, such as processing JavaScript-rendered content.

pip install selenium

Usage example:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://example.com")
print(driver.title)
driver.quit()

Playwright

Another tool for processing dynamic web pages, it has higher performance than Selenium and supports multiple browsers.

pip install playwright
playwright install

Usage example:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    print(page.title())
    browser.close()

Pyppeteer

Based on the Python version of Puppeteer, it is specially designed for crawling dynamic web pages.

pip install pyppeteer

Usage example:

from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto("https://example.com")
    print(await page.title())
    await browser.close()

import asyncio
asyncio.get_event_loop().run_until_complete(main())

HTTPX

Efficient tool for sending HTTP requests, supporting asynchronous operations.

pip install httpx

Usage example:

import httpx

async def fetch():
    async with httpx.AsyncClient() as client:
        response = await client.get("https://example.com")
        print(response.text)

import asyncio
asyncio.run(fetch())

Python BeautifulSoup crawler

To create a simple web scraper in Python, you can use the requests library to get the page content, and BeautifulSoup to parse the HTML.

Here's an example of a basic web scraper:


        import requests

        from bs4 import BeautifulSoup



        # URL to scrape

        url = "https://example.com"



        # Send a GET request

        response = requests.get(url)

        response.raise_for_status()  # Check for errors



        # Parse the HTML content

        soup = BeautifulSoup(response.content, "html.parser")



        # Extract specific data (e.g., all the headings)

        headings = soup.find_all("h1")



        # Print the headings

        for heading in headings:

            print(heading.text)

Note: You may need to install the libraries with the following commands:


        pip install requests

        pip install beautifulsoup4

BeautifulSoup finds text in specific tags and categories

Example scenario

` Literal value of the category `xxxclass` in the tag:



    Text to be extracted
    other text

Find and extract text using BeautifulSoup

Here is the Python code using `BeautifulSoup`:

from bs4 import BeautifulSoup

#HTML file
html_content = """

    Text to be extracted
    other text

"""

# Parse HTML
soup = BeautifulSoup(html_content, 'html.parser')

# Find specific tags and categories
span_tag = soup.find('span', class_='xxxclass')

#Extract text value
if span_tag:
    print(span_tag.text) #Output: text to be extracted
else:
    print("No matching tag found")

Method description

1. `find(tag, class_=...)`： - `tag`: Specify the HTML tag to find, such as `span`. - `class_`: Specify the class name, such as `xxxclass`. It should be noted that `class_` is underlined to avoid conflict with Python's reserved word `class`. 2. `.text`: Extract the text content within the tag.

Find multiple matches

If you need to extract multiple tags, you can use `find_all`:

# Find all matchinglabel
span_tags = soup.find_all('span', class_='xxxclass')

#Extract the text of each label
for tag in span_tags:
    print(tag.text)

Other filters

1. **Filter by multiple attributes**:

   span_tag = soup.find('span', {'class': 'xxxclass', 'id': 'specific-id'})

2. **Use regular expressions to match categories**:

   import re
   span_tag = soup.find('span', class_=re.compile(r'^xxx'))

Things to note

1. **HTML Format**: Make sure the HTML is valid, otherwise there may be problems with parsing. 2. **Empty result processing**: `None` is returned when the result cannot be found. You should check whether the result is empty to avoid errors.

Summarize

Text content in specific tags and categories can be easily extracted through the `find` or `find_all` methods of `BeautifulSoup`. Flexible use of filter conditions and attribute matching can handle more complex scenarios.

Selenium

What is Selenium?

Selenium is an open source tool primarily used to automate web browser operations. It supports multiple browsers, including Chrome, Firefox, Safari, etc., and can be used to test web applications or scrape web data.

Core functions

Browser automation: simulate user operations, such as clicking, entering text, submitting forms, etc.
Cross-browser testing: Supports automated testing of multiple browsers and operating systems.
Supports multiple programming languages: including Python, Java, C#, Ruby, etc.
Strong extensibility: can be integrated with testing frameworks (such as pytest, JUnit) and CI/CD tools (such as Jenkins).

Main components

Selenium WebDriver：Core tools for interacting with the browser.
Selenium IDE：A simple recording and playback tool for beginners.
Selenium Grid：Used for distributed testing, supporting multi-browser and multi-environment parallel testing.

Application scenarios

Functional testing: Verify that the functionality of the web application is working properly.
Regression testing: Quickly detect whether new features affect existing features.
Data scraping: Automated extraction of public data on web pages.
Continuous Integration: Automated testing within the CI/CD process.

Installation and basic use

The following are the steps to install Selenium and a simple Python usage example:

# Install Selenium
pip install selenium

#Sample code
from selenium import webdriver
from selenium.webdriver.common.by import By

# Start WebDriver
driver = webdriver.Chrome()
driver.get("https://www.example.com")

# Find elements and perform operations
element = driver.find_element(By.TAG_NAME, "h1")
print(element.text)

# Close browser
driver.quit()

Advantages and Challenges

advantage:Supports multiple browsers, multiple languages, and has a wide range of applications.
challenge:Handling dynamic web pages requires additional tricks, such as waiting for elements to load.

Selenium Chrome uses specific users

Install necessary packages

Make sure Selenium and ChromeDriver are installed:

pip install selenium

Download and install the appropriate version of ChromeChromeDriver。

Set up user folders

Chrome's user folder contains personal data such as bookmarks, history, cookies, etc. You can specify a specific folder to use to start the browser.

Sample code

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

#Specify ChromeDriver path
chromedriver_path = "/path/to/chromedriver"

#Specify user folder
user_data_dir = "/path/to/your/user/data"

# Set Chrome options
chrome_options = Options()
chrome_options.add_argument(f"--user-data-dir={user_data_dir}")
chrome_options.add_argument("--profile-directory=Default") # or other subfolder name

# Start browser
service = Service(chromedriver_path)
driver = webdriver.Chrome(service=service, options=chrome_options)

# Open a web page
driver.get("https://example.com")

# End program
driver.quit()

Things to note

Make sure to specifyuser_data_dirThe path is a valid and writable directory.
If the folder is already occupied by other Chrome processes, it may fail to start.
The path to the user's folder can be found through the browser's settings page:

Type in Chromechrome://version。
View the Profile Path.

Application scenarios

Requires the use of specific cookies, bookmarks or other personalization data.
When testing sites that require a login, use your saved login information.
Simulate the behavior of a specific user environment.

Get all user_data_dir

1. Install necessary libraries

Make sure it is installedseleniumandpsutil：

pip install selenium psutil

2. Get all Chrome user_data_dir in use

The following code will scan all running Chrome and extractuser-data-dirparameter:

import psutil
import re

def get_all_user_data_dirs():
    user_data_dirs = set()
    for proc in psutil.process_iter(attrs=['pid', 'name', 'cmdline']):
        try:
            if proc.info['name'] and 'chrome' in proc.info['name'].lower():
                cmdline = ' '.join(proc.info['cmdline'])
                match = re.search(r'--user-data-dir=([^\s]+)', cmdline)
                if match:
                    user_data_dirs.add(match.group(1))
        except (psutil.NoSuchProcess, psutil.AccessDenied):
            continue
    return list(user_data_dirs)

print(get_all_user_data_dirs())

3. Use specific user_data_dir in Selenium

find targetuser_data_dirFinally, it can be used with Selenium:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

chrome_user_data_dir = "C:\\Users\\YourUser\\AppData\\Local\\Google\\Chrome\\User Data"

options = Options()
options.add_argument(f"--user-data-dir={chrome_user_data_dir}")

service = Service("chromedriver.exe")
driver = webdriver.Chrome(service=service, options=options)
driver.get("https://www.google.com")

4. Conclusion

throughpsutilParse the running Chrome process to get alluser-data-dir, and use specificuser_data_dirStart the browser.

Get username from Chrome data directory

Problem description

To get the username from the Chrome user data directory, you need to access a file in that directory, typically at the following path:

C:\Users\[Username]\AppData\Local\Google\Chrome\User Data\（Windows）
/Users/[Username]/Library/Application Support/Google/Chrome/（macOS）
/home/[Username]/.config/google-chrome/（Linux）

These folders contain many files that you can readLocal Statefile to obtain basic information about the user.

Step: read`Local State`file

ChromeLocal StateThe file contains some basic user settings from which you can obtain the user's information.

import json
import os

def get_chrome_user_name(user_data_dir):
    local_state_path = os.path.join(user_data_dir, 'Local State')
    
    # Check if the file exists
    if not os.path.exists(local_state_path):
        return "Local State file not found"

    with open(local_state_path, 'r', encoding='utf-8') as file:
        local_state = json.load(file)

    # Get user information from Local State
    user_name = local_state.get('profile', {}).get('name', 'Unknown User')
    
    return user_name

# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'

print(get_chrome_user_name(user_data_dir))

Parse Local State

when you readLocal StateWhen you parse a file into JSON, you can get a lot of information from it, such as:

profile: Contains the user's configuration information, such as name, picture, etc.
last_version: Shows the last Chrome version.
is_logged_in: Shows whether the user is logged in.

In most cases, the username will be locatedprofile, and can be extracted by the above method.

Things to note

Local StateThe file is updated each time Chrome is launched, so the file contents may change over time.

This method relies on Chrome profile data, and the file structure may be different if the user uses a different Google account.

in conclusion

By parsingLocal StateYou can easily get the Chrome user's name from the JSON data in the file.

Get all users from Local State file

Problem description

ChromeLocal StateFiles store a lot of basic information about users. If you want to extract all usernames from it, you can read the file and parse its JSON content. Here are the steps on how to achieve this.

Step: read`Local State`Archive and parse

You can read theLocal Statefile, and then parses its JSON-formatted content to extract all user data.

import json
import os

def get_all_users(user_data_dir):
    local_state_path = os.path.join(user_data_dir, 'Local State')
    
    # Check if the file exists
    if not os.path.exists(local_state_path):
        return "Local State file not found"

    with open(local_state_path, 'r', encoding='utf-8') as file:
        local_state = json.load(file)

    # Get all user data from Local State
    profiles = local_state.get('profile', {}).get('info_cache', {})
    
    # Get all user names
    user_names = [profile.get('name', 'Unknown User') for profile in profiles.values()]
    
    return user_names

# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'

print(get_all_users(user_data_dir))

Parse user data in Local State

in ChromeLocal StateIn files, user data is usually stored inprofiledowninfo_cache, which is a dictionary in which each user's profile is keyed by its configuration name.

demonstration:`Local State`File structure

The following isLocal StateExample of structure in archive:


{
    "profile": {
        "info_cache": {
            "profile1": {
                "name": "User1",
                "avatar": "path/to/avatar1.jpg"
            },
            "profile2": {
                "name": "User2",
                "avatar": "path/to/avatar2.jpg"
            }
        }
    }
}

Things to note

This method assumes you already have read access to the Chrome user data directory.
Some user data may exist in different file structures depending on the version or settings of Chrome.
If there is no matching profile, 'Unknown User' will be returned as the default name.

in conclusion

By parsingLocal Statefile, you can get the names of all users in Chrome. This makes it easy to list all accounts.

User data in Local State file

Problem description

in ChromeLocal Statefile, each user's data is contained ininfo_cachemiddle. From there you can extract the username and the corresponding configuration subdirectory.

Step: read`Local State`Archive and parse

Here's how to do it from ChromeLocal StateMethod to obtain the names and corresponding subdirectories of all users in the file.

import json
import os

def get_users_and_profiles(user_data_dir):
    local_state_path = os.path.join(user_data_dir, 'Local State')
    
    # Check if the file exists
    if not os.path.exists(local_state_path):
        return "Local State file not found"

    with open(local_state_path, 'r', encoding='utf-8') as file:
        local_state = json.load(file)

    # Get all user data from Local State
    profiles = local_state.get('profile', {}).get('info_cache', {})
    
    # Get all user names and their corresponding subdirectories
    user_info = {}
    for profile_key, profile_data in profiles.items():
        user_name = profile_data.get('name', 'Unknown User')
        profile_sub_dir = os.path.join(user_data_dir, 'Profile ' + profile_key)
        user_info[user_name] = profile_sub_dir
    
    return user_info

# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'

users_and_profiles = get_users_and_profiles(user_data_dir)
for user_name, profile_dir in users_and_profiles.items():
    print(f"User name: {user_name}, subdirectory: {profile_dir}")

Parse the data structure in Local State

existLocal Statefile, all user data is located inprofiledowninfo_cache. Each user has a corresponding configuration key (e.g.profile1, profile2wait). Each configured profile contains the user's name (name) and other relevant information.

demonstration:`Local State`File structure

The following isLocal StateExample of structure in archive:


{
    "profile": {
        "info_cache": {
            "Profile 1": {
                "name": "User1",
                "avatar": "path/to/avatar1.jpg"
            },
            "Profile 2": {
                "name": "User2",
                "avatar": "path/to/avatar2.jpg"
            }
        }
    }
}

Things to note

Each user's configuration subdirectory usually ends withProfilebeginning, followed by a number (e.g.Profile 1, Profile 2wait).
This method returns the name of each user and its corresponding configuration subdirectory path.
If data for some users is missing, 'Unknown User' will be returned as the default name.

in conclusion

By parsingLocal Statefile, you can get the names of all users and their corresponding configuration subdirectories. This allows you to easily find the location of each user's configuration data.

Gmail for Chrome users

Problem description

To get the Gmail account from the Chrome user data directory, you need to extract the relevant information from the Chrome configuration data. This is usually accomplished by parsing Chrome's user data files, specifically Google-related account data.

Steps: Read the Profile file and parse it

Each Chrome user's data will be in their own folder, usually inUser Dataunder the directoryProfilewithin the directory. If the user is signed in to a Google Account, the relevant Gmail account information can be found in the configuration data.

import os
import json

def get_gmail_from_profile(user_data_dir, profile_name):
    profile_dir = os.path.join(user_data_dir, profile_name)
    accounts_file = os.path.join(profile_dir, 'Web Data')

    # Check if the file exists
    if not os.path.exists(accounts_file):
        return "Web Data file not found"

    # Try to read the Web Data file
    try:
        with open(accounts_file, 'r', encoding='utf-8') as file:
            web_data = json.load(file)

        # Extract Gmail account from data
        for row in web_data.get('accounts', []):
            if 'gmail' in row.get('email', ''):
                return row.get('email')
        
        return "No Gmail account found"
    except Exception as e:
        return f"Error reading Web Data file: {e}"

# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'

# Assume you are using Profile 1
profile_name = 'Profile 1'
print(get_gmail_from_profile(user_data_dir, profile_name))

Parse Web Data archives

Web Data files contain various login data in Chrome, including account names, passwords, and other related information. In this example, we look for theemailfield and check if it contains a Gmail mailbox.

Demonstration: Web Data File Structure

The following is an example of a partial structure of a Web Data file:


{
    "accounts": [
        {
            "email": "[email protected]",
            "password": "encrypted_password_1"
        },
        {
            "email": "[email protected]",
            "password": "encrypted_password_2"
        }
    ]
}

Things to note

Web Data files are stored encrypted, so directly parsing the data may not easily obtain the user's password, but Gmail accounts (email) can usually be extracted.
Depending on the version of Chrome, the structure and content of Web Data archives may vary.
If the user is not signed in to a Google Account, the Gmail account will not be found in Web Data.

in conclusion

By parsing the Chrome user profileWeb Datafile, you can easily retrieve a user's Gmail account. If the user is signed in to a Gmail account, the corresponding email will appear in the data.

Simulate user operations to browse the pages within the website

Step 1: Use web automation tools

Common web automation tools include Selenium and Playwright. These tools can simulate clicking, text input, and other user actions.

Step 2: Grab all links

Use a tool like BeautifulSoup or Playwright to get all the hyperlinks on your website (<a href>) to create a list of pages to browse.

Step 3: Set up browser emulation

Open a browser with Selenium or Playwright and simulate user actions such as clicking buttons, scrolling, and even triggering dynamic content.

Step 4: Handle JavaScript dynamic content

Modern websites often contain a lot of dynamically generated content, and you can use Playwright or Selenium to execute JavaScript to ensure the page loads correctly.

Step 5: Recursively visit all pages

Based on the crawled links, recursively visit all pages in the website and record the pages that have been visited to avoid duplication.

Step 6: Record user operations

When browsing each page, simulate common user behaviors, such as filling out forms, submitting information, and recording the results of the operations.

Step Seven: Application Example

from selenium import webdriver
from selenium.webdriver.common.by import By

#Initialize the browser
driver = webdriver.Chrome()

# Start browsing the website
driver.get("https://example.com")

# Grab all links
links = driver.find_elements(By.TAG_NAME, "a")
for link in links:
    href = link.get_attribute("href")
    print(f"Link found: {href}")

# Simulate click
if links:
    links[0].click()

# Close browser
driver.quit()

Things to note

When browsing and simulating websites, please abide by the terms of use of the relevant website to avoid causing excessive server load or violating the law.

Step through testing with WebDriver

Overview

When testing with a WebDriver, such as Selenium WebDriver, you can design step-by-step executions to troubleshoot issues or simulate user actions. Here are some common design approaches.

Method 1: Use explicit wait

By explicitly waiting, you can ensure that each step is completed before proceeding to the next step:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#Initialize WebDriver
driver = webdriver.Chrome()

#Open the target website
driver.get("https://example.com")

# Wait for the button to appear and click
wait = WebDriverWait(driver, 10)
button = wait.until(EC.element_to_be_clickable((By.ID, "button_id")))
button.click()

# Wait for other elements to load
text_field = wait.until(EC.visibility_of_element_located((By.NAME, "text_field_name")))
text_field.send_keys("Test data")

# Close browser
driver.quit()

Method 2: Manually insert a pause

Pause the test process through `time.sleep()` to observe the execution effect of each step:

import time
from selenium import webdriver

#Initialize WebDriver
driver = webdriver.Chrome()

#Open the target website
driver.get("https://example.com")
time.sleep(3) # Pause for 3 seconds

# Simulate button click
button = driver.find_element(By.ID, "button_id")
button.click()
time.sleep(3)

# Close browser
driver.quit()

**Note**: `time.sleep` is not recommended for formal testing, only for debugging purposes.

Method 3: Step-by-Step Execution and Breakpoint Design

By adding conditional breakpoints or input hints to the code, you can debug step by step:

from selenium import webdriver

#Initialize WebDriver
driver = webdriver.Chrome()

#Open the target website
driver.get("https://example.com")

# Continue after manual confirmation
input("Press Enter to continue to the next step...")
button = driver.find_element(By.ID, "button_id")
button.click()

# Continue to perform other tests
input("Press Enter to continue to the next step...")
driver.quit()

Method 4: Use a test framework to design controlled execution

With the help of a testing framework such as pytest or unittest, you can control test execution more flexibly:

import unittest
from selenium import webdriver

class TestExample(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.Chrome()

    def test_step_by_step(self):
        driver = self.driver
        driver.get("https://example.com")
        input("Check the page, press Enter to continue...") # Manual breakpoint

        button = driver.find_element(By.ID, "button_id")
        button.click()
        input("Check the operation results, press Enter to continue...") # Manual breakpoint

    def tearDown(self):
        self.driver.quit()

if __name__ == "__main__":
    unittest.main()

Things to note

1. **Waiting time settings**: Use explicit waiting instead of fixed pauses to improve efficiency and stability. 2. **Step-by-step testing is used for debugging**: The step-by-step execution design is suitable for the debugging process and is not suitable for long-term automated testing. 3. **Flexibility of testing frameworks**: Test cases and test steps can be better organized using frameworks.

Summarize

Through explicit waiting, manual breakpoints, pause time, or combined with a test framework, a step-by-step test process can be effectively designed to facilitate debugging and optimization operations.

Selenium searches for text starting with a specific line and extracts data

Code example

from selenium import webdriver
from selenium.webdriver.common.by import By

# Initialize Selenium
driver = webdriver.Chrome()
driver.get("Your target web page URL")

# Search for text starting with a specific line
target_text = "Target starting text"
rows = driver.find_elements(By.CSS_SELECTOR, "table tr")

# Save results
result_data = []

for row in rows:
    cells = row.find_elements(By.TAG_NAME, "td")
    if cells and cells[0].text.startswith(target_text):
        # Get the information behind the peer
        result_data.append([cell.text for cell in cells])

driver.quit()

# Convert the result to HTML
html_output = "<h2>Search results</h2>\n"
for i, row_data in enumerate(result_data, start=1):
    html_output += f"<h3>Line {i}</h3>\n<ul>\n"
    for data in row_data:
        html_output += f" <li>{data}</li>\n"
    html_output += "</ul>\n"

# show results
print(html_output)

Execution result example


Search results
Line 1

  Target starting text 1
  Other information1
  Other information 2

Line 2

  Target opening text 2
  Other information1
  Other information 2

Things to note

Text at the beginning of a specific line: usestartswithMethod to determine whether the text meets the conditions.
HTML title structure: use as needed<h2>and<h3>, and as an unordered list<ul>Organize information.
No need<head>and<body>: Generate only necessary HTML tags for easy embedding into other pages.

find_elements By.XPATH

Overview

In Selenium,find_elements(By.XPATH, xpath)Used to find all elements that match the criteria based on an XPath selector and return a list. it is related tofind_element(By.XPATH, xpath)Different, the latter only returns the first matching element.

basic grammar

from selenium import webdriver
from selenium.webdriver.common.by import By

# Start browser
driver = webdriver.Chrome()

# Open web page
driver.get("https://example.com")

# Use XPath to find all elements that match the criteria
elements = driver.find_elements(By.XPATH, "//div[@class='example-class']")

# Traverse the found elements and output the content
for element in elements:
    print(element.text)

# Close browser
driver.quit()

Common XPath expressions

XPath expression	describe
`//tagname`	Selects all elements with the specified tag, such as`//div`represent all`div`element
`//tagname[@attribute='value']`	Select elements based on attribute values, e.g.`//input[@type='text']`
`//tagname[contains(@attribute, 'value')]`	Attributes that contain certain text, such as`//div[contains(@class, 'header')]`
`//tagname[text()='text']`	Select elements that exactly match the text, such as`//button[text()='Submit']`
`//tagname[contains(text(), 'text')]`	Select elements containing specific text, such as`//p[contains(text(), 'Welcome')]`
`//*[@id='some-id']`	Select an element with a specific ID
`(//tagname)[index]`	Select the`index`matching elements, for example`(//div)[1]`Select the first div

Example: Find multiple elements

Assume the following HTML structure:

<div class="product">Product A</div>
<div class="product">Product B</div>
<div class="product">Product C</div>

We can use the following Selenium code to get allproductCategorydivelement:

elements = driver.find_elements(By.XPATH, "//div[@class='product']")
for element in elements:
    print(element.text)

Things to note

find_elements()What is returned is a list, even if only one element is found.
If no elements are found, an empty list is returned without throwing an error.
If you are sure that only one element is needed, you should usefind_element(),otherwisefind_elements()。

in conclusion

find_elements(By.XPATH, xpath)It is a powerful and flexible search method in Selenium that can be used to locate multiple elements on a web page and is suitable for crawlers and automated testing.

By.LINK_TEXT

Overview

In Selenium,By.LINK_TEXTandBy.PARTIAL_LINK_TEXTUsed to find elements based on the text content of a hyperlink:

By.LINK_TEXT: Exact match based on complete link text.
By.PARTIAL_LINK_TEXT: Perform fuzzy matching based on part of the link text.

basic grammar

from selenium import webdriver
from selenium.webdriver.common.by import By

# Start browser
driver = webdriver.Chrome()

# Open web page
driver.get("https://example.com")

# Use LINK_TEXT to find hyperlinks
element = driver.find_element(By.LINK_TEXT, "Full link text")
print(element.get_attribute("href"))

# Use PARTIAL_LINK_TEXT to find hyperlinks
element_partial = driver.find_element(By.PARTIAL_LINK_TEXT, "Partial text")
print(element_partial.get_attribute("href"))

# Close browser
driver.quit()

Example HTML

<a href="https://example.com/page1">Full link text</a>
<a href="https://example.com/page2">Click here to learn more</a>

Use LINK_TEXT

Suppose you want to click<a>Full link text</a>, you can use:

driver.find_element(By.LINK_TEXT, "Full link text").click()

Use PARTIAL_LINK_TEXT

If the hyperlink is "Click here to learn more", but we only know "Click here", you can use:

driver.find_element(By.PARTIAL_LINK_TEXT, "Click here").click()

Common application scenarios

Can be used when the hyperlink text is long and changes dynamicallyBy.PARTIAL_LINK_TEXT。
When ensuring that link text is unique, useBy.LINK_TEXTfor an exact match.

Things to note

These methods only work with<a>Label link.
If the web page uses JavaScript to generate the link, Selenium may not be able to find it immediately and need to wait for it to load.
When the page has multiple similar links,find_elements()will return a list, whilefind_element()Only the first matching element is returned.

in conclusion

By.LINK_TEXTUseful for finding complete links precisely, whileBy.PARTIAL_LINK_TEXTMore flexible and can be used to match part of link text.

Selenium handles missing elements

Use explicit wait WebDriverWait

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

try:
    elems = WebDriverWait(driver, 10).until(
        EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".item"))
    )
    print("Element found:", len(elems))
except:
    print("Element not found within 10 seconds")

Using find_elements itself will not throw an exception and can directly determine the length.

elements = driver.find_elements(By.CSS_SELECTOR, ".item")

if not elements:
    print("Element not found")
else:
    print("Number of elements found:", len(elems))

The processing element has not yet appeared (common reason)

DOM has not been loaded yet → use WebDriverWait
Element is in iframe → iframe must be switched first
Element is located in Shadow DOM → find_elements cannot be found directly
Wrong positioning → CSS selector / XPath needs to be rechecked
There is dynamic JS when the page is loading → need to wait for dynamic rendering to complete

Find the element after switching iframe

# Switch to iframe and search again
iframe = driver.find_element(By.TAG_NAME, "iframe")
driver.switch_to.frame(iframe)

elements = driver.find_elements(By.CSS_SELECTOR, ".item")
print("Found:", len(elems))

Shadow DOM requires JS to enter shadowRoot

shadow_host = driver.find_element(By.CSS_SELECTOR, "#shadowHost")
shadow_root = driver.execute_script("return arguments[0].shadowRoot", shadow_host)
elem = shadow_root.find_element(By.CSS_SELECTOR, ".target")

The most complete and secure search function (reusable)

def safe_find_elements(driver, by, selector, timeout=10):
    try:
        return WebDriverWait(driver, timeout).until(
            EC.presence_of_all_elements_located((by, selector))
        )
    except:
        return [] # Cannot find the returned empty list

elements = safe_find_elements(driver, By.CSS_SELECTOR, ".item")

if elements:
    print("Found:", len(elems))
else:
    print("Element not found")

Selenium handles alert boxes

type

Simple alert box- Only "OK" button.
Confirm alert box- There are "OK" and "Cancel" buttons.
Prompt warning box- There is an input box with "OK" and "Cancel" buttons.

Handling simple alert boxes

from selenium import webdriver
from selenium.webdriver.common.alert import Alert

driver = webdriver.Chrome()
driver.get("URL")

alert = Alert(driver)
print(alert.text) # Get the alert message
alert.accept() # Click OK

driver.quit()

Processing confirmation alert box

alert = Alert(driver)
print(alert.text)

alert.accept() # Click OK
# alert.dismiss() # Click to cancel

Processing prompt and warning box

alert = Alert(driver)
print(alert.text)

alert.send_keys("Test input") #Input content
alert.accept() # Click OK

best practices

usedriver.switch_to.alertSwitch to alert box.
usetry-exceptHandle exceptions to avoid test failures.

Use an explicit wait to make sure the alert box appears:


from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver, 10).until(EC.alert_is_present())

Selenium check alert box

Check if there is a warning box

In Selenium, you can useWebDriverWaitto check whether the warning box exists to avoid errors caused by the warning box not appearing.

from selenium import webdriver
from selenium.webdriver.common.alert import Alert
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("URL")

try:
    # Wait 5 seconds to check if there is an alert box
    WebDriverWait(driver, 5).until(EC.alert_is_present())
    alert = Alert(driver)
    print("Alert box content:", alert.text)
    alert.accept() # Click OK
except:
    print("No warning box found")

driver.quit()

best practices

useWebDriverWaitMake sure the warning box appears before switching to avoidNoAlertPresentException。
usetry-exceptTo prevent the test from failing because there is no warning box.

Handle system alert box

question

useWebDriverWait(driver, 5).until(EC.alert_is_present())It can only detect alert boxes generated by JavaScript, but cannot detect Chrome system-level alert boxes, such as "Save Passwords", "Website Notifications", etc.

solution

useChromeOptionsDisable system level alert boxes.
If a system warning box appears, you may need to usepyautoguiorWin32 APIto simulate keyboard or mouse operations.

Method 1: Disable Chrome System Alert

You can start Chrome byChromeOptionsto close notifications and other system-wide pop-ups.

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument("--disable-notifications") # Disable notifications
options.add_argument("--disable-save-password-bubble") # Disable saving password prompts

driver = webdriver.Chrome(options=options)
driver.get("URL")

Method 2: Use pyautogui to handle system alert boxes

If the warning box has appeared, you can usepyautoguito simulate key operations, such as pressingEnterorEscto close.

importpyautogui
import time

time.sleep(3) # Wait for the alert box to appear
pyautogui.press("enter") # Simulate pressing the Enter key

Method 3: Use the Win32 API (for Windows)

If you are in a Windows environment, you can usepygetwindowandpywinautoto detect and close system-level pop-ups.

import pygetwindow as gw
importpywinauto

windows = gw.getWindowsWithTitle("Google Chrome") # Get the Chrome window
for win in windows:
    if "system alert" in win.title.lower(): # Check whether the title contains "system alert"
        win.close() # Close the window

best practices

When launching your browser, useChromeOptionsto prevent warning boxes.
If the system warning box has appeared, consider usingpyautoguiorWin32 APIto simulate keyboard or window operations.

Shadow DOM

What is Shadow DOM?

Shadow DOM is part of Web Components and is used to encapsulate HTML, CSS and JavaScript so that the internal implementation details of the component do not interfere with the outside.

Simply put, it is a DOM structure that "isolates scope".

---

Shadow DOM structure diagram

<custom-element>
  #shadow-root (open)
    <button>Click me</button>
</custom-element>

buttonIt actually exists in the shadow DOM and cannot be used with traditionaldocument.querySelector("button")turn up.

---

Open and closed mode

open: Accessible with JavaScriptelement.shadowRoot
closed：element.shadowRootReturns null, not accessible

---

Create Shadow DOM (open mode)

const shadowHost = document.querySelector('#host');
const shadowRoot = shadowHost.attachShadow({ mode: 'open' });
shadowRoot.innerHTML = '<p>Hello from Shadow DOM</p>';

---

Use slot slot

// inside shadow DOM
<div><slot name="title">Default title</slot></div>

// external use
<custom-element>
  <span slot="title">Custom title</span>
</custom-element>

---

Manipulating Shadow DOM in Selenium

js = """
const host = document.querySelector('custom-element');
const shadow = host.shadowRoot;
return shadow.querySelector('button');
"""
button = driver.execute_script(js)
button.click()

---

advantage

Encapsulation: CSS/JS is not leaked or affected by outsiders
Componentization: Reusable UI elements
Security: Restrict developers to only operate internal operations through designated APIs

---

limit

Elements in shadow cannot be directly selected using XPath or general CSS Selector.
Requires JavaScript to be retrievedshadowRootSearch again later

Selenium with Tor Browser

Basic concepts

Tor Browser is essentially Firefox ESR plus Tor Network settings and enhanced privacy settings. Selenium cannot "control the Tor Browser GUI" directly, but it canFirefox + Tor SOCKS Proxyway to achieve the same anonymity effect as Tor Browser.

necessary conditions

Tor Browser (or at least Tor Expert Bundle) installed
Python package: selenium
Firefox and the corresponding version geckodriver

How to start Tor

First make sure that Tor is running on this machine. The default SOCKS proxy is:

127.0.0.1:9050

If using Tor Browser, after startup it is usually:

127.0.0.1:9150

Set up Tor Proxy using Selenium

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.service import Service

options = Options()
options.set_preference("network.proxy.type", 1)
options.set_preference("network.proxy.socks", "127.0.0.1")
options.set_preference("network.proxy.socks_port", 9150)
options.set_preference("network.proxy.socks_remote_dns", True)

service = Service("geckodriver.exe")
driver = webdriver.Firefox(service=service, options=options)

driver.get("https://check.torproject.org")

Verify whether Tor is used successfully

Enter the following URL, and the page shows that using Tor means success:

https://check.torproject.org

Firefox Binary for specific Tor Browser

If you want to use Firefox built into Tor Browser:

options.binary_location = r"C:\Tor Browser\Browser\firefox.exe"

Complete example (Windows)

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.service import Service

options = Options()
options.binary_location = r"C:\Tor Browser\Browser\firefox.exe"

options.set_preference("network.proxy.type", 1)
options.set_preference("network.proxy.socks", "127.0.0.1")
options.set_preference("network.proxy.socks_port", 9150)
options.set_preference("network.proxy.socks_remote_dns", True)

service = Service("geckodriver.exe")
driver = webdriver.Firefox(service=service, options=options)

driver.get("https://check.torproject.org")

Things to note

Tor + Selenium is easily recognized as automated behavior by websites
Tor Browser security may be weakened by Selenium preferences
Please do not log in to your real identity account to avoid leakage of relevance.
Tor network speed is slow, Selenium timeout setting needs to be increased

Common mistakes

Unable to connect: Tor is not started or SOCKS port error
Firefox cannot start: geckodriver is incompatible with the Firefox version
IP unchanged: socks_remote_dns not set

Summarize

Selenium cannot control Tor Browser UI directly
The correct way is Firefox + Tor SOCKS Proxy
Setting socks_remote_dns is key to anonymity

Python extracts Thunderbird important emails

Step 1: Prepare the environment

To extract emails from Thunderbird, you can use Python's IMAP protocol library, e.g.imaplibor third-party libraryimapclient. First, make sure Thunderbird has the IMAP protocol enabled and allows external applications to connect.

Step 2: Install the required Python libraries

Use pip to install related libraries:

pip install imapclient pyzmail36

Step 3: Connect to the mail server

Use the following code to connect to the email server and extract important emails:

import imapclient
from pyzmail import PyzMessage

# Set up mail server and login information
IMAP_SERVER = 'imap.example.com' # Replace with your IMAP server address
EMAIL = '[email protected]'
PASSWORD = 'your_password'

# Connect to IMAP server
with imapclient.IMAPClient(IMAP_SERVER) as client:
    client.login(EMAIL, PASSWORD)
    client.select_folder('INBOX')

    # Search for messages marked as important
    messages = client.search(['FLAGGED'])
    for uid in messages:
        raw_message = client.fetch([uid], ['BODY[]'])[uid][b'BODY[]']
        message = PyzMessage.factory(raw_message)
        
        # Display email information
        print(f"Subject: {message.get_subject()}")
        print(f"From: {message.get_address('from')}")
        print(f"Date: {message.get_decoded_header('date')}")

Step 4: Things to note

Please make sure that the IMAP server address is correct and that the IMAP protocol is enabled on your account.
Depending on your server settings, you may need to generate an application-specific password or enable secure connections (SSL).
Never store passwords directly in code, it is recommended to use environment variables or password management tools.

Step 5: Test and debug

Execute the above code to see if you can successfully extract emails marked as important. If you have any connection issues, you can check your server settings or perform more detailed debugging at the IMAP protocol layer.

In this way, you can successfully extract important emails from Thunderbird using Python.

PyAutoGUI

Install PyAutoGUI

Install PyAutoGUI using the following instructions:

pip install pyautogui

Basic functions

PyAutoGUI is an automation tool that allows simulation of mouse and keyboard operations, suitable for GUI automated testing or repetitive work.

Mouse operation

Control mouse position and actions:

importpyautogui

# Get screen resolution
screen_width, screen_height = pyautogui.size()
print(f"Screen resolution: {screen_width}x{screen_height}")

# Move the mouse to the specified coordinates
pyautogui.moveTo(100, 100, duration=1)

# Move the mouse from the current position
pyautogui.move(50, 50, duration=1)

# Simulate mouse click
pyautogui.click(200, 200)

# Simulate mouse right click
pyautogui.rightClick()

# Simulate drag operation
pyautogui.dragTo(400, 400, duration=1)

Keyboard operation

Simulate keyboard key input:

importpyautogui

# Enter text
pyautogui.write("Hello, PyAutoGUI!", interval=0.1)

# Simulate pressing a specific key
pyautogui.press("enter")

# Press multiple keys at the same time
pyautogui.hotkey("ctrl", "c") # Copy text

Screenshots and image recognition

Take a screenshot or find a specific image:

importpyautogui

# Capture the entire screen
screenshot = pyautogui.screenshot()
screenshot.save("screenshot.png")

# Find the image on the screen
location = pyautogui.locateOnScreen("image.png")
if location:
    print(f"Image location: {location}")
else:
    print("Image not found")

Warning and safety

To prevent the program from executing indefinitely, you can usepyautogui.FAILSAFE：

pyautogui.FAILSAFE = True # Default is True
# Move the mouse to the upper left corner of the screen (0, 0) to stop the program immediately

Common applications

Automate repetitive actions like filling out forms or clicking.
Screenshot to detect the presence of certain UI elements.
Simulate keyboard and mouse operations for simple testing.

Controlling Surfshark VPN

Preparation

If Surfshark VPN does not provide command line tools such assurfshark-cli), you need to use GUI automation tools (such aspyautogui) simulates manual operation.

Surfshark VPN desktop app installed.
Python is installedpyautoguimodule (available viapip install pyautoguiInstall).
Make sure the Python script has sufficient permissions.

program logic

throughpyautoguiAutomate the click of a button in Surfshark's graphical interface to connect or disconnect from a VPN.

Sample code

The following example assumes that Surfshark VPN has a fixed button position and usespyautoguiTo do:

importpyautogui
import time

def connect_vpn():
    # Make sure Surfshark is turned on
    print("Try to connect to VPN...")
    # Simulate clicking the "Connect" button and adjust the coordinates according to the actual location
    pyautogui.click(x=500, y=300) # Replace with the position of the "Connect" button
    time.sleep(5) # Wait for the connection to complete
    print("VPN connected")

def disconnect_vpn():
    # Make sure Surfshark is turned on
    print("Try to disconnect from VPN...")
    # Simulate clicking the "Disconnect" button and adjust the coordinates according to the actual position
    pyautogui.click(x=500, y=350) # Replace with the position of the "Disconnect" button
    time.sleep(5) # Wait for the disconnection to complete
    print("VPN disconnected")

# test
connect_vpn()
disconnect_vpn()

Precautions in actual operation

Button coordinates: need to be adjusted according to screen resolution and Surfshark VPN interfacexandycoordinates.
Launch the app: Make sure Surfshark VPN is launched, otherwise it will worksubprocessOpen the application e.g.subprocess.run("start surfshark.exe", shell=True)。
Automation permissions: May need to allow impersonation input in Windows settings.

Alternative method

If the coordinate method is unstable, you can use image recognition (such aspyautogui.locateOnScreen()) to find the button position and increase flexibility.

Kivy - Python GUI

Kivy is an open source Python framework for rapid development of multi-touch applications. It's designed with cross-platform support in mind, allowing developers to work on Windows, macOS, Linux, iOS, and Android Wait for the same code to run on multiple platforms. Kivy is particularly suitable for building GUI applications for mobile phones, tablets, and desktop devices, and it has good multi-touch support.

Kivy’s main features

Cross-platform support: Kivy supports mainstream operating systems, including Windows, macOS, Linux, and mobile platforms such as Android and iOS.
multi-touch: Built-in support for multi-touch events, suitable for developing mobile phone and tablet applications.
Efficient interface design: Kivy provides rich UI controls and can use Kivy Language (KV) to quickly design and layout the interface.
Hardware acceleration: Kivy uses OpenGL ES 2 to render graphics, ensuring application smoothness and performance.

How to use Kivy

Kivy applications are composed of multiple widgets, which can be laid out through code or Kivy's proprietary language KV files. Below is a simple application example that displays a button that changes color when clicked.

basic example

from kivy.app import App
from kivy.uix.button import Button

class MyApp(App):
    def build(self):
        return Button(text='Hello, Kivy!',
                      background_color=(0, 1, 0, 1)) # Green button

if __name__ == '__main__':
    MyApp().run()

Install Kivy

Kivy can be installed via pip:

pip install kivy

Kivy application scenarios

Kivy is suitable for a variety of application scenarios, including but not limited to:

Multi-touch applications for mobile phones and tablets
Cross-platform desktop application
game development
Innovative interface and interaction design

How to display remotely in Python Kivy

Kivy is a Python framework for building cross-platform applications, but it usually runs on a local device. To display your Kivy application remotely, you can consider the following options:

1. Use VNC or remote desktop tools

You can use VNC (Virtual Network Computing) or other remote desktop tools (such as RDP, TeamViewer, etc.) to remotely control and display Kivy applications.

Install a VNC server or other remote desktop software on the server.
Install the corresponding client tools on the local device.
Connect to the server and display the Kivy application remotely.

2. Use X11 Forwarding (for Linux)

For those using Linux, you can use X11 forwarding to display a graphical interface remotely:

Make sure you have X11 installed on the remote server.
Use SSH commands on the local side for X11 forwarding:
```
ssh -X username@remote_host
```
Execute the Kivy application and it will appear on your local device.

3. Use Kivy + Flask as a web application display

You can use Flask or other web frameworks to expose parts of your Kivy application to remote users and display them using a web browser:

Build a Flask backend to handle Kivy-related data or operations.
Display the corresponding output in the web front-end (Kivy cannot be displayed directly in the web, so this method is mainly used for data processing or back-end interaction).

4. Docker + Kivy + VNC

If you want to run your Kivy app in a containerized environment, you can set it up using Docker and VNC:

Package the Kivy application into a Docker container.
Install the VNC server in the container.
Connect to the remote container through VNC and view the Kivy application interface.

No module named 'kivy.garden.tickmarker'

ModuleNotFoundError: No module named 'kivy.garden.tickmarker'

This error indicates that the tickmarker module from Kivy's Garden is not installed. To fix this issue, follow the steps below:

Steps to Resolve the Issue

Install Kivy Garden
If you don’t already have Garden installed, open your terminal or command prompt and run the following command:
```
pip install kivy-garden
```
Install the Tickmarker Widget
After installing Kivy Garden, install the tickmarker widget by running:
```
garden install tickmarker
```

Once you've done these steps, try running your Kivy application again.

Rio UI

Introduction

Rio UI is a user interface framework with Python at its core, designed to allow developers to write interactive applications in a simple and intuitive way. It is based on Python programming logic and presents the UI through modern front-end technology.

feature

You can write UI using pure Python syntax, which lowers the learning threshold of separating front-end and back-end.
Supports real-time updates and reactive interface.
Can be paired with web technology to provide cross-platform solutions.
Simplifies the deployment process and is suitable for rapid prototyping.

Installation method

pip install rio-ui

basic example


import rio

app = rio.App()

@app.page("/")
def index():
    return rio.Text("Hello, Rio UI with Python!")

if __name__ == "__main__":
    app.run()

Application scenarios

Rapidly develop small tools and internal systems
Build interactive data visualization applications
Prototype systems of an educational or research nature

email: [email protected]

T:0000

資訊與搜尋 | 回dev首頁 | 回python首頁
email: Yan Sa [email protected] Line: 阿央

電話: 02-27566655 ,03-5924828

阿央
泱泱科技
捷昱科技泱泱企業

Python

software development

Multimedia software development

Python common links

Advantages of Python

Concise and easy-to-learn syntax

Rich standard library and third-party resources

Cross-platform features

Widely used in many fields

Active community support

Efficient development speed

Python development environment

Anaconda

What is Anaconda?

Main functions

Suitable for objects

How to install Anaconda?

FAQ

Anaconda environment

What is the Anaconda environment

Establish environment

Starting and switching environments

View environment

Export and restore environments

Delete environment

Jupyter

What is Jupyter?

Core features

Main components

Application scope

How to use Jupyter?

Advantages and Challenges

VS Code Python development environment

Install Visual Studio Code

Install Python extension

Install Python

Setting up the Python interpreter

Execute Python program

Install necessary packages

Enable autocompletion and debugging

Commonly used shortcut keys

VS Code

VS Code sets Python execution parameters

Modify launch.json

Set program and args parameters

The purpose of args

Read parameters in the program

Execution example

VS Code Python Debug mode

Enable Debug mode

Setting launch.json

Set breakpoint

Debugging function

Check variables

Using the Debug Console

VS Code sets Python path

Step 1: Install Python and VS Code

Step 2: Install the Python extension

Step 3: Check Python installation path

Step 4: Set Python path

Step 5: Confirm settings

Additional information

VS Code Python environment switching

1. Steps to choose a Python environment

2. How to decide to use Anaconda or other environments

3. Manage and switch environments in Conda

4. Frequently Asked Questions and Solutions

Execute Python on your Chromebook

Method 1: Using Linux (Crostini)

Method 2: Use an online compiler

Method 3: Install Android App

Method 4: Use VS Code

in conclusion

pip usage guide

1. What is pip?

2. Basic operations of pip

3. Advanced functions

4. Frequently Asked Questions and Solutions

5. Best practices for pip

pip/cache/http-v2 folder

1. Boolean values and types

2. Use`len()`function

3. Use equivalent comparisons`== []`

Extract all values from dictionary list

1. Extract all values into a single list (flattening)