Python's syntax is simple and close to natural language. Even novice programmers can get started quickly, lowering the learning threshold.
Python provides a wide range of standard function libraries, covering many aspects such as networking, data processing, and graphical interfaces. In addition, the powerful third-party ecosystem such as NumPy, Pandas, and TensorFlow makes Python a versatile development tool.
Python is a cross-platform language. Whether it is Windows, macOS or Linux, the same Python program can be executed, which greatly improves the flexibility of development.
Python plays an important role in multiple fields such as data science, artificial intelligence, web development, automation scripting, game development, etc., allowing developers to handle multiple needs with one language.
Python has a large global community, and both beginners and experienced developers can easily find educational resources, discussion groups, and technical support.
Python provides intuitive syntax and powerful tools, allowing developers to implement programming faster and shorten the product development cycle.
Anaconda is an open source Python and R programming platform designed for scientific computing, including applications such as data science, machine learning, artificial intelligence, and big data analytics.
Anaconda is suitable for users in the following fields:
The following are problems commonly encountered by users:
Can be entered in the terminalconda update condaandconda update anaconda。
Anaconda is a data science platform that integrates Python, a programming language, with a variety of built-in tools and libraries.
Anaconda provides a virtual environment (Environment) function that allows users to create multiple independent Python execution environments on the same computer. Each environment can have different Python versions and packages to avoid dependency conflicts between different projects.
# Create an environment named myenv and specify the Python version
conda create -n myenv python=3.10
# Start environment
conda activate myenv
#Exit the environment
conda deactivate
# List all environments
conda env list
# or
conda info --envs
# Export environment configuration to YAML file
conda env export > environment.yml
# Create environment from YAML file
conda env create -f environment.yml
# Delete the specified environment
conda remove -n myenv --all
Jupyter is an open source interactive computing environment that supports multiple programming languages and is mainly used for data science, machine learning, and academic research.
Jupyter is widely used in the following fields:
jupyter notebookStart Jupyter Notebook.Go toVisual Studio Code official website, download and install the version appropriate for your operating system.
In Visual Studio Code, install the Python extension by following these steps:
Make sure Python is installed on your system. Can be obtained fromPython official websiteDownload and install.
After the installation is complete, enter the following command in the command line to confirm that the installation is successful:
python --version
# or
python3 --version
Open your Python project or file, click the "Python" status bar in the lower right corner of Visual Studio Code, and select the appropriate Python interpreter.
Open the Python file in the editor and execute the program using the following method:
Ctrl + Shift + P, search for "Run Python File" and execute it.If you need to install a third-party package, you can use the built-in terminal to enter:
pip install package name
Enjoy auto-completion and powerful debugging tools through the functionality provided by Python extensions:
Here are some commonly used shortcut keys:
Ctrl + F5Ctrl + Shift + PShift + Alt + FCtrl + `If you need to pass parameters when executing a Python program, you can setlaunch.jsonFinish:
launch.jsonModify relevant settings in the file.The following is an example configuration, including the program path and execution parameters:
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Run with Arguments",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/main.py", // program path
"console": "integratedTerminal", // terminal type
"args": ["arg1", "arg2", "--option", "value"] // Pass parameters
}
]
}
existargsCommand line parameters can be passed in, for example:
arg1andarg2is a positional parameter.--optionandvalueis a parameter with options.usesys.argvTo read the parameters passed from the command line:
importsys
print("All parameters:", sys.argv)
if len(sys.argv) > 1:
print("First parameter:", sys.argv[1])
print("Second parameter:", sys.argv[2])
Suppose the program is:
python main.py arg1 arg2 --option value
Execution result:
All parameters: ['main.py', 'arg1', 'arg2', '--option', 'value']
First parameter: arg1
Second parameter: arg2
1. InstallationPython ExtensionExtension.
2. Start your Python project in VS Code.
3. PressF5Or click on the activity bar on the leftDebugicon.
1. Click "Add Configuration" in the Debug panel.
2. SelectPython, the system will automatically generate alaunch.json。
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal"
}
]
}
1. Click next to the code line number to add a break point.
2. Conditional break points can be used: right-click the break point and select "Edit Condition".
F10Skip function.F11Enter inside the function.Shift+F11Jump out of the function.1. View the current variable status in the "Variables" area of the Debug panel.
2. You can manually add specific expressions in the "Monitor" area.
1. Enter Python commands in the Debug Console to check the program status in real time.
2. Can perform operations such as variable query and function calling.
Make sure Python is installed and added to your system environment variables, then download and install Visual Studio Code.
Open Visual Studio Code, click the Extensions icon on the left, and searchPython, and then install the Python extensions provided by Microsoft.
Enter the following command in the terminal to confirm the installation path of Python:
which python
Or (Windows system):
where python
In Visual Studio Code, pressCtrl + Shift + P, enter and selectPython: Select Interpreter。
Select the correct Python path in the manifest. If it doesn't appear, enter the full path manually.
Open the terminal and executepython --versionto confirm that the correct version of the Python interpreter is selected.
If you need the Python path of a specific project, you can add it in the project root directory.vscode/settings.jsonfile and add the following content:
{
"python.pythonPath": "Your full path to Python"
}
replaceYour Python full pathis the actual path.
When developing in VS Code, I decide to use Anaconda or other environments, mainly through the underlying interpreter (Interpreter) settings. This ensures that your package dependencies are completely isolated from the program execution environment.
| environment type | Applicable situations | Main advantages |
|---|---|---|
| Anaconda | Data Science, Machine Learning, Deep Learning | It is pre-installed with a large number of scientific computing libraries and has strong management of underlying binary files (such as DLL). |
| Venv | General web development, automation scripts | Lightweight, fast to start, and contains only the necessary packages for execution. |
In addition to the graphical interface, you can also use commands in the built-in terminal of VS Code to manage the environment:
conda env listconda activate environment nameconda create -n name python=3.10Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser。
sudo apt update
sudo apt install python3 python3-pip -y
python3Enter the Python interactive environment.pip3 install module name。
sudo apt install wget gpg -y
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > packages.microsoft.gpg
sudo install -o root -g root -m 644 packages.microsoft.gpg /usr/share/keyrings/
sudo sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/packages.microsoft.gpg] \
https://packages.microsoft.com/repos/code stable main" > /etc/apt/sources.list.d/vscode.list'
sudo apt update
sudo apt install code -y
pip --version
pip install package name
For example:pip install requests
pip install --upgrade package name
For example:pip install --upgrade requests
pip list
pip uninstall package name
For example:pip uninstall requests
pip list --outdated
pip install package name==version number
For example:pip install requests==2.26.0
pip install -r requirements.txt
userequirements.txtFile, listing all required package names and versions.
pip freeze > requirements.txt
This command will export the packages and versions in the current environment as a requirements archive.
pip cache purge
Clean up pip's cache folder to free up disk space.
Solution: Check whether Python has been installed, and confirm that the Python installation directory has been added to the system's PATH environment variable.
Solution: try adding--userParameters, for example:pip install package name --user
Solution: Confirm whether the network is normal, or try to use a domestic mirror source, for example:pip install package name -i https://pypi.tuna.tsinghua.edu.cn/simple
venvorvirtualenv) to avoid conflicts with global installation packages.requirements.txtto manage dependencies.http-v2directory so that future installations of the same package do not need to be downloaded again.pip cache dirView the current cache directory.pip cache purgeClear all cached data.http-v2folder, but this will cause pip to re-download the file the next time it installs the relevant package.--no-cache-dirParameters, for example:pip install package_name --no-cache-dir。
type()The function can return the type of the object.
x = 10
print(type(x)) # <class 'int'>
y = "hello"
print(type(y)) # <class 'str'>
isinstance()Used to check whether a variable belongs to a certain type and supports multi-type checking.
x = 10
print(isinstance(x, int)) # True
print(isinstance(x, str)) # False
print(isinstance(x, (int, float))) # True
type()Relatively strict, only the exact type will be returned.isinstance()Inheritance relationships will be considered and are more commonly used in practice.Boolean in Python is the basic data type in programming that represents two logical states: True or False. It is the basis for all conditional judgments and process control.
TrueandFalse。bool。a = True
b = False
print(type(a)) # Output: <class 'bool'>
Inside Python,boolis an integerintsubcategories of , so they can participate in numerical operations:
TrueNumerically equivalent to 1.FalseNumerically equivalent to 0.print(True + 1) # Output: 2 (1 + 1)
print(False * 5) # Output: 0 (0 * 5)
Comparison operators compare two values and return a Boolean result.
| operator | describe | example | result |
|---|---|---|---|
== |
equal | 10 == 10 |
True |
!= |
not equal to | 5 != 10 |
True |
> |
greater than | 10 > 5 |
True |
<= |
less than or equal to | 5 <= 5 |
True |
Logical operators are used to combine or modify Boolean values or Boolean expressions.
and: Logical "and", both sides areTrueThe result isTrue。or: Logical "OR", either side isTrueThe result isTrue。not: Logical "not", inverting the Boolean value.print(True and False) # Output: False
print(True or False) # Output: True
print(not True) # Output: False
In Python, all objects have a truth value. When an object is used for conditional judgment, Python will convert it toTrueorFalse:
False(Falsy) objects include:None, the number 0 (0, 0.0), empty string (""), empty list ([]), empty tuple (()), empty dictionary ({}) and the empty set (set())。True(Truthy) objects include: all other objects except Falsy values (such as non-zero numbers, any non-empty container).if []: # Empty lists are treated as False
print("This will not be executed")
else:
print("The list is empty")
if "hello": # Non-empty strings are considered True
print("String is not empty")
In Python, an array is a data structure used to store multiple elements of the same type. Although Python itself does not have a built-in array type, you can uselistorarrayModules to achieve similar functionality.
listIt is Python's built-in data structure that can store many types of data, but can also be used to simulate arrays.
my_list = [1, 2, 3, 4, 5]
print(my_list[0]) # Output: 1
If you need a true array (all elements must be of the same type) you can usearrayMods.
import array
my_array = array.array('i', [1, 2, 3, 4, 5])
print(my_array[0]) # Output: 1
here,'i'Indicates that the elements in the array are integers.
Here are some basic operations:
my_array.append(6)my_array.remove(3)len(my_array)For situations where numerical operations are required,numpyProvides more powerful array support.
import numpy as np
my_numpy_array = np.array([1, 2, 3, 4, 5])
print(my_numpy_array[0]) # Output: 1
NumPy arrays support multidimensional data and vectorized operations, making them ideal for processing large amounts of data.
Python provides many ways to implement array functions,listApplicable to general situations,arrayModules are suitable for situations where the same type of elements is required, whereasnumpyIt is the tool of choice for scientific computing.
for variablesret_value = [], there are several simple and efficient ways to determine whether it is an empty list in the Python language. Since an empty list is treated asFalse, the following are several commonly used judgment methods.
This is the most popular method, checking the boolean value of a list directly. If the list is empty,notoperator will make its condition evaluate to true (True)。
ret_value = []
if not ret_value:
#Execute here when the list is empty
print("ret_value is an empty list")
else:
#Execute here when the list is not empty
print("ret_value is not an empty list")
len()functionChecks whether the length of the list is equal to 0.
ret_value = []
if len(ret_value) == 0:
print("ret_value is an empty list")
else:
print("ret_value is not an empty list")
== []Directly assign variables to an empty list literal[]Make a comparison.
ret_value = []
if ret_value == []:
print("ret_value is an empty list")
else:
print("ret_value is not an empty list")
In Python,listIt is a dynamic data structure that can easily add and remove elements.
New elements can be added using the following methods:
append(): Adds an element to the end of the list.insert(): Insert an element at the specified index position.extend(): Appends the elements of another list to an existing list.# Example of new element
my_list = [1, 2, 3]
my_list.append(4) # [1, 2, 3, 4]
my_list.insert(1, 10) # [1, 10, 2, 3, 4]
my_list.extend([5, 6]) # [1, 10, 2, 3, 4, 5, 6]
Elements can be removed using:
pop(): Remove elements based on index, default to remove the last element.remove(): Removes the first matching element based on value.clear(): Remove all elements from the list.# Remove element example
my_list = [1, 2, 3, 4, 5]
my_list.pop() # [1, 2, 3, 4]
my_list.remove(2) # [1, 3, 4]
my_list.clear() # []
For cases where you need elements of the same type, you can usearrayMods.
append()andextend()Method applies toarrayMods.
import array
my_array = array.array('i', [1, 2, 3])
my_array.append(4) # [1, 2, 3, 4]
my_array.extend([5, 6]) # [1, 2, 3, 4, 5, 6]
remove()andpop()Methods can be usedarrayMods.
# Remove element example
my_array = array.array('i', [1, 2, 3, 4])
my_array.remove(2) # [1, 3, 4]
my_array.pop() # [1, 3]
Python provides a variety of methods to dynamically increase and decrease arrays.listandarrayThe modules are suitable for different needs. For more functional requirements, you can also consider usingnumpy。
In Python,list.count()It is a tool specially used for statisticsThe number of occurrences of a specific elementmethod. It is the same as getting the length of the listlen()Different, a parameter must be passed in as the comparison object.
The syntax islist.count(value), which returns the integer number of times the value appears in the list. If the value does not exist, 0 is returned.
fruits = ['apple', 'banana', 'apple', 'orange', 'apple']
# Count the number of occurrences of 'apple'
apple_count = fruits.count('apple')
print(apple_count) # Output: 3
# Count non-existent elements
grape_count = fruits.count('grape')
print(grape_count) # Output: 0
Depending on your needs (whether you are looking for a single element, multiple elements, or all statistics), there are different best practices:
| need | Recommended method | Sample code |
|---|---|---|
| Count a single specific element | list.count() |
arr.count(10) |
| Get the total length of the list | len() |
len(arr) |
| Count the frequency of all elements | collections.Counter |
Counter(arr) |
| Condition count (eg greater than 5) | generator expression | sum(1 for x in arr if x > 5) |
If you need to know how many times "all elements" appear in the list at once, useCounterWill run many timescount()Much more efficient (O(n) vs O(n^2)).
from collections import Counter
data = [1, 2, 2, 3, 3, 3, 4]
counts = Counter(data)
print(counts) # Output: Counter({3: 3, 2: 2, 1: 1, 4: 1})
print(counts[3]) # Get the number of times 3: 3
print(counts.most_common(1)) # Get the element with the most occurrences
if x in listCompareif list.count(x) > 0Much faster, becauseinIt will stop as soon as it finds the first one.count()Only "shallow comparison" will be performed. If the list contains another list, it will not count inside the sublist.list.count()It will lead to a sharp drop in performance, so priority should be given toCounter。If you want to count the number of elements that conform to a certain logic (for example, a string with a length greater than 3), you can combinesum():
words = ['hi', 'hello', 'python', 'a', 'code']
# Count the number of words with length > 3
big_word_count = sum(1 for w in words if len(w) > 3)
print(big_word_count) # Output: 3
[key, value]。list1 = [
["a", 1],
["b", 2],
["c", 3]
]
list2 = [
["c", 30],
["a", 10],
["d", 40]
]
#Convert list1 to dict
dict1 = {k: v for k, v in list1}
dict2 = {k: v for k, v in list2}
# Find all keys
all_keys = sorted(set(dict1.keys()) | set(dict2.keys()))
# Merge results
merged = []
for k in all_keys:
v1 = dict1.get(k)
v2 = dict2.get(k)
merged.append([k, v1, v2])
for row in merged:
print(row)
['a', 1, 10]
['b', 2, None]
['c', 3, 30]
['d', None, 40]
Nonefill.[key, value1, value2]。In Python, if you have two lists (e.g.selected_fields_listandaSqlValuesStr_list), and need to iterate over them simultaneously, the most common and Pythonic way is to use the built-inzip()function.
zip()The function packs multiple iterable objects (Iterable) into a sequence of tuples. Each iteration of the loop will remove the element at the corresponding position from each list.
import re
# Assume these are the two lists you get after re.split()
selected_fields = "name,age,city"
aSqlValuesStr = "Alice,25,Taipei"
selected_fields_list = re.split(r',', selected_fields)
aSqlValuesStr_list = re.split(r',', aSqlValuesStr)
# Execute for loop
for field, value in zip(selected_fields_list, aSqlValuesStr_list):
print(f"Field: {field}, value: {value}")
Field: name, value: Alice
Field: age, value: 25
Field: city, value: Taipei
When you usezip()When using functions, you need to pay attention to the following points:
zip()Iteration will stop with the shortest list. ifselected_fields_listhas 3 elements, andaSqlValuesStr_listWith 4 elements, the loop will only execute 3 times.itertools.zip_longest。#Example: List lengths are inconsistent
list1 = [1, 2, 3]
list2 = ['a', 'b']
for item1, item2 in zip(list1, list2):
print(item1, item2)
#The output will only be:
# 1a
#2b
data = [
[1, 23, 456],
[7890, 12, 3],
[45, 678, 9]
]
# Calculate the maximum width of each column
col_widths = [max(len(str(row[i])) for row in data) for i in range(len(data[0]))]
# Format output
for row in data:
print(" ".join(str(val).rjust(col_widths[i]) for i, val in enumerate(row)))
1 23 456
7890 12 3
45 678 9
Third-party packagestabulateAligned tables can be output directly.
from tabulate import tabulate
data = [
[1, 23, 456],
[7890, 12, 3],
[45, 678, 9]
]
print(tabulate(data, tablefmt="grid"))
+------+-----+-----+
| 1 | 23 | 456 |
| 7890 | 12 | 3 |
| 45 | 678 | 9 |
+------+-----+-----+
rjust()Alignment.tabulateQuickly generate beautiful tables.The following example demonstrates how to print a two-dimensional list. When the element isfloatFixed display of two decimal places, other elements are output as they are, while maintaining field alignment.
data = [
[1, 23.456, 456],
[7890.1, 12, 3.5],
[45, 678.9, 9]
]
# Format each element string
formatted_data = []
for row in data:
new_row = []
for val in row:
if isinstance(val, float):
s = f"{val:.2f}" # float -> two decimal places
else:
s = str(val)
new_row.append(s)
formatted_data.append(new_row)
# Calculate the maximum width of each column
col_widths = [max(len(row[i]) for row in formatted_data) for i in range(len(formatted_data[0]))]
# Format output
for row in formatted_data:
print(" ".join(val.rjust(col_widths[i]) for i, val in enumerate(row)))
1 23.46 456
7890.10 12 3.50
45 678.90 9
isinstance(val, float)Determine floating point numbers.f"{val:.2f}"Format a floating point number to two decimal places.The data structure you provide is a list containing multiple dictionaries, where each dictionary represents a time series record (such as K-line data).
data = [
{'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
{'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'},
# ...
]
If you want to get all values in all records (1759028400000, '109398.3', '109364.8'etc.) and collect them in a list, you can use Nested List Comprehension.
Use a two-level list comprehension to iterate through each dictionary in the list, and then call.values()Method gets the values and finally collects all the values into a flat list.
data = [
{'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
{'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'}
]
all_values = [
value
for record in data # Traverse each dictionary record in the external list
for value in record.values() # Traverse all values in the dictionary record
]
print(all_values)
This results in a single list containing all values:
[1759028400000, '109398.3', '109364.8', '109489.2', '109364.8', '518.7594', 1759024800000, '109305.6', '109398.3', '109496.4', '109296.0', '757.0290']
If you only need to extract specific fields (e.g. just allopenprice), you can use a single-level list comprehension:
data = [
{'time': 1759028400000, 'open': '109398.3', ...},
{'time': 1759024800000, 'open': '109305.6', ...}
]
open_prices = [record['open'] for record in data]
print(open_prices)
Output result:
['109398.3', '109305.6']
List comprehensions are a concise, one-line syntax for creating new lists from existing lists.
from datetime import datetime
data_list = [
{'videoid': 'b5HxsaM_E2Y', 'publishedat': datetime(2025, 12, 7, 3, 0, 53), 'rankno': 7, 'viewcount': 913053, 'query': 'baseball'},
{'videoid': 'FEbMCBxsoWI', 'publishedat': datetime(2025, 11, 25, 5, 28, 6), 'rankno': 13, 'viewcount': 754598, 'query': 'baseball'},
{'videoid': 'nOJUI0PGB68', 'publishedat': datetime(2025, 12, 7, 3, 7, 46), 'rankno': 14, 'viewcount': 748349, 'query': 'baseball'},
{'videoid': 'uMHXIudw_w0', 'publishedat': datetime(2025, 12, 2, 10, 1, 38), 'rankno': 8, 'viewcount': 687949, 'query': 'baseball'}
]
target_key = 'videoid'
# Use list comprehension: for each dictionary item in data_list, take out item[target_key]
video_ids = [item[target_key] for item in data_list]
print(f"Extracted key: {target_key}")
print("All videoid values:")
print(video_ids)
['b5HxsaM_E2Y', 'FEbMCBxsoWI', 'nOJUI0PGB68', 'uMHXIudw_w0']
If some dictionaries in the list may be missing the target key (e.g. some dictionaries don't have the `videoid` key), using `item[target_key]` directly will throw a `KeyError` error. You can use the dictionary's `.get()` method or a conditional to handle it safely.
data_with_missing_key = [
{'videoid': 'A1', 'query': 'football'},
{'query': 'basketball'}, # Missing 'videoid' key
{'videoid': 'C3', 'query': 'Volleyball'}
]
target_key = 'videoid'
# Option A: Use .get() to set the default value to None (or any other value)
safe_video_ids_A = [item.get(target_key) for item in data_with_missing_key]
# Output: ['A1', None, 'C3']
# Option B: Extract only values with that key
safe_video_ids_B = [item[target_key] for item in data_with_missing_key if target_key in item]
# Output: ['A1', 'C3']
print(f"\nSafe extraction results (Plan B): {safe_video_ids_B}")
you fromonesymbollist.values()The result obtained is a Python dictionary view objectdict_values. Although it looks like a list, it is a dynamic view and not a standard list that can be indexed or modified.
dict_values([1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523'])
Do not showdict_values()To get the content of a tag, the simplest and most common way is to use the built-inlist()The function casts it to a list.
Willdict_valuespassed tolist()Function that will immediately copy all elements in the view object to a new standard list.
# Assume this is the dict_values view object you obtained
dict_values_object = your_dictionary.values() # Assume onesymbollist is a dictionary
# Step: Use list() function to convert
result_list = list(dict_values_object)
print(result_list)
[1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523']
If your goal is to create an immutable sequence, you can usetuple()The function converts it into a tuple.
result_tuple = tuple(dict_values_object)
print(result_tuple)
(1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523')
Note that if you just want to process the values one by one in a loop, you don't need to explicitly convert it to a list.dict_valuesItself is an iterable object (Iterable).
# Assume this is the dict_values view object you obtained
dict_values_object = your_dictionary.values()
print("Output elements one by one:")
for value in dict_values_object:
print(value)
In summary, uselist(your_dict.values())is the most common way to get a clean, actionable list of values.
to start fromdict_valuesExtracts all values from the view object and combines them into a single string separated only by commas and spaces (for example:1763510400000, '0.00015218', ...), you need to combine the following steps:
dict_valuesView object.", ".join()method to concatenate these strings.Assume your original dictionary is namedonesymbol_dict, and you need to convert non-string values (such as timestamps) to strings to use.join()Make a connection.
onesymbol_dict = {
'time': 1763510400000,
'open': '0.00015218',
'close': '0.00015336',
'high': '0.00015415',
'low': '0.00015067',
'volume': '1634523'
}
# Step 1 & 2: Get values and convert all values to strings
# Use list comprehensions to ensure all elements are strings
values_as_strings = [str(v) for v in onesymbol_dict.values()]
# Step 3: Use ', '.join() to join these strings
values_string = ", ".join(values_as_strings)
print(values_string)
This will produce a single string whose content is only the value, without any outer brackets ordict_valuesLabel:
1763510400000, 0.00015218, 0.00015336, 0.00015415, 0.00015067, 1634523
Note that in the example output you expect, values other than timestamps have single quotes:1763510400000, '0.00015218', ...。
If your goal is to have string values retain single quotes in the final output, you will need to add quotes manually during concatenation. This is usually required when preparing strings for SQL statements or specific formats.
# Extra step: Handle quotes manually, assuming non-integer values require quotes
quoted_values = []
for v in onesymbol_dict.values():
if isinstance(v, (str, float)) or (isinstance(v, int) and v< 1000000000000): # 假設小數字串需要引號
quoted_values.append(f"'{v}'")
else:
quoted_values.append(str(v))
final_quoted_string = ", ".join(quoted_values)
print(final_quoted_string)
This will give you the format you expect (assumingtimeNo quotes required, other numeric strings do):
1763510400000, '0.00015218', '0.00015336', '0.00015415', '0.00015067', '1634523'
isdigit()Method can be used to check whether a string contains only numeric characters.
# Example
string = "12345"
if string.isdigit():
print("is a number")
else:
print("Not a number")
Notice:isdigit()Cannot handle decimal points or negative signs.
If you need to check a string with a decimal point, you can remove the decimal point before usingisdigit()。
# Example
string = "123.45"
if string.replace(".", "").isdigit():
print("is a number")
else:
print("Not a number")
This method does not work with negative numbers.
The most common approach is to try to convert the string to a float or integer and catch an exception if the conversion fails.
# Example
string = "-123.45"
try:
float(string) # You can use int(string) instead to check for integers
print("is a number")
except ValueError:
print("Not a number")
Regular expressions can match numbers exactly, including integers, decimals, and negative numbers.
# Example
import re
string = "-123.45"
pattern = r"^-?\d+(\.\d+)?$"
if re.match(pattern, string):
print("is a number")
else:
print("Not a number")
For simple cases, useisdigit(). For more complex cases (like dealing with decimals or negative numbers) it is recommended to usetry-exceptor regular expression.
f-string (formatted string literal) is a powerful and efficient string formatting method introduced in Python 3.6. It provides a concise and highly readable way to embed the values of variables and expressions into strings.
The core feature of f-string is: using prefixes at the beginning of the stringforF, and use braces{}to contain what is to be calculated and displayed.
You can put any variable name directly inside curly braces.
name = "Alice"
age=30
message = f"Hello, my name is {name} and I am {age} years old."
# Output: Hello, my name is Alice and I am 30 years old.
The power of f-string is that you can put any valid Python expression inside the curly braces and it will be evaluated at runtime.
price = 19.99
tax_rate = 0.05
total = price * (1 + tax_rate)
# Perform calculations within f-string
result = f"The total price including tax is: {price * (1 + tax_rate):.2f} yuan."
# Output: The total price including tax is: 20.99 yuan.
# Call function
def get_status():
return "OK"
status_msg = f"System status: {get_status()}"
# Output: System status: OK
f-string supports and.format()Same method as Format Specifier Mini-Language, using colon:to separate expressions and format specifiers.
| format code | use | example | output |
|---|---|---|---|
:.2f |
Floating point number with two decimal places | f"{3.14159:.2f}" |
3.14 |
: <10 |
Align left, width 10 | f"{'Name':<10}" |
Name |
: >10 |
Align right, width 10 | f"{'Value':>10}" |
Value |
:^10 |
Center aligned, width 10 | f"{'Hi':^10}" |
Hi |
:, |
Numeric thousands separator | f"{1000000:,}" |
1,000,000 |
Starting with Python 3.8, f-string introduces a convenient debugging feature that allows you to add an equal sign after the variable.=, automatically displays variable names and their values.
user_id = 42
is_active = True
debug_output = f"User ID is {user_id=}, Status: {is_active=}"
# Output: User ID is user_id=42, Status: is_active=True
'...', you can use double quotes inside"...",vice versa.{{}}。\It is not allowed to be used to represent string formats, such asf"{{'\n'}}"will cause errors.Braces in f-string{}Internally, you can directly useint()The function converts the variable back to an integer type. This is the clearest and most direct method.
inta = 12 # assuming raw integer
float_a = inta * 2.0 # The result is 24.0 (floating point number)
# Convert floating point number back to integer directly in f-string
result_str = f"abc def {int(float_a)}"
print(f"Original value (float_a): {float_a}")
print(f"Formatted result: {result_str}")
You can use the integer format codes from the formatting mini-languagedor.0fto control the output format.
:duse:dPython is asked to treat the value as an integer when displayed. if variableais a floating point number, Python will automatically round it to the nearest integer before displaying it (if your calculation results in $24.0$, it will display $24$).
float_a = 24.0
result_d = f"abc def {float_a:d}"
# Output: abc def 24
:.0fuse:.0fmeans format the value as a floating point number, but require 0 digits after the decimal point. This will also cause the results to be displayed rounded.
float_a = 24.0
result_0f = f"abc def {float_a:.0f}"
# Output: abc def 24
If your purpose is to ensure that the result is an absolute integer without a decimal point, it is recommended to use option 1:
str = f"abc def {int(a)}"
In Python,re.split()The function isreA powerful tool in the (regular expression) module, used to split strings according to the delimiters (patterns) defined in regular expressions and return the results as a list (list).
re.split(pattern, string, maxsplit=0, flags=0)
pattern: Regular expression pattern used to split strings.string: The target string to be cut.maxsplit(Optional): Specify the maximum number of cuts. The default value 0 means no limit on the number of cuts.flags(optional): Flags that control the behavior of regular expressions (e.g.re.IGNORECASE)。Use regular expressions to define multiple or complex delimiters.
import re
text = "apple,banana;orange-grape"
# Use comma, semicolon or hyphen as delimiter
result = re.split(r'[;,-]', text)
print(result)
# Output: ['apple', 'banana', 'orange', 'grape']
with standard stringssplit()different,re.split()Multiple consecutive delimiters (e.g. multiple spaces) can be easily handled and ignored.
text = "Word1 Word2 Word3"
# Use \s+ to match one or more whitespace characters as delimiters
result = re.split(r'\s+', text)
print(result)
# Output: ['Word1', 'Word2', 'Word3']
If setmaxsplit, the cutting operation will only be performed a specified number of times, and the remaining parts will be retained in the list as the last element.
text = "one:two:three:four"
# Only cut once
result = re.split(r':', text, maxsplit=1)
print(result)
# Output: ['one', 'two:three:four']
If you place the delimiter pattern inside the brackets(), the delimiter itself will also be included between the elements of the resulting list.
text = "2025-01-15"
# Put hyphens in parentheses so they are preserved
result = re.split(r'(-)', text)
print(result)
# Output: ['2025', '-', '01', '-', '15']
startswith()is a Python string (str) object method, used to determine whether the string begins with the specified substring.
If it matches, returnTrue; Otherwise returnFalse。
str.startswith(prefix[, start[, end]])
prefix: The starting string to be compared can be a single string or a tuple of strings.start(Optional): Specify which index position of the original string to start judging from.end(Optional): Specify the index position (excluding end).Boolean value: If the string starts with the specified prefix, returnTrue,otherwiseFalse。
text = "Python Programming"
#Basic usage
print(text.startswith("Py")) # True
print(text.startswith("Java")) # False
#Specify range
print(text.startswith("thon", 2)) # True (starting at index 2 is "thon")
# multiple alignments
print(text.startswith(("Py", "Java", "C"))) # True, because any matching
# Not case sensitive (can be converted to lowercase first)
print(text.lower().startswith("py")) # True
url = "https://openai.com"
if url.startswith("https://"):
print("Secure connection URL")
files = ["data1.csv", "data2.csv", "readme.txt"]
csv_files = [f for f in files if f.startswith("data")]
print(csv_files) # ['data1.csv', 'data2.csv']
To determine whether a string "ends" with a certain text, you can useendswith()methods, syntax andstartswith()same.
filename = "report.pdf"
if filename.endswith(".pdf"):
print("This is a PDF file")
In Python, strings are immutable objects. To remove the last character, string slicing is usually used to create a new string.
text = "Hello!"
#Method 1: Use slicing
new_text = text[:-1]
print(new_text) #Output: Hello
#Method 2: Use rstrip() to remove specific ending characters
text2 = "Hello!!!"
new_text2 = text2.rstrip("!")
print(new_text2) #Output: Hello
#Method 3: Make sure it is not empty and then delete the last character
if text:
text = text[:-1]
print(text)
Hello
Hello
Hello
text[:-1]The substring from the beginning to the second to last character will be taken out.rstrip(chars)The specified characters at the end (not limited to one) can be removed.To remove the leading characters, use:
text = text[1:]
s[:-1]。rstrip()。Given a stringstr1, we hope to find instrAorstrBThe part that comes before. For example:
str1 = "Hello World, this is a test. Stop here or continue."
strA = "Stop"
strB = "continue"
The goal is to obtain"Hello World, this is a test. "。
re.split()You can split a string based on multiple keywords and take the first part:
import re
def get_substring_before(text, strA, strB):
result = re.split(f"{re.escape(strA)}|{re.escape(strB)}", text, maxsplit=1)[0]
return result
str1 = "Hello World, this is a test. Stop here or continue."
strA = "Stop"
strB = "continue"
print(get_substring_before(str1, strA, strB)) # "Hello World, this is a test. "
re.search()can be used to matchstrAorstrB, and get the content before matching:
import re
def get_substring_before(text, strA, strB):
match = re.search(f"{re.escape(strA)}|{re.escape(strB)}", text)
return text[:match.start()] if match else text
str1 = "Hello World, this is a test. Stop here or continue."
print(get_substring_before(str1, "Stop", "continue")) # "Hello World, this is a test. "
find()You can manually search for the earlieststrAorstrB, and then retrieve the corresponding part:
def get_substring_before(text, strA, strB):
indexA = text.find(strA)
indexB = text.find(strB)
indices = [i for i in [indexA, indexB] if i != -1]
first_index = min(indices, default=len(text))
return text[:first_index]
str1 = "Hello World, this is a test. Stop here or continue."
print(get_substring_before(str1, "Stop", "continue")) # "Hello World, this is a test. "
re.split()The simplest, suitable for single search.re.search()Provide more flexible regular matching.find()The method is the most efficient and suitable for simple string processing.values = ["str1", "str2", 123, "str3", 456]
s = ", ".join(str(v) for v in values)
print(s) # Output: str1, str2, 123, str3, 456
def join_values(*args, sep=", "):
return sep.join(str(v) for v in args)
print(join_values("str1", "str2", 88, "str3"))
# Output: str1, str2, 88, str3
data = {
"str1": "hello",
"str2": "world",
"int1": 123,
"str3": "ok"
}
# Sort by key and then join
s = ", ".join(str(data[k]) for k in sorted(data.keys()))
print(s) # hello, world, 123, ok
s = ", ".join(f"{k}={v}" for k, v in data.items())
print(s)
# Output: str1=hello, str2=world, int1=123, str3=ok
rows = [
{"str1": "A", "int1": 10},
{"str1": "B", "int1": 20},
]
for row in rows:
print(", ".join(str(v) for v in row.values()))
# A, 10
# B, 20
def join_values(values, sep=", ", quoted=False):
if quoted:
# Use full single quotes ‘ ’
return sep.join(f"‘{v}’" for v in values)
else:
return sep.join(str(v) for v in values)
values = ["str1", "str2", 88, "str3"]
print(join_values(values, quoted=False))
# Output: str1, str2, 88, str3
print(join_values(values, quoted=True))
# Output: ‘str1’, ‘str2’, ‘88’, ‘str3’
def join_args(*args, sep=", ", quoted=False):
if quoted:
return sep.join(f"‘{v}’" for v in args)
return sep.join(str(v) for v in args)
print(join_args("str1", "str2", 88, "str3", quoted=True))
# ‘str1’, ‘str2’, ‘88’, ‘str3’
data = {"str1": "hello", "str2": "world", "int1": 123}
print(join_values(data.values(), quoted=True))
# ‘hello’, ‘world’, ‘123’
def join_key_value(d, sep=", ", quoted=False):
if quoted:
return sep.join(f"{k}=‘{v}’" for k, v in d.items())
return sep.join(f"{k}={v}" for k, v in d.items())
print(join_key_value(data, quoted=True))
# str1=‘hello’, str2=‘world’, int1=‘123’
Pythonre.matchIs a function in the regular expression module, used to match from the beginning of the string.
If the match is successful, return aMatchobject; otherwise returnNone。
re.match(pattern, string, flags=0)
Parameter description:
pattern: Regular expression to match.string: The string to be checked.flags: Optional parameter used to modify matching behavior, for examplere.IGNORECASE。group(n):Return to the firstncaptured subgroups,n=0Return the entire match.start(): Returns the starting position of the match.end(): Returns the end position of the match.span(): Returns the matching range (start, end).import re
#Define a string
text = "123 Hello World!"
# Use re.match to match numbers from the beginning
match = re.match(r"(\d+)\s+(.*)", text)
if match:
print(f"Entire matching result: {match.group(0)}") # 123 Hello World!
print(f"Number part: {match.group(1)}") # 123
print(f"Text part: {match.group(2)}") # Hello World!
else:
print("match failed")
The entire matching result: 123 Hello World!
Numeric part: 123
Text part: Hello World!
re.matchOnly match from the beginning of the string, if the beginning does not match, returnNone。re.searchorre.findall。Regular Expression (Regex for short) is a syntax used to describe string matching rules. It is often used to search, replace or verify strings.
in PythonreIn the module,patternThat's the core part of defining these rules.
\d: Matches any number (0-9).\D: Matches anything that is not a number.\w: Matches any single character (letter, number, underscore).\W: Matches any non-single character.\s: Matches any whitespace character (space, tab, etc.).\S: Matches any non-whitespace character..: Matches except newline characters (\n) except any single character.*: Matches the previous expression 0 or more times.+: Matches the previous expression 1 or more times.?: Matches the previous expression 0 or 1 times.{n}: Matches the previous expression exactly n times.{n,}: Matches the previous expression at least n times.{n,m}: Matches the previous expression at least n times but not more than m times.^: Matches the beginning of the string.$: Matches the end of the string.\b: Matches word boundaries (such as the beginning or end of a word).\B: Matches non-word boundaries.(...):Group, capture the contents within brackets.|: Logical "or", such asa|bmatchaorb。(?:...): Groups but does not capture content.(?=...): Looking forward, the match must be followed by the specified content.(?!...): Negative lookaround, matching cannot be followed by specified content.(?<=...): Forward lookback, matching must be preceded by the specified content.(?<!...): Negative lookback, matching cannot be preceded by specified content.import re
# Example 1: Match content starting with a number
pattern = r"^\d+"
text = "123abc"
match = re.match(pattern, text)
if match:
print(f"Matching result: {match.group()}") # Output: 123
#Example 2: Match text after numbers
pattern = r"(\d+)\s+(.*)"
text = "123 Hello World"
match = re.match(pattern, text)
if match:
print(f"Number part: {match.group(1)}") # Output: 123
print(f"Text part: {match.group(2)}") # Output: Hello World
re.search()Used to search for the first match of a regular expression in a string and return itMatchObject, returned if there is no matchNone。
import re
text = "Hello 2024!"
match = re.search(r"\d+", text)
if match:
print("Number found:", match.group()) # 2024
whenre.search()When a match is found, it returnsMatchObjects can access information through the following methods:
group(): Return the matching stringstart(): matching starting indexend():End index of matchspan():Return (start, end) index rangeimport re
text = "Python 3.10 is great!"
match = re.search(r"\d+\.\d+", text)
if match:
print("Match content:", match.group()) # 3.10
print("Start index:", match.start()) # 7
print("End index:", match.end()) # 11
print("range:", match.span()) # (7, 11)
through brackets()to create a group and usegroup(n)to extract the corresponding matching content.
import re
text = "John Doe, Age: 25"
match = re.search(r"(\w+) (\w+), Age: (\d+)", text)
if match:
print("Last name:", match.group(1)) # John
print("name:", match.group(2)) # Doe
print("Age:", match.group(3)) # 25
re.search()Only the first matching result is returned, whereasre.findall()All matching results will be returned.
import re
text = "Price: $10, Discount: $2, Tax: $1"
match = re.search(r"\$\d+", text)
print("re.search:", match.group()) # $10
matches = re.findall(r"\$\d+", text)
print("re.findall:", matches) # ['$10', '$2', '$1']
re.search()Suitable for finding the first matching result, and can be used toMatchObject to get detailed information. For multiple matching results, usere.findall()。
In a regular expression,(...)Matching content will be captured and stored ingroup(n), but not the capture group(?:...)It is only used for organizational structure and will not affect the group number, so matching is faster.
If used in regular expression()to organize matching conditions, which will affectgroup(n)number. use(?:...)This ensures that the group index remains unchanged.
import re
text = "2024-03-12"
pattern = r"(\d{4})-(?:\d{2})-(\d{2})"
match = re.search(pattern, text)
print(match.group(1)) # 2024
print(match.group(2)) # 12
use(?:...|...)can let|The operator affects matching content but not group access.
import re
text = "bar123"
pattern = r"(?:foo|bar|baz)\d+"
match = re.search(pattern, text)
print(match.group()) # bar123
When parsing Chrome parameters, use(?:...)This ensures that the matching format does not affect the group number.
import re
cmdline = '--user-data-dir="C:\\Users\\moirg\\AppData\\Local\\Google\\Chrome\\User Data"'
match = re.search(r'--user-data-dir=(?:"([^"]+)"|(\S+))', cmdline)
user_data_dir = match.group(1) or match.group(2)
print(user_data_dir) # C:\Users\moirg\AppData\Local\Google\Chrome\User Data
(?:...)It can improve performance in regular expressions, avoid affecting group index, and is suitable for|Operations and specific condition matching make the code more efficient and clear.
import datetime
now = datetime.datetime.now()
print(now)
dt = datetime.datetime(2025, 7, 2, 14, 30, 0)
print(dt)
now = datetime.datetime.now()
formatted = now.strftime("%Y-%m-%d %H:%M:%S")
print(formatted)
dt_str = "2025-07-02 14:30:00"
parsed = datetime.datetime.strptime(dt_str, "%Y-%m-%d %H:%M:%S")
print(parsed)
now = datetime.datetime.now()
delta = datetime.timedelta(days=7)
next_week = now + delta
print(next_week)
today = datetime.date.today()
print(today)
dt1 = datetime.datetime(2025, 7, 1)
dt2 = datetime.datetime(2025, 7, 2)
print(dt1 < dt2)
dt1 = datetime.datetime(2025, 7, 1, 12, 0, 0)
dt2 = datetime.datetime(2025, 7, 2, 14, 30, 0)
diff = dt2 - dt1
print(diff)
print(diff.total_seconds())
today = datetime.date.today()
print(today.weekday()) # 0 = Monday, 6 = Sunday
When one datetime object has no time zone (naive) and the other has a time zone (aware) and is subtracted, it will produce:
TypeError: can't subtract offset-naive and offset-aware datetimes
from datetime import datetime
def is_aware(dt):
return dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None
def is_naive(dt):
return not is_aware(dt)
dt1 = datetime.now() # naive
dt2 = datetime.now().astimezone() # aware
print(is_naive(dt1), is_aware(dt1))
print(is_naive(dt2), is_aware(dt2))
Suitable for situations across time zones or when precise time calculation is required.
from datetime import datetime, timezone
sql_dt = sql_dt.replace(tzinfo=timezone.utc) # SQL data transfer aware
now_dt = datetime.now(timezone.utc) # Use aware for the current time
diff = now_dt - sql_dt
print(diff.total_seconds())
sql_dt = sql_dt.replace(tzinfo=None)
now_dt = datetime.now()
diff = now_dt - sql_dt
In Python, you can use the built-in modulesstatisticsofstdev()orpstdev()Calculate the sample standard deviation or population standard deviation.
import statistics
ratios = [2.3, 2.8, 3.1, 2.5, 3.0]
# average
average = sum(ratios) / len(ratios)
#Sample standard deviation (n-1)
std_sample = statistics.stdev(ratios)
#Matrix standard deviation (n)
std_population = statistics.pstdev(ratios)
print(f"Average: {average:.2f}")
print(f"Sample standard deviation: {std_sample:.3f}")
print(f"Population standard deviation: {std_population:.3f}")
Average: 2.74
Sample standard deviation: 0.303
Population standard deviation: 0.271
If you don't want to rely onstatisticsModule, you can calculate it yourself using mathematical formulas:
ratios = [2.3, 2.8, 3.1, 2.5, 3.0]
average = sum(ratios) / len(ratios)
# standard deviation (parent)
variance = sum((x - average) ** 2 for x in ratios) / len(ratios)
std_dev = variance ** 0.5
print(f"Standard deviation: {std_dev:.3f}")
Standard deviation: 0.271
statistics.stdev(): sample standard deviation (divided byn-1)statistics.pstdev(): Parent standard deviation (divided byn)pstdev()stdev()avg = sum(data) / len(data)statistics.stdev(data)statistics.pstdev(data)# ANSI color code example
print("\033[31m red text\033[0m")
print("\033[32m green text\033[0m")
print("\033[33myellow text\033[0m")
print("\033[34m blue text\033[0m")
print("\033[35m purple text\033[0m")
print("\033[36m cyan text\033[0m")
print("\033[37m white text\033[0m")
print("\033[1;31m bold red text\033[0m")
print("\033[42m green background text\033[0m")
# Format: \033[style; foreground color; background color m
# Style: 0=Default, 1=Bold, 4=Underline
# Foreground color: 30~37
# Background color: 40~47
print("\033[1;33;44m bold yellow text + blue background\033[0m")
def color_text(text, color_code):
return f"\033[{color_code}m{text}\033[0m"
print(color_text("Warning!", "1;31")) # Bold red
print(color_text("Success!", "1;32")) # Bold green
importsys
if sys.stdout.isatty():
print("Terminal may support ANSI colors")
else:
print("It may be a file or an output environment that does not support color")
import colorama
colorama.init()
print("\033[32mThis text should be green\033[0m")
def supports_ansi():
try:
print("\033[31m test red\033[0m")
return True
except:
return False
print("ANSI is supported" if supports_ansi() else "ANSI is not supported")
colorama。In Python,input()The function itself does not supportTabCompleted automatically.
To achieve this function, you can combinereadlineModule that allows users to useTabDo autocompletion (similar to Bash or IPython).
import readline
# Define a list of strings that can be completed
WORDS = ['apple', 'banana', 'cherry', 'grape', 'orange', 'watermelon']
def completer(text, state):
"""Autocomplete function: compare available strings based on input prefixes"""
options = [w for w in WORDS if w.startswith(text)]
if state < len(options):
return options[state]
else:
return None
# Enable completion function
readline.set_completer(completer)
readline.parse_and_bind('tab: complete')
# User input (supports Tab)
user_input = input("Enter the name of the fruit (you can press Tab to complete): ")
print(f"What you entered is: {user_input}")
readline.set_completer()Set auto-completion logic.readline.parse_and_bind('tab: complete')letTabKey triggers autocompletion.completer(text, state):
textis the currently entered text.stateRepresents the matching item (0, 1, 2...).None。You can also update the completion list based on the current context or dynamic content:
import readline
def dynamic_completer(text, state):
current_words = ['cat', 'car', 'dog', 'duck', 'deer']
options = [w for w in current_words if w.startswith(text)]
if state < len(options):
return options[state]
return None
readline.set_completer(dynamic_completer)
readline.parse_and_bind('tab: complete')
command = input("Enter the animal name: ")
print("You enter:", command)
readlineThe module is inLinux / macOSThe above presets are available.pip install pyreadline3
readlineModules allowinput()Supports tab completion.pyreadline3kit.The most common loop is used to run a fixed number of times or sequence.
# from 0 to 4
for i in range(5):
print(i) #0,1,2,3,4
#Specify the starting point, end point and step length
for i in range(2, 10, 2):
print(i) #2,4,6,8
Directly iterate lists, strings, dictionaries, etc.
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
print(fruit)
for ch in "hello":
print(ch)
#Iterate over dictionary
person = {"name": "Tom", "age": 25}
for key, value in person.items():
print(key, value)
It will always be executed when the condition is True.
count = 0
while count < 5:
print(count)
count += 1
Control the loop process.
for i in range(10):
if i == 3:
continue # skip this time
if i == 7:
break # End early
print(i)
Go back into the circle and put it back into the circle.
for i in range(3):
for j in range(2):
print(f"i={i}, j={j}")
for or while can be addedelse, only "normal completion (no break)" will be executed.
for i in range(5):
print(i)
else:
print("Loop ends normally")
Concise writing method can complete the loop and generate the list in one line.
squares = [x**2 for x in range(5)]
print(squares) # [0,1,4,9,16]
for in range(): Suitable for fixed times or sequences.for in iterable object: Suitable for processing lists, strings, and dictionaries.while: Suitable for loops with an indefinite number of times and ending according to conditions.break / continue / else: Control the loop process.Python uses the try...except statement to intercept and handle errors that occur during program execution to prevent program crashes.
try:
# Code that may cause exceptions
result=10/0
except ZeroDivisionError:
# Code executed when a specific exception occurs
print("The divisor cannot be zero")
except Exception as e:
#Catch all other types of exceptions
print(f"An error occurred: {e}")
else:
# Execute if no exception occurs in the try block
print("operating normally")
finally:
# Will be executed regardless of whether an exception occurs
print("Clean resources or close files")
You can use tuples in an except to handle multiple error types at the same time.
try:
# Perform operations
pass
except (ValueError, TypeError):
print("The entered data type or value is incorrect")
Use the raise keyword to manually trigger exceptions based on logical requirements.
age = -1
if age < 0:
raise ValueError("年齡數值不可為負數")
classMyClass:
def __init__(self, value):
self.value = value
def display(self):
print(f"Value: {self.value}")
obj = MyClass(10)
obj.display() # Output: Value: 10
Static methods are defined using the `@staticmethod` decorator and have nothing to do with categories and objects. They cannot access category properties or object properties. Applies to some instrumental functions:
classMyClass:
@staticmethod
def add(a, b):
return a + b
result = MyClass.add(5, 3)
print(result) # Output: 8
Class methods are defined using the `@classmethod` decorator, the first parameter is the class itself (usually named `cls`), and class attributes can be accessed:
classMyClass:
count = 0
@classmethod
def increment_count(cls):
cls.count += 1
MyClass.increment_count()
print(MyClass.count) # Output: 1
Python supports class inheritance. Subclasses can inherit the attributes and methods of the parent class and override the parent class methods:
class Parent:
def greet(self):
print("Hello from Parent!")
class Child(Parent):
def greet(self):
print("Hello from Child!")
obj = Child()
obj.greet() # Output: Hello from Child!
Category attributes belong to the entire category and are shared by all objects; object attributes belong to each object:
classMyClass:
class_attr = "I am a class attribute"
def __init__(self, value):
self.instance_attr = value
obj1 = MyClass(10)
obj2 = MyClass(20)
print(MyClass.class_attr) # Output: I am a class attribute
print(obj1.instance_attr) # Output: 10
print(obj2.instance_attr) # Output: 20
All classes in Python inherit from `object` by default, which is a built-in base class that provides some basic methods, such as `__str__` and `__eq__`:
class MyClass(object):
def __init__(self, value):
self.value = value
def __str__(self):
return f"MyClass with value {self.value}"
obj = MyClass(5)
print(obj) # Output: MyClass with value 5
In Python, class inheritance allows subclasses (Derived Class) to inherit the properties and methods of the parent class (Base Class) to achieve code reuse. For example:
class Parent:
def greet(self):
print("Hello from Parent!")
class Child(Parent):
pass
c = Child()
c.greet() # Output: Hello from Parent!
Subclasses can override (Override) the methods of the parent class and rewrite its functions:
class Parent:
def greet(self):
print("Hello from Parent!")
class Child(Parent):
def greet(self):
print("Hello from Child!")
c = Child()
c.greet() # Output: Hello from Child!
In the subclass, you can call the method of the parent class through `super()` and extend the behavior of the parent class:
class Parent:
def greet(self):
print("Hello from Parent!")
class Child(Parent):
def greet(self):
super().greet()
print("Hello from Child!")
c = Child()
c.greet()
# Output:
# Hello from Parent!
# Hello from Child!
Python supports multiple inheritance, and subclasses can inherit multiple parent classes at the same time:
class Parent1:
def greet(self):
print("Hello from Parent1!")
class Parent2:
def greet(self):
print("Hello from Parent2!")
class Child(Parent1, Parent2):
pass
c = Child()
c.greet() # Output: Hello from Parent1! (according to inheritance order)
Multiple inheritance uses MRO (Method Resolution Order) to determine the resolution order of methods. This can be checked using the `__mro__` attribute:
print(Child.__mro__) # Output: (, , , )
Use the `abc` module to define an Abstract Base Class to force subclasses to implement specific methods:
from abc import ABC, abstractmethod
class AbstractParent(ABC):
@abstractmethod
def greet(self):
pass
class Child(AbstractParent):
def greet(self):
print("Hello from Child!")
c = Child()
c.greet() # Output: Hello from Child!
classClassB:
def greet(self):
print("Hello from ClassB!")
# Dynamically create a temporary class inherited from ClassB
TempClass = type('TempClass', (ClassB,), {
'greet': lambda self: (print("Hello from TempClass!"), super(TempClass, self).greet())[0]
})
#Create instance and test
temp = TempClass()
temp.greet()
type('TempClass', (ClassB,), {...})greetThe method is to print out the new message first, and then passsuper()call parent categorygreet。
Hello from TempClass!
Hello from ClassB!
Suppose there are two categoriesAClassandBClass, they have functions with the same name and parameters.
We want to be able to easily switch which category to use without having to modify the main program logic.
class AClass:
def process(self, data):
print(f"AClass processing: {data}")
def result(self):
return "result from AClass"
class BClass:
def process(self, data):
print(f"BClass processing: {data}")
def result(self):
return "result from BClass"
# You can control which category to use through settings
USE_A = True
#Dynamic selection of categories
SelectedClass = AClass if USE_A else BClass
# Create an instance and use it
obj = SelectedClass()
obj.process("Test data")
print(obj.result())
AClass processing: test data
Results from AClass
BClass processing: test data
Results from BClass
def get_class(name):
mapping = {
"A": AClass,
"B": BClass
}
return mapping.get(name, AClass) # Default is AClass
# Dynamic selection
cls = get_class("B")
obj = cls()
obj.process("Test data")
In Python,Not necessarily requiredWritten like C++ or Javaabstract class.
Python adopts "Duck Typing". As long as objects have the same method names and behaviors, they can be considered compatible.
class AClass:
def process(self, data):
print(f"AClass processing: {data}")
class BClass:
def process(self, data):
print(f"BClass processing: {data}")
def run(obj):
obj.process("data") # No need to specify the type, as long as there is this method
run(AClass())
run(BClass())
AClass Processing: Data
BClass Processing: Data
Although Python does not enforce types, if you want to clearly define the interface in team development or large projects, you can useabcMods.
from abc import ABC, abstractmethod
class BaseClass(ABC):
@abstractmethod
def process(self, data):
pass
class AClass(BaseClass):
def process(self, data):
print(f"AClass processing: {data}")
class BClass(BaseClass):
def process(self, data):
print(f"BClass processing: {data}")
# BaseClass() will report an error because the abstract method is not implemented
abcMods.In Python,Moduleis a file containing program code, usually with the file extension.pyThe end. Modules can define functions, categories, variables, and can also contain executable code to facilitate reuse and program structuring.
Python viaimportKeyword to introduce the module, for example:
import math
print(math.sqrt(16)) # Output 4.0
Developers can create their own mods. For example, create amymodule.py:
# mymodule.py
def greet(name):
return f"Hello, {name}!"
Use in another program file:
import mymodule
print(mymodule.greet("Alice"))
os: Handle operating system related operations.sys: Access Python interpreter related information.datetime: Processing date and time.math: Provides mathematical functions.random: Generate random numbers.In Python,PackageIt is a way of organizing modules. A package is a directory containing multiple modules, which makes the program easier to manage and maintain through a hierarchical structure.
__init__.pyarchive (omitted in Python 3.3+, but added to help clearly define the suite).Create a file calledmypackageKit of:
mypackage/
│── __init__.py
│── module1.py
└── module2.py
module1.pyexample:
def add(a, b):
return a + b
module2.pyexample:
def multiply(a, b):
return a * b
import mypackage.module1
import mypackage.module2
print(mypackage.module1.add(2, 3)) # Output 5
print(mypackage.module2.multiply(2, 3)) # Output 6
from mypackage.module1 import add
from mypackage.module2 import multiply
print(add(10, 5)) # Output 15
print(multiply(10, 5)) # Output 50
In Python, to get the current module'sPackage name, you can use special variables__package__. This and__module__The concept of getting the current module name is similar.
mypackage/
│── __init__.py
└── submodule.py
submodule.pycontent:
print("__name__:", __name__)
print("__package__:", __package__)
print("__module__:", __module__)
If used in other programs asimport mypackage.submoduleImport, the output is roughly as follows:
__name__: mypackage.submodule
__package__: mypackage
__module__: __main__
__name__: The complete module name.__package__: The name of the package to which the current module belongs. If it is a top-level module, it will be an empty string.__module__: Usually used in the definition of a class or function to indicate which module it is defined in.__package__Help confirm package paths.There are two main ways to check the Python package version: through the terminal (command line) or by running it in Python code. This can help you confirm whether the environment meets project requirements.
This is the fastest method and does not require entering the Python interactive environment.
pip show package name(For examplepip show pandas)。pip list。pip list | findstr "Package name"(Windows) orpip list | grep "package name" (Linux/Mac)。conda list package name。If you need to determine the version when the program is executed, you can use the following two methods:
import pandas
print(pandas.__version__)
from importlib.metadata import version
print(version('pandas'))
| method | Instructions/Code | Applicable situations |
|---|---|---|
| Pip command | pip show |
View detailed information such as installation path, author, dependencies, etc. |
| Pip list | pip list |
Get a quick overview of all packages and versions in your current environment. |
| internal properties | .__version__ |
Make logical judgments while the script is running. |
| Metadata | version() |
Standardized way of checking without loading the entire suite. |
Sometimes the problem isn't with the suite, but with the Python interpreter itself:
python --versionimport sys
print(sys.version)
pip listThe displayed version is different from the one executed in VS Code. Please confirm theInterpreterWhether the choice is correct.importlibChecking for uninstalled packages will throwPackageNotFoundError。pip showis not case-sensitive, but some code logic may require precise names.In Python, class objects can be accessed viacls.__module__Find the name of the module that defines the category and then usesys.modulesGet the module object and finally read it__package__Property to get the package name.
importsys
# Assume a category is defined in mypackage.submodule
classMyClass:
pass
# Get the name of the module to which the category belongs
module_name = MyClass.__module__
print("Module name:", module_name)
# Get module object
mod = sys.modules[module_name]
# Get the package name from the module object
print("Package name:", mod.__package__)
The module itself is an object and can be accessed directly__package__property.
import math
import mypackage.submodule as sub
# math is a standard function library module, there is no package, so __package__ is an empty string
print("math.__package__:", math.__package__)
# Modules for custom kits
print("sub.__package__:", sub.__package__)
Module name: mypackage.submodule
Package name: mypackage
math.__package__:
sub.__package__: mypackage
category → __module__ → sys.modules[...] → __package__Module object → __package__Obtain directlyWhen a Python file is executed directly, its module name is__main__. This makes using onlycls.__module__Unable to obtain original file name and path.
Assume your project structure is:
/project
|-- test_runner.py <-- The file you execute directly (will be treated as __main__)
|-- test/
|-- db_test.py <-- File that defines DbCmdAgent
If you execute the following code in `test_runner.py`:
from test.db_test import DbCmdAgent
agent_obj = DbCmdAgent(...) # Instantiation
# At this time cls.__module__ is still 'test.db_test' (correct module name)
However, if you define categories in `test_runner.py`:
# Contents of test_runner.py
class DbCmdAgent:
pass
agent_obj = DbCmdAgent()
#At this time cls.__module__ == '__main__' (wrong module name)
This indicates that the class you are getting information for is defined in a file that is executed as `__main__`.
inspectThe module can directly obtain the file pathRegardless of whether the category is defined in `__main__`, you can bypass the `__module__` attribute and useinspectmodule to obtain the source code file path corresponding to this category. This is a more reliable and general method.
import inspect
import os
# --- Simulation situation: categories defined in __main__ (currently executing script) ---
class DbCmdAgent:
"""This category is defined in the currently executing main script"""
def __init__(self, data):
self.data = data
agent_obj = DbCmdAgent("some_data")
def get_class_location_robust(obj):
"""
Use inspect.getfile to bypass the __module__ == '__main__' problem.
"""
cls = type(obj)
# 1. Use inspect.getfile() to get the file path that defines the category
try:
file_path = inspect.getfile(cls)
# 2. Get the file name and directory
file_name = os.path.basename(file_path)
directory = os.path.dirname(file_path)
py_name = os.path.splitext(file_name)[0]
# 3. If __module__ is __main__, replace it with the archive name to provide more context
module_name = cls.__module__
if module_name == '__main__':
module_name = py_name # Use db_test or test_runner as context
exceptTypeError:
# Handle built-in types
file_path = "N/A (Built-in or C extension)"
file_name = "N/A"
directory = "N/A"
py_name = "N/A"
module_name = cls.__module__
return {
"module_name_or_main": module_name,
"py_name_no_ext": py_name,
"directory": directory,
"file_path": file_path,
}
# Execute and view the results
location_info = get_class_location_robust(agent_obj)
print("--- Category definition file information ---")
print(f"File name (.py name): {location_info['py_name_no_ext']}")
print(f"Directory path (Package): {location_info['directory']}")
print(f"Full file path: {location_info['file_path']}")
inspect.getfile(type(obj)): This is the most reliable way to get the original definition file path directly, completely independent of whether the module is considered__main__。__module__This only reflects the class's module namespace in the current program, not necessarily its actual name in the project structure.This method usesimportlib.import_moduleTry to import the specified module. If you encounterModuleNotFoundError, will try to import its submodules from the currently loaded package.
import importlib
importsys
def safe_import(module_name):
try:
# Try to import directly
return importlib.import_module(module_name)
except ModuleNotFoundError:
# If failed, try to import the submodule from a known package
for pkg in list(sys.modules.keys()):
if pkg and not pkg.startswith("_"):
try:
return importlib.import_module(f"{pkg}.{module_name}")
except ModuleNotFoundError:
continue
raise # If still not found, throw an exception
import numpy
mod1 = safe_import("random") # It can succeed directly because it is a standard library
mod2 = safe_import("linalg") # Will try numpy.linalg
print(mod2.__name__) # Output numpy.linalg
sys.modulesSave all loaded modules.package.module_namemethod to try again.In Python,sys.pathis a list containing all directory paths that the Python interpreter will search in order when trying to import a module. When you executeimport some_module, Python will check in ordersys.patheach directory in the list until you find a directory namedsome_modulefiles (e.g.some_module.py、some_module/__init__.pywait).
sys.pathLists usually consist of the following three parts, and are searched in the following order:
sys.path[0]) is usually the currently executing Python script (e.g.main.py) is located in the directory.'', which represents the current working directory (Current Working Directory, CWD).PYTHONPATHsystem environment variables.sys.pathmiddle.PYTHONPATHUsually used to permanently add some commonly used project or library directories located in non-standard locations to the search path without having to manually set them in each project.os, sys, json, socket) installation directory.pipInstalled third-party packages (e.g.fastapi, numpy, pandas) installation directory. In a virtual environment, this is usually located in the environment'slib/site-packagesdirectory.In addition to the abovePYTHONPATHIn addition, there are several environment variables related to the Python execution environment, which affect the behavior of the interpreter and path finding, but affectsys.pathThe main variables that make up thePYTHONPATH:
| System variables | Function description | Relationship to sys.path |
|---|---|---|
PYTHONPATH |
Defines additional directories to be added to the mod search path. | direct impactsys.pathcomposition. |
PYTHONHOME |
Used to set an alternative path to the Python installation directory, especially for embedded systems. | Indirectly affects the standard library andsite-packageslocation. |
PATH |
Used by the operating system to find executable files (e.g.python.exe) path. |
does not directly affectsys.path, but affects which Python interpreter is executed. |
VIRTUAL_ENV |
When you are in a virtual environment, this variable points to the root directory of the virtual environment. | indirect impactsys.path, as it ensuressite-packagesIt comes from the virtual environment rather than the entire system. |
becausesys.pathis an ordinary Python list that you can dynamically modify while the program is running, but such modifications only take effect within the current interpreter session:
importsys
import os
# Add the parent directory to the search path (commonly used for testing or internal project reference)
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
Module Shadowing (or Name Shadowing) is a common error or programming problem in Python. It occurs when you accidentally create a code file or variable name that conflicts with the name of a built-in module or third-party library that you are trying to import or use.
sys.path) to find the mod.socket, json, random) or a file or directory with the same name as a third-party library (for example, you create a file namedsocket.pyfile), Python will load your local file first.The result of this is that your local file "covers" or "overrides" the standard module that should be loaded, making your code unable to access standard module functionality (such assocket.AF_UNSPEC) will fail because the local file you loaded does not have these properties.
| Standard module name | The local file name that caused the shadow | as a result of |
|---|---|---|
socket |
socket.py |
Unable to use standardsocketThe network constants of the module (such asAF_INET, AI_PASSIVE)。 |
json |
json.py |
Unable to use standardjsonModularloadsordumpsfunction. |
test(Project name) |
test.py |
In a unit testing environment, there may be conflicts with the internal logic of the testing framework. |
socket.pyrenamednetwork_handler.py。import sys; print(sys.path)Check Python's search path to see if it loads your local files in preference to the standard library directory.To get the current execution of the Python program__main__The name of the .py file it belongs to, which can be used__main__.__file__orsys.argv[0]. But in interactive mode, Jupyter, or-cMay not exist at execution time and therefore require safe handling.
import os
importsys
import __main__
def get_main_py_path():
"""Returns the absolute path of .py to which main belongs. If it cannot be found, it returns None"""
# Case 1: Normal execution .py
main_file = getattr(__main__, "__file__", None)
if main_file:
return os.path.abspath(main_file)
# Case 2: Judgment from sys.argv[0]
if len(sys.argv) > 0:
argv0 = sys.argv[0]
if argv0 not in ("", "-c", ""):
candidate = os.path.abspath(argv0)
if os.path.exists(candidate):
return candidate
# Case 3: Interactive mode, Jupyter, embed etc.
return None
# Example
path = get_main_py_path()
if path:
print("main path:", path)
print("main filename:", os.path.basename(path))
else:
print("main.py not found (maybe in interactive environment or not executed from archive)")
path = get_main_py_path()
filename = os.path.basename(path) if path else None
print(filename)
__main__.__file__is the most reliable way, but does not always exist.sys.argv[0]is a backup method, but you need to confirm that the value is not-cor<stdin>。
To get the parameter names and their corresponding values of a function in Python, you can use the `inspect` module, which provides introspection utilities. Specifically, `inspect.signature()` can help you retrieve the names of the parameters, and you can pass the current frame's local variables to get their values.
Here is an example that demonstrates how to get the function name, parameter names, and their values:
```python
import inspect
# Sample function
def my_function(a, b, c=5):
# Get the current frame
frame = inspect.currentframe()
# Get the function name
func_name = frame.f_code.co_name
print(f"Function name: {func_name}")
# Get the parameter names and their values
args, _, _, values = inspect.getargvalues(frame)
# Print parameter names and values
for arg in args:
print(f"Parameter name: {arg}, Value: {values[arg]}")
# Call the function
my_function(1, 2)
```
### Output:
```
Function name: my_function
Parameter name: a, Value: 1
Parameter name: b, Value: 2
Parameter name: c, Value: 5
```
### Explanation:
1. **`inspect.currentframe()`**: Retrieves the current execution frame.
2. **`frame.f_code.co_name`**: Extracts the name of the current function.
3. **`inspect.getargvalues(frame)`**: Gets the argument names and their corresponding values from the frame. This function returns a tuple containing:
- `args`: List of argument names.
- `_`: Placeholder for unused information.
- `values`: Dictionary containing argument names as keys and their values.
This allows you to print both the names of the function's parameters and their values at runtime.
In Python, you can useinspect.signature()Get function parameter information,
and further fromParameter.annotationThe property gets the type annotation (type hint) for each parameter.
import inspect
def my_function(a: int, b: str, c: float = 3.14) -> bool:
return str(a) == b
sig = inspect.signature(my_function)
for name, param in sig.parameters.items():
print(f"Parameter name: {name}")
print(f"Default value: {param.default}")
print(f"Type annotation: {param.annotation}")
print()
Parameter name: a
Default: <class 'inspect._empty'>
Type annotation: <class 'int'>
Parameter name: b
Default: <class 'inspect._empty'>
Type annotation: <class 'str'>
Parameter name: c
Default: 3.14
Type annotation: <class 'float'>
param.annotationWill get type annotations (such asint、str)。inspect._empty。get_type_hints()Get the complete type hint dictionary.This method will automatically resolve forward references (types marked as strings).
from typing import get_type_hints
hints = get_type_hints(my_function)
print(hints)
{'a': <class 'int'>, 'b': <class 'str'>, 'c': <class 'float'>, 'return': <class 'bool'>}
inspect.signature()The parameter order and name can be obtained.param.annotationYou can check the type annotation of each parameter.get_type_hints()It is easier to get all type hints (including return values) at once.In Python, you can use the object's__class__attribute ortype()function,
Get the category (class) it belongs to, and then get the category name.
class Animal:
pass
class Dog(Animal):
pass
obj = Dog()
#Method 1: Use __class__.__name__
print(obj.__class__.__name__) # Output: Dog
#Method 2: Use type()
print(type(obj).__name__) # Output: Dog
#Method 3: Get the complete module and category name
print(obj.__class__) # Output: <class '__main__.Dog'>
print(obj.__class__.__module__) # Output: __main__
Dog
Dog
<class '__main__.Dog'>
__main__
obj.__class__: Get the category object of the object.obj.__class__.__name__: Get the category name string.type(obj)Equivalent toobj.__class__。__module__: The name of the module where the category is defined can be obtained.To get the complete "module + category name" at the same time, you can write:
cls = type(obj)
full_name = f"{cls.__module__}.{cls.__name__}"
print(full_name)
__main__.Dog
type(obj).__name__This is the simplest way to get the category name.f"{obj.__class__.__module__}.{obj.__class__.__name__}"。The following is sample code that uses Python to determine which inheritance class a property belongs to:
import inspect
class BaseClass:
base_attr = "I am an attribute from BaseClass"
class SubClass(BaseClass):
sub_attr = "I am an attribute from SubClass"
# Define a function to find the category to which an attribute belongs
def find_attribute_owner(cls, attr_name):
for base in inspect.getmro(cls): # Get MRO (method resolution order)
if attr_name in base.__dict__:
return base
return None
# test
sub_obj = SubClass()
attributes = sub_obj.__class__.__dict__.items() # Get all attributes at the category level
for name, value in attributes:
owner = find_attribute_owner(sub_obj.__class__, name)
print(f"Attribute '{name}' belongs to category: {owner.__name__}")
__dict__, which stores the attributes defined by the category.For the categories in the example, the execution results are as follows:
Attribute '__module__' belongs to class: SubClass
Attribute 'sub_attr' belongs to class: SubClass
Attribute '__doc__' belongs to class: SubClass
Attribute 'base_attr' belongs to class: BaseClass
class MyClass:
def fun1(self):
'''
Fun1 comment
'''
pass
obj = MyClass()
print(obj.fun1.__doc__)
import inspect
class MyClass:
def fun1(self):
'''
Fun1 comment
'''
pass
print(inspect.getdoc(MyClass.fun1))
In Python, you can use the built-inwarningsModules that alert users at execution time (rather than compile time) that a function has been deprecated and suggest new alternatives.
import warnings
def old_function(x, y):
warnings.warn(
"Function old_function() is deprecated, please use new_function(x, y) instead.",
category=DeprecationWarning,
stacklevel=2
)
return x + y
def new_function(x, y):
return x + y
result = old_function(3, 4)
print(result)
DeprecationWarning: Function old_function() is deprecated, please use new_function(x, y) instead.
result = old_function(3, 4)
7
warnings.warn()The warning is displayed during the execution phase and does not interrupt the program.category=DeprecationWarningUsed to mark deprecation warnings.stacklevel=2Have the warning message point to the caller's location, not inside the function.python -W error::DeprecationWarning your_script.py
Decorators can be used to allow multiple old functions to share the same deprecation prompt logic:
import warnings
from functools import wraps
def deprecated(new_func_name):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
warnings.warn(
f"Function {func.__name__}() is deprecated, please use {new_func_name}() instead.",
category=DeprecationWarning,
stacklevel=2
)
return func(*args, **kwargs)
return wrapper
return decorator
@deprecated("new_function")
def old_function(x, y):
return x + y
In Python,@staticmethodand@classmethodBoth decorators can define methods that can be called without instantiating the class, but their purpose and behavior differ.
@staticmethodis not accepting any implicit first argument (noneselforcls) method. It's just like a normal function, just belonging to the class's namespace.@staticmethod。classMyClass:
@staticmethod
def static_method(x, y):
return x + y
# Static methods can be called without creating an instance
result = MyClass.static_method(5, 10) # Result: 15
focus:@staticmethodCategory cannot be accessed (cls) or instance (self)。
@classmethodis the accept category itself (cls) as the first parameter. This allows it to access and modify the category's status.@classmethod。classMyClass:
class_variable = 0
def __init__(self, value):
self.value = value
MyClass.class_variable += 1
@classmethod
def get_class_variable(cls):
return cls.class_variable
#Create instance
obj1 = MyClass(10)
obj2 = MyClass(20)
# Call category method
print(MyClass.get_class_variable()) # Result: 2
focus:@classmethodCan access category-level status (cls)。
| feature | @staticmethod | @classmethod |
|---|---|---|
| first parameter | No implicit first argument | cls(category itself) |
| access instance | none | none |
| Access Category | none | have |
| usage | Utility functions that are related to categories but do not require instances or categories | Need to manipulate category-level data or provide alternative constructors |
Python itself does not provide a "default static method" or a "default class method", that is, the function of automatically executing a method the first time any static or class method is called. But we can achieve similar behavior through lazy loading techniques.
You can define a static variable in the class to track the initialized state, and then execute the initialization logic when the static or class method is called for the first time.
classMyClass:
initialized = False # Static variable, tracking whether it has been initialized
@staticmethod
def init_once():
if not MyClass.initialized:
print("Initialization logic execution...")
MyClass.initialized = True
@classmethod
def class_method(cls):
cls.init_once()
print("Call class method")
@staticmethod
def static_method():
MyClass.init_once()
print("Call static method")
# Call the category method for the first time to trigger initialization
MyClass.class_method() # Output: Initialization logic execution...Call class method
# Call the category method for the second time and no longer perform initialization
MyClass.class_method() # Output: Call class method
# The first time the static method is called, no initialization is performed because it has already been initialized.
MyClass.static_method() # Output: Call static method
initialized: This variable is used to track whether the category has been initialized. The initial value isFalse。init_once()method: This method is responsible for executing the initialization logic and willinitializedset toTrue, to prevent repeated initialization.Although Python does not have a built-in "default static method" or "default class method", by using static variables and lazy loading techniques, you can automatically execute initialization logic when the static or class method is called for the first time, and ensure that this logic will only be executed once.
In Python, thread is a mechanism used to implement concurrency. It allows a program to perform multiple tasks simultaneously within a single process (Process). This is very useful for performing I/O-intensive operations (such as network communication, file reading and writing), and can prevent the program from blocking (Blocking) waiting for external operations to complete.
In the standard CPython interpreter, there is a "Global Interpreter Lock" (Global Interpreter Lock, GIL). The GIL ensures that only one thread can execute Python bitcode at any given time. This means:
multiprocessingmodule).threadingPython uses the standard function librarythreadingModule to handle threads of execution. There are two main methods of thread creation:
This is the simplest and most common usage.
import threading
import time
def task(name, delay):
"""Task function to be executed by the thread"""
print(f"Thread {name}: Starting...")
time.sleep(delay) # Simulate time-consuming I/O operations
print(f"Thread {name}: Task completed.")
#Create execution thread
thread1 = threading.Thread(target=task, args=("T1", 2))
thread2 = threading.Thread(target=task, args=("T2", 4))
# Start the thread
thread1.start()
thread2.start()
# Wait for all threads to complete (block the main thread until they finish)
thread1.join()
thread2.join()
print("All execution threads have completed. The main program exits.")
threading.ThreadcategorySuitable for more complex scenarios, encapsulating the logic of the execution thread in a category.
import threading
import time
class MyThread(threading.Thread):
def __init__(self, name, delay):
super().__init__()
self.name = name
self.delay = delay
def run(self):
"""
When the thread starts, the run() method is automatically called.
Here you define the tasks to be performed by the thread.
"""
print(f"Thread {self.name}: Starting...")
time.sleep(self.delay)
print(f"Thread {self.name}: Task completed.")
#Create and start the thread
thread3 = MyThread("T3", 3)
thread3.start()
thread3.join()
print("Custom execution thread has been completed.")
When multiple threads access and modify shared data, race conditions may occur. You need to use a synchronization mechanism to protect your data:
Lock: Lock is the most basic synchronization primitive. Called before a thread accesses a shared resourcelock.acquire(), call after completionlock.release()。RLock(Reentrant lock): Allows the same thread to acquire the lock multiple times.Semaphore(Log): Used to limit the number of threads that can access resources at the same time.import threading
# Shared resources
counter = 0
# Create lock
lock = threading.Lock()
def increment_counter():
global counter
# Obtain the lock to ensure that only one thread can execute this block at the same time
lock.acquire()
try:
# Competition section
current_value = counter
time.sleep(0.001) # Simulate switching
counter = current_value + 1
finally:
# Release lock
lock.release()
threads = []
for i in range(100):
t = threading.Thread(target=increment_counter)
threads.append(t)
t.start()
for t in threads:
t.join()
print(f"Final counter value: {counter}") # If there is no lock, this value may not be 100
PythonthreadingThe module provides the function of creating and managing execution threads, but due to operating system limitations and design philosophy, Python does not provide a safe, direct, and forced method to stop (Kill) external execution threads. Forced stopping may result in resource leakage or data corruption.
Therefore, stopping the execution thread must be achieved through **Cooperative Mechanism****, that is, letting the execution thread check a stop flag by itself and exit gracefully.
This is the safest and most recommended method of thread stopping. It requires the thread to periodically check an external variable (flag) in the loop of executing the task.
import threading
import time
# Shared stop flag
stop_flag = threading.Event()
def monitored_task(name, delay):
"""
A task function that periodically checks the stop flag
"""
print(f"Thread {name}: Starting...")
i = 0
while not stop_flag.is_set(): # Check whether the flag is set
i += 1
print(f"Thread {name}: Execution step {i}")
# Simulate time-consuming operations and check regularly
time.sleep(delay)
# Here you can set a limit on the number of executions to ensure that there is no infinite loop
if i >= 5:
break
print(f"Execution thread {name}: received a stop signal or the task ended, and exited gracefully.")
# --- Main program control block ---
#Create execution thread
worker_thread = threading.Thread(target=monitored_task, args=("Worker-1", 1))
# Start the thread
worker_thread.start()
print("\nMain program: The execution thread has been started, wait 3 seconds...\n")
time.sleep(3) # Let the thread run for a while
# Send stop signal
print("\nMain program: Set stop flag...\n")
stop_flag.set() # Set Event and let is_set() return True
# Wait for the thread to finish gracefully and exit (usually quickly)
worker_thread.join()
print("\nMain program: The thread has been safely stopped and joined. The program exited.")
threading.Event(): A simple synchronization primitive. It maintains a Bollinger flag internally.stop_flag.is_set(): Check whether the flag is True (i.e. whether a stop signal is issued).stop_flag.set():Set the flag to True to send a stop signal.while not stop_flag.is_set():: This is the main control structure of the thread. As long as no stop signal is received, the loop will continue to execute.worker_thread.join(): Ensure that the main program will wait for the work thread to complete (gracefully exit) before continuing execution. This is an important step to ensure a clean ending to the procedure.Although there are some experimental or unsafe ways to force a stop, such as using `_thread.stop()` or throwing an exception, these methods can cause:
Therefore, in Python, you should always stick to using a coordinated flagging mechanism to stop threads.
This is the most recommended approach in multi-threaded environments. Instead of having all threads grab the same object, each thread has an independent copy of the object. In Python, you can usethreading.local()to achieve.
import threading
# Create a thread area to store objects
thread_data = threading.local()
def get_service():
# If the current thread does not have its own service, create one
if not hasattr(thread_data, 'service'):
print(f"Create a new connection for thread {threading.current_thread().name}")
thread_data.service = create_new_connection()
return thread_data.service
def task():
service = get_service()
# Perform operations...
If the object must be the same (such as writing to the same file or operating on the same global counter), you must useLock. This ensures that only one thread can access the object at the same time, avoiding race conditions.
lock = threading.Lock()
def safe_task():
with lock:
# Within this block, other threads must wait
shared_object.do_something()
---
If you want to avoid the risk of lock contention or crashes that come with multiple threads, there are two main alternatives to consider:
This is currently the most popular approach in Python (such as the core principle of FastAPI). It runs within a single thread and waits for I/O (such as API requests, database queries) by switching tasks.
import asyncio
async def fetch_api(url):
# Use asynchronous libraries such as aiohttp
response = await call_api(url)
return response
async def main():
# Execute multiple tasks at the same time, but switch within a single thread
results = await asyncio.gather(fetch_api("url1"), fetch_api("url2"))
Python's execution thread is limited by GIL (Global Interpreter Lock) and cannot truly parallelize operations.multiprocessingMultiple independent Python interpreter instances will be opened.
from multiprocessing import Process
def task(name):
print(f"Process {name} is executing")
if __name__ == "__main__":
p = Process(target=task, args=('A',))
p.start()
p.join()
If you want to completely decouple tasks, you can use Celery or Redis Queue. Throw the task into the queue and let the back-end Worker (which may be multiple processes or multiple machines) pick up and execute it.
---| plan | Solution | Applicable situations |
|---|---|---|
| Thread-Local | Each thread gets a copy | API Service, database connection |
| Asyncio | Single thread switching (non-synchronous) | High concurrent network requests (recommended) |
| Multiprocessing | independent memory space | CPU computing, completely avoiding sharing conflicts |
When you use Thread-Local Storage (TLS), the purpose is to protect those "non-thread-safe" objects (such as API Service, database connections). But if data needs to be exchanged between threads (for example, the results downloaded by thread A need to be processed by thread B), you need to establish a special "communication channel".
Pythonqueue.QueueIs thread safe. This is the most standard and safest way to transfer information between threads. It has already processed all lock logic internally.
import threading
import queue
# Create a global queue that all execution threads can access
task_queue = queue.Queue()
def producer():
# Produce data and put them in the queue
data = {"video_id": "abc", "status": "pending"}
task_queue.put(data)
def consumer():
# Get data from the queue
data = task_queue.get()
# Process data...
task_queue.task_done()
If you need to share a large list or dictionary, you can use general global variables, but they must be accessed with matchingthreading.Lock。
shared_results = []
results_lock = threading.Lock()
def task():
result = "Some operation results"
#Lock before accessing shared resources
with results_lock:
shared_results.append(result)
# Automatically unlock after leaving the with block
Sometimes you don't want to share "data" but "state" (for example: telling other threads that the API has been initialized).
api_ready = threading.Event()
def initializer():
#Perform initialization
api_ready.set() # Send signal
def worker():
api_ready.wait() # Wait for the signal until the initializer calls set()
print("Start working")
| Content type | Storage location | management style |
|---|---|---|
| Tool objects(API, DB connection) | Thread-Local (region) | Each has a copy to avoid crashes. |
| Mission information(ID, parameter) | Queue (global) | Use thread-safe queue passing. |
| Calculation result(statistics) | Global List/Dict (global) | Must matchthreading.Lock。 |
To put it simply: **"Get the private tools (connections) by yourself, and get the public information (data) in queue (Queue/Lock)."**
In a multi-thread environment, when multiple threads try to modify the same global variable or shared resource (such as a file, database connection, global list) at the same time, an error occurs.Race Condition, leading to data confusion.threading.LockIs a synchronization primitive that ensures that only one thread can enter a protected block of code at a time.
The safest and recommended way is to matchwithUse narrative sentences. This ensures that even if an exception occurs within the block, the lock will be released correctly to avoid deadlock.
import threading
# 1. Create a lock object
my_lock = threading.Lock()
shared_counter = 0
def increment_task():
global shared_counter
# 2. Use with to automatically manage acquire() and release()
with my_lock:
# The code in this block can only be executed by one thread at a time
temp = shared_counter
temp += 1
shared_counter = temp
# Start multiple thread tests
threads = [threading.Thread(target=increment_task) for _ in range(100)]
for t in threads: t.start()
for t in threads: t.join()
print(f"Final count: {shared_counter}")
Although not recommended, sometimes finer control is needed. You must manually callacquire()acquire the lock andfinallycall in blockrelease()。
lock = threading.Lock()
def manual_task():
lock.acquire() #Acquire the lock. If the lock is already occupied, it will block (wait) here.
try:
# Execute tasks
pass
finally:
lock.release() # Must be released, otherwise other threads will never be able to execute
threading.LockIt is not reentrant. This means that if the same thread requests the same lock again when it already holds the lock, it will "lock" itself (deadlock).
threading.RLock()(Reentrant lock).Google API Serviceobject.Excessive use of locks will cause program performance to decrease because multiple threads will become "queued for execution." If possible, give priority to usingqueue.Queueor what we discussed earlierThread-Local Storage, these methods are usually more efficient and less error-prone than frequent locking.
In Python,async defandawaitis realizedAsynchronous Programmingcore syntax. They prevent the program from getting stuck while waiting for I/O tasks (such as network requests, reading files) and can instead handle other tasks, greatly improving performance.
When you prepend a function definition withasync, the function will become aCoroutine Function. When you call it, it will not execute the content immediately, but will return a "coroutine object".
async def fetch_data():
print("Start fetching data...")
# Simulate time-consuming tasks
return {"data": "success"}
# A direct call will only get the coroutine object and will not execute print
result = fetch_data()
print(result) # Output: <coroutine object fetch_data at ...>
awaitcan only beasync defFor internal use. Its function is to "temporarily suspend the current coroutine, wait for the subsequent tasks to complete, and obtain the return value." While waiting, the system can perform other asynchronous tasks.
import asyncio
async def main():
# Use await to execute the coroutine and get the results
data = await fetch_data()
print(f"Capture results: {data}")
# Start the entry for asynchronous programs
asyncio.run(main())
| grammar | Function description | Things to note |
|---|---|---|
| async def | Declare an asynchronous function | What is returned is the coroutine object, not the execution result. |
| await | Wait for asynchronous tasks to complete | Can only be written inside async functions. |
| asyncio.run() | Start the outermost asynchronous entry | A program usually only needs to be called once. |
await, you will getcoroutineobject, when you try to read properties (such asresult.text) will spray out whenAttributeError。SyntaxError. You must ensure that the outer function hasasyncGrooming.asyncused in functionstime.sleep()It will block the entire event loop. Please use insteadawait asyncio.sleep()。Imagine you are cooking:
await) at the same time, when you go to chop vegetables, the sound of boiling water (event notification) will call you back to deal with it.loop.run_until_complete()yesasyncioThe lower-level method in the module is used to execute the coroutine until it is completed and return the result. After Python 3.7, although it is officially recommended to useasyncio.run(), but you still need to use this method in some specific situations (such as when you need to reuse event loops or customize startup logic).
userun_until_completeYou must first obtain or create an event loop object, and then pass the coroutine to it.
import asyncio
async def my_task():
await asyncio.sleep(1)
return "Task completed"
# 1. Get event loop
loop = asyncio.get_event_loop()
# 2. Execute the coroutine until completion and obtain the return value directly
result = loop.run_until_complete(my_task())
print(result) #Output: Task completed
Both can get the return value, but manage the life cycle differently:
| characteristic | asyncio.run() (recommended) | loop.run_until_complete() |
|---|---|---|
| degree of automation | high. Automatically create, close loops and clean up tasks. | Low. The life cycle of the loop needs to be managed manually. |
| reusability | Low. Each call creates a new loop. | high. Multiple tasks can be executed in the same loop. |
| Usage restrictions | Cannot be used within an already running loop. | More flexible and often used in legacy code or testing environments. |
If you are in a script that is already running and want to ensure that a coroutine is executed and the value is obtained, you can use this method:
import asyncio
async def add(a, b):
return a + b
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
# Execute the first task
val1 = loop.run_until_complete(add(10, 20))
#Execute the second task
val2 = loop.run_until_complete(add(val1, 5))
print(f"Final result: {val2}")
finally:
# Must be closed manually
loop.close()
If you want multiple coroutines to execute in parallel and retrieve the return value uniformly, you need to matchasyncio.gather:
async def task(id):
return f"result {id}"
loop = asyncio.get_event_loop()
# gather will wrap multiple coroutines into one task, and run_until_complete will wait for all to be completed.
results = loop.run_until_complete(asyncio.gather(task(1), task(2), task(3)))
print(results) #Output: ['Result 1', 'Result 2', 'Result 3']
async defcall within functionrun_until_complete, will spurt outThis event loop is already running. Please always use it inside asynchronous functions.await。loop.close()to release resources, otherwise the program may not exit normally or generate a warning.run_until_completeWill return coroutinereturnIf an uncaught exception occurs in the coroutine, the exception will be thrown to the calling end.asyncio.run()It is a high-level API introduced since Python 3.7. It is the most recommended way to start asynchronous programs. it will automatically take care ofEstablish an event loop, execute the coroutine, and close the loop after completion. The most important thing is that it will be passed directly back to the coroutinereturnvalue.
You just need toasync defThe function call passed toasyncio.run(), it will return the result like a normal synchronous function.
import asyncio
async def calculate_score(name):
print(f"Calculating the score of {name}...")
await asyncio.sleep(1) # Simulate time-consuming tasks
return 95
# Get the result of return directly
final_score = asyncio.run(calculate_score("Zhang Xiaoming"))
print(f"The final score is: {final_score}") # Output: 95
Usually we will define amain()function as the entry point, and obtain the results of all subtasks inside, and finally byasyncio.run(main())Unified output.
async def task_a():
return "apple"
async def task_b():
return "banana"
async def main():
# Use gather in main to execute simultaneously
results = await asyncio.gather(task_a(), task_b())
return results # Return a list
# Get the return value of main through asyncio.run
all_fruits = asyncio.run(main())
print(all_fruits) # Output: ['apple', 'banana']
| Rule items | illustrate |
|---|---|
| single entrance | In an execution continuation, usually only called onceasyncio.run()。 |
| Automatic cleaning | It automatically cancels all remaining tasks and closes the thread pool, which is very safe. |
| Nesting limit | cannotis alreadyasync deffunction internal callasyncio.run()。 |
If you are inside an asynchronous function trying to get the results of another asynchronous function, useawait, instead ofasyncio.run()。
# Error demonstration
async def sub_task():
return 10
async def main():
# An error will occur here: RuntimeError: asyncio.run() cannot be called from a running event loop
res = asyncio.run(sub_task())
# Correct correction
async def main():
res = await sub_task() # Please use await in an asynchronous environment
asyncio.run()As the start button of the program, place it inif __name__ == "__main__":in the block.asyncio.run()The exception will be thrown directly. It is recommended torunthe outer layer plustry...except。asyncio.run()can be removed intact.NumPy(Numerical Python) is the most important scientific computing library in Python. It provides efficient multi-dimensional array objectsndarray, and a large library of mathematical functions for manipulating these arrays. It is the underlying pillar of data science, machine learning (such as Pandas, Scikit-learn, TensorFlow) and other fields.
forCircle.import numpy as np
# Create one-dimensional and two-dimensional arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([[1, 2], [3, 4]])
# Quickly create a specific array
zeros = np.zeros((3, 3)) # 3x3 matrix with all zeros
ones = np.ones((2, 4)) # 2x4 matrix with all 1’s
eye = np.eye(3) # 3x3 identity matrix
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
| Function | Code example | illustrate |
|---|---|---|
| Shape check | arr.shape |
Return the size of each dimension (such as (3, 2)). |
| change shape | arr.reshape(1, 6) |
Change dimensions without changing the data. |
| Matrix multiplication | np.dot(a, b)ora @ b |
Perform matrix multiplication in linear algebra. |
| statistical function | np.mean(), np.std() |
Calculate the mean, standard deviation, maximum and minimum values. |
NumPy's slicing syntax is similar to Python List, but more powerful and supports multi-dimensional simultaneous slicing:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Get the second column (index 1)
print(arr[1, :]) # [4, 5, 6]
# Get the 2x2 sub-matrix in the lower right corner
print(arr[1:, 1:]) # [[5, 6], [8, 9]]
pip install numpyorconda install numpy。import numpy as np
print(np.__version__)
As mentioned earlier,NumbaIt can perfectly recognize the array structure of NumPy and further compile complex NumPy operations into machine code, reaching the execution limit close to the native C language.
NumbaIt is an open source JIT (Just-In-Time) compiler specifically designed to accelerate Python code that processes large amounts of data. It can translate Python functions into machine code with execution speed comparable to C, C++ or Fortran, and is especially suitable forNumPyArray operations.
The easiest way to use Numba is to add@jitor@njitDecorator.
from numba import njit
import numpy as np
# @njit represents "nopython" mode, which ensures not to enter the Python parser and is the fastest
@njit
def fast_function(n):
total=0
for i in range(n):
total += i
return total
# The first call will compile, the second call will directly execute the machine code
print(fast_function(10000000))
| model | Decorator | illustrate |
|---|---|---|
| nopython mode | @njit |
Recommended.Completely separated from the Python interpreter and directly compiled into machine code. If the program code contains parts that cannot be compiled, an error will be reported. |
| object mode | @jit |
If it cannot be compiled, it will fall back to the Python parser for execution. Performance improvements are usually limited. |
To take advantage of your CPU's multi-core capabilities, simply turn onparallel=Trueand useprange。
from numba import njit, prange
@njit(parallel=True)
def parallel_sum(A):
s = 0
# prange will automatically allocate loops to different CPU cores
for i in prange(A.shape[0]):
s += A[i]
return s
pip install numbaorconda install numba。llvmlitelibrary).import numba
print(numba.__version__)
dict、listOr the performance is poor when customizing objects.Pandas is a Python-based data analysis and manipulation tool designed for processing structured data, such as tabular data or time series data.
import pandas as pd
# Create DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['Taipei', 'Taichung', 'Kaohsiung']}
df = pd.DataFrame(data)
# View data
print(df)
# Filter information older than 28
filtered_df = df[df['age'] > 28]
print(filtered_df)
Pandas provides efficient, flexible and intuitive operation methods, which is particularly suitable for data analysis and processing. It is one of the indispensable tools in data science and machine learning.
Pandas is a powerful data analysis tool, and both beginners and advanced users can benefit from its easy-to-use design and extensive functionality.
First you need to installgoogletranskit. Enter the following command at the command line:
pip install googletrans==4.0.0-rc1
Note: Please confirm the version when installing4.0.0-rc1, as older versions may no longer work.
Here is an example of translating English into Traditional Chinese:
from googletrans import Translator
#Initialize the Translator object
translator = Translator()
# Translate text
text = "Hello, how are you?"
result = translator.translate(text, src="en", dest="zh-tw")
# Output translation results
print("original text:", text)
print("Translation:", result.text)
You can translate multiple languages, here are the common language codes:
enzh-twzh-cnjakofrdeGoogletrans is an unofficial Google Translate API and may stop working due to changes on Google's end. If you find that the translation function is not working, please consider using other translation APIs, such as Google's official Cloud Translation API.
DeepL provides high-accuracy translation services, but requires an API key to use its developer API.
The translation tool provided by Microsoft supports multi-language translation, but requires the use of Azure API key settings.
The translation service provided by Amazon Web Services (AWS) provides efficient translation of multilingual texts and needs to be accessed through the API key provided by AWS.
LibreTranslate is an open source translation tool that can set up its own server and does not require an API key. Some third-party public servers also offer the option of using them without an API key.
TextBlob is a natural language processing-based tool with built-in Google Translate functionality. However, older versions do not require an API key and may need to pay attention to version support.
MyMemory provides memory-based translation. Some functions do not require an API key, but advanced use may require application.
Among Googletrans' competitors, LibreTranslate and some versions of TextBlob offer options that don't require an API key. If you need a tool that is completely free and requires no additional setup, consider these options.
OpenCC(Open Chinese Convert) is an open source project dedicated to the conversion of Simplified Chinese to Traditional Chinese. It is not just a simple word-to-word conversion, but more importantly, it handlesvocabulary levelconversion and differences in word usage habits in different regions (Mainland China, Taiwan, Hong Kong).
In Python, the most commonly used wrapper library isopencc-python-reimplemented。
# Installation instructions
pip install opencc-python-reimplemented
# Basic code example
from opencc import OpenCC
# Initialization, s2t stands for Simplified to Traditional (simplified to traditional)
cc = OpenCC('s2t')
text = "Artificial intelligence has changed the world"
result = cc.convert(text)
print(result) #Output: Artificial intelligence has changed the world
| Schema name | illustrate | Applicable situations |
|---|---|---|
| s2t | Simplified Chinese to Traditional Chinese | Convert standard characters from simple to traditional. |
| t2s | Traditional Chinese to Simplified Chinese | Standard characters are converted from traditional to simplified. |
| s2twp | Simplified Chinese to Taiwanese Traditional Chinese (including vocabulary) | Most commonly used in Taiwan.Will convert "software" to "software". |
| tw2s | Taiwan Traditional Chinese to Simplified Chinese | Convert Taiwanese idioms to Mainland China simplified Chinese. |
| s2hk | Simplified to Hong Kong Traditional | It conforms to the word usage habits in Hong Kong (such as "里" is changed to "里"). |
If you don't want to write a program, OpenCC can also be operated directly in the terminal:
# Convert input.txt to output.txt, using Simplified to Taiwanese Traditional mode
opencc -i input.txt -o output.txt -c s2twp.json
reimplementedVersion..jsonFile definitions, you can go to the OpenCC installation directory to see how these dictionaries work.The following example will show how to use Python to query all phonetic notation of Chinese characters. we usepypinyinKit to obtain the pinyin of Chinese characters and customize the correspondence table from pinyin to phonetic symbols.
pip install pypinyin
The following is the Python code, including the phonetic correspondence table and the program logic for querying and generating HTML results.
from pypinyin import pinyin, Style
from jinja2 import Template
# Simple correspondence table from Pinyin to Zhuyin
pinyin_to_zhuyin = {
"a": "ㄚ", "ai": "ㄞ", "an": "ㄢ", "ang": "ㄤ", "ao": "ㄠ",
"ba": "ㄅㄚ", "bai": "ㄅㄞ", "ban": "ㄅㄢ", "bang": "ㄅㄤ", "bao": "ㄅㄠ",
# Omit some correspondences and need to improve them yourself
"hao": "ㄏㄠ", "hao": "ㄏㄠ", "hǎo": "ㄏㄠˇ", "hào": "ㄏㄠˋ"
}
# Query all pinyin of Chinese characters and convert them to pinyin
def get_zhuyin(char):
pinyins = pinyin(char, style=Style.NORMAL, heteronym=True)
unique_pinyins = set(pinyins[0])
zhuyins = {p: pinyin_to_zhuyin.get(p, p) for p in unique_pinyins} # Convert Pinyin to Zhuyin
return list(zhuyins.values())
# Query characters
chinese_char = 'good' # can be replaced with other characters
zhuyin_results = get_zhuyin(chinese_char)
pinyin_to_zhuyinDictionary to map pinyin to phonetic symbols. This correspondence table needs to be expanded by itself.heteronym=TrueMake sure that all pronunciations of multi-phonetic characters are displayed.import os
# Check if it is a file
if os.path.isfile("example.txt"):
print("File exists")
# Check if it is a folder
if os.path.isdir("my_folder"):
print("The folder exists")
# Check if the file or folder exists
if os.path.exists("path/to/check"):
print("path exists")
from pathlib import Path
path = Path("example.txt")
if path.exists():
print("path exists")
if path.is_file():
print("is a file")
if path.is_dir():
print("It is a folder")
In Python, the most standard and powerful module for handling file copying, moving, renaming and deletion isshutil(for high-level operations) andos(for basic path operations).
The copy operation has different functions depending on whether metadata needs to be preserved (such as creation time, permissions):
import shutil
# Copy a single file
shutil.copy2('source.txt', 'destination.txt')
#Copy the entire folder (the destination folder must not exist)
shutil.copytree('my_folder', 'backup_folder')
The underlying logic of moving and renaming is very similar. useshutil.moveCan be executed across disks, whileos.renameIt is often used to rename the same disk.
| Function | Common commands | illustrate |
|---|---|---|
| Move files or directories | shutil.move(src, dst) |
Move files or entire directories to new paths, supporting cross-partitions. |
| Rename file | os.rename(old, new) |
Change file or directory name. |
import shutil
import os
# Move files
shutil.move('test.txt', 'archive/test.txt')
# Rename
os.rename('old_name.txt', 'new_name.txt')
Delete operations should be used with caution, as these commands usually do not go through the recycle bin, but directly remove them permanently.
# Delete a single file
if os.path.exists('temp.txt'):
os.remove('temp.txt')
# Forcefully delete the entire folder
shutil.rmtree('old_data_folder')
Before performing a copy or move, check whether the path exists to avoid program crashes:
os.path.join()orpathlibModule to combine paths to ensure that the program works between Windows (backslashes) and Linux (forward slashes).try...exceptpack.shutil.copyandshutil.moveWhen a file with the same name already exists in the destination path, it will usually be overwritten directly. Before execution, you should confirm whether it needs to be renamed to avoid it.PythonloggingThe suite is a powerful built-in set for effective log management in applications. Whether logging debugging information, error tracking, or performance monitoring,loggingEach suite can provide multi-level and multi-format logging methods.
DEBUG、INFO、WARNING、ERRORandCRITICAL。The following is a basicloggingKit configuration example:
import logging
# Set logger
logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
# Set Handler
console_handler = logging.StreamHandler()
file_handler = logging.FileHandler('app.log')
# Set Formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
console_handler.setFormatter(formatter)
file_handler.setFormatter(formatter)
# Add Handler to logger
logger.addHandler(console_handler)
logger.addHandler(file_handler)
# Test different levels of logs
logger.debug('This is a debug message')
logger.info('This is an information message')
logger.warning('This is a warning message')
logger.error('This is an error message')
logger.critical('This is a serious message')
loggingThe following log levels are supported:
DEBUG: The lowest level, used for debugging information.INFO: General information, such as system operating status.WARNING: Warning information, but will not cause the program to stop.ERROR: Error information, which usually causes problems due to errors.CRITICAL: The most serious error that may cause the program to terminate.Can be usedFormatterClass to define the output format of the log. For example:
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
This format will output something similar to the following:
2024-10-25 10:00:00 - my_logger - INFO - This is an information message
loggingThe suite provides a variety of log management options, allowing developers to customize log levels, output formats and destinations according to needs, improving the maintainability and debugging efficiency of applications.
in Pythonloggingin the kit,HandlerIt is a key component responsible for defining the output location of the log. Different processors can output logs to different destinations, including the console, a file, the network, and even email.StreamHandlerOne of the most commonly used processors, it is responsible for outputting logs to the console.
existlogging, some commonHandlerinclude:
StreamHandler: Output the log tostdoutorstderr(usually the console).FileHandler: Output logs to a file.NullHandler: Ignore log output, suitable for situations where display is not required.SMTPHandler: Send logs via email.StreamHandleris the most commonly used processor for console output, it usually sends log messages to standard error (stderr). through settingsStreamHandler, which allows the logs to be displayed directly on the console, suitable for real-time monitoring of system status.
If you do not want the logs to be displayed on the console, you can remove them from the LoggerStreamHandler, or use insteadNullHandlerto avoid output.
The following example shows how to configureloggingand remove the console outputStreamHandler:
import logging
# Set Logger
logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
# Add StreamHandler to output to console
console_handler = logging.StreamHandler()
logger.addHandler(console_handler)
# Test output
logger.info("This message will be displayed on the console")
# Remove console output
logger.removeHandler(console_handler)
logger.info("This message will not be displayed on the console")
Can be usedNullHandlerTo disable all output of the log:
import logging
# Set up Logger and add NullHandler
logger = logging.getLogger('my_logger')
logger.addHandler(logging.NullHandler())
# This message will not be displayed in the console
logger.info("This message will not be displayed")
HandleryesloggingA component in a suite that defines the output location. throughStreamHandler, log messages can be displayed on the console for real-time monitoring. If console output is not needed, it can be removedStreamHandleror useNullHandlerto disable logging output.
To include a thread ID in logging.Formatter, you can use the%(thread)dproperty. This displays the unique integer ID assigned to the thread by the system.
import logging
# Add %(thread)d to display thread ID
log_formatter = logging.Formatter(
'%(asctime)s.%(msecs)03d %(thread)d %(levelno)s %(message)s',
"%Y-%m-%d %H:%M:%S"
)
#Example output:
# 2025-12-20 16:30:45.123 140735612345678 20 This is a log message
In addition to the integer ID, you can also choose to display the name of the thread according to your needs:
This code shows how to apply this format in a multi-threaded environment:
import logging
import threading
# 1. Set formatter
log_format = '%(asctime)s.%(msecs)03d [Thread:%(thread)d] %(levelno)s %(message)s'
date_format = "%Y-%m-%d %H:%M:%S"
formatter = logging.Formatter(log_format, date_format)
# 2. Set log processor
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger = logging.getLogger("ThreadTest")
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# 3. Test function
def task():
logger.info("Subthread is executing")
# Start testing
logger.info("Main thread started")
t = threading.Thread(target=task)
t.start()
t.join()
Python has built-in support for SQLite and is suitable for small applications.
import sqlite3
conn = sqlite3.connect("example.db")
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT)")
cursor.execute("INSERT INTO users (name) VALUES (?)", ("Alice",))
conn.commit()
cursor.execute("SELECT * FROM users")
print(cursor.fetchall())
cursor.close()
conn.close()
Used to connect to the MySQL database.
pip install pymysql
import pymysql
conn = pymysql.connect(host="localhost", user="root", password="password", database="test")
cursor = conn.cursor()
cursor.execute("SELECT * FROM users")
print(cursor.fetchall())
cursor.close()
conn.close()
Used to connect to PostgreSQL.
pip install psycopg2
import psycopg2
conn = psycopg2.connect(dbname="testdb", user="user", password="password", host="localhost")
cursor = conn.cursor()
cursor.execute("SELECT * FROM users")
print(cursor.fetchall())
cursor.close()
conn.close()
Used to connect to Microsoft SQL Server.
pip install pyodbc
import pyodbc
conn = pyodbc.connect("DRIVER={SQL Server}; SERVER=localhost; DATABASE=test; UID=user; PWD=password")
cursor = conn.cursor()
cursor.execute("SELECT * FROM users")
print(cursor.fetchall())
cursor.close()
conn.close()
Suitable for file-based databases.
pip install pymongo
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["testdb"]
collection = db["users"]
collection.insert_one({"name": "Alice", "age": 25})
print(list(collection.find()))
Suitable for caching and high-performance Key-Value access.
pip install redis
import redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
r.set("name", "Alice")
print(r.get("name"))
Suitable for full-text search and analysis.
pip install elasticsearch
from elasticsearch import Elasticsearch
es = Elasticsearch("http://localhost:9200")
doc = {"name": "Alice", "age": 25}
es.index(index="users", document=doc)
print(es.search(index="users", query={"match_all": {}}))
Supports multiple SQL databases and provides ORM functions.
pip install sqlalchemy
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, sessionmaker
engine = create_engine("sqlite:///example.db")
Base = declarative_base()
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
session.add(User(name="Alice"))
session.commit()
print(session.query(User).all())
sqlite3(built-in),pymysql(MySQL)、psycopg2(PostgreSQL)、pyodbc(SQL Server)。pymongo(MongoDB)、redis(Redis)、elasticsearch(Elasticsearch)。SQLAlchemy(Supports multiple SQL databases).Install PyMySQL using the following instructions:
pip install pymysql
Connect to the MySQL server using PyMySQL:
importpymysql
# Establish connection
conn = pymysql.connect(
host="localhost",
user="your_user",
password="your_password",
database="your_database",
charset="utf8mb4",
cursorclass=pymysql.cursors.DictCursor # Return dictionary format
)
#Create cursor
cursor = conn.cursor()
# Query information
cursor.execute("SELECT * FROM your_table")
result = cursor.fetchall()
for row in result:
print(row)
# Close connection
cursor.close()
conn.close()
When executing INSERT, UPDATE, and DELETE statements, changes need to be submitted:
try:
with conn.cursor() as cursor:
sql = "INSERT INTO users (name, age) VALUES (%s, %s)"
cursor.execute(sql, ("Alice", 25))
conn.commit() # Submit changes
except Exception as e:
conn.rollback() # Rollback when an error occurs
print("An error occurred:", e)
Can be usedcallprocCall stored procedure:
with conn.cursor() as cursor:
cursor.callproc("your_stored_procedure", (param1, param2))
result = cursor.fetchall()
print(result)
useexecutemanyTo perform bulk inserts:
data = [("Bob", 30), ("Charlie", 28), ("David", 35)]
sql = "INSERT INTO users (name, age) VALUES (%s, %s)"
with conn.cursor() as cursor:
cursor.executemany(sql, data)
conn.commit()
Use parameterized queries to prevent SQL injection attacks:
name = "Alice"
sql = "SELECT * FROM users WHERE name = %s"
with conn.cursor() as cursor:
cursor.execute(sql, (name,))
result = cursor.fetchall()
print(result)
Make sure to close the database connection when the program ends:
conn.close()
PyMySQL will automaticallydatetime.datetimeConvert to MySQL DATETIME and can be passed in directly.
import pymysql
import datetime
conn = pymysql.connect(
host="localhost",
user="root",
password="pwd",
database="testdb",
cursorclass=pymysql.cursors.DictCursor
)
with conn:
with conn.cursor() as cur:
now = datetime.datetime.now()
sql = """
INSERT INTO logs (msg, created_at)
VALUES (%s, %s)
"""
cur.execute(sql, ("hello world", now))
conn.commit()
today = datetime.date.today()
sql = "INSERT INTO records (start_date) VALUES (%s)"
cur.execute(sql, (today,))
conn.commit()
The found field will automatically becomedatetime.datetimetype.
sql = "SELECT id, created_at FROM logs ORDER BY id DESC LIMIT 1"
cur.execute(sql)
row = cur.fetchone()
dt = row["created_at"]
print(dt, type(dt))
# For example: 2025-07-01 15:22:33
formatted = dt.strftime("%Y-%m-%d %H:%M:%S")
print(formatted)
start = datetime.datetime(2025, 1, 1, 0, 0, 0)
sql = """
SELECT * FROM logs
WHERE created_at >= %s
"""
cur.execute(sql, (start,))
rows = cur.fetchall()
begin = datetime.datetime(2025, 1, 1)
end = datetime.datetime(2025, 1, 31, 23, 59, 59)
sql = """
SELECT * FROM logs
WHERE created_at BETWEEN %s AND %s
"""
cur.execute(sql, (begin, end))
rows = cur.fetchall()
sql = "SELECT created_at FROM logs_str LIMIT 1"
cur.execute(sql)
row = cur.fetchone()
dt = datetime.datetime.strptime(row["created_at"], "%Y-%m-%d %H:%M:%S")
print(dt)
items = [
("log1", datetime.datetime.now()),
("log2", datetime.datetime.now()),
]
sql = "INSERT INTO logs (msg, created_at) VALUES (%s, %s)"
cur.executemany(sql, items)
conn.commit()
strptimeManual parsing.importpymysql
import datetime
# Establish database connection
conn = pymysql.connect(
host='localhost',
user='root',
password='your_password',
database='your_db',
cursorclass=pymysql.cursors.DictCursor # Return fields as dictionary
)
with conn:
with conn.cursor() as cursor:
sql = "SELECT id, created_at FROM your_table LIMIT 1"
cursor.execute(sql)
result = cursor.fetchone()
dt = result['created_at'] # Assume that the created_at field is of datetime type
print(dt)
print(type(dt))
dt_str = dt.strftime("%Y-%m-%d %H:%M:%S")
print(dt_str)
# If created_at is stored in string format (such as CHAR / VARCHAR)
dt = datetime.datetime.strptime(result['created_at'], "%Y-%m-%d %H:%M:%S")
print(dt)
# If DictCursor is not set, the data will be tuple, and the value method is as follows:
cursor = conn.cursor()
cursor.execute("SELECT id, created_at FROM your_table")
row = cursor.fetchone()
dt = row[1]
importpymysql
import datetime
# Establish database connection
conn = pymysql.connect(
host='localhost',
user='root',
password='your_password',
database='your_db',
cursorclass=pymysql.cursors.DictCursor
)
with conn:
with conn.cursor() as cursor:
sql = "SELECT created_at FROM your_table LIMIT 1"
cursor.execute(sql)
result = cursor.fetchone()
dt_from_sql = result['created_at'] # datetime type
now = datetime.datetime.now()
# Calculate time difference
diff = now - dt_from_sql
print(f"Difference in days: {diff.days}")
diff = now.date() - dt_from_sql.date()
print(f"Pure date difference number of days: {diff.days}")
importpymysql
import datetime
# Establish database connection
conn = pymysql.connect(
host='localhost',
user='root',
password='your_password',
database='your_db',
cursorclass=pymysql.cursors.DictCursor
)
with conn:
with conn.cursor() as cursor:
sql = "SELECT created_at FROM your_table LIMIT 1"
cursor.execute(sql)
result = cursor.fetchone()
dt_from_sql = result['created_at'] # datetime type
now = datetime.datetime.now()
# Calculate time difference
diff = now - dt_from_sql
hours = diff.total_seconds() / 3600
print(f"Difference in hours: {hours}")
print(f"The difference is an integer number of hours: {int(hours)}")
In InnoDB, the default storage engine of MySQL, the default isolation level isREPEATABLE READ(repeatable read).
This means that after you execute the first query in the same transaction (Transaction), MySQL will create a "snapshot" of the transaction. Even if the data in the database is modified by other programs one second later, as long as your transaction has not ended, the second query will still see the same snapshot data as the first time. This is to ensure the consistency of "repeatable reading".
This is the most standard approach. Before executing the second query, callconnection.commit(). Submitting will end the current transaction, and MySQL will create a new snapshot during the next query and read the latest data.
# First query
cursor.execute(sql)
result1 = cursor.fetchall()
# Force update: commit the current transaction
connection.commit()
# The second query will capture the latest information.
cursor.execute(sql)
result2 = cursor.fetchall()
If you want every SQL command to take effect immediately and see the latest results, you can turn on automatic submission after establishing a connection. So every oneexecute()will be treated as independent matters.
connection.autocommit = True
# All subsequent execute(sql) will directly read the latest status of the database
You can change the isolation level toREAD COMMITTED(read commit content). At this level, every time a query is executed within the same transaction, the latest submitted data will be read.
# Set before executing query
cursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED")
connection(wired) transaction scope, notcursorThe object itself. If you re-create the cursor but use the same connection and do not commit, the problem will still exist.In Python's multi-thread environment, if multiple threads share the same database connection, serious conflicts will occur. This is because connected objects are usually not thread-safe. Multiple threads sending SQL commands or reading results at the same time will cause data confusion, transaction exceptions, or connection interruptions. Therefore, it is a best practice to use a connection pool to allocate independent connections to each thread.
mysql-connector-pythonBuilt-in simple connection pool function. You can create a pool in advance and "borrow" connections from it when threads need it.
import threading
import mysql.connector
from mysql.connector import pooling
# 1. Create a connection pool (global variable)
db_config = {
"host": "localhost",
"user": "root",
"password": "password",
"database": "test_db"
}
connection_pool = mysql.connector.pooling.MySQLConnectionPool(
pool_name="my_pool",
pool_size=5, # Set the pool size, which should be greater than or equal to the number of threads
**db_config
)
# 2. Execute thread task function
def thread_task(task_id):
conn=None
try:
# Get an independent connection from the pool
conn = connection_pool.get_connection()
cursor = conn.cursor()
# Perform operations
cursor.execute("SELECT NOW()")
result = cursor.fetchone()
print(f"Execution thread {task_id} reading time: {result}")
# Note: Under this isolation level, remember to commit if there are any changes.
# conn.commit()
except Exception as e:
print(f"An error occurred in thread {task_id}: {e}")
finally:
# Important: Close the cursor and connection (here close will return the connection to the pool instead of actually interrupting it)
if conn and conn.is_connected():
cursor.close()
conn.close()
# 3. Start multiple execution threads
threads = []
for i in range(3):
t = threading.Thread(target=thread_task, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
If you are developing a larger application, SQLAlchemy provides more powerful and automated connection management (QueuePool). It automatically handles connection failures, recycling, and thread allocation.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# The connection pool will be automatically enabled when creating the Engine
engine = create_engine(
"mysql+mysqlconnector://root:password@localhost/test_db",
pool_size=10,
max_overflow=20
)
# Use sessionmaker to ensure that each thread gets an independent Session
Session = sessionmaker(bind=engine)
def sqlalchemy_task():
session = Session()
try:
# Perform operations
# session.execute(...)
session.commit()
finally:
session.close() # Return the connection
withSyntax to manage cursor and connection to ensure that the connection can be returned even if an exception occurs.commit()orrollback(), otherwise the connection may have an unfinished transaction status when returned to the pool, affecting the next user.To execute MySQL stored procedures in Python, you can usemysql-connector-pythonorPyMySQL。
pip install mysql-connector-python
import mysql.connector
# Connect to database
conn = mysql.connector.connect(
host="localhost",
user="your_user",
password="your_password",
database="your_database"
)
cursor = conn.cursor()
# Call stored procedure
cursor.callproc("your_stored_procedure", (param1, param2))
# Get results
for result in cursor.stored_results():
print(result.fetchall())
# Close connection
cursor.close()
conn.close()
To execute SQL Server stored procedures in Python, you can usepyodbc。
pip install pyodbc
import pyodbc
# Connect to SQL Server
conn = pyodbc.connect("DRIVER={SQL Server};"
"SERVER=your_server;"
"DATABASE=your_database;"
"UID=your_user;"
"PWD=your_password")
cursor = conn.cursor()
# Execute stored procedure
cursor.execute("{CALL your_stored_procedure (?, ?)}", (param1, param2))
# Get results
rows = cursor.fetchall()
for row in rows:
print(row)
# Close connection
cursor.close()
conn.close()
To execute PostgreSQL stored procedures in Python, you can usepsycopg2。
pip install psycopg2
import psycopg2
# Connect to PostgreSQL
conn = psycopg2.connect(
dbname="your_database",
user="your_user",
password="your_password",
host="localhost",
port="5432"
)
cursor = conn.cursor()
# Execute stored procedure
cursor.callproc("your_stored_procedure", (param1, param2))
# Get results
rows = cursor.fetchall()
for row in rows:
print(row)
# Close connection
cursor.close()
conn.close()
fetchall()orstored_results()to get results.requestsIt is the most commonly used HTTP request suite in Python, used to access REST API, download web pages, send form data, etc. It is better than the built-inurllibMore concise and easier to use.
pip install requests
import requests
#Send GET request
url = "https://api.pionex.com/api/v1/common/symbols"
response = requests.get(url)
# Check if successful
if response.status_code == 200:
data = response.json()
print("Request successful, return field:", list(data.keys()))
else:
print("Request failed:", response.status_code)
import requests
url = "https://httpbin.org/post"
payload = {"key": "value", "symbol": "BTC_USDT"}
headers = {"Content-Type": "application/json"}
res = requests.post(url, json=payload, headers=headers)
print(res.json())
| method | use |
|---|---|
requests.get() | Get information (most commonly used) |
requests.post() | Submit information or forms |
requests.put() | Update information |
requests.delete() | Delete data |
response.status_code | HTTP status code |
response.json() | Parse JSON response |
response.text | Get original text content |
response.raise_for_status() | Automatically throw HTTP error exceptions |
import requests
try:
r = requests.get("https://api.pionex.com/api/v1/unknown")
r.raise_for_status()
except requests.exceptions.HTTPError as e:
print("HTTP error:", e)
except requests.exceptions.RequestException as e:
print("General request error:", e)
pandasorasyncioIt can be used to retrieve large amounts of data.raise_for_status()yesrequestsIn moduleResponseObject method used to check whether the HTTP response status code is an error (such as 4xx or 5xx).
If the status code shows that the request failed, it will automatically throwrequests.exceptions.HTTPErrorExceptions are used to facilitate debugging or stopping execution.
import requests
url = "https://api.pionex.com/api/v1/common/symbols"
response = requests.get(url)
# Check whether the HTTP status code is successful (200~299)
response.raise_for_status()
# If successful, JSON can be parsed safely
data = response.json()
print("The request was successful, the number of trading pairs obtained:", len(data.get("data", [])))
The request was successful and the number of trading pairs obtained: 248
import requests
try:
r = requests.get("https://api.pionex.com/api/v1/invalid_endpoint")
r.raise_for_status()
except requests.exceptions.HTTPError as e:
print("HTTP error:", e)
except requests.exceptions.RequestException as e:
print("Request exception:", e)
HTTPError。import requests
# Call API
url = "https://api.example.com/data"
response = requests.get(url)
# Confirm the response is successful
if response.status_code == 200:
print("Data obtained successfully")
else:
print(f"Error: {response.status_code}")
# Parse JSON response data = response.json() #Access JSON data print(data["key1"]) print(data["key2"]["subkey"])
import requests
# API URL
url = "https://jsonplaceholder.typicode.com/posts"
# Send request
response = requests.get(url)
# Check response status and parse
if response.status_code == 200:
data = response.json()
# List the titles of each article
for post in data:
print(f"Post ID: {post['id']}, Title: {post['title']}")
else:
print(f"API call failed, status code: {response.status_code}")
try:
data = response.json()
print(data)
except ValueError:
print("The response is not in valid JSON format")
Python providesjsonModule to convert (serialize) data into JSON format,
And it can be saved as a file (dump) or reloaded (load) to achieve a cache effect.
json.dump(obj, file): Write a Python object to a file (JSON format).json.dumps(obj): Convert the object into a JSON string.json.load(file): Read JSON from a file and convert it back into a Python object.json.loads(str): Convert JSON string back to Python object.import json
import os
cache_file = "data_cache.json"
# Simulation data source
data = {
"name": "Alice",
"age": 30,
"hobbies": ["reading", "music", "coding"]
}
#Write data to JSON file (dump)
with open(cache_file, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
print("Data has been cached in the file.")
# Read JSON data from the file (load)
if os.path.exists(cache_file):
with open(cache_file, "r", encoding="utf-8") as f:
loaded_data = json.load(f)
print("Loaded data:", loaded_data)
The data has been cached in the file.
Loaded data: {'name': 'Alice', 'age': 30, 'hobbies': ['reading', 'music', 'coding']}
ensure_ascii=False: Ensure that Chinese characters are not converted to Unicode encoding.indent=2: Make the JSON file indented and easier to read.json.load()Will automatically restore JSON to corresponding types such as dict, list, int, str, etc.import json,os
CACHE_PATH = "result_cache.json"
def expensive_calculation():
print("Performing complex operation...")
return [x**2 for x in range(5)]
def get_cached_result():
#Load the cache if it exists
if os.path.exists(CACHE_PATH):
with open(CACHE_PATH, "r") as f:
return json.load(f)
# Otherwise write to cache after calculation
result = expensive_calculation()
with open(CACHE_PATH, "w") as f:
json.dump(result, f)
return result
data = get_cached_result()
print("Get data:", data)
Performing complex operations...
Get data: [0, 1, 4, 9, 16]
Get data: [0, 1, 4, 9, 16]
json.dump():Write to JSON file.json.load(): Read from JSON file.dumps()andloads()Process strings.This is the most standard and common way of storing a series of records in the same format. Treat all data points as a list (corresponding square brackets in JSON[]) element.
[
{"time": 1759028400000, "open": "109398.3", "close": "109364.8", "high": "109489.2", "low": "109364.8", "volume": "518.7594"},
{"time": 1759024800000, "open": "109305.6", "close": "109398.3", "high": "109496.4", "low": "109296.0", "volume": "757.0290"},
...
]
In Python, you just build a list of dictionaries and then usejson.dumps()Perform serialization.
import json
data_list = [
{'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
{'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'}
]
json_output = json.dumps(data_list, indent=4)
# print(json_output)
If each data point'stimeThe value is unique and you can use it as the key of the top-level object to create a dictionary structure. This structure facilitates direct querying of specific data through timestamps.
{
"1759028400000": {"open": "109398.3", "close": "109364.8", "high": "109489.2", "low": "109364.8", "volume": "518.7594"},
"1759024800000": {"open": "109305.6", "close": "109398.3", "high": "109496.4", "low": "109296.0", "volume": "757.0290"},
...
}
You need to iterate through the data list and convert each data point'stimeThe value is taken out as a dictionary key.
import json
source_data = [
{'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
{'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'}
]
data_dict = {}
for item in source_data:
# Make sure the timestamp is a string because JSON keys must be strings
time_key = str(item['time'])
# Copy or create a new dictionary and remove the 'time' field as it is now the key
value_data = item.copy()
del value_data['time']
data_dict[time_key] = value_data
json_output = json.dumps(data_dict, indent=4)
# print(json_output)
In Python, use the built-injsonModules are the best way to serialize Python data structures, such as lists and dictionaries, into JSON format. Your target structure corresponds in Python to a list containing dictionaries.
Assume that the data you get from the database query is already a list, each element of which is a dictionary, and the key names are consistent with the JSON field names you want.
db_data = [
{'time': 1759028400000, 'open': '109398.3', 'close': '109364.8', 'high': '109489.2', 'low': '109364.8', 'volume': '518.7594'},
{'time': 1759024800000, 'open': '109305.6', 'close': '109398.3', 'high': '109496.4', 'low': '109296.0', 'volume': '757.0290'},
# ... more data
]
json.dumps()You just need to importjsonmodule and usejson.dumps()Function converts a Python list object into a JSON-formatted string.
import json
# Assume db_data is ready
# Use json.dumps() to serialize Python lists into JSON strings
# indent=4 is used to beautify the output and make it easier to read
json_output_string = json.dumps(db_data, indent=4)
print(json_output_string)
This will produce a JSON string that matches your goal:
[
{
"time": 1759028400000,
"open": "109398.3",
"close": "109364.8",
"high": "109489.2",
"low": "109364.8",
"volume": "518.7594"
},
{
"time": 1759024800000,
"open": "109305.6",
"close": "109398.3",
"high": "109496.4",
"low": "109296.0",
"volume": "757.0290"
}
]
If your database driver returns a list of tuples (for example:[(1759028400000, '109398.3', ...), ...]), you need to convert it into a dictionary list before performing the above serialization.
# Assume the field name is as follows
field_names = ['time', 'open', 'close', 'high', 'low', 'volume']
tuple_data = [
(1759028400000, '109398.3', '109364.8', '109489.2', '109364.8', '518.7594'),
#...
]
# List Comprehension
list_of_dicts = [
dict(zip(field_names, row))
for row in tuple_data
]
# Now list_of_dicts can be passed directly to json.dumps() for serialization.
In Python, the way you implement a Web API and output JSON depends on the framework you use. Most directly corresponds to PHPecho json_encode()The logic is built in using PythonjsonModules, but in a web environment, headers are usually automatically processed through tools provided by the framework.
FastAPI is the first choice for modern Python development, it automatically converts Python'sdict(dictionary) orlist(list) to JSON and set the correctContent-Type。
from fastapi import FastAPI
app = FastAPI()
@app.get("/api/stocks")
def get_stocks():
# Directly return the Python dictionary, FastAPI will automatically handle json_encode
data = ["AAPL", "TSLA", "NVDA", "GOOGL"]
return {"stocks": data}
In Flask, officially providedjsonifyfunction, in addition to converting data, it will also help you set the Response Header toapplication/json。
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/api/stocks')
def get_stocks():
data = ["AAPL", "TSLA", "NVDA", "GOOGL"]
# Equivalent to PHP's echo json_encode($data) and add Header
return jsonify(data)
If you're using the big framework Django, you'll typically useJsonResponseobject.
from django.http import JsonResponse
def stock_api(request):
data = {"stocks": ["AAPL", "TSLA"]}
return JsonResponse(data)
If you just want to simply get the JSON string (such as writing to a file or custom output), you can usejson.dumps()。
import json
data = {"name": "Gemini", "age": 18}
# This is the closest to the low-level implementation of PHP json_encode()
json_string = json.dumps(data)
print(json_string)
| Function | PHP (Vanilla) | Python (FastAPI / Flask) |
|---|---|---|
| Set JSON headers | header('Content-Type: application/json') |
Frame automatic handling (usingreturn dictorjsonify) |
| Object to JSON string | json_encode($data) |
json.dumps(data) |
| Output and end | echo ...; exit; |
return ... |
returnA dictionary and they will handle all format conversions for you.Google API is mainly a service provided through Google Cloud Platform (GCP), allowing developers to access various Google functions (such as search, maps, cloud drives, etc.). Most Google APIs follow a RESTful architecture and use JSON format for data exchange.
Before using any Google API, you must first create a project in the Google Cloud Console and set up an authentication method. Mainly divided into two types:
In Python, the following libraries are officially recommended to simplify development:
google-api-python-client: Universal API client for most Discovery services.google-auth: Specially handles authentication and token management.google-cloud-storage / google-cloud-vision: Client library optimized for specific cloud services.pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib。execute()。Google APIs are not completely free and unlimited. Each API has its own quota billing method:
| category | Representative API | Main purpose |
|---|---|---|
| Work automation | Google Sheets / Calendar / Gmail | Automate forms, schedule, and send emails. |
| Data and storage | Cloud Storage / BigQuery | Store large files and perform big data analysis. |
| AI | Vision / Translation / Natural Language | Image recognition, text translation, and sentiment analysis. |
| multimedia | YouTube Data API | Search videos and manage channel content. |
Use Google APIs (such as YouTube, Drive, Gmail, etc.) in Python, usually throughgoogle-api-python-clientin the librarybuildfunction to create a Service object. This object encapsulates all API methods and is your main communication channel with Google servers.
To create a Service object, you need the API name, version, and authentication key (API Key) or credentials (OAuth2).
from googleapiclient.discovery import build
# Initialize YouTube Data API v3
youtube = build(
'youtube',
'v3',
developerKey='YOUR_API_KEY'
)
# Call API example
request = youtube.videos().list(part="snippet", id="dQw4w9WgXcQ")
response = request.execute()
This is the core reason why you experienced the crash before.build()The generated Service object and its internalhttptransport layer objectNot thread safeof. In a multi-threaded environment, be sure to follow the following guidelines:
youtube.execute()。run()Executed in methodbuild()。googleapiclient.discovery.ResourceExample.call frequentlybuild()The performance will be reduced due to repeated downloading of the "Discovery Document" (approximately 1-2 seconds of additional delay). You can optimize by:
# Disable exploration file downloading (if you have installed precompiled libraries or want to speed things up)
# Note: This requires that you already have relevant definitions in your environment
youtube = build('youtube', 'v3', developerKey='KEY', static_discovery=True)
The most common error encountered when calling Service methods isHttpError. You have to catch it to handle quota exhaustion or permission issues.
from googleapiclient.errors import HttpError
try:
response = youtube.channels().list(part="statistics", mine=True).execute()
except HttpError as e:
if e.resp.status == 403:
print("API quota exhausted or insufficient permissions")
elif e.resp.status == 401:
print("Authentication invalid")
else:
print(f"HTTP error: {e}")
| API functions | name parameter | version parameters |
|---|---|---|
| YouTube Data | 'youtube' | 'v3' |
| Google Drive | 'drive' | 'v3' |
| Google Sheets | 'sheets' | 'v4' |
| Gmail API | 'gmail' | 'v1' |
If you share the same Google API Service object among multiple threads (e.g.YoutubeApi.youtube), you may encounter thread-safety issues. bottomhttplib2orurllib3The transport layer will generate a race condition (Race Condition) when multiple threads access the same connection object concurrently, causing the program to crash directly at the underlying network layer. Sometimes it will even trigger an error at the C language level, causing the Python interpreter to close directly. This is whytry...exceptThe reason why it cannot be intercepted.
Do not create a global domain in the main threadyoutubeObjects are available to everyone. Each thread should create its own API Service instance when it starts. In this way, each thread has its own network connection and cache, without interfering with each other.
import threading
from googleapiclient.discovery import build
def get_youtube_service():
# Call this function internally in each thread to create independent objects
return build('youtube', 'v3', developerKey='YOUR_API_KEY')
def thread_task(video_id):
try:
# Create a dedicated service within the execution thread
local_youtube = get_youtube_service()
request = local_youtube.videos().list(
part="snippet,statistics",
id=video_id
)
response = request.execute()
print(f"Successfully obtained: {video_id}")
except Exception as e:
print(f"Thread error: {e}")
# Start multiple threads
t1 = threading.Thread(target=thread_task, args=("vid1",))
t2 = threading.Thread(target=thread_task, args=("vid2",))
t1.start()
t2.start()
If you don't want to create objects repeatedly, you can usethreading.LockEnsure that only one thread can execute at the same timeexecute(). Although this will cause the API call to become queued (losing the concurrency acceleration effect), it will ensure that it will not crash.
import threading
# Create a global lock
api_lock = threading.Lock()
def thread_task_with_lock(video_id):
# Use with to ensure the lock is automatically released
with api_lock:
try:
request = YoutubeApi.youtube.videos().list(
part="snippet",
id=video_id
)
vlist = request.execute()
except Exception as e:
print(f"Error: {e}")
If you are using an older version of the client library, you can specify it explicitly when building the servicehttpobject and make it non-thread safe or usehttplib2.Http()independent instance. But this is usually more complicated, so it is recommended to give priority to option 1.
socket.setdefaulttimeout(10), to prevent a certain execution thread from permanently occupying resources due to network stuckness.pip install smtplib email
These two modules are generally built into Python and do not require additional installation.
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
# Set sender, recipient, subject and content
sender = '[email protected]'
receiver = '[email protected]'
subject = 'test email'
body = 'This is a test email sent by Python. '
# Create email content
msg = MIMEMultipart()
msg['From'] = sender
msg['To'] = receiver
msg['Subject'] = subject
msg.attach(MIMEText(body, 'plain'))
#Send using Gmail SMTP
smtp_server = 'smtp.gmail.com'
smtp_port=587
username = '[email protected]'
password = 'your_app_password' # It is recommended to use the App password instead of the general password
with smtplib.SMTP(smtp_server, smtp_port) as server:
server.starttls()
server.login(username, password)
server.send_message(msg)
print('Email has been sent')
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlibFilters can be created through the Gmail API, for example, when the sender is a certain mailbox, automatically add a label.
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os
import pickle
# Required authorization scope
SCOPES = ['https://www.googleapis.com/auth/gmail.settings.basic',
'https://www.googleapis.com/auth/gmail.modify']
def gmail_service():
credentials=None
if os.path.exists("token.pickle"):
with open("token.pickle", "rb") as token:
creds = pickle.load(token)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file("credentials.json", SCOPES)
creds = flow.run_local_server(port=0)
with open("token.pickle", "wb") as token:
pickle.dump(creds, token)
return build("gmail", "v1", credentials=creds)
# Create Gmail filter
def create_filter():
service = gmail_service()
filter_config = {
"criteria": {
"from": "[email protected]" # Condition: sender
},
"action": {
"addLabelIds": ["Label_123456"], # Gmail internal label ID
"removeLabelIds": ["INBOX"] # Optional: remove the inbox
}
}
result = service.users().settings().filters().create(
userId="me",
body=filter_config
).execute()
print("Created successfully, Filter ID:", result["id"])
if __name__ == "__main__":
create_filter()
Label_123456is Gmail's label ID, accessible viausers().labels().list()obtain.The rules (Filter) created by the Gmail API will be automatically triggered when new emails come in, and no additional "execution" is required.
Use the Gmail API to search for matching messages and then label or move them in batches.
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os
import pickle
SCOPES = ['https://www.googleapis.com/auth/gmail.modify']
def gmail_service():
credentials=None
if os.path.exists("token.pickle"):
with open("token.pickle", "rb") as token:
creds = pickle.load(token)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file("credentials.json", SCOPES)
creds = flow.run_local_server(port=0)
with open("token.pickle", "wb") as token:
pickle.dump(creds, token)
return build("gmail", "v1", credentials=creds)
#Apply Gmail rules to existing emails
def run_rule():
service = gmail_service()
# Search criteria, such as from:[email protected]
query = "from:[email protected]"
results = service.users().messages().list(userId="me", q=query).execute()
messages = results.get("messages", [])
if not messages:
print("No emails matching the criteria")
return
for msg in messages:
service.users().messages().modify(
userId="me",
id=msg["id"],
body={
"addLabelIds": ["Label_123456"], # Add a new label
"removeLabelIds": ["INBOX"] # Remove the inbox
}
).execute()
print(f"{len(messages)} messages processed")
if __name__ == "__main__":
run_rule()
qConditions and batch modifications.from:, subject:, has:attachment。The most common and recommended way to build an HTTP API server in Python is to use an efficient and feature-rich web framework. The following introduces the three mainstream frameworks and their characteristics:
FastAPI is a modern, fast (powered by Starlette and Pydantic) web framework for building APIs. It natively supports asynchronous operations (async/await) and automatically generates OpenAPI (Swagger UI) files.
# Installation: pip install fastapi uvicorn
from fastapi import FastAPI
app = FastAPI()
# Define a root route (GET request)
@app.get("/")
def read_root():
return {"Hello": "World"}
# Define a route with parameters
@app.get("/items/{item_id}")
def read_item(item_id: int, q: str = None):
return {"item_id": item_id, "q": q}
# Execution server (assuming the file name is main.py)
# Execute in terminal: uvicorn main:app --reload
Flask is a micro-framework that keeps the core simple and allows developers to freely choose components such as databases and verification libraries. It is suitable for small projects or applications that require a high degree of customization.
# Installation: pip install flask
from flask import Flask, jsonify, request
app = Flask(__name__)
#Define an API route
@app.route("/data", methods=['GET'])
def get_data():
return jsonify({"message": "This is Flask API data"})
#Define a POST route
@app.route("/submit", methods=['POST'])
def post_data():
data = request.get_json()
return jsonify({"received": data}), 201
#Execute server
# if __name__ == '__main__':
# app.run(debug=True)
DRF is a powerful tool for quickly building RESTful APIs based on the fully functional Django web framework. It provides all the functionality needed for large APIs such as serializers, model view sets, authentication and permission control.
Note: The DRF sample code is longer and requires the creation of a Django project and application first, so it is omitted here, but it is the first choice for large projects.
Although the idiomatic approach to FastAPI is to use decorators@app.get()Decorate functions directly, but to better organize and structure large applications (especially when using dependency injection or route modularization), you can encapsulate route handlers in classes, often called the "APIRouter" pattern or "Class-Based Views".
This requires using **fastapi.APIRouter** and **method decorators** to achieve this.
we will useAPIRouterto define the route and then register the class method as a handler.
# Installation: pip install fastapi uvicorn
from fastapi import APIRouter, FastAPI, Depends
from typing import Annotated
# Step 1: Create APIRouter instance
router = APIRouter(
prefix="/api/v1", # Set the prefix for this routing group
tags=["items"] # Used for file classification
)
# Step 2: Define categories containing routing logic
class ItemService:
"""Process all Item-related business logic and routing processing"""
def __init__(self, db_dependency: str):
# Simulate dependency injection (Dependency Injection)
self.db_connection = db_dependency
# Use @router.get to decorate class methods
@router.get("/")
def read_root(self):
# Implement read_root logic
return {"message": "Hello World from ItemService Class!", "db_status": self.db_connection}
# Implement read_item logic
# Parameter {item_id} comes from URL path
@router.get("/{item_id}")
def read_item(self, item_id: int, q: str | None = None):
return {
"item_id": item_id,
"query": q,
"status": "Processing completed"
}
#Auxiliary function: simulate dependency injection (Dependency)
def get_db():
# In practice, this will return a database connection
return "Database Connected Successfully"
# Step 3: Create a factory function for class instances and inject dependencies
# Annotated[str, Depends(get_db)] indicates that db_connection is provided by get_db
def get_item_service(db_connection: Annotated[str, Depends(get_db)]):
# Return an instance of ItemService
return ItemService(db_connection)
# Step 4: Create FastAPI application principal
app = FastAPI()
# Step 5: Core! Use Depends to mount methods of the ItemService class into the application
# Depends are used here to let FastAPI know how to obtain the instance of ItemService
app.include_router(
router,
dependencies=[Depends(get_item_service)]
)
Save the above code asmain.py, and run the Uvicorn server:
uvicorn main:app --reload
ItemService.read_rootmethod.ItemService.read_itemmethod.The key to this pattern is:
APIRouterto define routes.@router.get/@router.postand other decorators.app.include_routerofdependenciesparameters.The advantage of this is that the business logic (inItemService(internal) is separated from the routing definition, which improves the modularization and testability of the program code.
If you are running a Flask application in a terminal and started it using `app.run()`, the easiest way is to use the keyboard interrupt signal.
In unit testing or scenarios where you need to automatically shut down the server after performing a specific operation, you need to send the stop command through the code.
Flask's built-in development server actually uses the Werkzeug library. You can send a specific HTTP request to the server to cause it to shut itself down. This requires defining a special route in your application.
from flask import Flask, request
app = Flask(__name__)
# Define a secret shutdown route
@app.route('/shutdown', methods=['POST'])
def shutdown_server():
# Ensure that only authorized users can call this endpoint
func = request.environ.get('werkzeug.server.shutdown')
if func is None:
raise RuntimeError('Not running on Werkzeug development server')
func() # Call the closing function
return 'The server is shutting down...'
# ... other routes ...
# Send a POST request to http://127.0.0.1:5000/shutdown in another program or script to shut down the server.
By default `app.run()` is blocking. To continue executing other logic in the main code while leaving the server running in the background, you need to usethreadingormultiprocessingMods.
import threading
import time
from flask import Flask
app = Flask(__name__)
# ...your route ...
def run_server():
# Note: debug=True should not be used in production environments
app.run(port=5000)
# Start the server in a new thread
thread = threading.Thread(target=run_server)
thread.start()
print("Flask server has been started in the background...")
# Let the server run for a while
time.sleep(5)
# (Program shutdown) There is no built-in simple shutdown function in this thread mode.
# You must use the /shutdown route above or terminate the thread directly (not recommended, may not be clean).
# Example: Wait for the server thread to end (after Ctrl+C to stop)
# thread.join()
In a production environment, you would not use `app.run()`. You'll use a WSGI server (such as Gunicorn or uWSGI). To stop these servers:
When a Flask application is not running in the default Werkzeug development server (such as in a production environment using Gunicorn, uWSGI, or other ASGI servers), the most reliable way to determine the current environment is to check the **WSGI/ASGI environment variables** (i.e.request.environorrequest.scope)。
request.environ(Applies to WSGI servers such as Gunicorn)In a Flask route function, you can accessrequest.environDictionary that checks specific keys set by the WSGI server to determine the server type.
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/server_info')
def get_server_info():
# Default: assuming Werkzeug development server
server_name = "Werkzeug (Development Server)"
# Check for common WSGI server identifiers
# Check Gunicorn
if 'gunicorn.version' in request.environ:
server_name = f"Gunicorn (version: {request.environ.get('gunicorn.version')})"
# Check uWSGI
elif 'uwsgi.version' in request.environ:
# uWSGI usually does not have a standard version key, but has specific environment variables
server_name = "uWSGI"
# Check Waitress (common production WSGI server)
elif 'waitress.version' in request.environ:
server_name = f"Waitress (version: {request.environ.get('waitress.version')})"
# Check common identifiers of other WSGI servers
elif 'SERVER_SOFTWARE' in request.environ:
# Many servers will set this standard WSGI key
server_name = request.environ.get('SERVER_SOFTWARE')
# If werkzeug.server.shutdown exists, it is almost certainly Werkzeug
elif 'werkzeug.server.shutdown' in request.environ:
server_name = "Werkzeug (Development Server, Shutdown Available)"
return jsonify({
"current_server": server_name,
"is_dev_server": (server_name.startswith("Werkzeug"))
})
# Start the application
# if __name__ == '__main__':
# app.run(debug=True)
In your server shutdown logic, you can use this check directly to decide whether to execute `shutdown_func`:
from flask import request, jsonify
@app.route('/shutdown', methods=['POST'])
def shutdown_server():
shutdown_func = request.environ.get('werkzeug.server.shutdown')
if shutdown_func is None:
# Non-Werkzeug environment (usually production environment)
return jsonify({
"error": "This endpoint is for development environments only. Please use operating system commands to stop the server."
}), 400
#WerkzeugEnvironment
shutdown_func()
return jsonify({"message": "The server is shutting down..."}), 200
Although `SERVER_SOFTWARE` is a standard WSGI environment variable, it is not necessarily set by all servers, or the setting format is not necessarily standard. Checking a unique key for a specific server configuration (such as `gunicorn.version`) is the most reliable way.
In Flask, used to process query string parameters in URLs (such as/my_api?abc=3&def=xyinabcanddef) the main tools are fromflaskModularrequestobject. You can userequest.argsDictionary to access these parameters.
request.argsdictionaryrequest.argsis a dictionary-like object that contains question marks (?) after all key-value pairs.
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/my_api', methods=['GET'])
def get_query_parameters():
#Access the entire query parameter dictionary
all_params = request.args
print(f"All parameters: {all_params}")
# --- Method to obtain specific parameters ---
# 1. Use .get() to obtain parameters (recommended: safe and provides default values)
# Get the 'abc' parameter, defaults to None if it does not exist
abc_value = request.args.get('abc')
# Get the 'def' parameter, if it does not exist, it will be defaulted to 'default_value'
def_value = request.args.get('def', 'default_value')
# 2. Directly use [] to access (not recommended: KeyError will be thrown if the parameter does not exist)
# try:
# required_param = request.args['required']
# except KeyError:
# required_param = "Missing"
# --- Handling multi-valued parameters ---
# If the URL is /my_api?item=apple&item=banana
item_list = request.args.getlist('item') # Return ['apple', 'banana']
return jsonify({
"status": "success",
"abc": abc_value,
"def": def_value,
"item_list": item_list
})
if __name__ == '__main__':
# Test URL 1: http://127.0.0.1:5000/my_api?abc=3&def=xy&item=A&item=B
# Test URL 2: http://127.0.0.1:5000/my_api
app.run(debug=True)
| method | use | Behavior | example |
|---|---|---|---|
request.args.get('key') |
Safely get a single parameter | If the parameter does not exist, returnNone, no error will be thrown. |
request.args.get('user') |
request.args.get('key', 'default') |
Get parameters and provide default values | If the parameter does not exist, returns the default value you specified. | request.args.get('page', 1) |
request.args['key'] |
Direct access to parameters | If the parameter does not exist, it will throwKeyErrorerror, resulting in a 500 server error. |
request.args['id'] |
request.args.getlist('key') |
Get multiple parameters with the same name | Returns a list containing all values. | request.args.getlist('filter') |
fromrequest.argsAll values obtained are strings. If the argument should be a number or a Boolean value, you need to do the type conversion manually:
# Get parameters (string)
num_str = request.args.get('num', '0')
#Convert to integer
try:
num_int = int(num_str)
except ValueError:
num_int = 0 # Handle conversion failure
# Convert to Boolean (note: all non-empty strings are True in Python)
bool_str = request.args.get('is_admin', 'false').lower()
is_admin = bool_str == 'true'
In Flask, the key to distinguishing a route is the Endpoint, not the function name you define in your Python code. You can use loops and Python's closure (Closure) features to dynamically define and register routes, and assign unique Endpoints to them.
We will use a Factory Function to create each route processor and specify a unique `endpoint` name when registering it.
from flask import Flask, jsonify
import os
app = Flask(__name__)
# Assuming this is your data source
dynamic_apis = {
"user_info": {"method": "GET", "handler": lambda: {"data": "User info fetched"}},
"product_list": {"method": "GET", "handler": lambda: {"data": "Products listed"}},
"submit_form": {"method": "POST", "handler": lambda: {"data": "Form submitted"}, "methods": ['POST']},
}
filename_base = "db_agent"
# --- Core logic: routing factory function ---
def create_dynamic_route(funstr, handler_func, methods):
"""
Create and return a route handler function (closure).
This function captures the values of funstr and handler_func.
"""
def dynamic_route_handler():
# The function name is fixed, but funstr can be accessed at runtime
print(f"Processing request: {funstr}")
# Execute actual business logic
result = handler_func()
return jsonify(result)
# Return the created function
return dynamic_route_handler
# --- Dynamic registration loop ---
for funstr, api_info in dynamic_apis.items():
# 1. Build URL path
url_path = f"/{filename_base}/{funstr}"
# 2. Define the Endpoint name (use funstr to ensure uniqueness)
endpoint_name = f"{filename_base}_{funstr}_api"
# 3. Get HTTP method
methods = api_info.get("methods", ['GET']) # Default is GET
# 4. Create routing processing function
handler_func = create_dynamic_route(funstr, api_info["handler"], methods)
# 5. Register route
# Use the endpoint= parameter to specify a unique name instead of the function name
app.add_url_rule(
url_path,
endpoint=endpoint_name,
view_func=handler_func,
methods=methods
)
print(f"Registered route: {url_path} (Endpoint: {endpoint_name})")
if __name__ == '__main__':
# Test endpoint, for example: http://127.0.0.1:5000/db_agent/user_info
app.run(debug=True)
Gradiois an open source Python library designed to make machine learning models more accessible and interactive. With simple code, you can quickly build a web interface for your machine learning model for others to test and use.
The usage of Gradio is very simple. The following is a basic example showing how to create a simple text input and output interface:
importgradioasgr
#Define functions to process input data
def greet(name):
return "Hello," + name + "!"
# Create Gradio interface
iface = gr.Interface(fn=greet, inputs="text", outputs="text")
# Start interface
iface.launch()
The code above will create a simple web interface that allows users to enter a name and then display a personalized greeting.
Gradio provides developers with a simple and powerful tool that can quickly transform machine learning models into web applications, allowing a wider range of users to easily test and experience the results of AI. Whether used for personal projects or teamwork, Gradio is ideal for promoting and displaying models.
If you want to redirect the output of the Gradio interface to a page on the Apache HTTP server (such as `/results`), you can use Python'srequestsModule to convert Gradio
The results are sent to the server. Here are the basic steps on how to integrate Gradio and Apache:
First, make sure your Apache HTTP server is running and has an endpoint configured that can handle data (e.g./results). This endpoint can be PHP, Python, or another backend language to handle the incoming data.
Here is sample code for writing a Gradio application and redirecting its output to an HTTP server:
importgradioasgr
import requests
# Handle Gradio input and redirect to HTTP server
def process_and_redirect(input_data):
# Process the input data
result = f"Processed: {input_data}"
#Send an HTTP POST request to the Apache HTTP server and carry the processed data
url = 'http://your-apache-server-address/results' # Replace with your server address
payload = {'result': result}
try:
response = requests.post(url, data=payload)
if response.status_code == 200:
return f"Successfully redirected to {url}."
else:
return f"Redirect failed. Status code: {response.status_code}"
except Exception as e:
return f"An error occurred: {str(e)}"
# Create Gradio interface
iface = gr.Interface(
fn=process_and_redirect,
inputs="text",
outputs="text",
title="Gradio redirect to HTTP server"
)
iface.launch()
Your Apache server should be configured with an endpoint to handle POST requests, such as a simple PHP script to receive Gradio data:
<?php
// Handle POST requests from Gradio
if ($_SERVER["REQUEST_METHOD"] == "POST") {
$result = $_POST['result']; // Get the 'result' parameter from the POST request
echo "Data received from Gradio: " . htmlspecialchars($result);
}
?>
This allows you to use the Gradio application to redirect output to an Apache HTTP server and process the data server-side. Such integration enables Gradio's interactive features to be more widely used in Web environments.
The easiest way to embed the Gradio interface in an Apache page is to useiframeLabel. set upsrcThe attribute is the URL of the Gradio server.
<!DOCTYPE html>
<html lang="zh">
<head>
<meta charset="UTF-8">
<title>Apache interface embedded Gradio</title>
</head>
<body>
<h1>Embedded Gradio interface</h1>
<iframe src="http://your-gradio-server-address:7860" width="100%" height="800px" frameborder="0"></iframe>
</body>
</html>
If you want to directly access the Gradio interface through Apache's URL, you can configure a reverse proxy. This eliminates the need to display the original Gradio URL.
mod_proxyandmod_proxy_httpMods. Execute the following instructions:sudo a2enmod proxy
sudo a2enmod proxy_http
Or un-comment:
#LoadModule proxy_module modules/mod_proxy.so
#LoadModule proxy_http_module modules/mod_proxy_http.so
/gradio)。<Location "/gradio">
ProxyPass "http://localhost:7860/"
ProxyPassReverse "http://localhost:7860/"
</Location>
sudo systemctl restart apache2
Once completed, you can usehttp://your-apache-server-address/gradioDisplay interface content from Gradio in Apache pages.
In Apache, make sure it is enabledproxyandproxy_httpMods. If it is not enabled yet, you can execute the following command:
sudo a2enmod proxy
sudo a2enmod proxy_http
In the Gradio application's Apache configuration file (e.g./etc/apache2/sites-available/yourdomain.conf), add the following settings to configure ProxyPass
and ProxyPassReverse:
<VirtualHost *:80>
ServerName yourdomain.com
# Point the Gradio root directory to the Gradio server
ProxyPass/http://localhost:7860/
ProxyPassReverse/http://localhost:7860/
# Ensure that static resources can be proxied normally
ProxyPass /static/ http://localhost:7860/static/
ProxyPassReverse /static/ http://localhost:7860/static/
</VirtualHost>
After completing the setup, restart Apache to apply the changes:
sudo systemctl restart apache2
Such a configuration can explicitly handle requests to Gradio static resources such astheme.css) should resolve the missing style issue. If the static file path of Gradio is not/static/, please adjust the path according to the actual situation.
Available in Gradiogr.DataFramewidget to display or edit tabular data, such as a Pandas DataFrame or other tabular format. Here we will explain how to use Gradio
The DataFrame component is used in applications to create interactive data tables.
If Gradio is not installed yet, you can install it using the following command:
pip install gradio
gr.DataFramedisplay table dataHere's how to use Gradio to display a DataFrame. Let's say we have a Pandas DataFrame that needs to be displayed in a Gradio app:
import gradio as gr
import pandas as pd
#Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Occupation': ['Engineer', 'Designer', 'Doctor']}
df = pd.DataFrame(data)
# Define function to return DataFrame
def show_dataframe():
return df
# Create Gradio interface
interface = gr.Interface(fn=show_dataframe, outputs=gr.DataFrame(), title="Personnel Data Table")
interface.launch()
import gradio as gr: Introducing the Gradio suite.data: Dictionary used to create sample data, containing three columns: name, age, and occupation.show_dataframe: Define a function to return a Pandas DataFrame for display.gr.DataFrame(): Create a DataFrame component for table display in the Gradio interface.interface.launch(): Start the Gradio application.gr.DataFrameMake interactive editsIf you want to allow users to edit the table, you cangr.DataFrameMedium settingseditable=True, allowing users to modify table data:
interface = gr.Interface(fn=show_dataframe, outputs=gr.DataFrame(editable=True), title="Editable Personnel Data Table")
The launched application will display an editable table, and users can modify the data directly on the web page.
Can be usedpsutilmodule to find the PID of the Gradio program. First, make sure you have installedpsutil:
pip install psutil
Then you can use the following code to find the PID of the Gradio-related program.
import psutil
# Search for programs containing 'gradio'
for process in psutil.process_iter(['pid', 'name', 'cmdline']):
if 'gradio' in ' '.join(process.info['cmdline']):
print("Found Gradio program PID:", process.info['pid'])
Once the PID is found, you can useterminate()orkill()method to terminate the program. For example:
for process in psutil.process_iter(['pid', 'name', 'cmdline']):
if 'gradio' in ' '.join(process.info['cmdline']):
process.kill() # Forcefully terminate the program
print(f"Terminated Gradio program PID: {process.info['pid']}")
usekill()The method will terminate the program immediately, so make sure that the program has no important operations in progress. This sample code terminates all matching Gradio programs.
Can be usedpsutilModule to find programs listening on a specific port. First, make sure you have installedpsutil:
pip install psutil
Then, you can use the following code to find the monitor7860The program PID of the port.
import psutil
#Specify the port number to be searched
target_port = 7860
pid_to_kill = None
# Search for programs listening on the specified port
for conn in psutil.net_connections(kind='inet'):
if conn.laddr.port == target_port and conn.status == psutil.CONN_LISTEN:
pid_to_kill = conn.pid
break
if pid_to_kill:
print("Found the program PID of listening port 7860:", pid_to_kill)
else:
print("No program found for listening port 7860")
Once the PID is found, you can usepsutil.Processofkill()Method to forcefully terminate the program:
if pid_to_kill:
process = psutil.Process(pid_to_kill)
process.kill() # Forcefully terminate the program
print(f"Program PID of listening port 7860 has been terminated: {pid_to_kill}")
else:
print("Cannot terminate program because the PID was not found")
This code will forcefully terminate any program listening on the specified port. Please confirm that the port is indeed used by Gradio to avoid accidentally terminating other services.
Gradio is primarily designed for building fast, easy-to-use web UIs for displaying machine learning models. It itself already runs on a web server, but its main purpose is to provide a human-computer interaction interface rather than a traditional RESTful API endpoint.
If you want to have RESTful API functionality in your Gradio application, the most recommended method is to embed the Gradio application into a more powerful web framework, such as **FastAPI** or **Flask**. This allows you to use the main framework to define API endpoints while mounting the Gradio UI on a subpath.
FastAPI is fast and relies on the Starlette framework together with Gradio, making the embedding process very smooth.
# Install FastAPI, Uvicorn (server), and Gradio
pip install fastapi uvicorn gradio
You will use Gradio's.to_app()method converts the Gradio interface into an ASGI application and then uses FastAPI'smountfunction to mount it on the main application.
from fastapi import FastAPI
from fastapi.responses import JSONResponse
importgradioasgr
# 1. Define Gradio application logic
def greet(name):
return f"Hello, {name}!"
gr_interface = gr.Interface(fn=greet, inputs="text", outputs="text")
# 2. Convert Gradio interface to ASGI application
gradio_app = gr_interface.to_app()
# 3. Create FastAPI main application
app = FastAPI(
title="Gradio + FastAPI Server",
description="Provides both API endpoints and Gradio UI"
)
# 4. (API SVR) Define your RESTful API endpoints
@app.get("/api/v1/status")
def api_status():
return JSONResponse(content={"status": "API is operational"})
@app.get("/api/v1/model_inference")
def api_inference(data: str):
# Here you can call your model logic
result = f"Processed data: {data.upper()}"
return JSONResponse(content={"result": result})
# 5. (Gradio SVR) Mount the Gradio application to the /gradio path
# The mount() function allows you to mount another ASGI application to a specified path
app.mount("/gradio", gradient_app)
# Execution server (assuming the file name is main.py)
# Execute in the terminal: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
Gradio itself also provides an experimental feature that allows you to directly call the underlying functions of the Gradio interface through URL parameters without the need for an additional web framework. This is not a standard RESTful API, but allows for simple programmatic access.
However, embedding Gradio using FastAPI/Flask is a more standard and flexible approach that provides more standardized API design, documentation, and certification.
This is the most common crawler combination, suitable for beginners, and is used to parse static web pages.
pip install requests beautifulsoup4
Usage example:
import requests
from bs4 import BeautifulSoup
url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
print(soup.title.string)
A powerful crawler framework, suitable for large crawler projects, supporting multi-threaded and distributed crawlers.
pip install scrapy
Usage example:
scrapy startproject myproject
Create a crawler module in the project and run the crawling command.
Suitable for dynamic web crawling that needs to simulate user operations, such as processing JavaScript-rendered content.
pip install selenium
Usage example:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://example.com")
print(driver.title)
driver.quit()
Another tool for processing dynamic web pages, it has higher performance than Selenium and supports multiple browsers.
pip install playwright
playwright install
Usage example:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
print(page.title())
browser.close()
Based on the Python version of Puppeteer, it is specially designed for crawling dynamic web pages.
pip install pyppeteer
Usage example:
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto("https://example.com")
print(await page.title())
await browser.close()
import asyncio
asyncio.get_event_loop().run_until_complete(main())
Efficient tool for sending HTTP requests, supporting asynchronous operations.
pip install httpx
Usage example:
import httpx
async def fetch():
async with httpx.AsyncClient() as client:
response = await client.get("https://example.com")
print(response.text)
import asyncio
asyncio.run(fetch())
To create a simple web scraper in Python, you can use the requests library to get the page content, and BeautifulSoup to parse the HTML.
Here's an example of a basic web scraper:
import requests
from bs4 import BeautifulSoup
# URL to scrape
url = "https://example.com"
# Send a GET request
response = requests.get(url)
response.raise_for_status() # Check for errors
# Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Extract specific data (e.g., all the headings)
headings = soup.find_all("h1")
# Print the headings
for heading in headings:
print(heading.text)
Note: You may need to install the libraries with the following commands:
pip install requests
pip install beautifulsoup4
Text to be extracted
other text
from bs4 import BeautifulSoup
#HTML file
html_content = """
Text to be extracted
other text
"""
# Parse HTML
soup = BeautifulSoup(html_content, 'html.parser')
# Find specific tags and categories
span_tag = soup.find('span', class_='xxxclass')
#Extract text value
if span_tag:
print(span_tag.text) #Output: text to be extracted
else:
print("No matching tag found")
# Find all matchinglabel
span_tags = soup.find_all('span', class_='xxxclass')
#Extract the text of each label
for tag in span_tags:
print(tag.text)
span_tag = soup.find('span', {'class': 'xxxclass', 'id': 'specific-id'})
2. **Use regular expressions to match categories**:
import re
span_tag = soup.find('span', class_=re.compile(r'^xxx'))
Selenium is an open source tool primarily used to automate web browser operations. It supports multiple browsers, including Chrome, Firefox, Safari, etc., and can be used to test web applications or scrape web data.
The following are the steps to install Selenium and a simple Python usage example:
# Install Selenium
pip install selenium
#Sample code
from selenium import webdriver
from selenium.webdriver.common.by import By
# Start WebDriver
driver = webdriver.Chrome()
driver.get("https://www.example.com")
# Find elements and perform operations
element = driver.find_element(By.TAG_NAME, "h1")
print(element.text)
# Close browser
driver.quit()
Make sure Selenium and ChromeDriver are installed:
pip install selenium
Download and install the appropriate version of ChromeChromeDriver。
Chrome's user folder contains personal data such as bookmarks, history, cookies, etc. You can specify a specific folder to use to start the browser.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
#Specify ChromeDriver path
chromedriver_path = "/path/to/chromedriver"
#Specify user folder
user_data_dir = "/path/to/your/user/data"
# Set Chrome options
chrome_options = Options()
chrome_options.add_argument(f"--user-data-dir={user_data_dir}")
chrome_options.add_argument("--profile-directory=Default") # or other subfolder name
# Start browser
service = Service(chromedriver_path)
driver = webdriver.Chrome(service=service, options=chrome_options)
# Open a web page
driver.get("https://example.com")
# End program
driver.quit()
user_data_dirThe path is a valid and writable directory.chrome://version。Make sure it is installedseleniumandpsutil:
pip install selenium psutil
The following code will scan all running Chrome and extractuser-data-dirparameter:
import psutil
import re
def get_all_user_data_dirs():
user_data_dirs = set()
for proc in psutil.process_iter(attrs=['pid', 'name', 'cmdline']):
try:
if proc.info['name'] and 'chrome' in proc.info['name'].lower():
cmdline = ' '.join(proc.info['cmdline'])
match = re.search(r'--user-data-dir=([^\s]+)', cmdline)
if match:
user_data_dirs.add(match.group(1))
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
return list(user_data_dirs)
print(get_all_user_data_dirs())
find targetuser_data_dirFinally, it can be used with Selenium:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
chrome_user_data_dir = "C:\\Users\\YourUser\\AppData\\Local\\Google\\Chrome\\User Data"
options = Options()
options.add_argument(f"--user-data-dir={chrome_user_data_dir}")
service = Service("chromedriver.exe")
driver = webdriver.Chrome(service=service, options=options)
driver.get("https://www.google.com")
throughpsutilParse the running Chrome process to get alluser-data-dir, and use specificuser_data_dirStart the browser.
To get the username from the Chrome user data directory, you need to access a file in that directory, typically at the following path:
C:\Users\[Username]\AppData\Local\Google\Chrome\User Data\(Windows)/Users/[Username]/Library/Application Support/Google/Chrome/(macOS)/home/[Username]/.config/google-chrome/(Linux)These folders contain many files that you can readLocal Statefile to obtain basic information about the user.
Local StatefileChromeLocal StateThe file contains some basic user settings from which you can obtain the user's information.
import json
import os
def get_chrome_user_name(user_data_dir):
local_state_path = os.path.join(user_data_dir, 'Local State')
# Check if the file exists
if not os.path.exists(local_state_path):
return "Local State file not found"
with open(local_state_path, 'r', encoding='utf-8') as file:
local_state = json.load(file)
# Get user information from Local State
user_name = local_state.get('profile', {}).get('name', 'Unknown User')
return user_name
# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'
print(get_chrome_user_name(user_data_dir))
when you readLocal StateWhen you parse a file into JSON, you can get a lot of information from it, such as:
profile: Contains the user's configuration information, such as name, picture, etc.last_version: Shows the last Chrome version.is_logged_in: Shows whether the user is logged in.In most cases, the username will be locatedprofile, and can be extracted by the above method.
By parsingLocal StateYou can easily get the Chrome user's name from the JSON data in the file.
ChromeLocal StateFiles store a lot of basic information about users. If you want to extract all usernames from it, you can read the file and parse its JSON content. Here are the steps on how to achieve this.
Local StateArchive and parseYou can read theLocal Statefile, and then parses its JSON-formatted content to extract all user data.
import json
import os
def get_all_users(user_data_dir):
local_state_path = os.path.join(user_data_dir, 'Local State')
# Check if the file exists
if not os.path.exists(local_state_path):
return "Local State file not found"
with open(local_state_path, 'r', encoding='utf-8') as file:
local_state = json.load(file)
# Get all user data from Local State
profiles = local_state.get('profile', {}).get('info_cache', {})
# Get all user names
user_names = [profile.get('name', 'Unknown User') for profile in profiles.values()]
return user_names
# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'
print(get_all_users(user_data_dir))
in ChromeLocal StateIn files, user data is usually stored inprofiledowninfo_cache, which is a dictionary in which each user's profile is keyed by its configuration name.
Local StateFile structureThe following isLocal StateExample of structure in archive:
{
"profile": {
"info_cache": {
"profile1": {
"name": "User1",
"avatar": "path/to/avatar1.jpg"
},
"profile2": {
"name": "User2",
"avatar": "path/to/avatar2.jpg"
}
}
}
}
By parsingLocal Statefile, you can get the names of all users in Chrome. This makes it easy to list all accounts.
in ChromeLocal Statefile, each user's data is contained ininfo_cachemiddle. From there you can extract the username and the corresponding configuration subdirectory.
Local StateArchive and parseHere's how to do it from ChromeLocal StateMethod to obtain the names and corresponding subdirectories of all users in the file.
import json
import os
def get_users_and_profiles(user_data_dir):
local_state_path = os.path.join(user_data_dir, 'Local State')
# Check if the file exists
if not os.path.exists(local_state_path):
return "Local State file not found"
with open(local_state_path, 'r', encoding='utf-8') as file:
local_state = json.load(file)
# Get all user data from Local State
profiles = local_state.get('profile', {}).get('info_cache', {})
# Get all user names and their corresponding subdirectories
user_info = {}
for profile_key, profile_data in profiles.items():
user_name = profile_data.get('name', 'Unknown User')
profile_sub_dir = os.path.join(user_data_dir, 'Profile ' + profile_key)
user_info[user_name] = profile_sub_dir
return user_info
# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'
users_and_profiles = get_users_and_profiles(user_data_dir)
for user_name, profile_dir in users_and_profiles.items():
print(f"User name: {user_name}, subdirectory: {profile_dir}")
existLocal Statefile, all user data is located inprofiledowninfo_cache. Each user has a corresponding configuration key (e.g.profile1, profile2wait). Each configured profile contains the user's name (name) and other relevant information.
Local StateFile structureThe following isLocal StateExample of structure in archive:
{
"profile": {
"info_cache": {
"Profile 1": {
"name": "User1",
"avatar": "path/to/avatar1.jpg"
},
"Profile 2": {
"name": "User2",
"avatar": "path/to/avatar2.jpg"
}
}
}
}
Profilebeginning, followed by a number (e.g.Profile 1, Profile 2wait).By parsingLocal Statefile, you can get the names of all users and their corresponding configuration subdirectories. This allows you to easily find the location of each user's configuration data.
To get the Gmail account from the Chrome user data directory, you need to extract the relevant information from the Chrome configuration data. This is usually accomplished by parsing Chrome's user data files, specifically Google-related account data.
Each Chrome user's data will be in their own folder, usually inUser Dataunder the directoryProfilewithin the directory. If the user is signed in to a Google Account, the relevant Gmail account information can be found in the configuration data.
import os
import json
def get_gmail_from_profile(user_data_dir, profile_name):
profile_dir = os.path.join(user_data_dir, profile_name)
accounts_file = os.path.join(profile_dir, 'Web Data')
# Check if the file exists
if not os.path.exists(accounts_file):
return "Web Data file not found"
# Try to read the Web Data file
try:
with open(accounts_file, 'r', encoding='utf-8') as file:
web_data = json.load(file)
# Extract Gmail account from data
for row in web_data.get('accounts', []):
if 'gmail' in row.get('email', ''):
return row.get('email')
return "No Gmail account found"
except Exception as e:
return f"Error reading Web Data file: {e}"
# Example: Chrome user data directory path
user_data_dir = r'C:\Users\YourUserName\AppData\Local\Google\Chrome\User Data'
# Assume you are using Profile 1
profile_name = 'Profile 1'
print(get_gmail_from_profile(user_data_dir, profile_name))
Web Data files contain various login data in Chrome, including account names, passwords, and other related information. In this example, we look for theemailfield and check if it contains a Gmail mailbox.
The following is an example of a partial structure of a Web Data file:
{
"accounts": [
{
"email": "[email protected]",
"password": "encrypted_password_1"
},
{
"email": "[email protected]",
"password": "encrypted_password_2"
}
]
}
email) can usually be extracted.By parsing the Chrome user profileWeb Datafile, you can easily retrieve a user's Gmail account. If the user is signed in to a Gmail account, the corresponding email will appear in the data.
Common web automation tools include Selenium and Playwright. These tools can simulate clicking, text input, and other user actions.
Use a tool like BeautifulSoup or Playwright to get all the hyperlinks on your website (<a href>) to create a list of pages to browse.
Open a browser with Selenium or Playwright and simulate user actions such as clicking buttons, scrolling, and even triggering dynamic content.
Modern websites often contain a lot of dynamically generated content, and you can use Playwright or Selenium to execute JavaScript to ensure the page loads correctly.
Based on the crawled links, recursively visit all pages in the website and record the pages that have been visited to avoid duplication.
When browsing each page, simulate common user behaviors, such as filling out forms, submitting information, and recording the results of the operations.
from selenium import webdriver
from selenium.webdriver.common.by import By
#Initialize the browser
driver = webdriver.Chrome()
# Start browsing the website
driver.get("https://example.com")
# Grab all links
links = driver.find_elements(By.TAG_NAME, "a")
for link in links:
href = link.get_attribute("href")
print(f"Link found: {href}")
# Simulate click
if links:
links[0].click()
# Close browser
driver.quit()
When browsing and simulating websites, please abide by the terms of use of the relevant website to avoid causing excessive server load or violating the law.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#Initialize WebDriver
driver = webdriver.Chrome()
#Open the target website
driver.get("https://example.com")
# Wait for the button to appear and click
wait = WebDriverWait(driver, 10)
button = wait.until(EC.element_to_be_clickable((By.ID, "button_id")))
button.click()
# Wait for other elements to load
text_field = wait.until(EC.visibility_of_element_located((By.NAME, "text_field_name")))
text_field.send_keys("Test data")
# Close browser
driver.quit()
import time
from selenium import webdriver
#Initialize WebDriver
driver = webdriver.Chrome()
#Open the target website
driver.get("https://example.com")
time.sleep(3) # Pause for 3 seconds
# Simulate button click
button = driver.find_element(By.ID, "button_id")
button.click()
time.sleep(3)
# Close browser
driver.quit()**Note**: `time.sleep` is not recommended for formal testing, only for debugging purposes.from selenium import webdriver
#Initialize WebDriver
driver = webdriver.Chrome()
#Open the target website
driver.get("https://example.com")
# Continue after manual confirmation
input("Press Enter to continue to the next step...")
button = driver.find_element(By.ID, "button_id")
button.click()
# Continue to perform other tests
input("Press Enter to continue to the next step...")
driver.quit()
import unittest
from selenium import webdriver
class TestExample(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Chrome()
def test_step_by_step(self):
driver = self.driver
driver.get("https://example.com")
input("Check the page, press Enter to continue...") # Manual breakpoint
button = driver.find_element(By.ID, "button_id")
button.click()
input("Check the operation results, press Enter to continue...") # Manual breakpoint
def tearDown(self):
self.driver.quit()
if __name__ == "__main__":
unittest.main()
from selenium import webdriver
from selenium.webdriver.common.by import By
# Initialize Selenium
driver = webdriver.Chrome()
driver.get("Your target web page URL")
# Search for text starting with a specific line
target_text = "Target starting text"
rows = driver.find_elements(By.CSS_SELECTOR, "table tr")
# Save results
result_data = []
for row in rows:
cells = row.find_elements(By.TAG_NAME, "td")
if cells and cells[0].text.startswith(target_text):
# Get the information behind the peer
result_data.append([cell.text for cell in cells])
driver.quit()
# Convert the result to HTML
html_output = "<h2>Search results</h2>\n"
for i, row_data in enumerate(result_data, start=1):
html_output += f"<h3>Line {i}</h3>\n<ul>\n"
for data in row_data:
html_output += f" <li>{data}</li>\n"
html_output += "</ul>\n"
# show results
print(html_output)
Search results
Line 1
- Target starting text 1
- Other information1
- Other information 2
Line 2
- Target opening text 2
- Other information1
- Other information 2
startswithMethod to determine whether the text meets the conditions.<h2>and<h3>, and as an unordered list<ul>Organize information.<head>and<body>: Generate only necessary HTML tags for easy embedding into other pages.In Selenium,find_elements(By.XPATH, xpath)Used to find all elements that match the criteria based on an XPath selector and return a list. it is related tofind_element(By.XPATH, xpath)Different, the latter only returns the first matching element.
from selenium import webdriver
from selenium.webdriver.common.by import By
# Start browser
driver = webdriver.Chrome()
# Open web page
driver.get("https://example.com")
# Use XPath to find all elements that match the criteria
elements = driver.find_elements(By.XPATH, "//div[@class='example-class']")
# Traverse the found elements and output the content
for element in elements:
print(element.text)
# Close browser
driver.quit()
| XPath expression | describe |
|---|---|
//tagname |
Selects all elements with the specified tag, such as//divrepresent alldivelement |
//tagname[@attribute='value'] |
Select elements based on attribute values, e.g.//input[@type='text'] |
//tagname[contains(@attribute, 'value')] |
Attributes that contain certain text, such as//div[contains(@class, 'header')] |
//tagname[text()='text'] |
Select elements that exactly match the text, such as//button[text()='Submit'] |
//tagname[contains(text(), 'text')] |
Select elements containing specific text, such as//p[contains(text(), 'Welcome')] |
//*[@id='some-id'] |
Select an element with a specific ID |
(//tagname)[index] |
Select theindexmatching elements, for example(//div)[1]Select the first div |
Assume the following HTML structure:
<div class="product">Product A</div>
<div class="product">Product B</div>
<div class="product">Product C</div>
We can use the following Selenium code to get allproductCategorydivelement:
elements = driver.find_elements(By.XPATH, "//div[@class='product']")
for element in elements:
print(element.text)
find_elements()What is returned is a list, even if only one element is found.find_element(),otherwisefind_elements()。find_elements(By.XPATH, xpath)It is a powerful and flexible search method in Selenium that can be used to locate multiple elements on a web page and is suitable for crawlers and automated testing.
In Selenium,By.LINK_TEXTandBy.PARTIAL_LINK_TEXTUsed to find elements based on the text content of a hyperlink:
By.LINK_TEXT: Exact match based on complete link text.By.PARTIAL_LINK_TEXT: Perform fuzzy matching based on part of the link text.from selenium import webdriver
from selenium.webdriver.common.by import By
# Start browser
driver = webdriver.Chrome()
# Open web page
driver.get("https://example.com")
# Use LINK_TEXT to find hyperlinks
element = driver.find_element(By.LINK_TEXT, "Full link text")
print(element.get_attribute("href"))
# Use PARTIAL_LINK_TEXT to find hyperlinks
element_partial = driver.find_element(By.PARTIAL_LINK_TEXT, "Partial text")
print(element_partial.get_attribute("href"))
# Close browser
driver.quit()
<a href="https://example.com/page1">Full link text</a>
<a href="https://example.com/page2">Click here to learn more</a>
Suppose you want to click<a>Full link text</a>, you can use:
driver.find_element(By.LINK_TEXT, "Full link text").click()
If the hyperlink is "Click here to learn more", but we only know "Click here", you can use:
driver.find_element(By.PARTIAL_LINK_TEXT, "Click here").click()
By.PARTIAL_LINK_TEXT。By.LINK_TEXTfor an exact match.<a>Label link.find_elements()will return a list, whilefind_element()Only the first matching element is returned.By.LINK_TEXTUseful for finding complete links precisely, whileBy.PARTIAL_LINK_TEXTMore flexible and can be used to match part of link text.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
try:
elems = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".item"))
)
print("Element found:", len(elems))
except:
print("Element not found within 10 seconds")
elements = driver.find_elements(By.CSS_SELECTOR, ".item")
if not elements:
print("Element not found")
else:
print("Number of elements found:", len(elems))
# Switch to iframe and search again
iframe = driver.find_element(By.TAG_NAME, "iframe")
driver.switch_to.frame(iframe)
elements = driver.find_elements(By.CSS_SELECTOR, ".item")
print("Found:", len(elems))
shadow_host = driver.find_element(By.CSS_SELECTOR, "#shadowHost")
shadow_root = driver.execute_script("return arguments[0].shadowRoot", shadow_host)
elem = shadow_root.find_element(By.CSS_SELECTOR, ".target")
def safe_find_elements(driver, by, selector, timeout=10):
try:
return WebDriverWait(driver, timeout).until(
EC.presence_of_all_elements_located((by, selector))
)
except:
return [] # Cannot find the returned empty list
elements = safe_find_elements(driver, By.CSS_SELECTOR, ".item")
if elements:
print("Found:", len(elems))
else:
print("Element not found")
from selenium import webdriver
from selenium.webdriver.common.alert import Alert
driver = webdriver.Chrome()
driver.get("URL")
alert = Alert(driver)
print(alert.text) # Get the alert message
alert.accept() # Click OK
driver.quit()
alert = Alert(driver)
print(alert.text)
alert.accept() # Click OK
# alert.dismiss() # Click to cancel
alert = Alert(driver)
print(alert.text)
alert.send_keys("Test input") #Input content
alert.accept() # Click OK
driver.switch_to.alertSwitch to alert box.try-exceptHandle exceptions to avoid test failures.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 10).until(EC.alert_is_present())
In Selenium, you can useWebDriverWaitto check whether the warning box exists to avoid errors caused by the warning box not appearing.
from selenium import webdriver
from selenium.webdriver.common.alert import Alert
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("URL")
try:
# Wait 5 seconds to check if there is an alert box
WebDriverWait(driver, 5).until(EC.alert_is_present())
alert = Alert(driver)
print("Alert box content:", alert.text)
alert.accept() # Click OK
except:
print("No warning box found")
driver.quit()
WebDriverWaitMake sure the warning box appears before switching to avoidNoAlertPresentException。try-exceptTo prevent the test from failing because there is no warning box.useWebDriverWait(driver, 5).until(EC.alert_is_present())It can only detect alert boxes generated by JavaScript, but cannot detect Chrome system-level alert boxes, such as "Save Passwords", "Website Notifications", etc.
ChromeOptionsDisable system level alert boxes.pyautoguiorWin32 APIto simulate keyboard or mouse operations.You can start Chrome byChromeOptionsto close notifications and other system-wide pop-ups.
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--disable-notifications") # Disable notifications
options.add_argument("--disable-save-password-bubble") # Disable saving password prompts
driver = webdriver.Chrome(options=options)
driver.get("URL")
If the warning box has appeared, you can usepyautoguito simulate key operations, such as pressingEnterorEscto close.
importpyautogui
import time
time.sleep(3) # Wait for the alert box to appear
pyautogui.press("enter") # Simulate pressing the Enter key
If you are in a Windows environment, you can usepygetwindowandpywinautoto detect and close system-level pop-ups.
import pygetwindow as gw
importpywinauto
windows = gw.getWindowsWithTitle("Google Chrome") # Get the Chrome window
for win in windows:
if "system alert" in win.title.lower(): # Check whether the title contains "system alert"
win.close() # Close the window
ChromeOptionsto prevent warning boxes.pyautoguiorWin32 APIto simulate keyboard or window operations.Shadow DOM is part of Web Components and is used to encapsulate HTML, CSS and JavaScript so that the internal implementation details of the component do not interfere with the outside.
Simply put, it is a DOM structure that "isolates scope".
---<custom-element>
#shadow-root (open)
<button>Click me</button>
</custom-element>
buttonIt actually exists in the shadow DOM and cannot be used with traditionaldocument.querySelector("button")turn up.
element.shadowRootelement.shadowRootReturns null, not accessibleconst shadowHost = document.querySelector('#host');
const shadowRoot = shadowHost.attachShadow({ mode: 'open' });
shadowRoot.innerHTML = '<p>Hello from Shadow DOM</p>';
---
// inside shadow DOM
<div><slot name="title">Default title</slot></div>
// external use
<custom-element>
<span slot="title">Custom title</span>
</custom-element>
---
js = """
const host = document.querySelector('custom-element');
const shadow = host.shadowRoot;
return shadow.querySelector('button');
"""
button = driver.execute_script(js)
button.click()
---
shadowRootSearch again laterTor Browser is essentially Firefox ESR plus Tor Network settings and enhanced privacy settings. Selenium cannot "control the Tor Browser GUI" directly, but it canFirefox + Tor SOCKS Proxyway to achieve the same anonymity effect as Tor Browser.
First make sure that Tor is running on this machine. The default SOCKS proxy is:
127.0.0.1:9050
If using Tor Browser, after startup it is usually:
127.0.0.1:9150
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.service import Service
options = Options()
options.set_preference("network.proxy.type", 1)
options.set_preference("network.proxy.socks", "127.0.0.1")
options.set_preference("network.proxy.socks_port", 9150)
options.set_preference("network.proxy.socks_remote_dns", True)
service = Service("geckodriver.exe")
driver = webdriver.Firefox(service=service, options=options)
driver.get("https://check.torproject.org")
Enter the following URL, and the page shows that using Tor means success:
https://check.torproject.org
If you want to use Firefox built into Tor Browser:
options.binary_location = r"C:\Tor Browser\Browser\firefox.exe"
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.service import Service
options = Options()
options.binary_location = r"C:\Tor Browser\Browser\firefox.exe"
options.set_preference("network.proxy.type", 1)
options.set_preference("network.proxy.socks", "127.0.0.1")
options.set_preference("network.proxy.socks_port", 9150)
options.set_preference("network.proxy.socks_remote_dns", True)
service = Service("geckodriver.exe")
driver = webdriver.Firefox(service=service, options=options)
driver.get("https://check.torproject.org")
To extract emails from Thunderbird, you can use Python's IMAP protocol library, e.g.imaplibor third-party libraryimapclient. First, make sure
Thunderbird has the IMAP protocol enabled and allows external applications to connect.
Use pip to install related libraries:
pip install imapclient pyzmail36
Use the following code to connect to the email server and extract important emails:
import imapclient
from pyzmail import PyzMessage
# Set up mail server and login information
IMAP_SERVER = 'imap.example.com' # Replace with your IMAP server address
EMAIL = '[email protected]'
PASSWORD = 'your_password'
# Connect to IMAP server
with imapclient.IMAPClient(IMAP_SERVER) as client:
client.login(EMAIL, PASSWORD)
client.select_folder('INBOX')
# Search for messages marked as important
messages = client.search(['FLAGGED'])
for uid in messages:
raw_message = client.fetch([uid], ['BODY[]'])[uid][b'BODY[]']
message = PyzMessage.factory(raw_message)
# Display email information
print(f"Subject: {message.get_subject()}")
print(f"From: {message.get_address('from')}")
print(f"Date: {message.get_decoded_header('date')}")
Execute the above code to see if you can successfully extract emails marked as important. If you have any connection issues, you can check your server settings or perform more detailed debugging at the IMAP protocol layer.
In this way, you can successfully extract important emails from Thunderbird using Python.
Install PyAutoGUI using the following instructions:
pip install pyautogui
PyAutoGUI is an automation tool that allows simulation of mouse and keyboard operations, suitable for GUI automated testing or repetitive work.
Control mouse position and actions:
importpyautogui
# Get screen resolution
screen_width, screen_height = pyautogui.size()
print(f"Screen resolution: {screen_width}x{screen_height}")
# Move the mouse to the specified coordinates
pyautogui.moveTo(100, 100, duration=1)
# Move the mouse from the current position
pyautogui.move(50, 50, duration=1)
# Simulate mouse click
pyautogui.click(200, 200)
# Simulate mouse right click
pyautogui.rightClick()
# Simulate drag operation
pyautogui.dragTo(400, 400, duration=1)
Simulate keyboard key input:
importpyautogui
# Enter text
pyautogui.write("Hello, PyAutoGUI!", interval=0.1)
# Simulate pressing a specific key
pyautogui.press("enter")
# Press multiple keys at the same time
pyautogui.hotkey("ctrl", "c") # Copy text
Take a screenshot or find a specific image:
importpyautogui
# Capture the entire screen
screenshot = pyautogui.screenshot()
screenshot.save("screenshot.png")
# Find the image on the screen
location = pyautogui.locateOnScreen("image.png")
if location:
print(f"Image location: {location}")
else:
print("Image not found")
To prevent the program from executing indefinitely, you can usepyautogui.FAILSAFE:
pyautogui.FAILSAFE = True # Default is True
# Move the mouse to the upper left corner of the screen (0, 0) to stop the program immediately
If Surfshark VPN does not provide command line tools such assurfshark-cli), you need to use GUI automation tools (such aspyautogui) simulates manual operation.
pyautoguimodule (available viapip install pyautoguiInstall).throughpyautoguiAutomate the click of a button in Surfshark's graphical interface to connect or disconnect from a VPN.
The following example assumes that Surfshark VPN has a fixed button position and usespyautoguiTo do:
importpyautogui
import time
def connect_vpn():
# Make sure Surfshark is turned on
print("Try to connect to VPN...")
# Simulate clicking the "Connect" button and adjust the coordinates according to the actual location
pyautogui.click(x=500, y=300) # Replace with the position of the "Connect" button
time.sleep(5) # Wait for the connection to complete
print("VPN connected")
def disconnect_vpn():
# Make sure Surfshark is turned on
print("Try to disconnect from VPN...")
# Simulate clicking the "Disconnect" button and adjust the coordinates according to the actual position
pyautogui.click(x=500, y=350) # Replace with the position of the "Disconnect" button
time.sleep(5) # Wait for the disconnection to complete
print("VPN disconnected")
# test
connect_vpn()
disconnect_vpn()
xandycoordinates.subprocessOpen the application e.g.subprocess.run("start surfshark.exe", shell=True)。If the coordinate method is unstable, you can use image recognition (such aspyautogui.locateOnScreen()) to find the button position and increase flexibility.
Kivy is an open source Python framework for rapid development of multi-touch applications. It's designed with cross-platform support in mind, allowing developers to work on Windows, macOS, Linux, iOS, and Android Wait for the same code to run on multiple platforms. Kivy is particularly suitable for building GUI applications for mobile phones, tablets, and desktop devices, and it has good multi-touch support.
Kivy applications are composed of multiple widgets, which can be laid out through code or Kivy's proprietary language KV files. Below is a simple application example that displays a button that changes color when clicked.
from kivy.app import App
from kivy.uix.button import Button
class MyApp(App):
def build(self):
return Button(text='Hello, Kivy!',
background_color=(0, 1, 0, 1)) # Green button
if __name__ == '__main__':
MyApp().run()
Kivy can be installed via pip:
pip install kivy
Kivy is suitable for a variety of application scenarios, including but not limited to:
Kivy is a Python framework for building cross-platform applications, but it usually runs on a local device. To display your Kivy application remotely, you can consider the following options:
You can use VNC (Virtual Network Computing) or other remote desktop tools (such as RDP, TeamViewer, etc.) to remotely control and display Kivy applications.
For those using Linux, you can use X11 forwarding to display a graphical interface remotely:
ssh -X username@remote_host
You can use Flask or other web frameworks to expose parts of your Kivy application to remote users and display them using a web browser:
If you want to run your Kivy app in a containerized environment, you can set it up using Docker and VNC:
This error indicates that the tickmarker module from Kivy's Garden is not installed. To fix this issue, follow the steps below:
pip install kivy-garden
garden install tickmarker
Once you've done these steps, try running your Kivy application again.
Rio UI is a user interface framework with Python at its core, designed to allow developers to write interactive applications in a simple and intuitive way. It is based on Python programming logic and presents the UI through modern front-end technology.
pip install rio-ui
import rio
app = rio.App()
@app.page("/")
def index():
return rio.Text("Hello, Rio UI with Python!")
if __name__ == "__main__":
app.run()
email: [email protected]