Python Usage and High Performance Tips Summary

Python Usage and High Performance Tips Summary


1. confusing operations


This section compares some of Python's more confusing operations.

1.1 Random sampling with and without put-back

import random
random.choices(seq, k=1) # list of length k, with put-back sampling
random.sample(seq, k) # list of length k, without put-back sampling

1.2 Parameters of the lambda function

func = lambda y: x + y # The value of x is bound at function runtime
func = lambda y, x=x: x + y # The value of x is bound at function definition time

1.3 copy and deepcopy

import copy
y = copy.copy(x) # only the topmost level is copied
y = copy.deepcopy(x) # Copy all nested parts

Copy and variable aliasing are confusing when combined with.

a = [1, 2, [3, 4]]

# Alias.
b_alias = a  
assert b_alias == a and b_alias is a

# Shallow copy.
b_shallow_copy = a[:]  
assert b_shallow_copy == a and b_shallow_copy is not a and b_shallow_copy[2] is a[2]

# Deep copy.
import copy
b_deep_copy = copy.deepcopy(a)  
assert b_deep_copy == a and b_deep_copy is not a and b_deep_copy[2] is not a[2]
Changes to the alias affect the original variable. The elements in the (shallow) copy are aliases of the elements in the original list, while the deep copy is made recursively, and changes to the deep copy do not affect the original variable.

1.4 == and is

x == y # whether the two references have the same value
x is y # whether the two references point to the same object

1.5 Determining the type

type(a) == int # Ignore polymorphic features in object-oriented design
isinstance(a, int) # takes into account the polymorphic features of object-oriented design
str.find(sub, start=None, end=None); str.rfind(...)     # Return -1 if not found
str.index(sub, start=None, end=None); str.rindex(...)   # Throw a ValueError exception if not found

1.7 List backward indexing

This is just a matter of habit, forward indexing when the subscript starts from 0, if the reverse index also want to start from 0 can use ~.

print(a[-1], a[-2], a[-3])
print(a[~0], a[~1], a[~2])

2. C/C++ User's Guide


Many Python users migrated from C/C++, and there are some differences in syntax and code style between the two languages, which are briefly described in this section.

2.1 Very Large Numbers and Very Small Numbers

Whereas the C/C++ convention is to define a very large number, Python has inf and -inf

a = float('inf')
b = float('-inf')

2.2 Boolean values

While the C/C++ convention is to use 0 and non-0 values for True and False, Python recommends using True and False directly for Boolean values.

a = True
b = False

2.3 Determining Null

The C/C++ convention for null pointers is if (a) and if (!a); Python for None is

if x is None:s
    pass

If you use if not x, you will treat all other objects (such as strings of length 0, lists, tuples, dictionaries, etc.) as False.

2.4 Swapping values

The C/C++ convention is to define a temporary variable that can be used to swap values. With Python's Tuple operation, you can do this in one step.

a, b = b, a

2.5 Comparing

The C/C++ convention is to use two conditions. With Python, you can do this in one step.

if 0 < a < 5:
    pass

2.6 Set and Get Class Members

The C/C++ convention is to set class members to private and access their values through a series of Set and Get functions. While it is possible to set the corresponding Set and Get functions in Python via @property, @setter, and @deleter, we should avoid unnecessary abstraction, which can be 4 - 5 times slower than direct access.

2.7 Input and output parameters of functions

It is customary in C/C++ to list both input and output parameters as arguments to a function, and to change the value of the output parameter via a pointer. The return value of a function is the execution state, and the function caller checks the return value to determine whether it was successfully executed. In Python, there is no need for the function caller to check the return value, and the function throws an exception directly when it encounters a special case.

2.8 Reading Files

Reading a file in Python is much simpler than in C/C++. The opened file is an iterable object that returns one line at a time.

with open(file_path, 'rt', encoding='utf-8') as f:
   for line in f:
       print(line) # The \n at the end is preserved

2.9 File path splicing

Python's os.path.join automatically adds a / or \ separator between paths, depending on the operating system.

import os
os.path.join('usr', 'lib', 'local')

2.10 Parsing command-line options

While Python can use sys.argv to parse command-line options directly, as in C/C++, the ArgumentParser utility under argparse is more convenient and powerful.

2.11 Calling External Commands

While Python can use os.system to invoke external commands directly, as in C/C++, you can use subprocess.check_output to freely choose whether to execute the shell or not, and to get the results of external command execution.

import subprocess
# If the external command returns a non-zero value, throw a subprocess.CalledProcessError exception
result = subprocess.check_output(['cmd', 'arg1', 'arg2']).decode('utf-8')  
# Collect both standard output and standard errors
result = subprocess.check_output(['cmd', 'arg1', 'arg2'], stderr=subprocess.STDOUT).decode('utf-8')  
# Execute shell commands (pipes, redirects, etc.), you can use shlex.quote() to double quote the arguments to cause
result = subprocess.check_output('grep python | wc > out', shell=True).decode('utf-8')

2.12 Do not repeat the wheel

Don't build wheels repeatedly. Python is called batteries included, which means that Python provides solutions to many common problems.

3. Common tools


3.1 Reading and writing CSV files

import csv
# Read and write without header
with open(name, 'rt', encoding='utf-8', newline='') as f: # newline='' lets Python not handle line feeds uniformly
    for row in csv.reader(f):
        print(row[0], row[1]) # CSV reads all data as str
with open(name, mode='wt') as f:
    f_csv = csv.writer(f)
    f_csv.writerow(['symbol', 'change'])

# Read and write with header
with open(name, mode='rt', newline='') as f:
    for row in csv.DictReader(f):
        print(row['symbol'], row['change'])
with open(name, mode='wt') as f:
    header = ['symbol', 'change']
    f_csv = csv.DictWriter(f, header)
    f_csv.writeheader()
    f_csv.writerow({'symbol': xx, 'change': xx})

When csv file is too large, there will be an error. _csv.Error: field larger than field limit (131072), fix by changing the limit

import sys
csv.field_size_limit(sys.maxsize)

csv can also read data split by \t

f = csv.reader(f, delimiter='\t')

3.2 Iterator tools

A number of iterator tools are defined in itertools, such as the subsequence tool.

import itertools
itertools.islice(iterable, start=None, stop, step=None)
# islice('ABCDEF', 2, None) -> C, D, E, F

itertools.filterfalse(predicate, iterable) # Filter out elements whose predicate is False
# filterfalse(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6

itertools.takewhile(predicate, iterable) # stop iterating when predicate is False
# takewhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 1, 4

itertools.dropwhile(predicate, iterable) # start iterating when predicate is False
# dropwhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6, 4, 1

itertools.compress(iterable, selectors) # select based on whether each element of selectors is True or False
# compress('ABCDEF', [1, 0, 1, 0, 1, 1]) -> A, C, E, F

Sequence sorting.

sorted(iterable, key=None, reverse=False)

itertools.groupby(iterable, key=None) # group by value, iterable needs to be sorted first
# groupby(sorted([1, 4, 6, 4, 1])) -> (1, iter1), (4, iter4), (6, iter6)

itertools.permutations(iterable, r=None) # Arrange, return value is Tuple
# permutations('ABCD', 2) -> AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC

itertools.combinations(iterable, r=None) # Combinations, return value is Tuple
itertools.combinations_with_replacement(...)
# combinations('ABCD', 2) -> AB, AC, AD, BC, BD, CD

Multiple sequences are merged.

itertools.chain(*iterables) # Multiple sequences directly concatenated
# chain('ABC', 'DEF') -> A, B, C, D, E, F

import heapq
heapq.merge(*iterables, key=None, reverse=False) # Multiple sequences in order
# merge('ABF', 'CDE') -> A, B, C, D, E, F

zip(*iterables) # Stop when the shortest sequence is exhausted, the result can only be consumed once
itertools.zip_longest(*iterables, fillvalue=None) # Stop when the longest sequence is exhausted, the result can only be consumed once

3.3 Counters

A counter counts the number of occurrences of each element in an iterable object.

import collections
# Create
collections.Counter(iterable)

# frequency
collections.Counter[key] # frequency of key occurrences
# return the n most frequent elements and their corresponding frequencies, if n is None, return all elements
collections.Counter.most_common(n=None)

# Insert/Update
collections.Counter.update(iterable)
counter1 + counter2; counter1 - counter2 # counter plus or minus

# Check if two strings have the same constituent elements
collections.Counter(list1) == collections.Counter(list2)

3.4 Dict with default values

When accessing a non-existent Key, defaultdict will set it to some default value.

import collections
collections.defaultdict(type) # When a dict[key] is accessed for the first time, type is called without arguments, providing an initial value for the dict[key].

3.5 Ordered Dict

import collections
OrderedDict(items=None) # Preserve the original insertion order when iterating

4. High Performance Programming and Debugging


4.1 Outputting error and warning messages

Outputting messages to standard errors

import sys
sys.stderr.write('')

Exporting warning messages

import warnings
warnings.warn(message, category=UserWarning)  
# The values of category are DeprecationWarning, SyntaxWarning, RuntimeWarning, ResourceWarning, FutureWarning

Control the output of warning messages

$ python -W all # Output all warnings, equivalent to setting warnings.simplefilter('always')
$ python -W ignore # Ignore all warnings, equivalent to setting warnings.simplefilter('ignore')
$ python -W error # Convert all warnings to exceptions, equivalent to setting warnings.simplefilter('error')

4.2 Testing in code

Sometimes for debugging purposes, we want to add some code to our code, usually some print statements, which can be written as.

# in the debug part of the code
if __debug__:
    pass

Once debugging is over, this part of the code will be ignored by executing the -O option on the command line:

$ python -0 main.py

4.3 Code style checking

Using pylint, you can perform a number of code style and syntax checks to catch errors before running

pylint main.py

4.4 Code consumption

Time consumption tests

$ python -m cProfile main.py

Test a block of code for time consumption

# block definition
from contextlib import contextmanager
from time import perf_counter

@contextmanager
def timeblock(label):
    tic = perf_counter()
    try:
        yield
    finally:
        toc = perf_counter()
        print('%s : %s' % (label, toc - tic))

# Code block time consumption test
with timeblock('counting'):
    pass

Some principles of code consumption optimization

  1. Focus on optimizing where performance bottlenecks occur, not on the entire code.
  2. Avoid using global variables. Local variables are faster to find than global variables, and running code with global variables defined in a function is typically 15-30% faster.
  3. Avoid using . to access properties. It is faster to use from module import name and to put the frequently accessed class member variable self.member into a local variable.
  4. Use built-in data structures as much as possible. str, list, set, dict, etc. are implemented in C and run quickly.
  5. Avoid creating unnecessary intermediate variables, and copy.deepcopy().
  6. String splicing, e.g. a + ':' + b + ':' + c creates a lot of useless intermediate variables, ':'.join([a, b, c]) is much more efficient. Also consider whether string splicing is necessary; for example, print(':'.join([a, b, c])) is less efficient than print(a, b, c, sep=':').

5. Other Python tricks


5.1 argmin and argmax

items = [2, 1, 3, 4]
argmin = min(range(len(items)), key=items.__getitem__)

argmax is the same.

5.2 Transposing two-dimensional lists

A = [['a11', 'a12'], ['a21', 'a22'], ['a31', 'a32']]
A_transpose = list(zip(*A)) # list of tuple
A_transpose = list(list(col) for col in zip(*A)) # list of list

5.3 Expanding a one-dimensional list into a two-dimensional list

A = [1, 2, 3, 4, 5, 6]

# Preferred.
list(zip(*[iter(A)] * 2))