On Performance

September 6, 2018

Why ?

economic efficiency ( do more and faster with less)
a major goal in code refactoring/optimization phase ( build first, optimize later )

Aspects of performance

Metrics ( monitoring )

availibility: down times, crash times, GC times
response time: average, .95 quantile
throughput: query per second, degree of concurrency
resource usage: cpu idle, memory used, disk used & io, network io, instance counts

Profiling analysis ( what costs how much )

request cost distribution over phases:

browser rendering time
external network latency
internal network latency
business backend execution time
database execution latency time

resource usage distribution over lines of codes:

memory allocation & release ( memory leak detection )
file handle allocation & release ( file handle leaks )

Dependency analysis ( what depends on what )

dependency defintion & managements ( package management tools, artifact repos )
dependency loops detection ( must be DAG to build and run )
conflicts resolution
visualization of dependencies

Benchmarking (comparison of changes)

set up test group and control group (baseline)
input & output standarization & quantification
statistical metric comparation of groups and times

Common bottlenecks of performance

single points of failure: any “master” node without backup or automatic election mechanism

Database slow queries

no index
vague or wrong usage of index
too many data rows

Business logic defects

unecessary nested loops or recursion
unecessary synchronized and/or sequential execution flow (n+1 issue)

How to improve performance

Code level

add specific and proper indexes
use batch execution
use connection pools (use random sleep)
guard read with cache middlewares
throttle write with message queues
use statics and consts when possible
proactively clean up costly objects
less synchronized logs

Arch level

parallelization and asynchronization when possible
scale up and down service instances with flow & load changes
do data partitions
read and write flow separation
hot and cold data separation
build up performance environments for testing

Build up performance environments for testing

real requests recording & playback
containment and provision
tests management for benchmarking

Tools

metrics collection: filebeat, fluentd for logs; prometheus for key-value metrics
metrics aggregation: logstash, fluentd
frontend solutions: kibana, grafana
visualization diagrams: comparison: histograms; analysis: flamegraph; relationship: force-directed graphs, circos
http request load & stress test: gatlin, ab
profiling: pprof for golang

Design patterns & mindsets

epoll in a loop ( servers )
async & await ( avoiding callback hell )
divide & conquer ( map then reduce )
trading time for space ( like file sorts ), or space for time ( any kinds of indexes )
event based ( fire and forget )
master-minions
separate list from detail
in distributed systems: pick two among consistency, availability and partition-tolerance