Fundamentally, a function is a relationship (mapping) between the values of some set

A function can map a set to itself. For example,

The set you are mapping *from* is called the **domain**.

The set that is being mapped *to* is called the **codomain**.

The **range** is the subset of the codomain which the function actually maps to (a function doesn't necessarily map to *every* value in the codomain. But where it does, the range equals the codomain).

Functions which map to **scalar-valued** or **real-valued** functions.

Functions which map to **vector-valued** functions.

An identity function maps something to itself:

That is, for every

Say we have a function

We say **invertible** if and only if there exists a function **function composition**, i.e.

The inverse of a function is *unique*, that is, it is *surjective* and *injective* (described below), that is, there is a unique

A **surjective** function, also called "onto", is a function *at least* one

This is equivalent to:

An **injective** function, also called "one-to-one", is a function *at most* one

That is, not all *one* corresponding

A function can be both surjective and injective, which just means that for every

As mentioned before, the inverse of a function is both surjective and injective!

A convex function is a continuous function whose value at the midpoint of every interval in its domain does not exceed the arithmetic mean of its values at the ends of the interval. (Convex Function. Weisstein, Eric W. Wolfram MathWorld)

A **convex** region is one in which any two points in the region can be joined by a straight line that does not leave the region.

Which is to say that a convex function has a minimum, and only one (and this is also the only position where the derivative is 0).

More formally, a function is convex if the second derivative is positive everywhere. A function can be convex on a range

In higher dimensions, these derivatives aren't scalar values, so we instead define convexity if the *Hessian* *positive semidefinite* (notated *strictly* convex if *positive definite* (notated

**Transcendental** functions are those that are not polynomial, e.g.

Logarithms are frequently encountered. They have many useful properties, such as turning multiplication into addition:

Multiplying many small numbers is problematic with computers, leading to underflow errors. Logarithms are commonly used to turn this kind of multiplication into addition and avoid underflow errors.

Note that

Often you may see a distinction made between solving a problem **analytically** (sometimes **algebraeically** is used) and solving a problem **numerically**.

Solving a problem analytically means you can exploit properties of the objects and equations, e.g. through methods from calculus, avoiding substituting numerical values for the variables you are manipulating (that is, you only need to manipulate symbols). If a problem may be solved analytically, the resulting solution is called a **closed form** solution (or the **analytic** solution) and is an exact solution.

Not all problems can be solved analytically; generally more complex mathematical models have no closed form solution. These problems are also often the ones of most interest. Such problems need to be *approximated* numerically, which involves evaluating the equations many times by substituting different numerical values for variables. The result is an approximate (**numerical**) solution.

You'll often see a caveat with algorithms that they only work for linear models. On the other hand, some models are touted for their capacity for nonlinear models.

A **linear model** is a model which takes the general form:

Note that this function does not need to produce a literal line. The "linear" constraint does not apply to the predictor variables

"Linear" refers to the parameters; i.e. the function must be "linear in the parameters", meaning that the parameters

A **nonlinear model** includes parameters such as *not* linear) or transcendental functions.

Many artificial intelligence and machine learning algorithms are based on or benefit from some kind of *metric*. In this context the term has a concrete definition.

The typical case for metrics is around similarity. Say you have a bunch of random variables

How do we define "similar"?

We'll use a distance function

*reflexivity*:$\mu(v,v)=0$ for all$v$ *symmetry*:$\mu(v_1,v_2)=\mu(v_2, v_1)$ for all$v_1, v_2$ *triangle inequality*:$\mu(v_1, v_2) \leq \mu(v_1, v_3) + \mu(v_3, v_2)$ for all$v_1, v_2, v_3$

If all these are satisfied, we say that **metric**.

If only reflexivity and symmetry are satisfied, we have a **semi-metric** instead.

So we can create a *feature*

that the lower the distance (metric), the higher the probability.

- Convex Function. Weisstein, Eric W. Wolfram MathWorld.