Go to Quantitative Methods

Topics

Table of Contents


Introduction


In this chapter, we will cover:



Measures of Central Tendency and Location


Tip

  • A population is defined as all members of a specified group.
  • A parameter describes the characteristics of a population.
  • A sample is a subset drawn from a population.

A sample statistic describes the characteristic of a sample.

For example, all stocks listed on a country’s exchange refers to a population. If 30 stocks are selected from the listed stocks, then this refers to a sample.

Sample statistics - such as measures of central tendency, measures of dispersion, skewness, and kurtosis - help make probabilistic statements about investment returns.


Measures of Central Tendency


Arithmetic Mean
The sample mean is the arithmetic mean calculated for a sample. It is expressed as:

X¯=i=1nXin

where n is the number of observations in the sample.

A drawback of the arithmetic mean is that it is sensitive to extreme values (outliers). It can be pulled sharply upward or downward by extremely large or small observations, respectively.

Median
The median is the midpoint of a data set that has been sorted into ascending or descending order.

As compared to a mean, a median is less affected by extreme values (outliers).

Mode
The mode is the most frequently occurring value in a distribution.

When working with continuous data such as stock returns, modal interval is often used instead of a mode. The data is divided into bins and the bin with the highest frequency is considered the modal interval.

Dealing with Outliers


When data contains outliers, there are three options to deal with the extreme values:

Option 1: Do nothing; use the data without any adjustment.
Option 2: Delete all the outliers.
Option 3: Replace the outliers with another value.

Tip

A trimmed mean excludes a stated percentage of the lowest and highest values and then calculates the arithmetic mean of the remaining values.

A winsorized mean assigns a stated percentage of the lowest values equal to one specified low value and a stated percentage of the highest values equal to one specified high value, and then computes a mean from the restated data.

Measures of Location


Quartiles, Quintiles, Deciles, and Percentiles

A quantile is a value at or below which a stated fraction of the data lies.

The formula for the position of a percentile in a data set with n observations sorted in ascending order is:

Ly=n+1100y
Tip

  • When L, is a whole number, the location corresponds to an actual observation.
  • When L is not a whole number or integer, L lies between the two closest integer numbers (one above and one below) and we use linear interpolation between those two places to determine P.

Interquartile range is the difference between the third and the first quartiles.

Example

Consider the data set:
47 35 37 32 40 39 36 34 35 31 44

  1. Find the 75 percentile point
  2. Find the 1 quartile and 3 quartile
  3. Calculate the interquartile range
  4. Find the 5 decile point
  5. Find the 6 decile point.

Solution
31, 32, 34, 35, 35, 36, 37, 39, 40, 44, 47

  1. L75=3412=9 → 40 is the value.
  2. 1 quartile = 34 and 3 quartile = 40.
  3. IQR = 40 - 34 = 6
  4. L50=5010012=6 → 36 is the value.
  5. L60=6010012=7.2
    Therefore, P60=37+0.2(3937)=37.4.

Box and Whiskers Plot


A box and whiskers plot is used to visualize the dispersion of data across quartiles.

Pasted image 20250907212946.png

There are several variations of the box and whiskers plot. Sometimes the whiskers may be a function of the interquartile range instead of the highest and lowest values.

Quantiles in Investment Practice




Measures of Dispersion


Measures of central tendency tell us where the investment results (expected returns) are centered.

However, to evaluate an investment we also need to know how returns are dispersed around the mean. Measures of dispersion describe the variability of outcomes around the mean.

Range


The range is the difference between the maximum and minimum values in a data set.

It is expressed as: Range = Max value – Min Value

Another way to specify the range is to mention the actual minimum and maximum values.

The range is easy to compute; however, it does not tell us much about how the data is distributed.

Mean Absolute Deviations


It is the average of the absolute values of deviations from the mean. It is expressed as:

MAD=i=1n|XiX¯|n

where X¯ is the sample mean and n is the number of observations in the sample.

Sample Variance and Sample Standard Deviation


Sample variance applies when we are dealing with a subset, or sample, of the total population. It is expressed as:

s2=i=0n(XiX¯)2n1

where X¯ is the sample mean and n is the number of observations in the sample.

Sample standard deviation is defined as the positive square root of the sample variance.

Downside Deviation and Coefficient of Variation


Variance and standard deviation of returns take account of returns above and below the mean, but often investors are concerned only with downside risk, for example returns below the mean.

The target downside deviation, or target semi-deviation, is a measure of the risk of being below a given target. It is calculated as the square root of the average squared deviations from the target, but it includes only those observations below the target (B).

The sample target semi-deivation can be calculated as:

STarget=(XiB)2n1
Tip

The target downside deviation will be less than the standard deviation, because deviations above the target are ignored. As the target is increased, the target downside deviation will increase.

Coefficient of variation expresses how much dispersion exists relative to the mean of a distribution and allows for direct comparison of dispersion across different data sets, even if the means are drastically different from one another.

It is used in investment analysis to compare relative risks. When evaluating investments, a lower value is better.

Coefficient of variation is expressed as: CV=sX where s is sample standard deviation of a set of observations and X is the sample mean.



Measures of Shape of a Distribution


Mean and variance may not adequately describe an investment’s distribution of returns. To reveal other important characteristics of the distribution, we must look beyond measures of central tendency, location, and dispersion. One such characteristic is the degree of symmetry in return distributions.

Types of Distribution:

Tip

Investors prefer positive skewness because it has a higher chance of very large returns and also because it has a higher mean return.

Excess kurtosis = kurtosis – 3

An excess kurtosis with an | | > 1 is considered significant.

Pasted image 20250907215029.png



Correlation Between Two Variables


Scatter Plot


A scatter plot is a type of graph used to visualize the joint variation in two numerical variables. It is constructed with the x-axis representing one variable and the y-axis representing the other variable.

Dots are drawn to indicate the values of the two variables at different points in time. The pattern of a scatter plot may indicate no relationship, linear relationship or a non-linear relationship between the two variables.

In case of a linear relationship,

Covariance and Correlation


Covariance is a measure of how two variables move together. The formula for computing the sample covariance of X and Y is

SXY=i=1N(XiX¯)(YiY¯)n1

The problem with covariance is that it can vary between ± which makes it difficult to interpret.

Correlation is a standardized measure of the linear relationship between two variables with values ranging between ±1. The sample correlation coefficient can be calculated as

rXY=sXYsX×sY

Properties of Correlation


Correlation ranges from -1 and +1.

Limitations of Correlation Analysis


The correlation analysis has certain limitations: