
What is FinTech?
An industry composed of companies that use new technology and innovation, to compete in the marketplace of traditional financial institutions and intermediaries.

6 core functions of the financial system (article: “Form Follows Function” by Crane and Bodie)
 Clearing and settling payments  Blockchain, Paypal, Klarna
 Pooling of resources and divisibility (ex. shares)  Crowdfunding
 Transfer economic resources across time, borders and industries  Roboadvising, BitCoin
 Risk management  ETFs, roboadvising
 Price information (aids decentralized decision making)  credit score models based on ML (groceries)
 Dealing with incentive problems (ex ante and ex post asymmetric info)  Smart contracts

Why is fintech happening now?
 * Financial system is inefficient (unit cost of financial intermediation constant at 2% for 130 years!)
 * Humans have terrible track record in asset management
 * Computing power
 * Explosion of useful and available data (Big data)

Strategies for incumbent banks and other companies in relation to fintech?
 Do nothing/wait
 Acquisition of fintech startups
 Convert to fintech
 Partner with fintech

What is an ERD?
= “Entity Relationship Diagram” → describes the relationship between different tables

What is a key in an ERD?
 Minima subset of attributes that acts as a unique identifier for tuples in a relation (if two tuples agree on the values of the key, then they must be the same tuple!)
 Primary key (within a table)
 Foreign Key (between tables)

In which order are the SQL statements evaluated?
 SELECT (5)
 FROM (1)
 WHERE (2)
 GROUP BY (3)
 HAVING SUM (4)

“What is Machine Learning?”
Machine learning is the study of algorithms that:
Improve their performance P, at some task T, with experience E.
We want to generalize from its experience. The probability distribution is unknown.

“What is statistical learning?”
Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics and baseball.

“What is Supervised learning?”
Supervised learning algorithms try to model relationships and dependencies between the target prediction output and the input features such that we can predict the output values for new data based on those relationships which it learned from the previous data sets.

What is Unsupervised learning?
The computer is trained with unlabeled data. Only the inputs/predictors, Xi, are observed.
These algorithms are particularly useful in cases where the human expert doesn’t know what to look for in the data (market segmentation, topic modeling). In the supervised learning case, we gave the computer the variables and the labels etc, then it picked the best model given these inputs.

“What is clustering (in unsupervised learning context)”?
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).

“What is topic modeling?”
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents.

“How do we produce a model in machine learning in the supervised case?”
 Assuming we have an observed set of training data, we run a statistical method to estimate the model. We can use either
 1. Parametric methods
 2. Nonparametric methods

What is a regression problem?
Regression problems (in machine learning): “Regression: The goal is to predict continuous or whole values, e.g. home prices.“

What is a classification problem?
Classification problems: “Classification: The goal is to predict discrete values, e.g. {1,0}, {True, False}, {spam, not spam}” These values/factors/classes/categories are sometimes numerical but always unordered. We cannot order tea and coffee in a continuous way.

Limitations of machine learning/statistical learning
ML algorithms can be really good at predictions but they are not really built for estimating parameters or inference. This means that it is hard to draw conclusion on what variable had a certain impact on the prediction and why/why not. The models are often complex and very hard to interpret (blackbox).

Describe the machine learning process (8 steps)
 Raw data
 Feature engineering (clean, new variables etc.)
 Split into training and test
 Build model on training data
 Evaluated on test set
 Tune hyperparameters
 Repeat 46
 Use the chosen model

The BiasVariance Tradeoff
More complicated models → bias down but variance up
 Tradeoff between flexibility and interpretability


“What are resampling methods?”
 Tools that involves repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain more information about the fitted model
 Model Assessment: estimate test error rates

Model Selection: select the appropriate level of model flexibility.
 Example of resampling methods:
 ross Validation (CV)randomly splitting the data into training and validation(testing) parts
 We then use the training part to build each possible model (i.e. the different combinations of variables) and choose the model that gave the lowest error rate when applied to the validation data
 Bootstrapping
 Obtain distinct data sets by repeatedly sampling observations from the original data set with replacement. Each of these “bootstrap data sets" is created by sampling with replacement, and is the same size as our original dataset. As a result some observations may appear more than once in a given bootstrap data set and some not at all

