1. What is FinTech?
    An industry composed of companies that use new technology and innovation, to compete in the marketplace of traditional financial institutions and intermediaries.
  2. 6 core functions of the financial system (article: “Form Follows Function” by Crane and Bodie)
    • Clearing and settling payments - Blockchain, Paypal, Klarna
    • Pooling of resources and divisibility (ex. shares) - Crowdfunding
    • Transfer economic resources across time, borders and industries - Robo-advising, BitCoin
    • Risk management - ETFs, robo-advising
    • Price information (aids decentralized decision making) - credit score models based on ML (groceries)
    • Dealing with incentive problems (ex ante and ex post asymmetric info) -  Smart contracts
  3. Why is fintech happening now?
    • * Financial system is inefficient (unit cost of financial intermediation constant at 2% for 130 years!)
    • * Humans have terrible track record in asset management
    • * Computing power
    • * Explosion of useful and available data (Big data)
  4. Strategies for incumbent banks and other companies in relation to fintech?
    • Do nothing/wait
    • Acquisition of fintech start-ups
    • Convert to fintech
    • Partner with fintech
  5. What is an ERD?
    = “Entity Relationship Diagram” → describes the relationship between different tables
  6. What is a key in an ERD?
    • Minima subset of attributes that acts as a unique identifier for tuples in a relation (if two tuples agree on the values of the key, then they must be the same tuple!)
    • Primary key (within a table)
    • Foreign Key (between tables)
  7. In which order are the SQL statements evaluated?
    • SELECT (5)
    • FROM (1)
    • WHERE (2)
    • GROUP BY (3)
    • HAVING SUM (4)
  8. “What is Machine Learning?”
    Machine learning is the study of algorithms that:

    Improve their performance P, at some task T, with experience E.

    We want to generalize from its experience. The probability distribution is unknown.
  9. “What is statistical learning?”
    Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics and baseball.
  10. “What is Supervised learning?”
    Supervised learning algorithms try to model relationships and dependencies between the target prediction output and the input features such that we can predict the output values for new data based on those relationships which it learned from the previous data sets.
  11. What is Unsupervised learning?
    The computer is trained with unlabeled data. Only the inputs/predictors, Xi, are observed.

    These algorithms are particularly useful in cases where the human expert doesn’t know what to look for in the data (market segmentation, topic modeling). In the supervised learning case, we gave the computer the variables and the labels etc, then it picked the best model given these inputs.
  12. “What is clustering (in unsupervised learning context)”?
    Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
  13. “What is topic modeling?”
    In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents.
  14. “How do we produce a model in machine learning in the supervised case?”
    • Assuming we have an observed set of training data, we run a statistical method to estimate the model. We can use either
    • 1. Parametric methods
    • 2. Non-parametric methods
  15. What is a regression problem?
    Regression problems (in machine learning): “Regression: The goal is to predict continuous or whole values, e.g. home prices.“
  16. What is a classification problem?
    Classification problems: “Classification: The goal is to predict discrete values, e.g. {1,0}, {True, False}, {spam, not spam}” These values/factors/classes/categories are sometimes numerical but always unordered. We cannot order tea and coffee in a continuous way.
  17. Limitations of machine learning/statistical learning
    ML algorithms can be really good at predictions but they are not really built for estimating parameters or inference. This means that it is hard to draw conclusion on what variable had a certain impact on the prediction and why/why not. The models are often complex and very hard to interpret (blackbox).
  18. Describe the machine learning process (8 steps)
    • Raw data
    • Feature engineering (clean, new variables etc.)
    • Split into training and test
    • Build model on training data
    • Evaluated on test set
    • Tune hyperparameters
    • Repeat 4-6
    • Use the chosen model
  19. The Bias-Variance Tradeoff
    More complicated models → bias down but variance up

    • Trade-off between flexibility and interpretability
    • Image Upload 1

    Image Upload 2
  20. “What are resampling methods?”
    • Tools that involves repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain more information about the fitted model
    • Model Assessment: estimate test error rates
Model Selection: select the appropriate level of model flexibility.

    • Example of resampling methods:
    • ross Validation (CV)randomly splitting the data into training and validation(testing) parts
    • We then use the training part to build each possible model (i.e. the different combinations of variables) and choose the model that gave the lowest error rate when applied to the validation data
    • Bootstrapping
    • Obtain distinct data sets by repeatedly sampling observations from the original data set with replacement. Each of these “bootstrap data sets" is created by sampling with replacement, and is the same size as our original dataset. As a result some observations may appear more than once in a given bootstrap data set and some not at all
Card Set