Task 1 – Theory and Understanding 30 pts

Home

Get App

Take Quiz

Create

Explain Data information and knowledge

Data are just values and signals that are stored in our database. By themselves these values dont have a meaning but if we combine them together with some type of schema or with metadata suitable for our values&signals we can say that the data has meaning and therefore should be classified as information instead.Knowledge is what we can get from using the information by for example draw statistics from it to identify trends or other relevant insights the information can provide when used properly.
What does ACID stand for?

ACID is a set of principles used to guarantee the reliability of database transactions.atomicity, consistency, isolation, durability

Atomicity: All or nothing in that when a "chain" of actions are supposed to be preformed either all of these actions are preformed or non are.This makes sure that the database changes as expected or not at all.

Consistency: Changes made must bring the database to a valid stat meaning this is follow all rules set up for the database. By doing this we can make sure that it moves from one valid state to another.

Isolation: This means that transactions preformed on the data should not cause issues for other transactions from for example other users, this could be achieved with locks and helps with consistency and completeness.

Durability: This makes sure data from the database does not disappear and is saved correctly so that is does not vanish even though there might be something like a database crash. Normally this is achieved through backups in a secure storage.
What are "prepared statments" in JDBC? What are they used for?

"The JDBC Statement are used as a Statement object to execute a SQL statement; CallableStatement, and PreparedStatement interfaces define the methods and properties that enable you to send SQL or PL/SQL commands and receive data from your database.They also define methods that help bridge data type differences between Java and SQL data types used in a database."https://www.tutorialspoint.com/jdbc/jdbc-statements.htmPrepared statements can take input parameters and are good for statements containing operations we intend to use may times.
1.4 Explain the concept of a "full table scan"

This searches a table in a sequential order row by row and the columns are checked for the validity of some condition. " Full table scans [2] are usually the slowest method of scanning a table due to the heavy amount of I/O reads required from the disk which consists of multiple seeks as well as costly disk to memory transfers."
Explain what an index is in databases
The first column is the Search key that contains a copy of the primary key or candidate key of the table. These values are stored in sorted order so that the corresponding data can be accessed quickly. * Note: The data may or may not be stored in sorted order.
- It is a data structure technique which is used to quickly locate and access the data in a database.Indexing makes columns faster to query by creating pointers to where data is stored within a database.
- There are two types of databases indexes:
- 1. Clustered
- 2. Non-clustered
The first column is the Search key that contains a copy of the primary key or candidate key of the table. These values are stored in sorted order so that the corresponding data can be accessed quickly. * Note: The data may or may not be stored in sorted order.

The second column is the Data Reference or Pointer which contains a set of pointers holding the address of the disk block where that particular key value can be found.
What is an index seek?
1.5 Difference of well-formedness and validity?

We can add a DTD, this is like a structure that the XML file to be validated must match. We say what elements and attributes must be in the xml file in order for us to validate it. If the XML file has a DD and does not follow that structure it will not be valid.
Describe map reduce
Map shuffle reduce , example count words with 100 documents and 10 machines.
- The data is first mapped to the machines, they split the data as evenly as possible between the machines. Each machine then starts to process the data.
- The 100 docs are split over the machines so that each one take 10 docs each and then count.the result from counts are then called keys and these keys are sorted/classified merged so that the computers can count each classification.
What does the CRUD acronym stad for?

CRUD is an acronym that stands for Create, Read, Update, and Delete.

These are the four most basic operations that can be performed with most traditional database systems and they are the backbone for interacting with any database.

The first letter of CRUD, ‘C’, refers to CREATE aka add, insert.

The second letter of CRUD , ‘R’, refers to SELECT (data retrieval) operation. The word ‘read’ retrieves data or record-set from a listed table(s). SQL uses the SELECT command to retrieve the data.

The third letter of CRUD, ‘U’, refers to Update operation. Using the Update keyword, SQL brings a change to an existing record(s) of the table.

The last letter of the CRUD operation is ‘D’ and it refers to removing a record from a table. SQL uses the SQL DELETE command to delete the record(s) from the table.

https://www.sqlshack.com/crud-operations-in-sql-server/
Replication in DBMS?
example of two models.
- Data Replication is the process of storing data in more than one site or node. It is useful in improving the availability of data. It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.
- '
- Master-slave replication
- Master-master replication
- Master-slave replication
- assumes that you have a single Master server that accepts both reads and writes and one or more read-only Slave servers. Data from the master server are asynchronously replicated to Slave servers.
- Master-master replication
- This type of replication assumes that you have two or more master nodes that can accept both read and write requests. In addition, you can have multiple slave nodes for each of your masters. The replication between master nodes is asynchronous.
https://dzone.com/articles/pros-and-cons-of-mysql-replication-types
Name and describe big V´s
- Volume - large amounts of data
- Velocity - speed of new incoming data
- variety - data comes in many diffrent form and formats
- Veracity - data usually contains data we are actually not intrested in so this has to be cleaned away.
- Volatility - Big data volatility refers to how long is data valid and how long should it be stored. In this world of real time data you need to determine at what point is data no longer relevant to the current analysis.
various Pros and Cons of SQL as well as NoSQL?
SQL Pros:
- It is highly suitable for relational databases.
- Has a predefined schema which is helpful in many cases.
- Normalization can be greatly used here, thus it also helps in removing redundancy and organizing data in a better way.
- Transactions in SQL databases are ACID compliant, thereby guarantees security and stability.
- Follows well-defined standards like ISI and ANSI which are accepted worldwide.
- Code-free.
- Unbeatable speed in retrieving database records with great ease.
- Uses single standardized language i.e SQL across different RDBMS.
- SQL Cons:
- The process of interfacing is complex.
- As SQL is an object, it occupies space.
- Handling Big data is very costly as you will have to increase the hardware for scaling.
- When a table is dropped, the view becomes inactive.
- NoSQL Pros:
- Capable of handling big data.
- As it is schema-less and table free, it offers a high level of flexibility with data models.
- It is a low-cost database and the open source NoSQL databases provide very affordable solutions to small enterprises.
- Easier and low-cost scalability. You don’t need to increase the hardware for scaling. You just need to add more servers to the pool as NoSQL is schema-free and built on distributed systems.
- Detailed database modeling is not required here. Hence it saves time and effort.
- NoSQL Cons:
- The benefits of NoSQL come at the cost of relaxing ACID properties. NoSQL offers only eventual consistency.
- Relatively less community support.
- Lacks standardization, unlike SQL, which in turn creates some issues during migration.
- Inter-operability is also a concern in the case of NoSQL databases.
When to Use NoSQL?
- Given below are the use cases where you should prefer using NoSQL databases:
- To handle a huge volume of structured, semi-structured and unstructured data.
- Where there is a need to follow modern software development practices like Agile Scrum and if you need to deliver prototypes or fast applications.
- If you prefer object-oriented programming.
- If your relational database is not capable enough to scale up to your traffic at an acceptable cost.
- If you want to have an efficient, scale-out architecture in place of an expensive and monolithic architecture.
- If you have local data transactions that need not be very durable.
- If you are going with schema-less data and want to include new fields without any ceremony.
- When your priority is easy scalability and availability.When to Avoid NoSQL?
- Enlisted below are some pointers that would guide you on when to avoid NoSQL.
- If you are required to perform complex and dynamic querying and reporting, then you should avoid using NoSQL as it has a limited query functionality. For such requirements, you should prefer SQL only.
- NoSQL also lacks in the ability to perform dynamic operations. It can’t guarantee ACID properties. In such cases like financial transactions, etc., you may go with SQL databases.
- You should also avoid NoSQL if your application needs run-time flexibility.
- If consistency is a must and if there aren’t going to be any large-scale changes in terms of the data volume, then going with the SQL database is a better option.
One should also keep in mind that NoSQL databases won’t support structured query language. The querying language may vary from one database to another.
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
Structured Data
- Data that is the easiest to search and organize, because it is usually contained in rows and columns and its elements can be mapped into fixed pre-defined fields, is known as structured data. Think about what data you might store in an Excel spreadsheet and you have an example of structured data.
- In structured data, entities can be grouped together to form relations (‘customers’ that are also ‘satisfied with the service). This makes structured data easy to store, analyze and search and until recently was the only data easily usable for businesses. Often used with SQL.
Unstructured Data

A much bigger percentage of all the data is our world is unstructured data. Unstructured data is data that cannot be contained in a row-column database and doesn’t have an associated data model. Think of the text of an email message. The lack of structure made unstructured data more difficult to search, manage and analyse, which is why companies have widely discarded unstructured data, until the recent proliferation of artificial intelligence and machine learning algorithms made it easier to process.

Other examples of unstructured data include photos, video and audio files, text files, social media content, satellite imagery, presentations, PDFs, open-ended survey responses, websites and call center transcripts/recordings.

Instead of spreadsheets or relational databases, unstructured data is usually stored in data lakes, NoSQL databases, applications and data warehouses.

Semi-Structured Data

Beyond structured and unstructured data, there is a third category, which basically is a mix between both of them. The type of data defined as semi-structured data has some defining or consistent characteristics but doesn’t conform to a structure as rigid as is expected with a relational database. Therefore, there are some organizational properties such as semantic tags or metadata to make it easier to organize, but there’s still fluidity in the data.

Email messages are a good example. While the actual content is unstructured, it does contain structured data such as name and email address of sender and recipient, time sent, etc.
What is a database index in a relational database?
What (specific) problem does
it solve?

Provide an example database table, and a query on this table that you would
expect to be improved by an index.

Further, provide an example for a query on the
same table that would not be improved by the index. (5 pts).
It is a data structure technique which is used to quickly locate and access the data in a database.Indexing makes columns faster to query by creating pointers to where data is stored within a database.

Author

ccc

352199

Card Set

Task 1 – Theory and Understanding 30 pts

Description

Task 1 – Theory and Understanding 30 pts

Updated

2020-06-04T21:13:22Z