Interview Questions About Data & Data Warehouse & SQL
Hello after a long time :)
As long as I work and in the light of the knowledge I have acquired, I tried to compile sample questions that may be encountered by those who want to pursue a career in this field during job interviews.
QUESTIONS
Today, many Data Warehouse projects can fail. What could be the root causes of these failures?
- Needs analysis is not done or not done enough,
- Lack of experienced resources,
- Lack of competent data warehouse architects,
- No competent ETL developer,
- Lack of competent BI developer,
- Lack of competent BI analysts,
- Not knowing the competencies and “best practices” of the database or databases used,
the ETL product, etc., No specific report inventory
- Report-based improvements for fast printouts,
- Business units have not clarified their needs on their side,
- The source analysis has not been done or is done insufficiently,
- The relationships of the tables — this relationship may be a PK-PK relationship or a logical relationship — are not sufficiently determined,
- Lack of documentation for source systems,
- Deficiencies in the analysis of business processes in different systems,
- Insufficient data quality, ETL architecture is not planned,
- Not enough time to devote to ETL development,
- Developing ETL jobs in the production environment, not in the test environment, Commissioning without performance tests,
- Creating models in which tables in different layers — for example, ODS or STG — are used directly in reports and connected with tables in the DM layer,
- The data in the sources are not standardized, to be used in processes as it is without increasing data quality,
- Adding DWH to the beginning or end of the source table and calling it DWH table,
- Creating separate reports for each interval instead of the time dimension structure for the calculations made for different time windows,
- Not knowing Fact and Dim structures, looking at every figure as fact, every verbal data as dim,
- Keeping unnecessary columns on the table, saying that it is necessary for the future,
- Deleting the data deleted from the source from the DWH environment with the initial transfer,
- Keeping personal or critical information in random tables without taking into account legal regulations such as KVKK / GDPR,
- Using fields such as TCKN, VKN as keys,
- Considering the data warehouse as a calculation center, performing the necessary operations on other systems at DWH, and then transferring them back to the relevant system,
-Managers who do not have knowledge of data management concepts underestimate this whole process and think as if it is a simple task.
When working in the ETL field, how can we combine data from different sources?
Cannot be combined without transform; however, a separate server may not be required for this job. Transform job is shared between Extract and Load. I understand that the transform is transferred to the load stage, as a result, the information received from the source can be processed in some way while being loaded to the target.
What would be your solution as a result of deleting the data deleted from the source from the DWH environment?
Backup can be made. We can transfer all information coming from a different source under a different schema and do DWH studies under a different schema.
Is it more advantageous to write code for ETL or use a program designed for this job?
You may think that the answer is obvious from the arrival of the problem, but the situation is not quite as expected. Both have advantages over the location. The important thing is to reveal these advantages and disadvantages if any.
What are the differences between the “delete” and “truncate” commands?
The main difference is that the delete command deletes one row from the table, while truncate deletes them all. Another important difference is that after using delete, it is used to undo when a problem occurs. Such a thing cannot be done when using the truncate command. Also, truncate is pretty fast.
What is “Denormalization”? Can you explain briefly?
Of course, by the way, it is a very broad subject. Available forms. We can express it briefly as follows. Denormalization is the process by which we access the database from parent forms to subforms.
Can it be said that NULL values are equal to zero?
No, it cannot be said. “Zero” is a certain number format. NULL represents the absence of a character due to its unknown or absent character.
How to create an empty table over the existing one?
Using the following sample command:
Select * into employeecopy from employee where 1 = 2
I tried to answer the questions as best I could. I hope that will be useful …
Best regards.