I admit it - I've a lot to learn when it comes to the internals of big data technologies. Even though I've worked on many data science projects at Goibibo, my knowledge of the principles involved is very limited. So when this project happened, where we needed to build some data pipelines for business needs, I found the perfect opportunity to explore the Hadoop and Hive ecosystem. This blog post is about a new tool that I came across, HPL/SQL.
Years back, the database world was ruled by relational databases, chief amongst them being Oracle. And it had this great language for writing queries - PL/SQL. From the Oracle website,
Big data systems
Years back, the database world was ruled by relational databases, chief amongst them being Oracle. And it had this great language for writing queries - PL/SQL. From the Oracle website,
PL/SQL is a procedural language designed specifically to embrace SQL statements within its syntax. PL/SQL program units are compiled by the Oracle Database server and stored inside the database. And at run-time, both PL/SQL and SQL run within the same server process, bringing optimal efficiency. PL/SQL automatically inherits the robustness, security, and portability of the Oracle Database.However, the explosion in information technology requirements led to a humongous amount of data being available for analysis. This gave rise to two other classes of databases:
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them.NOSQL database
NoSQL (originally referring to "non SQL") database provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
So depending on the problem at hand, engineers now had a host of RDBMS (Oracle, MySQL, Aurora etc), NOSQL (Cassandra, MongoDB, Redis etc), and Big Data systems (Hadoop) to choose from.
This gave rise to a complex challenge - most data analysis earlier could be powered by the same RDBMS alone - example, businesses could build intelligence using stored procedures and triggers in PL/SQL on Oracle. In the world of multiple databases, engineers now have to learn multiple tools for data collection, analysis and so on - example Spark, Mapreduce, Hive and so on.
Different ecosystems and data analysis tools started coming out with their approaches to solve this problem. For example, connecting to multiple databases, custom SQL that is compatible with a covered set of databases and so on. However, the coverage has not been big so far.
Different ecosystems and data analysis tools started coming out with their approaches to solve this problem. For example, connecting to multiple databases, custom SQL that is compatible with a covered set of databases and so on. However, the coverage has not been big so far.
To solve this complex problem, Hive comes with a language/tool called as HPL/SQL. The promise of HPL/SQL is simple - it is a hybrid and heterogeneous language that understands syntax of most procedural SQL dialect, and can be used with any databases, running existing Oracle PL/SQL code on Apache Hive and Microsoft SQL Server, or running Transact-SQL on Oracle, Cloudera Impala or Amazon Redshift. So, you can use it as a language to write new procedural SQL with Hive, and you can use it as a tool to execute existing PL/SQL and logic that you have written for other databases.
Compared to writing code in Python, Java or Scala, HPL/SQL is much more friendly for the BI/SQL expert to run, and allows them to hit the ground running faster. It offers features like
- Control Flow Statements
- Built-in Functions
- Stored Procedures, Functions and Packages
- Exception and Condition Handling
- On-the-fly SQL Conversion
So, if you are exploring an efficient way to implement ETL processes in Hadoop, give HPL/SQL a try. Since Hive 2.0, HPL/SQL comes built in with Hive. All you need to do is,
When executing commands directly:
hplsql -e "SELECT * FROM src LIMIT 1"
When running from a file:
hplsql -f script.sql
You can go through the documentation to get more details if required.
I want to know if it supports inserting data by partition.
ReplyDeleteinsert into tablename partition(partition_name) from select * from table;
but it is error .
Function:not found partition
wish you see that
Deletei don`t konw why my name is unknown .
ReplyDeletethis is my email : 10372756845aa@gmail.com
The control, stockpiling, and correspondence of advanced data require capable utilization of calculations. Software engineering encourages the use of these calculations.ExcelR Data Science Courses
ReplyDeleteAwesome blog. I enjoyed reading your articles. This is truly a great read for me. Keep up the good work!
ReplyDeletedata analytics course
data science course
business analytics course
Excellent! I love to post a comment that "The content of your post is awesome" Great work!
ReplyDeletebest data analytics courses in mumbai
"Thank you for sharing wonderful information with us.Really useful for everyone data scientist courses
ReplyDelete"
If you are interested in data science, you may want to check out its training in data science course in india
ReplyDeleteI've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
ReplyDeleteData Science courses
Aivivu chuyên vé máy bay, tham khảo
ReplyDeletemua ve may bay tu han quoc ve viet nam
giá vé máy bay vinh hồ chí minh
mua vé máy bay đi hà nội
vé máy bay hải phòng nha trang
vé máy bay đi Mỹ giá rẻ 2021
I have bookmarked your website because this site contains valuable information in it. I am really happy with articles quality and presentation. Thanks a lot for keeping great stuff. I am very much thankful for this site.
ReplyDeletedata scientist training and placement
This is a splendid website! I"m extremely content with the remarks!.
ReplyDeletedata scientist training and placement in hyderabad
Thanks for posting the best information and the blog is very good .digital marketing institute in hyderabad
ReplyDeleteI wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
ReplyDeletedata analytics courses in hyderabad with placements
In this article, the author writes about HPL/SQL, a hybrid and heterogeneous language that understands syntax of most procedural SQL dialects and can be used with any databases. I appreciate sharing such an important insights here with the readers. Thank you.
ReplyDeleteIs iim skills fake?
Your writing is very useful and instructive. I frequently read these articles to expand my knowledge and better my comprehension. I appreciate you sharing.
ReplyDeleteData Analytics Courses in Agra
This is a great and intriguing article. I had been seeking this type of information and found it enjoyable to read. Please continue to publish, and thank you for sharing.
ReplyDeletedaa Analytics courses in leeds
The blog post on HPL/SQL is very informative and insightful. Thanks for sharing valuable information.
ReplyDeleteDigital Marketing Courses in Italy
Thank you for sharing in depth knowledge and insights on Procedural SQL on Hadoop, NoSQL and RDBMS.
ReplyDeleteAdwords marketing
such an informative blog post. great work
ReplyDeleteDigital marketing business
really well written article, great work on the blog post
ReplyDeleteGST Certification Course