Skip to main content

Apache Spark 2.0 - CSV to Oracle Table

A quick exercise to easily load a csv file with header, into an oracle table, using spark ( spark session created when launching spark-shell )

val prop = new java.util.Properties
prop.setProperty("user","username")
prop.setProperty("password","password")

val vd = spark.read.option("header", "true").option("quote","'").option("escape","^").csv("file:///C:/tmp/test.csv")
vd.write.mode("overwrite").jdbc("jdbc:oracle:thin:@host:port/SID","table_name",prop)

Comments

Popular posts from this blog

Generate SQL Loader Control Files

There is an easy way to generate SQL Loader control files using SQL Developer Import functionality as described by the post below http://www.thatjeffsmith.com/archive/2012/08/using-oracle-sql-developer-to-setup-sqlloader-runs/ Using Oracle SQL Developer to Setup SQL*Loader Runs by  thatjeffsmith   on August 15, 2012   55 comments Tell Others About This Story: 0 1 0 I've done a couple of posts on how to use SQL Developer to load your Excel files to an Oracle table. However, I always wonder how many folks realize there may be a 'better way.' If you are loading data to Oracle on a regular basis, there's a few things I want you to know about: SQL*Loader Data Pump External Tables SQL*Loader is a client tool. It runs on your desktop and connects to the database over SQL*Net. It's part of the Oracle Client installation. It reads one or more files, parses the data, and loads it to your tables. A control (CTL) file that defines how the data i...

NodeJS MSSQL connection error "EINVALIDSTATE"

It took me some time to find the reasons for the connection error below: Requests can only be made in the LoggedIn state, n…", code: "EINVALIDSTATE" I was trying to connect to MS Sql server using Node JS TEDIOUS package. The code from the Microsoft  is very simple and didn't return a more complete error message. This is the original code: ... var connection = new Connection(config);  connection.on( 'connect' , function ( err ) {  // If no error, then good to proceed.  console .log( "Connected" );  executeStatement(); }); ... Checking the internet a bit more, I found a way to get more insight into the error: ... var  connection =  new  Connection(config);  connection.on( 'connect' ,  function ( err )  {  if (err) return console.error(err);   console .log( "Connected" );  executeStatement(); }); ... That raised: ConnectionError {message: "Failed to connect to 10.0.0.100:1433 - c...

AWS Athena and S3 Partitioning

Athena is a great tool to query your data stored in S3 buckets. To have the best performance and properly organize the files I wanted to use partitioning. Although very common practice, I haven't found a nice and simple tutorial that would explain in detail how to properly store and configure the files in S3 so that I could take full advantage of the Athena partitioning features. After some testing, I managed to figure out how to set it up. This is what I did: Starting from a CSV file with a datetime column, I wanted to create an Athena table, partitioned by date. A basic google search led me to this  page , but It was lacking some more detailing. The biggest catch was to understand how the partitioning works. Based on a datetime column(processeddate), I had to split the date into the year, month and day components to create new derived columns, which in turn I'll use as the partition keys to my table Example of d...