Hello geeks, Welcome back to another article on Python PyMy Library. In this article, we will learn why we use this library and how we can install it, and at the end, we will go through some examples to get a better understanding of it.
PyMy Module
So, PyMy module gives us the programming interface to interact with redshift. This library provides Python users an extensive way to use redshift while using the python programming language. It helps to build connections between users and redshift using API calls. Moreover, it also gives us the functionality of creating tables, sending and receiving data from the cloud warehouse. Now, once we understand what PyMy library is and why it is used, it’s time to know how to use it. But before that, let’s see its installation.
Installation
To install it, you can type the following command in CLI.
pip install pymy
How to use pymy Library?
Step 1 : Setting up Environment Variables
Once we are completed with installation, let’s set the environmental variable with a redshift credential as described below to use it.
export MYSQL_{INSTANCE}_DATABASE=""
export MYSQL_{INSTANCE}_USERNAME=""
export MYSQL_{INSTANCE}_HOST=""
export MYSQL_{INSTANCE}_PORT=""
export MYSQL_{INSTANCE}_PASSWORD=""
[ Note: It is worth noting that our IP address is authorized for the MySQL cluster/instance. ]
Step 2: Creating Schema and Table
After setting up the credentials, let’s see the syntax for creating a table.
pymy.create_table({INSTANCE}, data, primary_key=(<column_name>), types=None)
Now, it has several parameters to control the schema of the table. Let’s understand them.
- Instance : It is the parameter which consist information of the environment variable and it must be same as the credentials of environmental variable we set.
- Data : It consists whole information about the database we are creating. There is a proper way to define those information so that data can be parsed successfully to the cloud. This parameter consist of following information:
- Name of Redshift Schema
- Name of Redshift Table
- Column Names
- Row / Column Values
- Primary Key: This parameter accepts the column name which we want to set as the primary key.
- Types: This parameter is used to set the datatype of the column. However, create_function() automatically detects it if we don’t define it explicitly. To set it manually we will use dictionary object to set the datatype whose key is column name and values are datatypes.
Step 3: Sending data to Redshift
Once we have created the table, let’s see how we can send data to the table. To do that, we will use the following syntax:
pymy.send_to_redshift(instance, data, replace=True, batch_size=1000, types=None, primary_key=(), create_boolean=False)
- Replace : This parameter tells whether we need to replace the table or append data in existing table.
- Batch Size: This parameter allows us to send data in batches. We can define number of batches as per our convenient else we need not to mention them.
Now, once we understand how we can create a table and send data, let’s see an example to understand it as a whole.
Example 1: Creating Database
# importing pymy library
import pymy
# creating schema and table with some values
data = {
"table_name" : 'animal.dog'
"columns_name" : ['name','size'],
"rows" : [['Pif','big'], ['Milou','small']]
}
pymy.create_table({INSTANCE}, data, primary_key=(), types=None)
Example 2: Sending Data to Database
import pymy
data = {
"table_name" : 'animal.dog'
"columns_name" : ['name','size'],
"rows" : [['Pif','big'], ['Milou','small']]
}
pymy.send_to_redshift({INSTANCE},data)
Conclusion
So, today in this article, we have seen how we can use PyMy library, which provides us the programming interface to interact with redshift. We have seen various functions in the library which provide us the functionality to do multiple operations like creating the table and sending data to the schema created.
I hope this article has helped you. Thank You.