When MySQL is Slow: Replacing MySQL with PrestoDB 0.246
email: stephen <at> gigashock *dot* com
Intro
I found it fairly straightforward to move data from MySQL to PrestoDB. PrestoDB is excellent for read-only queries with large datasets. PrestoDB differs from Starburst Presto. If you're using Starburst Presto, see this post.
Why in the world do I need PrestoDB?
The truth is: I probably don’t need to use a query engine this powerful. PrestoDB is for clients that have terabytes of data with billions of rows distributed across hundreds of servers.
This project is quite small in comparison; the database I'm using is only ~11 gigabytes with ~18 million rows and is stored on one server.
The dateset is, however, large enough to make my out-of-the-box MySQL database choke under the pressure of a "LIKE '%value%'" query. Running these queries in MySQL takes about 3x longer compared to PrestoDB and fully consumes the capacity of my server's hardware in the meantime.
PrestoDB quickly handles the same "LIKE '%value%'" query out-of-the-box and with ease, on the same hardware I am using with MySQL.
Tutorial
Minio Python3 s3cmd-2.0.1 openjdk-11-jre-headless
$ sudo apt update
Install Python3:
$ sudo apt install python3
$ sudo apt-get install python-dateutil
$ sudo ln -s /usr/bin/python3 /usr/bin/python
I'm using vim as a text editor:
$ sudo apt-get install vim
Install screen to make things easier:
$ sudo apt-get install screen
Install Java:
$ sudo apt install openjdk-11-jre-headless
$ mkdir ~/Desktop/source
$ cd ~/Desktop/source
Download and Install Minio
$ wget https://dl.min.io/server/minio/release/linux-amd64/archive/minio_0.0.20210116021944_amd64.deb
$ sudo dpkg -i minio_0.0.20210116021944_amd64.deb
$ sudo mkdir /mnt/data
$ sudo chown -R user /mnt/data && sudo chmod u+rxw /mnt/data
$ vim ~/.bashrc
Add the following lines:
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin
$ source ~/.bashrc
$ screen
Press ENTER
$ minio server /mnt/data
Detach your screen (ctrl + a + d)
$ wget https://sourceforge.net/projects/s3tools/files/s3cmd/2.0.1/s3cmd-2.0.1.tar.gz
$ tar xzf s3cmd-2.0.1.tar.gz
$ touch ~/.s3cfg
$ vim ~/.s3cfg
# Setup endpoint host_base = localhost:9000 host_bucket = localhost:9000 bucket_location = us-east-1 use_https = False # access keys access_key = minioadmin secret_key = minioadmin # enable s3 signature apis signature_v2 = False
$ cd s3cmd-2.0.1
$ ./s3cmd mb s3://data
$ ./s3cmd ls s3://
$ cd ~/Desktop/source/
$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.246/presto-server-0.246.tar.gz
$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar
$ sudo chmod +x presto-cli-0.246-executable.jar
$ tar xzf presto-server-0.246.tar.gz
$ mv presto-server-0.246 presto
$ cd presto
$ mkdir etc
$ cd etc
$ mkdir catalog
$ touch config.properties
$ vim config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=3GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
$ touch jvm.config
$ vim jvm.config
-server
-Xmx5G
-XX:+UseG1GC
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-Djdk.attach.allowAttachSelf=true
$ touch node.properties
$ vim node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/mnt/data
$ cd catalog
$ touch minio.properties
$ vim minio.properties
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://data/
hive.s3.aws-access-key=minioadmin
hive.s3.aws-secret-key=minioadmin
hive.s3.endpoint=http://localhost:9000
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.allow-drop-table=true
Use your MySQL credentials here.
$ touch mysql.properties
$ vim mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://127.0.0.1/?useSSL=false&allowPublicKeyRetrieval=true
connection-user=stephen
connection-password=pa55w0rd!awaeqewqeqweqwAAA
$ screen
$ ~/Desktop/source/presto/bin/launcher run
$ cd ~/Desktop/source
$ ./presto-cli-0.246-executable.jar
presto> create schema minio.books;
presto> use minio.books;
Import your MySQL table into Presto:
presto:books> create table minio.books.list as select * from mysql.books.book;
presto:books> select count(*) from list;
Benchmarks
PrestoDB is 2.5x faster than MySQL in this test.
presto> select count(*) from book where BINARY title like '%Red%';
presto> select count(*) from list where title like '%Red%';
Possible Errors
presto> create table minio.books.author as select * from mysql.books.author;
Query 20210204_070906_00000_7d4wf failed: Unsupported Hive type: varchar(16777215). Supported VARCHAR types: VARCHAR(<=65535), VARCHAR.
Error:
2021-02-03T01:48:57.973+0900 ERROR main com.facebook.presto.server.PrestoServer No factory for connector mysql java.lang.IllegalArgumentException: No factory for connector mysql at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
Exception in thread "main" java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object)'
$ mv $HIVE_HOME/lib/guava-19.0.jar $HIVE_HOME
/lib/guava-19.0.jar.bak
$ cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/