When MySQL is Slow: Replacing MySQL with PrestoDB 0.246
email: stephen <at> gigashock *dot* com
Intro
I found it fairly straightforward to move data from MySQL to PrestoDB. PrestoDB is excellent for read-only queries with large datasets. PrestoDB differs from Starburst Presto. If you're using Starburst Presto, see this post.
Why in the world do I need PrestoDB?
The truth is: I probably don’t need to use a query engine this powerful. PrestoDB is for clients that have terabytes of data with billions of rows distributed across hundreds of servers.
This project is quite small in comparison; the database I'm using is only ~11 gigabytes with ~18 million rows and is stored on one server.
The dateset is, however, large enough to make my out-of-the-box MySQL database choke under the pressure of a "LIKE '%value%'" query. Running these queries in MySQL takes about 3x longer compared to PrestoDB and fully consumes the capacity of my server's hardware in the meantime.
PrestoDB quickly handles the same "LIKE '%value%'" query out-of-the-box and with ease, on the same hardware I am using with MySQL.
Tutorial
Minio Python3 s3cmd-2.0.1 openjdk-11-jre-headless
$ sudo apt update Install Python3:
$ sudo apt install python3$ sudo apt-get install python-dateutil$ sudo ln -s /usr/bin/python3 /usr/bin/python
I'm using vim as a text editor:
$ sudo apt-get install vim
Install screen to make things easier:
$ sudo apt-get install screenInstall Java:
$ sudo apt install openjdk-11-jre-headless$ mkdir ~/Desktop/source$ cd ~/Desktop/sourceDownload and Install Minio
$ wget https://dl.min.io/server/minio/release/linux-amd64/archive/minio_0.0.20210116021944_amd64.deb$ sudo dpkg -i minio_0.0.20210116021944_amd64.deb$ sudo mkdir /mnt/data$ sudo chown -R user /mnt/data && sudo chmod u+rxw /mnt/data$ vim ~/.bashrcAdd the following lines:
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin$ source ~/.bashrc$ screenPress ENTER
$ minio server /mnt/dataDetach your screen (ctrl + a + d)
$ wget https://sourceforge.net/projects/s3tools/files/s3cmd/2.0.1/s3cmd-2.0.1.tar.gz$ tar xzf s3cmd-2.0.1.tar.gz$ touch ~/.s3cfg
$ vim ~/.s3cfg
# Setup endpoint host_base = localhost:9000 host_bucket = localhost:9000 bucket_location = us-east-1 use_https = False # access keys access_key = minioadmin secret_key = minioadmin # enable s3 signature apis signature_v2 = False
$ cd s3cmd-2.0.1
$ ./s3cmd mb s3://data
$ ./s3cmd ls s3://
$ cd ~/Desktop/source/$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.246/presto-server-0.246.tar.gz$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar$ sudo chmod +x presto-cli-0.246-executable.jar$ tar xzf presto-server-0.246.tar.gz$ mv presto-server-0.246 presto$ cd presto $ mkdir etc$ cd etc$ mkdir catalog$ touch config.properties$ vim config.propertiescoordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=3GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080$ touch jvm.config$ vim jvm.config-server
-Xmx5G
-XX:+UseG1GC
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-Djdk.attach.allowAttachSelf=true$ touch node.properties$ vim node.propertiesnode.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/mnt/data$ cd catalog$ touch minio.properties$ vim minio.propertiesconnector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://data/
hive.s3.aws-access-key=minioadmin
hive.s3.aws-secret-key=minioadmin
hive.s3.endpoint=http://localhost:9000
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.allow-drop-table=trueUse your MySQL credentials here.
$ touch mysql.properties$ vim mysql.propertiesconnector.name=mysql
connection-url=jdbc:mysql://127.0.0.1/?useSSL=false&allowPublicKeyRetrieval=true
connection-user=stephen
connection-password=pa55w0rd!awaeqewqeqweqwAAA$ screen$ ~/Desktop/source/presto/bin/launcher run$ cd ~/Desktop/source$ ./presto-cli-0.246-executable.jarpresto> create schema minio.books;presto> use minio.books;Import your MySQL table into Presto:
presto:books> create table minio.books.list as select * from mysql.books.book;presto:books> select count(*) from list;
Benchmarks
PrestoDB is 2.5x faster than MySQL in this test.
presto> select count(*) from book where BINARY title like '%Red%';presto> select count(*) from list where title like '%Red%';
Possible Errors
presto> create table minio.books.author as select * from mysql.books.author;
Query 20210204_070906_00000_7d4wf failed: Unsupported Hive type: varchar(16777215). Supported VARCHAR types: VARCHAR(<=65535), VARCHAR.Error:
2021-02-03T01:48:57.973+0900 ERROR main com.facebook.presto.server.PrestoServer No factory for connector mysql java.lang.IllegalArgumentException: No factory for connector mysql at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)Exception in thread "main" java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object)'
$ mv $HIVE_HOME/lib/guava-19.0.jar $HIVE_HOME
/lib/guava-19.0.jar.bak$ cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/