折腾:
【记录】Sequel中新建mysql数据库并新建表和相应字段
期间,在写pyspider的resultworker之前,先要去搞清楚:
python中如何操作mysql
也参考看了些帖子:
Pyspider把抓取的结果存入mysql数据库 | 拈花古佛
Pyspider实例之抓取数据并保存到MySQL数据库 – CSDN博客
漫谈 Pyspider 网络爬虫的实践 – 工具资源 – 掘金
实现Pyspider爬虫结果的自定义ResultWorker – 简书
【MySQL.Connector】Python下读取数据库信息 – 程序园
但是写法不一,还是去找最通用的,最好用的库和写法
【已解决】Python3中选择合适的mysql的连接器或驱动
先去安装PyMySQL:
<code>➜ AutocarData which python /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/bin/python ➜ AutocarData pipenv install pymysql Installing pymysql… Looking in indexes: https://pypi.python.org/simple Collecting pymysql Using cached https://files.pythonhosted.org/packages/e5/07/c0f249aa0b7b0517b5843eeab689b9ccc6a6bb0536fc9d95e65901e6f2ac/PyMySQL-0.8.0-py2.py3-none-any.whl Installing collected packages: pymysql Successfully installed pymysql-0.8.0 Adding pymysql to Pipfile's [packages]… Pipfile.lock (625834) out of date, updating to (b95d9c)… Locking [dev-packages] dependencies… Locking [packages] dependencies… Updated Pipfile.lock (b95d9c)! Installing dependencies from Pipfile.lock (b95d9c)… 🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 1/1 — 00:00:00 ➜ AutocarData pipenv graph PyMySQL==0.8.0 pyspider==0.3.10 - chardet [required: >=2.2, installed: 3.0.4] - click [required: >=3.3, installed: 6.7] - cssselect [required: >=0.9, installed: 1.0.3] - Flask [required: >=0.10, installed: 0.12.2] - click [required: >=2.0, installed: 6.7] - itsdangerous [required: >=0.21, installed: 0.24] - Jinja2 [required: >=2.4, installed: 2.10] - MarkupSafe [required: >=0.23, installed: 1.0] - Werkzeug [required: >=0.7, installed: 0.14.1] - Flask-Login [required: >=0.2.11, installed: 0.4.1] - Flask [required: Any, installed: 0.12.2] - click [required: >=2.0, installed: 6.7] - itsdangerous [required: >=0.21, installed: 0.24] - Jinja2 [required: >=2.4, installed: 2.10] - MarkupSafe [required: >=0.23, installed: 1.0] - Werkzeug [required: >=0.7, installed: 0.14.1] - Jinja2 [required: >=2.7, installed: 2.10] - MarkupSafe [required: >=0.23, installed: 1.0] - lxml [required: Any, installed: 4.2.1] - pycurl [required: Any, installed: 7.43.0.1] - pyquery [required: Any, installed: 1.4.0] - cssselect [required: >0.7.9, installed: 1.0.3] - lxml [required: >=2.1, installed: 4.2.1] - requests [required: >=2.2, installed: 2.18.4] - certifi [required: >=2017.4.17, installed: 2018.4.16] - chardet [required: >=3.0.2,<3.1.0, installed: 3.0.4] - idna [required: <2.7,>=2.5, installed: 2.6] - urllib3 [required: >=1.21.1,<1.23, installed: 1.22] - six [required: >=1.5.0, installed: 1.11.0] - tblib [required: >=1.3.0, installed: 1.3.2] - tornado [required: >=3.2,<=4.5.3, installed: 4.5.3] - u-msgpack-python [required: >=1.6, installed: 2.5.0] - wsgidav [required: >=2.0.0, installed: 2.3.0] - defusedxml [required: Any, installed: 0.5.0] </code>
然后去写代码测试效果:
期间,关于:
http://pymysql.readthedocs.io/en/latest/modules/connections.html
中的charset的是utf-8 还是utf8,需要去搞清楚:
【已解决】mysql中支持的字符编码字符集以及utf8的写法
然后继续写代码,想要给connect加上catch error,但是不知道是哪些error:
【已解决】pymysql中connect异常时有哪些错误类型
然后继续写代码。
然后继续去试试删除table,新建table(如果不存在的话)
而对于创建table的sql语法如何写,可以参考Sequel中的:

<code>CREATE TABLE `autohome_car_info` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `cityDealerPrice` int(11) unsigned NOT NULL DEFAULT '0', `msrpPrice` int(11) unsigned NOT NULL DEFAULT '0', `mainBrand` char(20) NOT NULL DEFAULT '', `subBrand` varchar(20) NOT NULL DEFAULT '', `brandSerie` varchar(20) NOT NULL DEFAULT '', `brandSerieId` varchar(15) NOT NULL DEFAULT '', `model` varchar(50) NOT NULL DEFAULT '', `modelId` varchar(15) NOT NULL DEFAULT '', `modelStatus` char(5) NOT NULL DEFAULT '', `url` varchar(200) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; </code>
当然测试期间,可以去改个table 的名字,避免把之前的冲突了
参考:

去搜搜:
IF NOT EXISTS
mysql IF NOT EXISTS
MySQL: Insert record if not exists in table – Stack Overflow
MySQL :: MySQL 5.5 Reference Manual :: 17.4.1.6 Replication of CREATE … IF NOT EXISTS Statements
MYSQL IF NOT EXISTS AND IF EXISTS – CSDN博客
mysql CREATE table IF NOT EXISTS
MySQL :: MySQL 8.0 Reference Manual :: 13.1.18 CREATE TABLE Syntax
MySQL对CREATE TABLE IF NOT EXISTS SELECT的处理 – CSDN博客
然后创建数据库的代码:
<code>#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Author: Crifan Li
# Project: autohomeBrandData
# Function: implement custom result worker for autohome car data
import pymysql
import pymysql.cursors
class MysqlDb:
config = {
'host': '127.0.0.1',
'port': 3306,
'user': 'root',
'password': 'crifan_mysql',
'database': 'AutohomeResultdb',
'charset': "utf8"
}
connection = None
isConnected = False
def __init__(self):
"""init mysql"""
if not self.isConnected:
self.isConnected = self.connect()
print("Connect mysql return", self.isConnected)
def connect(self):
try:
self.connection = pymysql.connect(**self.config, cursorclass=pymysql.cursors.DictCursor)
print("connect mysql ok, self.connection=", self.connection)
return True
except pymysql.Error as err:
print("Connect mysql with config=", self.config, " error=", err)
return False
def createTable(self, newTablename):
if self.connection is None:
print("Please connect mysql first before create table")
return False
cursor = self.connection.cursor()
print("cursor=", cursor)
createTableSql = """CREATE TABLE `%s` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`cityDealerPrice` int(11) unsigned NOT NULL DEFAULT '0',
`msrpPrice` int(11) unsigned NOT NULL DEFAULT '0',
`mainBrand` char(20) NOT NULL DEFAULT '',
`subBrand` varchar(20) NOT NULL DEFAULT '',
`brandSerie` varchar(20) NOT NULL DEFAULT '',
`brandSerieId` varchar(15) NOT NULL DEFAULT '',
`model` varchar(50) NOT NULL DEFAULT '',
`modelId` varchar(15) NOT NULL DEFAULT '',
`modelStatus` char(5) NOT NULL DEFAULT '',
`url` varchar(200) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;""" % (newTablename)
print("createTableSql=", createTableSql)
try:
cursor.execute(createTableSql)
self.connection.commit()
return True
except pymysql.Error as err:
print("Create table using sql=", createTableSql, "error=", err)
return False
def test():
"""test mysql"""
mysqlObj = MysqlDb()
print("mysqlObj=", mysqlObj)
# testTablename = "autohome_car_info"
testTablename = "tbl_car_info_test"
createTableOk = mysqlObj.createTable(testTablename)
print("createTable", testTablename, "return", createTableOk)
if __name__ == '__main__':
test()
</code>运行正常:

去看看,果然创建了数据库了:

接着去drop table
调试期间,用sql:
<code>CREATE TABLE `%s` IF NOT EXISTS ( </code>
结果出错:
<code>error= (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IF NOT EXISTS (\n `id` int(11) unsigned NOT NULL AUTO_INCREMENT,\n `cityDealerPr' at line 1") </code>
发现是自己写错位置了,换成:
<code>CREATE TABLE IF NOT EXISTS `%s` ( </code>
就可以了。
不过对于,去创建已经存在的table的话,会出现warning:
<code>/Users/crifan/.virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pymysql/cursors.py:322: Warning: (1050, "Table 'tbl_car_info_test' already exists") self._do_get_result() </code>
不过去加了:
<code>except pymysql.Warning as warn:
print("Create table using sql=", createTableSql, "Warning=", warn)
return True
</code>却没有执行到。

所以去掉吧。
然后就可以正常的继续去drop掉table了:

然后用代码:
<code>
# def insert(self, **valueDict):
def insert(self, tablename, valueDict):
"""
inset dict value into mysql table
makesure the value is dict, and its keys is the key in the table
"""
if self.connection is None:
print("Please connect mysql first before insert value into table")
return False
cursor = self.connection.cursor()
print("cursor=", cursor)
dictKeyList = valueDict.keys()
dictValueList = valueDict.values()
print("dictKeyList=", dictKeyList, "dictValueList=", dictValueList)
keyListSql = ", ".join(self.quoteIdentifier(eachKey) for eachKey in dictKeyList)
print("keyListSql=", keyListSql)
# valueListSql = ", ".join(eachValue for eachValue in dictValueList)
valueListSql = ""
formattedDictValueList = []
for eachValue in dictValueList:
print("eachValue=", eachValue)
eachValueInSql = ""
valueType = type(eachValue)
print("valueType=", valueType)
if valueType is str:
eachValueInSql = '"%s"' % eachValue
elif valueType is int:
eachValueInSql = '%d' % eachValue
# TODO: add more type formatting if necessary
print("eachValueInSql=", eachValueInSql)
formattedDictValueList.append(eachValueInSql)
valueListSql = ", ".join(eachValue for eachValue in formattedDictValueList)
print("valueListSql=", valueListSql)
# insertSql = """INSERT INTO %s (id, url, title, type, thumb, count, temperature, images, tags, post_time) values (%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
insertSql = """INSERT INTO %s (%s) VALUES (%s)""" % (tablename, keyListSql, valueListSql)
print("insertSql=", insertSql)
try:
cursor.execute(insertSql)
self.connection.commit()
return True
except pymysql.Error as err:
print("Insert into table using sql=", insertSql, "error=", err)
return False
</code># 4. test insert value dict
valueDict = {
“url”: “https://www.autohome.com.cn/spec/5872/#pvareaid=2042128”, #车型url
“mainBrand”: “宝马”, #品牌
“subBrand”: “华晨宝马”, #子品牌
“brandSerie”: “宝马3系”, #车系
“brandSerieId”: “66”, #车系ID
“model”: “2010款 320i 豪华型”, #车型
“modelId”: “5872”, #车型ID
“modelStatus”: “停售”, #车型状态
“cityDealerPrice”: 325000, #经销商参考价
“msrpPrice”: 375000 # 厂商指导价
}
print(“valueDict=”, valueDict)
insertOk = mysqlObj.insert(testTablename, valueDict)
print(“insertOk=”, insertOk)
可以正常的插入值:

【总结】
最后用代码:
<code>#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Project: autohomeBrandData
# Function: implement custom result worker for autohome car data
# Author: Crifan Li
# Date: 20180512
# Note:
# If you want to modify to your mysql and table, you need:
# (1) change change MysqlDb config to your mysql config
# (2) change CurrentTableName to your table name
# (3) change CreateTableSqlTemplate to your sql to create new mysql table fields
# (4) before use this ResultWorker, run py file to execute testMysqlDb, to init db and create table
# (5) if your table field contain more type, edit insert to add more type for "TODO: add more type formatting if necessary"
import pymysql
import pymysql.cursors
CurrentTableName = "tbl_autohome_car_info"
CreateTableSqlTemplate = """CREATE TABLE IF NOT EXISTS `%s` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '自增,主键',
`cityDealerPrice` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '经销商参考价',
`msrpPrice` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '厂商指导价',
`mainBrand` char(20) NOT NULL DEFAULT '' COMMENT '品牌',
`subBrand` varchar(20) NOT NULL DEFAULT '' COMMENT '子品牌',
`brandSerie` varchar(20) NOT NULL DEFAULT '' COMMENT '车系',
`brandSerieId` varchar(15) NOT NULL DEFAULT '' COMMENT '车系ID',
`model` varchar(50) NOT NULL DEFAULT '' COMMENT '车型',
`modelId` varchar(15) NOT NULL DEFAULT '' COMMENT '车型ID',
`modelStatus` char(5) NOT NULL DEFAULT '' COMMENT '车型状态',
`url` varchar(200) NOT NULL DEFAULT '' COMMENT '车型url',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;"""
class MysqlDb:
config = {
'host': '127.0.0.1',
'port': 3306,
'user': 'root',
'password': 'crifan_mysql',
'database': 'AutohomeResultdb',
'charset': "utf8"
}
defaultTableName = CurrentTableName
connection = None
def __init__(self):
"""init mysql"""
# 1. connect db first
if self.connection is None:
isConnected = self.connect()
print("Connect mysql return %s" % isConnected)
# 2. create table for db
createTableOk = self.createTable(self.defaultTableName)
print("Create table %s return %s" %(self.defaultTableName, createTableOk))
def connect(self):
try:
self.connection = pymysql.connect(**self.config, cursorclass=pymysql.cursors.DictCursor)
print("connect mysql ok, self.connection=", self.connection)
return True
except pymysql.Error as err:
print("Connect mysql with config=", self.config, " error=", err)
return False
def quoteIdentifier(self, identifier):
"""
for mysql, it better to quote identifier xxx using backticks to `xxx`
in case, identifier:
contain special char, such as space
or same with system reserved words, like select
"""
quotedIdentifier = "`%s`" % identifier
# print("quotedIdentifier=", quotedIdentifier)
return quotedIdentifier
def executeSql(self, sqlStr, actionDescription=""):
print("executeSql: sqlStr=%s, actionDescription=%s" % (sqlStr, actionDescription))
if self.connection is None:
print("Please connect mysql first before %s" % actionDescription)
return False
cursor = self.connection.cursor()
print("cursor=", cursor)
try:
cursor.execute(sqlStr)
self.connection.commit()
return True
except pymysql.Error as err:
print("Execute sql %s occur error %s for %s" % (sqlStr, err, actionDescription))
return False
def createTable(self, newTablename):
print("createTable: newTablename=", newTablename)
createTableSql = CreateTableSqlTemplate % (newTablename)
print("createTableSql=", createTableSql)
return self.executeSql(sqlStr=createTableSql, actionDescription=("Create table %s" % newTablename))
def dropTable(self, existedTablename):
print("dropTable: existedTablename=", existedTablename)
dropTableSql = "DROP TABLE IF EXISTS %s" % (existedTablename)
print("dropTableSql=", dropTableSql)
return self.executeSql(sqlStr=dropTableSql, actionDescription=("Drop table %s" % existedTablename))
# def insert(self, **valueDict):
def insert(self, valueDict, tablename=defaultTableName):
"""
inset dict value into mysql table
makesure the value is dict, and its keys is the key in the table
"""
print("insert: valueDict=%s, tablename=%s" % (valueDict, tablename))
dictKeyList = valueDict.keys()
dictValueList = valueDict.values()
print("dictKeyList=", dictKeyList, "dictValueList=", dictValueList)
keyListSql = ", ".join(self.quoteIdentifier(eachKey) for eachKey in dictKeyList)
print("keyListSql=", keyListSql)
# valueListSql = ", ".join(eachValue for eachValue in dictValueList)
valueListSql = ""
formattedDictValueList = []
for eachValue in dictValueList:
# print("eachValue=", eachValue)
eachValueInSql = ""
valueType = type(eachValue)
# print("valueType=", valueType)
if valueType is str:
eachValueInSql = '"%s"' % eachValue
elif valueType is int:
eachValueInSql = '%d' % eachValue
# TODO: add more type formatting if necessary
print("eachValueInSql=", eachValueInSql)
formattedDictValueList.append(eachValueInSql)
valueListSql = ", ".join(eachValue for eachValue in formattedDictValueList)
print("valueListSql=", valueListSql)
insertSql = """INSERT INTO %s (%s) VALUES (%s)""" % (tablename, keyListSql, valueListSql)
print("insertSql=", insertSql)
# INSERT INTO tbl_car_info_test (`url`, `mainBrand`, `subBrand`, `brandSerie`, `brandSerieId`, `model`, `modelId`, `modelStatus`, `cityDealerPrice`, `msrpPrice`) VALUES ("https://www.autohome.com.cn/spec/5872/#pvareaid=2042128", "宝马", "华晨宝马", "宝马3系", "66", "2010款 320i 豪华型", "5872", "停售", 325000, 375000)
return self.executeSql(sqlStr=insertSql, actionDescription=("Insert value to table %s" % tablename))
def delete(self, modelId, tablename=defaultTableName):
"""
delete item from car model id for existing table of autohome car info
"""
print("delete: modelId=%s, tablename=%s" % (modelId, tablename))
deleteSql = """DELETE FROM %s WHERE modelId = %s""" % (tablename, modelId)
print("deleteSql=", deleteSql)
return self.executeSql(sqlStr=deleteSql, actionDescription=("Delete value from table %s by model id %s" % (tablename, modelId)))
def testMysqlDb():
"""test mysql"""
testDropTable = True
testCreateTable = True
testInsertValue = True
testDeleteValue = True
# 1.test connect mysql
mysqlObj = MysqlDb()
print("mysqlObj=", mysqlObj)
# testTablename = "autohome_car_info"
# testTablename = "tbl_car_info_test"
testTablename = CurrentTableName
print("testTablename=", testTablename)
if testDropTable:
# 2. test drop table
dropTableOk = mysqlObj.dropTable(testTablename)
print("dropTable", testTablename, "return", dropTableOk)
if testCreateTable:
# 3. test create table
createTableOk = mysqlObj.createTable(testTablename)
print("createTable", testTablename, "return", createTableOk)
if testInsertValue:
# 4. test insert value dict
valueDict = {
"url": "https://www.autohome.com.cn/spec/5872/#pvareaid=2042128", #车型url
"mainBrand": "宝马", #品牌
"subBrand": "华晨宝马", #子品牌
"brandSerie": "宝马3系", #车系
"brandSerieId": "66", #车系ID
"model": "2010款 320i 豪华型", #车型
"modelId": "5872", #车型ID
"modelStatus": "停售", #车型状态
"cityDealerPrice": 325000, #经销商参考价
"msrpPrice": 375000 # 厂商指导价
}
print("valueDict=", valueDict)
insertOk = mysqlObj.insert(valueDict=valueDict, tablename=testTablename)
print("insertOk=", insertOk)
if testDeleteValue:
toDeleteModelId = "5872"
deleteOk = mysqlObj.delete(modelId=toDeleteModelId, tablename=testTablename)
print("deleteOk=", deleteOk)
if __name__ == '__main__':
testMysqlDb()
</code>可以实现在Python中去操作mysql了。
转载请注明:在路上 » 【已解决】Python中如何操作mysql