Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- controlNet
- 과제형 코딩테스트
- 포스코 코딩테스트
- classifier-free guidance
- Image Generation
- diffusion models
- posco 채용
- 코딩테스트
- stable diffusion
- 프로그래머스
- 포스코 채용
- ddim
- KT
- kt인적성
- manganinja
- ip-adapter
- 논문 리뷰
- colorization
- DDPM
- dp
- Generative Models
Archives
- Today
- Total
Paul's Grit
[SQL] 크롤링 데이터를 DB에 저장하기 본문
Chicago sandwich 랭킹 정보를 크롤링하고 DB에 저장¶
- 랭킹
- 메뉴
- 가게 이름
- 가격
- 가게 주소
In [39]:
from bs4 import BeautifulSoup
from urllib.request import urlopen, Request
from user_agent import generate_user_agent
import re
from urllib.parse import urljoin
import mysql.connector
import pandas as pd
크롤링 Code¶
In [ ]:
url = 'https://www.chicagomag.com/chicago-magazine/november-2012/best-sandwiches-chicago/'
req = Request(url=url, headers={'User-Agent':generate_user_agent()})
page = urlopen(req)
soup = BeautifulSoup(page, 'html.parser')
sammy = soup.find_all('div', class_='sammy')
for idx, each_menu in enumerate(sammy):
rank = each_menu.find(class_='sammyRank').string
menu, cafe = each_menu.find(class_='sammyListing').get_text().split('\n')[:2]
cafe_url = each_menu.find(class_='sammyListing').find('a')['href']
cafe_url = urljoin('https://www.chicagomag.com', cafe_url)
req = Request(url=cafe_url, headers={'User-Agent':generate_user_agent()})
page = urlopen(req)
soup = BeautifulSoup(page, 'html.parser')
text = soup.find('p', 'addy').get_text()
text = re.split('.,', text)[0]
price = re.search('\$\d+.(\d+)?', text).group()
address = text[len(price)+1:]
print(rank, menu, cafe, price, address)
DB 연결¶
In [85]:
conn = mysql.connector.connect(
host = "database-1.cccbm3nnjzrq.ap-northeast-2.rds.amazonaws.com",
port = 3306,
user = "robot",
password = "1234",
database = "amrbase"
)
In [ ]:
cursor = conn.cursor()
cursor.execute('CREATE TABLE sandwich (ranking int, cafe varchar(32), menu varchar(64), price float, address varchar(64), primary key (ranking))')
크롤링 & INSERT¶
In [ ]:
url = 'https://www.chicagomag.com/chicago-magazine/november-2012/best-sandwiches-chicago/'
req = Request(url=url, headers={'User-Agent':generate_user_agent()})
page = urlopen(req)
soup = BeautifulSoup(page, 'html.parser')
sammy = soup.find_all('div', class_='sammy')
for idx, each_menu in enumerate(sammy):
rank = each_menu.find(class_='sammyRank').string
menu, cafe = each_menu.find(class_='sammyListing').get_text().split('\n')[:2]
cafe_url = each_menu.find(class_='sammyListing').find('a')['href']
cafe_url = urljoin('https://www.chicagomag.com', cafe_url)
req = Request(url=cafe_url, headers={'User-Agent':generate_user_agent()})
page = urlopen(req)
soup = BeautifulSoup(page, 'html.parser')
text = soup.find('p', 'addy').get_text()
text = re.split('.,', text)[0]
price = re.search('\$\d+.(\d+)?', text).group()
address = text[len(price)+1:]
price = float(price[1:])
sql_insert = 'INSERT INTO sandwich VALUES (%s, %s, %s, %s, %s)'
cursor.execute(sql_insert, (rank, cafe, menu, price, address))
print(rank, cafe, menu, price, address)
conn.commit()
결과 확인¶
In [ ]:
cursor.execute('SELECT * FROM sandwich')
result = cursor.fetchall()
for row in result:
print(row)
INSERT한 뒤, UPDATE로 수정해보기¶
- UPDATE qeury
sql_update = 'update sandwich2 set price=%s, address=%s where ranking=%s'
cursor.execute(sql_update, (price, address, rank))
- %s
- string format과 다름
- 순서대로 데이터 값이 들어감
In [81]:
cursor = conn.cursor()
cursor.execute('CREATE TABLE sandwich2 (ranking int, cafe varchar(32), menu varchar(64), price float, address varchar(64), primary key (ranking))')
In [ ]:
url = 'https://www.chicagomag.com/chicago-magazine/november-2012/best-sandwiches-chicago/'
req = Request(url=url, headers={'User-Agent':generate_user_agent()})
page = urlopen(req)
soup = BeautifulSoup(page, 'html.parser')
sammy = soup.find_all('div', class_='sammy')
for idx, each_menu in enumerate(sammy):
rank = each_menu.find(class_='sammyRank').string
menu, cafe = each_menu.find(class_='sammyListing').get_text().split('\n')[:2]
sql_insert = 'INSERT INTO sandwich2 (ranking, cafe, menu) VALUES (%s, %s, %s)'
cursor.execute(sql_insert, (int(rank), cafe, menu))
cafe_url = each_menu.find(class_='sammyListing').find('a')['href']
cafe_url = urljoin('https://www.chicagomag.com', cafe_url)
req = Request(url=cafe_url, headers={'User-Agent':generate_user_agent()})
page = urlopen(req)
soup = BeautifulSoup(page, 'html.parser')
text = soup.find('p', 'addy').get_text()
text = re.split('.,', text)[0]
price = re.search('\$\d+.(\d+)?', text).group()
address = text[len(price)+1:]
price = float(price[1:])
sql_update = 'update sandwich2 set price=%s, address=%s where ranking=%s'
cursor.execute(sql_update, (price, address, rank))
print(rank, cafe, menu, price, address)
conn.commit()
In [ ]:
cursor.execute('SELECT * FROM sandwich2')
result = cursor.fetchall()
for row in result:
print(row)
In [101]:
conn.close()
'Data Analysis > SQL' 카테고리의 다른 글
[SQL] Aggregate Functions : COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING (0) | 2023.08.31 |
---|---|
[SQL] PRIMARY KEY, FOREIGN KEY (0) | 2023.08.31 |
[SQL] Python with MySQL (0) | 2023.08.30 |
[SQL] SQL Backup & Restore (0) | 2023.08.28 |
[SQL] SQL: CONCAT, ALIAS, DISTINCT, LIMIT (0) | 2023.08.28 |