[关闭]
@shaobaobaoer 2018-02-10T03:45:42.000000Z 字数 3717 阅读 845

PYTHON 爬虫 ———— Tarot Reader

塔罗 tarot 爬虫


0x00 Abstract

Why not write a python worm to practice what I have learned these days?
Yesterday night , I said to myself.
I should write a python worm which contains the funciton of :

           OK Let's Do it !

0x01 Which Website

So, which website should we choose?
When u are searching "tarot" on Google , what u find ? Yeah! The first web always is free-tarot-reading.net, which, I think, attract lots of famous tarot-scholer.
The online free tarot reading is very attractive, possibly. Ok, click the first column:

Universal 6 Card Spread by LT ZEN Mode /switch Animations

Oh~~ Did u find some secret in url?
https://www.free-tarot-reading.net/readings/135433044
View the detail at the end of url ———————— Path Leak (maybe it is not an term)
How about using other number at the end of URL?
The article num is random , but it still has some rules. I'm not the admin of that website, but we can strong our worm to select the useful information.

0x02 SQL Construction

OK, our target is definite:
the Universal 6 Card Spread
- Using 21 Big-Arcana
- 1 card has respective meaning in 6 location

So , we create a database means tarot. and set 21 tables, containing card location and its meaning.
big_arcana_dict is a dictionary . Its key is card number and its value is card name.U can do this

  1. def database_create():
  2. db = pymysql.connect("localhost", "root", "", "tarot")
  3. cursor = db.cursor()
  4. for i in range(0, 22):
  5. string = "CREATE TABLE %s ( id int(5) NOT NULL , text varchar(1000))" % big_arcana_dict[i]
  6. cursor.execute(string)
  7. cursor.close()
  8. db.close()

0x03 Start the worm

As a fan of webdriver, I select it as the engine of my worm.

  1. def wdriver(a):
  2. x = webdriver.Firefox()
  3. i = 0
  4. startnum = 135432416
  5. url = "https://www.free-tarot-reading.net/readings/%s" % (startnum + a)
  6. for i in range(0, 6):
  7. try:
  8. x.get(url)
  9. print("[+] Get payload %s" % (startnum + a), time.ctime())
  10. content_1 = (x.find_element_by_xpath(xpath_list[i]).text)
  11. print (content_1)
  12. except:
  13. errorhander("wdriver error")
  14. time.sleep(5)
  15. x.close()

0x04 Handling the Content

To confirm information available, we should check it.
What we should do is listed

Retain \' and \n make it convenient for sql operation
Check the head of content if it is not the card-reading for Universal 6 Card Spread
Recheck the card name if it is not the card in big arcana , the information is still useless

  1. # stringmaker(content_1)
  2. def stringmaker(string):
  3. string = string.split("\n")
  4. i = 0
  5. id = 0
  6. card = ""
  7. text = ""
  8. if "Card" in string[0] and ord(string[0][5]) > 48 and ord(string[0][5]) < 57:
  9. id = int(string[0][5])
  10. # this error hander still has some problem
  11. else:
  12. errorhander("no id")
  13. for i in range(0, 22):
  14. if big_arcana_dict[i] in string[1]:
  15. card = big_arcana_dict[i]
  16. break
  17. for i in range(2, len(string)):
  18. text += string[i].replace("'", "\\'") + '\\n'
  19. # retain \' and \n make it convenient for sql operation
  20. if text == "":
  21. errorhander("no text")
  22. database_write(id, card, text)

0x05 Writing Into SQL

We should check whether infromation is repetitive or not.
Using pymysql and find it.

  1. def database_write(id, tables_name, text):
  2. if database_avoid_cycle(tables_name, id) == True:
  3. db = pymysql.connect("localhost", "root", "", "tarot")
  4. cursor = db.cursor()
  5. sql = "INSERT INTO %s VALUES (%s, '%s')" % (tables_name, id, text)
  6. try:
  7. cursor.execute(sql)
  8. db.commit()
  9. print("[+] Success to upload data with ", id, tables_name, time.ctime())
  10. except:
  11. db.rollback()
  12. print("Error")
  13. db.close()
  14. else:
  15. pass
  16. def database_avoid_cycle(table, id):
  17. db = pymysql.connect("localhost", "root", "", "tarot")
  18. cursor = db.cursor()
  19. sql = "SELECT * from %s WHERE id=%s LIMIT 0,1" % (table, id)
  20. try:
  21. cursor.execute(sql)
  22. results = cursor.fetchall()
  23. if results == ():
  24. db.close()
  25. return True
  26. for row in results:
  27. text = row[1]
  28. if text != "":
  29. db.close()
  30. print("[-] Fail to upload data for the data has been existed")
  31. return False
  32. except:
  33. print("[-] error ")
  34. db.close()
  35. return True

0x06 Strength th Worm!

Many other things will still leave for me to finish.
The tarot reading funtion will be updated in one week if everything goes smoothly.

Other operation I will not introduce in this article.
The whole code u can enter my github to view or copy.
Github href

Just for learning . Not for commerce

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注