22 December 2017 • 3 mins read
This is the third week update and I am a bit delayed as I was not with my laptop for a few days. I went to my Grandparent’s place and came back becoming fatty. 😛
So, this week I faced many many many challenges and atlast I am happy for some contribution to my dream project in the KWoC’17 i.e, athityakumar/tvseries. I will now share these things today in this blog post.
So, let me start by saying what the project is about and what are things to be done.
TV Series is a tool that scrapes Episode Synopsis’ of popular TV Series’ from websites like Wikipedia / IMDb and shows it all in one single place, with a better user-friendly navigation UI. Website is accessible at https://athityakumar.github.io/tvseries/index.html
We need to add some web series to the website which can be done by web scraping using ruby. We provide some basic details like the reference links and write a ruby_scrapper function which scraps all the data from those links and generate all the necessary files (JSON and HTML files).
First of all, I have gather all the details of the series and enter into the index.json
The next step is to write a scraper function (in ruby) for the particular web series which scraps the data and store it in the respective file. In this case, we will write suits_scraper function which scraps all the data (episodes’ synopsis) and i will store it in suits.json.
But this is not an easy task. For this, I took a course Learn Ruby and made myself familiar with ruby. I read many blog posts and tried out web scraping using ruby like login, copying the data from website etc. Well, I thought that I was prepared by then but seriously NO! because I faced lots and lots and lots of errors but, my mentor Athitya Kumar was so supportive. He clarified my each and every single doubt of mine.
Whatever the doubt may be, he clarified me and helped me a lot. After many days of serious debugging, atlast the scraper worked and the required files are generated. You can see that in the below picture.
Later on few inbuit functions like generate_html, generate_sitemap will do the rest of things i.e, suits.html is created in the series folder. Also add an image in the images/series with the same filename (like suits.png). And that’s it, only testing is pending now.
Apart from adding Suits, the scraper functions of the remaining series also got executed and few series got updated. Finally after a struggle for 18 days, I was able to add one series via PR #54 and I slept very peacefully that night. 😄
I would like to thank my mentor once again because all this was impossible without him. Thank you brother, Athitya Kumar. Now that I am familiar with the codebase and all the errors, it is a little bit comfortable for me to add few more series to the project by the end of KWoC and later on.