Publishing data visualizations on the web - full documentation of solution evaluation and technical issues
Recap
Here's the thing, this website was built with WordPress and was originally hosted on GoDaddy. The plan at the time was signed for three years. After about a year and a half, I decided to move the website to SiteGround . In order to keep GoDaddy's host from idling, I just want to do some Side Projects and try to build a data dashboard on the website. GoDaddy's host can just be used to run automated programs or store data.
I'm not an expert in host and server management, but I think that the automatic capture and calculation of data requires high host resources, so it may be a good thing to separate the host for data calculation and the host for racking, to avoid the former Be careful when the resources are excessive, the blog also hangs up with it.
Another benefit is data backup. Many of the data used for the data dashboard are public and free data. Even if the data is lost, it can be downloaded again, so there is no need to back it up at all. If there is code, you can use GitHub to back it up on the local machine. On the contrary, my website data will be automatically backed up every day, be careful to sail the ship for thousands of years.
Subject of this article
If you want to execute the data dashboard project on your own computer, the so-called local side (Local), the difficulty is quite low, but when you put this project on the website, the problem will be much more complicated. The following is about the difficulties I encountered in the early days, the solutions I found that worked, and the way I finally settled.
When I wrote this article, my data dashboard project had some ideas, but no concrete results yet. If there are new progress and solutions, they will be updated in this article, and I look forward to your comments and feedback.
In addition, the word "local machine" sounds a bit engineering nerdy, but for the convenience of description, I will continue to refer to the computer I use as "local machine", and the "host" of the website is two relative concepts.
Project planning
To start a Side Project, you must have some ideas of your own. Like my first topic, I want to make an interactive dashboard for agricultural product market conditions. There are data sources in json format available on the government data open platform , and the way to obtain historical data is also very simple.
Although the data source of my first project is easy to obtain, I believe that the second and third projects will not be so lucky, and there will be some data that can only be captured by crawlers. Presume that the worst case will happen, so that you can retain enough flexibility and expandability when building your own SOP, and it is also convenient for other small projects to be directly applied in the future.
Python or R
The most well-known programming languages for data analysis are Python and R. In addition to data analysis, data visualization can also be done. There are many resources and related packages on the Internet. This part can be said to be indistinguishable. In terms of learning cost, I have a little understanding of both, so it is not my main consideration. In the end, for generality, I chose Python without hesitation.
What is generality? First of all, as mentioned above, sometimes we will need to write crawlers to capture information on the Internet, and crawlers are originally Python’s strengths. Although R also has related suites, the comments from netizens seem to be not very good. Some It's better to leave it to a professional.
The second is the level of use. R is usually only used for statistics or data analysis, although I've heard people write snakes or landmines in R, but please don't put your talents in weird places. On the contrary, in addition to data analysis, Python will also be used by engineers to write apps or web pages, with more levels of use and endless potential.
In the above two duels, Python obviously prevailed, so I chose Python without any suspense, and the following instructions will also focus on Python.
where do you put your work
I planned to put it on my website from the beginning, so this item was not considered at all, but it was specially written to make everyone's thinking process clearer.
If you want to put your works on a website, of course, you must have a website first, and you have to deal with troubles such as website management and host operation. If you don't have a website, consider putting it on free or paid space. In R, you can upload to RPubs, like the COVID-19 situation made by my colleague. I haven't looked for the Python part, but I believe there are similar resources available.
flow control
If you put your dashboard in a web space like RPubs, then you don't have to care about traffic at all, but if you're like me and want to put it on your own host, you must pay attention to this.
In some cases, the data source files you use may be very large, reaching tens of MB or even hundreds of them. When users browse the web, the size of your file, the amount of traffic your host has to send out, which is scary. If one day your work is accidentally popular, and a lot of netizens come to make a pilgrimage, your hosting traffic will explode first, and then you will receive a letter of concern from the hosting provider, asking if you want to spend more Money upgrade program. If you pretend to be dead, then your website may be shut down directly.
Ah, you said your hosting plan writes unlimited traffic? Just look at that and smile.
Overall, I think the traffic problem is the most troublesome part of this SOP planning, and it also affects the selection of visualization tools.
Choose a visualization
Although Python's own visualization suite can be used, such as Plotly, but considering the traffic, I expected the most ideal solution to be Power BI. As a result, man is not as good as God.
Power BI
Power BI is Microsoft's visualization tool, so I won't introduce it here.
In my original idea, I planned to write the Python program and hang it on the host, let it update automatically every day, and store the data in the host database. At the same time, let Power BI connect to the host's database, and the data source will be automatically updated every day. After the update, Power BI will store the data in Microsoft's database. Since it is only updated once a day, a 100MB file is only about 3GB a month, and traffic is not a problem at all. Finally, I use Power BI to make an interactive dashboard, and use the Publish to web(public) function to embed it in the web page as an iframe.
Go to the ARON HACK website to see the full article "Publishing Data Visual Interactive Charts on the Web - Full Record of Solution Evaluation and Technical Issues"
Like my work? Don't forget to support and clap, let me know that you are with me on the road of creation. Keep this enthusiasm together!
- Author
- More